Benchmarking YOLO Models for Marine Search and Rescue in Variable Weather Conditions

Alshibli, Aysha; Memon, Qurban

doi:10.3390/automation6030035

Open AccessArticle

Benchmarking YOLO Models for Marine Search and Rescue in Variable Weather Conditions

by

Aysha Alshibli

and

Qurban Memon

^*

ECE Department, College of Engineering, UAE University, Al Ain 15551, United Arab Emirates

^*

Author to whom correspondence should be addressed.

Automation 2025, 6(3), 35; https://doi.org/10.3390/automation6030035

Submission received: 8 June 2025 / Revised: 13 July 2025 / Accepted: 24 July 2025 / Published: 2 August 2025

(This article belongs to the Section Intelligent Control and Machine Learning)

Download

Browse Figures

Versions Notes

Abstract

Deep learning with unmanned aerial vehicles (UAVs) is transforming maritime search and rescue (SAR) by enabling rapid object identification in challenging marine environments. This study benchmarks the performance of YOLO models for maritime SAR under diverse weather conditions using the SeaDronesSee and AFO datasets. The results show that while YOLOv7 achieved the highest mAP@50, it struggled with detecting small objects. In contrast, YOLOv10 and YOLOv11 deliver faster inference speeds but compromise slightly on precision. The key challenges discussed include environmental variability, sensor limitations, and scarce annotated data, which can be addressed by such techniques as attention modules and multimodal data fusion. Overall, the research results provide practical guidance for deploying efficient deep learning models in SAR, emphasizing specialized datasets and lightweight architectures for edge devices.

Keywords:

YOLO; YOLOv7; YOLOv10; YOLOv11; marine object detection; search and rescue

1. Introduction

Effective search and rescue (SAR) in the expansive and dynamic marine environment hinges on the swift and precise detection of objects. Marine object detection, a critical field within computer vision and remote sensing, utilizes various imaging technologies, including satellite imagery, synthetic aperture radar, sonar, and underwater cameras, to identify, classify, and locate items in oceanic settings [1]. Sophisticated algorithms, particularly convolutional neural networks (CNNs), have demonstrated an exceptional ability to automatically learn complex features from image data. The integration of deep learning into unmanned aerial vehicles (UAVs) and other autonomous systems is revolutionizing maritime SAR operations, offering rapid deployment, broad area coverage, and immediate visual data analysis [2]. This move towards AI-driven SAR represents a major paradigm shift, overcoming the limitations of human-centric approaches and leading to more efficient and effective emergency responses at sea. Beyond SAR, this technology plays a vital role in diverse applications, such as locating shipwrecks and mines, as well as mapping marine habitats, particularly in underwater environments with limited visibility and complex acoustic conditions [3]. Marine object detection is increasingly important for enhancing autonomous underwater vehicle (AUV) safety and underwater infrastructure inspection efficiency, significantly improving upon limited human observation from ships or aircraft across vast maritime search areas.

The expansion of maritime activities such as shipping, offshore energy, and fishing underscores the critical need for automated detection systems in ensuring maritime security, preserving the environment, and advancing oceanographic studies. The capacity to detect and analyze marine objects yields substantial benefits for global trade, ecological balance, and defense strategies. For instance, satellite SAR-based ship detection helps to control illegal fishing, piracy, and unauthorized vessel presence [4]. Identifying marine debris, particularly plastics, supports pollution reduction initiatives in line with the United Nations Sustainable Development Goal 14 [5]. In marine biology, automated species detection, such as for whales and coral reefs, facilitates more effective biodiversity research [6]. Additionally, developments in deep learning and autonomous underwater vehicles (AUVs) have enhanced real-time monitoring capabilities, decreasing the necessity for manual inspections.

The ability to quickly locate individuals, vessels, or debris is critical for more efficient rescue operations and is crucial for improving the survival rates of those in distress. The natural complexities of the marine environment—including erratic weather, reduced visibility due to fog or darkness, and the constant movement of search platforms and the objects being sought—frequently render conventional SAR approaches inadequate. Considering a simple example of detecting a small object with a high impact in the SAR marine risk assessment process, the risk matrix can be developed as described in Table 1. In Table 1, numbers in brackets show representative values of likelihood, impact, and detectability to calculate the risk score of the hazard (small target in this case). The risk score is calculated as follows:

R i s k s c o r e = \frac{L i k e l i h o o d \times I m p a c t}{D e t e c t a b i l i t y}

The equation suggests that for marine SAR, higher detectability helps in reducing the risk score, and this can only be enabled by increasing technological investments, such as a deep learning-based detection system, along with the fusion of thermal cameras with sonar and optical cameras.

Progress in marine object detection is significantly hampered by limited and imbalanced data. Unlike abundant terrestrial datasets (e.g., COCO, ImageNet), marine datasets (e.g., SeaShips [7], MODD, URPC) are often small, inadequately annotated, or environment-specific [8]. Satellite data (e.g., Sentinel-1 SAR) suffer from low resolution and noise, while underwater data are affected by turbidity and occlusion [9]. The lack of consistent benchmarks further complicates the comparison of different solutions [10,11].

Several interconnected challenges impede progress across several fronts. Environmental factors such as turbidity, waves, and lighting degrade visual data [12], while sensor limitations (e.g., light refraction in water and color absorption) further complicate optical detection [13]. Speckle noise in SAR, high cost, and sparse annotations further deteriorate matters [4,14]. Algorithmic development struggles with real-time processing on resource-limited autonomous devices and generalizing deep learning models due to training data biases [15,16]. Additionally, regulatory and ethical concerns, such as privacy in vessel tracking and data sharing restrictions, limit access to crucial annotated datasets [10].

Despite recent advancements in attention mechanisms and context-aware architectures, further research is crucial to enhance model scalability to generalize across different datasets. Global research continues to tackle significant limitations by pursuing several key objectives: identifying well-annotated algorithms to effectively generalize across diverse marine environments by mitigating training data biases; enabling real-time processing on resource-limited platforms such as AUVs; addressing regulatory and ethical considerations concerning vessel tracking privacy and data sharing restrictions; and finally, establishing consistent benchmarks that allow for fair comparison of model performance across different solutions. Ultimately, the goal is to create a more accurate, reliable, efficient, and ethically responsible marine object detection system capable of operating effectively in the complex and varied oceanic domain.

This research aims to achieve the following:

experimentally evaluate how well various You Only Look Once (YOLO) models perform in identifying marine objects in aerial images captured under diverse weather conditions for SAR operations;
analyze the potential of YOLO models for generalization, computational requirements, and robustness;
identify regulatory efforts to enable a robust deep learning-based marine object detection for SAR.

The paper is organized as follows: Section 2 reviews the recent literature on marine object detection and classification; Section 3 details the different datasets, technical challenges, and evaluation metrics used in models; Section 4 presents the benchmarking approach to evaluate YOLO models for marine SAR; in Section 5, the experimental results from training and testing these models are presented along with measured benchmark values. In Section 6, the analysis of the results accumulated in Section 5 is presented. Regulatory efforts are also discussed in this section. Finally, the concluding summary of the key advancements and their potential impact on maritime SAR operations is presented in Section 7.

2. Literature Review

The use of YOLO models in marine SAR is underpinned by their fundamental theoretical strengths, which make them particularly well-suited for real-time object detection in complex and unpredictable environments. The key theoretical foundations include:

Single-stage object detection: Unlike traditional two-stage detectors, YOLO adopts a single-stage architecture. It processes the entire image in a single forward pass, enabling significantly faster detection.
Grid-based prediction mechanism: YOLO divides the input image into a grid, with each cell responsible for predicting a set number of bounding boxes, object confidence scores, and class probabilities if the object’s center falls within the cell. This approach facilitates simultaneous and distributed multiple object detection across a single image frame.
End-to-end learning: YOLO models are trained end-to-end to map raw pixel data directly to object locations and classes. By processing the entire image during training and inference, YOLO leverages global context, essential for distinguishing small or partially occluded targets from a noisy background such as waves or glare.
CNN-based feature extraction: YOLO employs CNNs as feature extractors to learn high-level representations of objects. These features enable reliable detection of maritime targets such as people, life rafts, and vessels, even under variable lighting, motion blur, or partial occlusions.
Non-maximum suppression (NMS): YOLO incorporates NMS to eliminate overlapping or redundant bounding boxes, retaining only the most confident detections.
Anchor boxes: Later versions of YOLO utilize anchor boxes derived from clustering ground-truth annotations. The model predicts offsets relative to these anchors, enhancing its ability to detect objects of varying sizes and aspect ratios. This is especially relevant for identifying small or irregularly shaped marine targets.
Multi-sensor fusion capability: YOLO can be integrated with data from thermal/infrared cameras or radar, enabling robust performance in low-visibility or nighttime scenarios. This multimodal fusion enhances detection reliability in challenging conditions typical of real-world SAR missions.
Continuous evolution and specialization: The YOLO family has seen rapid iteration with ongoing enhancements in accuracy, speed, and robustness. These improvements target challenges such as tiny object detection, complex backgrounds, and adverse environmental conditions such as fog, rain, or sea glare that affect marine SAR operations.

Recent literature showcases a growing body of review papers addressing critical aspects of marine environments and safety. Specifically, ref. [17] provides an intensive review of deep learning-based object recognition for both surface and underwater targets, establishing a unified framework of the key concepts and architectures, compiling benchmark datasets, and offering a comparative analysis of deep learning methodologies. Complementing this, ref. [18] surveys state-of-the-art deep neural network approaches for marine object detection, a capability deemed crucial for the advancement of autonomous ship navigation, maritime surveillance, and intelligent transportation systems, with a particular focus on YOLO models and the necessity of large-scale, standardized datasets.

In the context of maritime safety, ref. [19] delves into the automated detection and tracking of small objects during person overboard (POB) incidents, conceptualizing the involved technologies as an interconnected system. It introduces a novel three-phase POB model—detection, search and track, and rescue—detailing the initial two phases and their associated responsibilities. The urgency of rapid response in maritime search and rescue is further highlighted by the advancements in technology, such as UAVs equipped with sophisticated sensors, which have spurred the development of automated person detection systems using aerial imagery. In [20], both traditional and advanced machine learning/neural network-based techniques are analyzed, and the role of synthetic data in overcoming data limitations is also considered, ultimately guiding readers in selecting the most suitable methodologies and future trends.

Beyond safety, marine pollution, such as oil spills and litter, poses significant threats to ecosystems and industries, demanding advanced monitoring. A review of 53 recent studies [21] highlights AI’s role in detecting this pollution, showcasing high prediction rates through various model architectures, sensing technologies, and preprocessing methods. However, challenges persist, including limited training data, sensor inconsistencies, and real-time monitoring constraints.

Underwater marine object detection is a fundamental area in marine science and engineering with significant potential for ocean exploration, ecosystem monitoring, natural resource exploration, and fisheries management. Recognizing deep learning’s impact, a recent review [22] categorized challenges in vision-based underwater object detection, including image quality degradation, small object detection, poor generalization, and real-time detection. This article also assessed datasets, compared findings with the previous AI reviews, and discussed future trends in this dynamic field.

UAV-based object detection in maritime environments faces challenges due to limited annotated training data and complex backgrounds [23]. To address this, researchers developed the Maritime Search and Rescue Target Dataset (MSRTD) and proposed MSR-YOLO, an efficient detection model. Furthermore, detecting submerged individuals from UAVs is difficult, especially with sunlight reflection [24]. This led to the creation of ABT-YOLOv7, which integrates an asymptotic feature pyramid network (AFPN), a BiFormer module for small object detection, and a task-specific context decoupling (TSCODE) mechanism. These advancements significantly improve detection accuracy and robustness in challenging lighting conditions.

Beyond optical UAV imagery, deep learning is crucial for processing multimodal ocean sensor data to enable intelligent perception and maritime target detection. In [25], these technologies were explored, emphasizing the mathematical foundations of deep learning architectures such as SSD, R-CNN, and YOLO. It also highlighted the value of combining deep learning with image enhancement, data augmentation, and transfer learning to combat issues such as underwater image degradation and nonlinear noise. For detailed spectral analysis, a framework [26] using hyperspectral imaging and machine learning models showed that CNNs (EfficientNet B0, Inception V3) achieved a significantly higher accuracy than traditional classifiers, establishing hyperspectral imaging as a valuable asset for advanced SAR.

The challenge of detecting small maritime vessels in cluttered aerial imagery has led to several innovative solutions. Maritime Background Suppression Network (MBSDet) [27] tackles this by combining a background suppression module with a multidimensional feature enrichment (MFE) module, demonstrating superior performance on HRSC2016 and DOTA v1.0 datasets. For time-sensitive SAR operations, SG-Det [28] offers a lightweight, real-time detector based on Shuffle-GhostNet that prioritizes speed without sacrificing accuracy. Similarly, YOLO-BEV [29] incorporates a PAN+ with an extra-small-object detection head, a C2fSESA attention module for feature aggregation, and an RGSPP structure to reduce computational overhead. Evaluated on the MOBDrone dataset, YOLO-BEV achieved high accuracy with real-time frame rates.

Fog presents a significant challenge for maritime object detection, leading to the development of SRC-YOLO [30], an improved YOLOv4-tiny model. SRC-YOLO utilizes a single-scale retinex for visual distortion mitigation, a modified receptive field block to expand the receptive field, and a convolutional block attention module for enhanced feature focus, significantly improving detection in foggy maritime scenes. For underwater applications relying on a side-scan sonar, the BES-YOLO [31] network is designed to improve detection accuracy for multi-scale seafloor targets in noisy, complex environments. By incorporating an efficient multi-scale attention mechanism and a BiFPN for feature fusion, BES-YOLO achieves gains in detection and efficiency.

YOLO-SONAR [32] is a new model designed for marine object detection in forward-looking sonar images, addressing challenges such as low resolution and seabed interference. It incorporates a competitive coordinate attention mechanism for noise reduction, a context feature extraction module to improve small object detection, and Wise-IoU v3 loss to address class imbalance. YOLO-SONAR outperforms the existing methods, achieving mAP scores of 81.96% on MDFLS and 82.30% on the new WHFLS datasets. However, it faces computational cost and data dependency limitations.

Underwater optical imaging faces a significant hurdle in marine object detection due to color disparities caused by how light is absorbed and scattered in water. These distortions obscure object boundaries, making it difficult for both human operators and automated systems to identify crucial elements such as people, vessels, or debris, especially in lifesaving SAR scenarios. Addressing these color issues is fundamental for building effective automated detection systems. For instance, restoring a drowning victim’s thermal signature in green-tinted water can significantly aid UAV-based detection. Recent research [33] has leveraged principles of human visual perception to dynamically adjust color balance and contrast, mimicking human adaptability in turbid conditions and producing visually enhanced images. Another approach [34] achieves underwater image enhancement through color correction using such techniques as the gray world assumption, employing type-II fuzzy sets for visibility recovery, and contrast enhancement using curve transformations. These methods are crucial because deep learning models such as YOLO rely on high-quality input data to extract meaningful features and make accurate predictions, ultimately supporting more reliable automated analysis in challenging marine environments.

Table 2 summarizes recent notable research articles related to marine SAR, which reflect the growing sophistication and diversity of deep learning applications in maritime object detection. Based on the literature review, several research gaps in marine object detection for search and rescue emerge for further investigation:

Data scarcity and imbalance: There is a lack of annotated datasets specifically for marine SAR.
Generalization: The existing models struggle to perform consistently across different marine data types.
Detection of small and partially occluded objects: Detecting small and partially hidden objects in complex marine environments is still a major challenge.
Real-time processing challenges: Achieving real-time detection and analysis on platforms with limited computational resources remains a significant technical barrier.
Benchmarking and standardization: The absence of consistent benchmarks and evaluation methods makes it difficult to compare different detection models across different studies and datasets.
Regulatory and ethical issues: There are unaddressed concerns regarding privacy, data sharing, and ethical AI use in maritime SAR operations.

The main contributions of this research are outlined below:

Large-sized datasets were employed to examine YOLO models for robustness (research gaps “a,” “b,” and “c”).
Consistent benchmarks were used to evaluate YOLO models and recent studies (research gap “e”).
Computational load analysis of YOLO models was investigated for SAR operations (research gap “d”).
Recent benchmarking efforts were discussed for real-world utility (research gap “f”).

3. Datasets, Evaluation Metrics, and Technical Challenges

As maritime surveillance and SAR operations are becoming increasingly vital, the development and evaluation of models rely heavily on robust and diverse datasets. These datasets form the backbone for tasks such as detection, classification, and tracking of maritime objects, including vessels, buoys, humans, debris, etc. To enable UAV-based YOLO models to reliably detect marine objects in real-world settings, their training and evaluation must be grounded in datasets that represent the following operational complexities:

The dataset composition should incorporate temporal and geographic diversity and varied environmental conditions. The dataset should reflect differences in lighting, sea state, time of day, season, types of water bodies, and UAV perspectives (altitudes and angles). It should reflect a diverse set of annotated marine objects, especially for such applications as search and rescue (SAR) and maritime surveillance.
Each object should be labeled using standardized bounding box formats and consistent class definitions. A balanced class distribution is critical to avoid model bias. In the case of video datasets, every frame should be annotated individually to support object detection and multi-frame tracking.
UAV imagery must be high-resolution to capture small, overlapping, and distant targets. Both imagery and annotations should conform to widely accepted standards to ensure compatibility with YOLO training pipelines. Data (real and synthetic) must avoid extreme class imbalance to prevent bias in model predictions and enhance generalization.

A well-curated dataset ensures that a YOLO-based UAV model performs precisely in marine environments. For model performance, the key qualifications include the following:

A large number of labeled images for deep learning robustness.
Low occlusion and clutter to minimize obstructions or augment data to handle them.
Multi-scale objects to ensure objects appear at varying scales.
Normalized and augmented data, possibly filtered for noise.
Geospatial metadata may help in contextualizing detection scenarios.

Below, publicly available datasets are reviewed and summarized in Table 3.

3.1. Dataset Descriptions

MODDv2: Used for object detection, classification, and tracking, with a focus on detecting marine vessels and debris. It lacks multiple frequency bands and horizontal view orientation for aerial view applications [35].

Singapore Maritime Dataset (SMD): Designed for detection, classification, and tracking using RGB and NIR frequency bands. It consists of three video streams captured from various altitudes and angles [36].

OpenSARShip: The OpenSARShip dataset is a satellite dataset including SAR images in VV and VH polarizations, with bounding box annotations. It is good for radar-based detection, especially in low-visibility conditions [37].

S2Ships: This dataset includes satellite imagery with RGB and multispectral bands, supporting ship detection with bounding box annotations. Its multispectral bands provide flexibility across various conditions [38].

AFO (Aerial Dataset of Floating Objects): The AFO dataset contains aerial drone imagery focused on floating objects, such as kayaks, buoys, people, and boats. It is annotated with bounding boxes for object detection [39].

xView3 SAR: This is a large-scale dataset focused on maritime object detection using SAR imagery, with various ship types and floating objects. This dataset is valuable in low-visibility conditions [40].

LaRs: This dataset provides top-down RGB images, specifically designed for obstacle detection in maritime environments. This dataset is useful for identifying obstacles and mapping hazards [41].

SeaDronesSee: This dataset is used for object detection and tracking with drone footage over maritime environments. It includes various object classes such as people, boats, and floating objects, supporting SAR [42].

Seagull: This aerial dataset is designed for maritime surveillance, including RGB and thermal images, with bounding box annotations for various objects, making it suitable for both day and night detection [43].

Multi-Category Large-Scale Dataset for Maritime Object Detection (MCMOD): This larger dataset contains images with annotated maritime objects, all captured by three onshore high-resolution video cameras in Hainan, China [44].

While no single “Olympics” exists for marine object detection, the field is actively advanced by dedicated workshops and challenges. These initiatives utilize standardized datasets (satellite, aerial, drone imagery of marine, coastal, and port areas) and well-defined evaluation methodologies to foster progress in marine robotics, environmental monitoring, and underwater exploration. Prominent examples that serve as evolving benchmarks for marine object detection, particularly in SAR scenarios, include the following:

The SeaDronesSee Challenge, organized within the IEEE Global Vision Challenges framework, and often associated with CVPR/IROS workshops, for detecting ships, swimmers, buoys, and other marine objects;
Maritime Object Detection Challenge (MCMOT), which focuses on multi-camera object detection and tracking in varied conditions with day/night footage and adverse weather conditions;
Maritime Computer Vision (MaCVi) Challenge, as part of the IEEE/CVF conferences, with a focus on detection and classification of ships, buoys, and other maritime objects.

3.2. Evaluation Metrics

The object detection performance on marine datasets commonly involves several metrics. These include:

Intersection over union (IOU), a measure of the spatial overlap between predicted and ground-truth bounding boxes, often with a 0.5 threshold for positive detection.
Average precision (AP), which integrates precision across varying recall levels.
Mean average precision (mAP), the arithmetic mean of AP values across all object classes.
Precision and recall, assessing the rates of correct and complete detections, respectively.
F1-score, representing the harmonic mean of precision and recall.
Confusion matrix, a visualization of classification performance across different object categories, highlighting potential misclassifications (e.g., between rafts and speedboats).
Confidence score, indicating the model’s prediction certainty.

These metrics offer a comprehensive assessment of an object detection model’s performance and robustness in maritime environments, considering such factors as object scale, environmental conditions, and inter-class similarities. We will use these in our benchmarking experiments.

3.3. Technical Challenges

As discussed above, marine object detection for SAR faces substantial technical hurdles. Ensuring robustness and generalization across various scenarios and integrating multi-sensor data further complicate development. These technical challenges and their underlying causes are presented in Table 4. The existing datasets, discussed in Section 3.1, do not account for all these challenges. Achieving generalized model performance across different datasets thus becomes crucial for an effective marine SAR system.

4. Proposed Approach

The increasing importance of maritime surveillance and SAR demands reliable and varied data to train and assess computer vision models effectively. Based on the data presented in Table 3 and Table 4, AFO and SeaDroneSee datasets are chosen for benchmarking YOLO-based marine SAR models, as they were developed using aerial platforms in variable weather conditions. The SeaDronesSee dataset provides a rich and dynamic collection of high-resolution RGB images, supporting diverse tasks, tracking sequences, supplementary synthetic data, and specialized subsets. In contrast, the AFO presents several key advantages for creating and assessing detection and classification models in maritime contexts. Its variety of object types in real-world scenarios makes it highly effective for training resilient deep learning models for aerial surveillance, especially for SAR purposes.

A small marine object in YOLO models is generally characterized by its bounding box size relative to the overall image. Often, this means the object’s bounding box occupies less than 1% of the total image area or has dimensions smaller than 32 × 32 pixels within a 640 × 640 input image. This category includes such objects as small marine species, buoys, small boats, and floating debris.

YOLOv8’s anchor-free architecture, combined with an improved loss function, enables more precise bounding box predictions compared to YOLOv5. This enhancement is particularly beneficial for detecting small, irregular, or partially occluded objects. The model performs well in identifying ships, marine mammals, and floating debris, making it suitable for general maritime monitoring. However, YOLOv8 struggles with accurately detecting very small objects, especially under conditions of wave interference and low resolution [45]. This limitation is particularly critical in SAR operations, where missing survivors, small debris, or life rafts due to detection failures can have severe consequences [46]. While YOLOv8 is lightweight and optimized for real-time processing, even on edge devices, its effectiveness diminishes in complex marine scenes with cluttered backgrounds, small targets, or overlapping objects. Despite its speed and efficiency, these challenges highlight the need for further improvements in small-object detection accuracy for high-stakes marine applications.

While YOLOv9 excels in general object detection, its ability to detect very small marine objects (less than 50 pixels) in search and rescue (SAR) operations is limited [47]. It often misses these tiny targets in complex marine scenes, especially when they are occluded or blend with such features as waves and floating debris, resulting in inaccurate bounding box localization. In contrast, such architectures as LFN-YOLO and CFSD-UAVNet improve small-object detection by integrating SPD-Conv to maintain spatial details and GFPN for multi-scale feature fusion, capabilities not natively present in YOLOv9 [48,49]. Furthermore, modified versions, such as MAR-YOLOv9, address these shortcomings through the implementation of enhanced loss functions and attention mechanisms, which are optimizations that YOLOv9 lacks.

YOLOv10 enhances object detection in marine environments by integrating anchor-free detection with task-specific decoupled heads, significantly improving classification, particularly for small marine targets [50]. Its upgraded architecture captures multi-scale spatial details more effectively, boosting detection accuracy in challenging conditions such as sea clutter, sun glint, foam, and occlusions. Optimized for real-time performance, YOLOv10 achieves higher FPS and mAP than YOLOv8 across most scenarios while maintaining efficient memory usage, making it ideal for onboard UAV deployment. Despite its compact size and low parameter count, the model retains high accuracy, proving especially effective in resource-constrained environments such as UAVs. These advancements position YOLOv10 as a leading choice for UAV-based SAR missions [50], where speed and accuracy are critical for detecting small objects in dynamic marine settings.

YOLOv11 represents a significant leap forward in maritime object detection, refining YOLOv10’s architecture through advanced Neural Architecture Search and an optimized backbone-head design [51]. The introduction of multi-scale feature interaction modules combined with an attention-enhanced FPN significantly boosts detection capabilities, particularly for distant, occluded, or partially submerged objects in challenging marine environments. Engineered to excel in low-visibility conditions, YOLOv11 demonstrates exceptional resilience to motion blur and camera shake, critical for SAR operations, and automated port surveillance. Its parallel multi-task processing enhances efficiency in detecting small or overlapping marine targets, even in cluttered high-resolution imagery [51]. Despite these advancements, YOLOv11 maintains real-time inference speeds with higher accuracy than its predecessors, making it ideal for multi-object detection.

Addressing the intricacies of marine object detection, particularly small targets and fluctuating maritime environments, the YOLO model has evolved with each iteration. Finally, YOLOv7’s versatility shines through its ability to handle varying sea conditions and dim lighting to identify small boats, swimmers, and other crucial objects [52]. A comparative table (Table 5) presents a summary of the suitability of YOLO models for detecting very small objects in a complex marine environment.

Based on the discussion, this study is restricted to the training and testing results of three YOLO models (YOLOv7, YOLOv10, and YOLOv11) on the SeaDronesSee and AFO datasets. The experimental workflow model is illustrated in Figure 1. Figure 1 outlines the deep learning methodology adopted in this study. The process begins by loading image data from the first dataset. This step is followed by preprocessing, which includes label assignment. The preprocessed data are then saved and subsequently divided into training and testing sets. Here, it is ensured that class distribution is balanced for better model performance. For each model (YOLOv7, YOLOv10, and YOLOv11), the learning process involves cross-validation using optimization of hyperparameters. Once the training is completed, the evaluation metrics (discussed in Section 3.2) are measured to judge training performance. If satisfaction is reached based on a criterion, the model undergoes testing using the testing set, and the evaluation metrics are measured and termed as testing results. This entire procedure is completed for each model. After this, the entire training and testing process of each model is repeated for the second dataset. Finally, the performance parameters of all three models are analyzed against each dataset to assess generalization, computational efficiency, and robustness using evaluation metrics and the confusion matrix.

5. Experimental Methodology

This section presents experimental results obtained from datasets on deep learning YOLO models.

5.1. Dataset Preprocessing

The SeaDroneSee and AFO datasets primarily contain RGB images and video data with associated annotations. The SeaDroneSee dataset contains a large collection of still images and sequences of frames used for single-object and multi-object detection and tracking. The current version contains 14,227 images (8930 for training, 1547 for validation, and 3750 for testing) across 6 classes: swimmers, boats, jet skis, lifesaving appliances, buoy, and “ignored” regions. Its continuous updates ensure its relevance to real-world situations, making it a valuable resource for developing autonomous UAV-based SAR technologies. The AFO dataset contains a large collection of images featuring a broad spectrum of floating objects, including boats, debris, and natural clutter, captured from stationary and moving ground-based sensors with varied resolutions and video clips captured by drone-mounted cameras. These are typically provided in a structured JSON format along with metadata. The metadata include information about the location, dimensions such as height and width, classes, GPS coordinates, altitude, camera angles, and environmental conditions (e.g., waves, glare). Since YOLO algorithms require a specific normalized text-based annotation format, consisting of class indices and bounding box coordinates normalized to image dimensions (values between 0 and 1), the JSON annotations were converted to meet these requirements. Later, the datasets were divided into the training and validation sets. Separate test sets were then created by reserving 10% of the original training data.

5.2. Results

To analyze performance, YOLOv7, YOLOv10, and YOLOv11 were trained using the training set of both datasets. The benchmark metric results for YOLOv7 are graphed in Figure 2 and Figure 3 for the SeaDroneSee and AFO datasets, respectively. Likewise, the training results of YOLOv10 and YOLOv11 are displayed in Figure 4 and Figure 5, and in Figure 6 and Figure 7, respectively.

The results displayed in Figure 2 reveal strong performance for most classes (e.g., ignored at 0.95 TP, jet ski at 0.94 TP) but highlight critical weaknesses in “lifesaving appliances” (only 0.52 TP with 0.48 FP), indicating frequent misclassification as background or other objects. Moreover, as shown in Figure 3, the model shows high precision and recall for the majority of classes, with “ignored” and “jet ski” performing particularly well. However, lifesaving appliances remain a problem class, showing a low true-positive rate and high misclassification, which suggests the model is often confusing this class with the background or other objects. This highlights a clear need for either more representative training data or better class balancing, as this class is not being learned as well as the others.

The training results for YOLOv10 (in Figure 4) show solid learning for “ignored” and “jet ski” (≈0.85 true-positive each with <10% background spill), moderate performance on “boat” (0.61 TP, with 15% mis-routed to lifesaving gear and many misses), weak recall for swimmers (0.46 TP, 53% FN), and an almost blind spot for “lifesaving appliances” (0.08 TP, 77% FN); the overall background false-positives stayed below 10%, but a sizable share of real boats and swimmers still vanish into the background, signaling those two classes need targeted augmentation or class-balancing. Figure 5 demonstrates that YOLOv10 on AFO effectively suppresses background noise, maintaining false positives largely below 10%. The “ignored” and “jet ski” classes continue to show strong performance. However, boat detection is only moderate, with a significant number of boats either being misclassified as lifesaving gear or missed entirely. Swimmer detection remains weak, with true positives below 0.5 and a very high false negative rate. This indicates a particular need for more data augmentation or improved class distribution for swimmers. The model also almost entirely misses lifesaving appliances, suggesting significant difficulty with this class on the current dataset.

The training results in Figure 6 show strong performance for most classes, with “ignored” (0.97 TP) and “jet ski” (0.91 TP) demonstrating excellent detection, while “boat” (0.89 TP) and “swimmer” (0.82 TP) show good but slightly noisier performance with moderate false positives (8–10% FP). The main weakness remains “lifesaving appliances” (0.63 TP with 28% FN), indicating persistent detection challenges, likely due to complex features or data imbalance. Background suppression is effective (<10% FP for most classes). YOLOv11 yields positive results (shown in Figure 7) on the AFO dataset. As shown in Figure 7, the model maintains strong precision and recall for the “ignored” and “jet ski” classes. Detection for “boat” and “swimmer” also improved compared to earlier versions, although there are still some moderate false positives for these categories. The primary challenge continues to be the “lifesaving appliances” class. While its performance slightly improved, both recall and precision remain lower compared to other classes, likely due to class imbalance or difficult features. Background suppression is effective across the board, with consistently low false positives for the background. Overall, YOLOv11 demonstrates balanced results, but improvements for “lifesaving appliances” are still necessary.

After training, the YOLO models were tested using the testing set of the SeaDroneSee dataset. The testing results are shown in Table 6. The results indicate that YOLOv7 performs slightly better than YOLOv10 and YOLOv11. To understand it clearly, the confusion matrix of the YOLOv7 model for the SeaDroneSee dataset is shown in Figure 8.

The YOLOv7 testing results (Figure 8) show excellent performance for most classes, with near-perfect detection of “ignored” (0.99 TP), “swimmer” (0.96 TP), and “jet ski” (0.94 TP), demonstrating robust generalization. However, “boat” has moderate false positives (9% FP, likely confused with the background), while lifesaving appliances show improved but still suboptimal detection (0.81 TP with 15% FN). Background noise remains well-suppressed (<10% FP overall). For the AFO dataset, the testing results are shown in Table 7, and the confusion matrix is displayed in Figure 9.

The results of YOLOv7 on the AFO dataset exhibit strong generalization. The “ignored”, “swimmer”, and “jet ski” classes all achieve high true positive rates with minimal misclassifications. Boat detection is solid, though not perfect, with some moderate false positives likely confused with the background. While improved compared to training, “lifesaving appliances” detection still lags behind other classes, showing a higher false negative rate. This indicates that the model continues to struggle with reliably identifying this class in real test data. Notably, background false positives are well-controlled, suggesting effective noise suppression.

6. Analysis and Discussion

The YOLO series of models achieves a strong balance between speed and accuracy. Yet, when tested on specialized datasets such as SeaDroneSee and AFO, which feature demanding maritime conditions and drone-captured imagery, certain shortcomings emerge across YOLOv7, YOLOv10, and YOLOv11 models. These challenges underscore the necessity for further model refinement and dataset-specific enhancements to ensure reliable performance in maritime and aerial search and rescue operations. Table 8 outlines the key performance limitations observed for YOLOv7, YOLOv10, and YOLOv11 models on these datasets.

Based on experimental results, the YOLO models and their variants developed for marine environments can be compared based on generalization, performance, and computational requirements. Table 9 shows the comparison for each dataset. It shows that standard YOLOv7 yields slightly better generalized performance compared to YOLOv10 and YOLOv11 models in the context of all classes. However, it is also noted that the YOLOv7 variants show slightly better performance, but their performance cannot be generalized as they were developed specifically for relatively larger objects such as ships. A similar argument holds for YOLOv10 and YOLOv11 variants, as they were developed and tested using the custom datasets with better results.

The performance of YOLO models can also be compared in the context of real-time or near-real-time for SAR operations. In deep learning models such as YOLOv7, YOLOv10, and YOLOv11, the floating-point operations (FLOPS) in the forward pass primarily determine the computational complexity. These FLOPS for a standard convolutional layer can be computed as:

FLOPS = 2 × C_in × C_out × K_h × K_w × H_out × W_out

(1)

where C_in, C_out represent input and output channels, K_h, K_w represent kernel size in height and width, and H_out, W_out represent output feature map height and width. The backbone and head of YOLOv7 primarily utilize small 3 × 3 convolutional kernels across most feature extraction layers, striking a balance between model capacity, computational efficiency, and receptive field effectiveness. Additionally, 1 × 1 kernels are occasionally employed for channel reduction. YOLOv10 and YOLOv11 build on this foundation, maintaining the same design approach established by earlier versions. The multiplier ‘2’ takes care of both multiplication and addition in each operation.

For an intermediate layer, this can be calculated as follows. Assume a convolution layer of kernel size K_h × K_w = 3 × 3, input channel C_in: 512, output channel C_out: 1024, and an intermediate layer’s input image resolution of H_in: 20, W_in: 20, the output dimensions would be H_out = H_in − K_h + 1 = 18, and W_out = W_in − K_w + 1 = 18. Thus, the number of FLOPs for this intermediate layer can be calculated using Equation (1) as:

FLOPs for this intermediate layer = 2 × 512 × 1024 × 3 × 3 × 18 × 18 = 3.056 GFLOPs

The total complexity of the YOLO model can thus be determined by core stacked convolutional layers, though there are other such operations as activations and upsampling.

T o t a l F L O P S = \sum_{i = 1}^{L} {F L O P s}_{i}

(2)

where L is the number of layers. The difference in FLOPs of the YOLO models is influenced by the number of parameters involved in each convolution. This comparison is detailed in Table 10, where model size is calculated by multiplying the number of parameters by four, assuming 32-bit computation for each parameter. Table 10 shows that YOLOv10 (medium) involves the lowest FLOPs compared to YOLOv7 and YOLOv11 models. Thus, it is a good candidate for developing the SAR application on edge devices. The improved lightweight variants of YOLOv7 reduce these computations and memory access time using partial convolutions, etc. [59]. The innovations in architectural variants of the YOLOv10 model optimize this total FLOPS by removing redundancy and overhead [60]. Likewise, YOLOv11 variants use optimizations, for example, pruning and removal of blocks for some object sizes to reduce the number of operations in each FLOP [61]. Table 10 also shows the inference time computed on a dedicated NVIDIA T4 GPU platform, along with the runtime measured in this experimental work. The difference between inference time [62,63] and runtime in this work is due to the difference in computational power of the machines employed.

Benchmarking deep learning models for marine SAR typically encompasses several key components:

Developing specialized datasets: Due to the distinct challenges of marine environments, generic datasets often fall short. Researchers have thus focused on curating and enhancing datasets tailored specifically for SAR applications.
○
SARDet-100K represents a major step forward by aggregating and standardizing 10 existing SAR detection datasets into a large-scale, multi-class benchmark.
○
SeaDronesSee is widely used in UAV-based maritime SAR research, particularly for detecting individuals in the water.
○
VTSaR dataset supports aerial person detection by incorporating diverse scenes, activities, and viewpoints. It includes both visible and infrared images, as well as synthetic data.
○
Enhancing existing datasets, such as the Singapore Maritime Dataset (SMD), through relabeling or augmentation efforts to better align with deep learning-based marine object detection tasks.
Practical metrics: Beyond traditional measures such as mAP@0.5, benchmarks now increasingly consider hardware-specific inference speed, FLOPS, resource efficiency, false positive rates, resilience to environmental conditions such as fog and waves, and generalization.
Multimodal data: Benchmarking frameworks need to incorporate both optical (RGB and IR) and Synthetic Aperture Radar (SAR) imagery to improve model robustness and detection performance across diverse marine conditions.

These benchmarking initiatives are supported by a growing number of workshops and challenges, including the following:

ICMRA 2023 Workshop (CVPR); the 2025 Maritime Computer Vision Workshops and Challenges; SeaDroneSee Challenge; MCMOT Challenge;
2023 EMSA RPAS Benchmarking; 2024 AI-MAR Challenge (IEEE OES).

Such events provide platforms for standardized evaluation, promote collaboration, and accelerate the translation of research into operational AI solutions for SAR missions. Overall, current benchmarking efforts in marine SAR emphasize dataset specialization, robust evaluation metrics, and model generalizability. They serve as a bridge between experimental success and the deployment of dependable AI systems that enhance real-world rescue operations.

7. Conclusions

This study comprehensively benchmarked the YOLOv7, YOLOv10, and YOLOv11 models for marine SAR across various environmental conditions, utilizing the SeaDronesSee and AFO datasets. The results show YOLOv7 excels in overall detection accuracy and generalization, but struggles with small and ambiguous objects such as lifesaving appliances. Conversely, YOLOv10 and YOLOv11 provide faster inference and better computational efficiency, suitable for edge devices, though with some sacrifice in precision and robustness for small targets. The persistent difficulty in detecting lifesaving appliances highlights the need for improved dataset balancing and model refinements. To enhance detection accuracy in complex marine settings, future YOLO model algorithms need to integrate the following improvements:

Data input
- Fusion of multispectral/IR images with RGB to improve detection.
- Synthetic data, domain adaptation, and augmentations to boost adaptability.
Preprocessing
- Use a preprocessor to suppress reflections and wave interference.
Architectural modifications
- Attention layer(s) to better detect small, distant targets and ignore wave clutter.
- Dynamic anchor scaling for better maritime object detection.
- Reduce early downsampling in the backbone to preserve small object details.
- Motion-aware modules to better track moving marine objects.
- Pruned and quantized YOLO variants for efficient edge performance.
- Onboard AI processing for real-time, cloud-free detection.
Training
- Train on diverse maritime datasets for better generalization.

Author Contributions

Conceptualization, Q.M.; methodology, Q.M.; software, A.A.; validation, A.A.; formal analysis, Q.M.; investigation, A.A.; resources, A.A.; data curation, A.A.; writing—original draft preparation, Q.M.; writing—review and editing, Q.M. and A.A.; visualization, A.A.; supervision, Q.M.; project administration, Q.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The original data presented in the study are openly available at https://drive.google.com/drive/folders/1LkP4naF9Oo54O59GOiOgrvjd_74ABL3- (accessed on 12 July 2025).

Conflicts of Interest

The authors declare no conflict of interest.

References

Jian, M.; Yang, N.; Tao, C.; Zhi, H.; Luo, H. Underwater object detection and datasets: A survey. Intell. Mar. Technol. Syst. 2024, 2, 9. [Google Scholar] [CrossRef]
Bachir, N.; Memon, Q. Investigating YOLOv5 for Search and Rescue Operations Involving UAVs. In Proceedings of the 5th International Conference on Control and Computer Vision, Xiamen, China, 19–21 August 2022; pp. 200–204. [Google Scholar]
Zhu, C.; Zhao, D.; Liu, Z.; Mao, Y. Hierarchical Attention for Ship Detection in SAR Images. In Proceedings of the IEEE International Geoscience and Remote Sensing Symposium, Waikoloa, HI, USA, 26 September–2 October 2020; pp. 2145–2148. [Google Scholar]
Iervolino, P.; Guida, R.; Lumsdon, P.; Janoth, J.; Clift, M.; Minchella, A.; Bianco, P. Ship detection in SAR imagery: A comparison study. In Proceedings of the 2017 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Fort Worth, TX, USA, 23–28 July 2017; pp. 2050–2053. [Google Scholar]
Lebreton, L.; Slat, B.; Ferrari, F.; Sainte-Rose, B.; Aitken, J.; Marthouse, R.; Hajbane, S.; Cunsolo, S.; Schwarz, A.; Levivier, A.; et al. Evidence that the Great Pacific Garbage Patch is rapidly accumulating plastic. Sci. Rep. 2018, 8, 4666. [Google Scholar] [CrossRef]
Boom, B.J.; He, J.; Palazzo, S.; Huang, P.X.; Beyan, C.; Chou, H.-M.; Lin, F.-P.; Spampinato, C.; Fisher, R.B. A research tool for long-term and continuous analysis of fish assemblage in coral-reefs using underwater camera footage. Ecol. Inform. 2014, 23, 83–97. [Google Scholar] [CrossRef]
Zhang, Z.; Wu, J.; Zhang, X.; Zhang, C. SeaShips: A large-scale ship dataset for deep learning in marine applications. IEEE Trans. Multimed. 2018, 20, 2593–2604. [Google Scholar]
Kikaki, K.; Kakogeorgiou, I.; Mikeli, P.; Raitsos, D.E.; Karantzalos, K.; Veettil, B.K. MARIDA: A benchmark for Marine Debris detection from Sentinel-2 remote sensing data. PLoS ONE 2022, 17, e0262247. [Google Scholar] [CrossRef] [PubMed]
Liu, H.; Dai, J.; Wang, R.; Zheng, H.; Zheng, B. Combining background subtraction and three-frame difference to detect moving object from underwater video. In Proceedings of the OCEANS ’16 MTS/IEEE, Shanghai, China, 10–13 April 2016; pp. 1–5. [Google Scholar]
Zhang, Z.; Zhang, L.; Wu, J.; Guo, W. Optical and Synthetic Aperture Radar Image Fusion for Ship Detection and Recognition: Current State, Challenges, and Future Prospects. IEEE Geosci. Remote Sens. Mag. 2024, 12, 132–168. [Google Scholar] [CrossRef]
Bachir, N.; Memon, Q.A. Benchmarking YOLOv5 models for improved human detection in search and rescue missions. J. Electron. Sci. Technol. 2024, 22, 100243. [Google Scholar] [CrossRef]
Chuanmin, H. Remote detection of marine debris using Sentinel-2 imagery: A cautious note on spectral interpretations. Mar. Pollut. Bull. 2022, 183, 114082. [Google Scholar] [CrossRef]
Islam, M.J.; Edge, C.; Xiao, Y.; Luo, P.; Mehtaz, M.; Morse, C.; Enan, S.S.; Sattar, J. Semantic Segmentation of Underwater Imagery: Dataset and Benchmark. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, NV, USA, 25–29 October 2020; pp. 1769–1776. [Google Scholar]
Huo, G.; Wu, Z.; Li, J. Underwater Object Classification in Sidescan Sonar Images Using Deep Transfer Learning and Semisynthetic Training Data. IEEE Access 2020, 8, 47407–47418. [Google Scholar] [CrossRef]
Cao, X.; Ren, L.; Sun, C. Dynamic Target Tracking Control of Autonomous Underwater Vehicle Based on Trajectory Prediction. IEEE Trans. Cybern. 2023, 53, 1968–1981. [Google Scholar] [CrossRef]
Alameri, M.; Memon, Q.A. YOLOv5 Integrated with Recurrent Network for Object Tracking: Experimental Results from a Hardware Platform. IEEE Access 2024, 12, 119733–119742. [Google Scholar] [CrossRef]
Wang, N.; Wang, Y.; Er, M.J. Review on deep learning techniques for marine object recognition: Architectures and algorithms. Control. Eng. Pract. 2022, 118, 104458. [Google Scholar] [CrossRef]
Zhang, R.; Li, S.; Ji, G.; Zhao, X.; Li, J.; Pan, M.; Han, C. Survey on Deep Learning-Based Marine Object Detection. J. Adv. Transp. 2021, 2021, 5808206. [Google Scholar] [CrossRef]
Hoehner, F.; Langenohl, V.; Akyol, S.; el Moctar, O.; Schellin, T.E. Object Detection and Tracking in Maritime Environments in Case of Person-Overboard Scenarios: An Overview. J. Mar. Sci. Eng. 2024, 12, 2038. [Google Scholar] [CrossRef]
Martinez-Esteso, J.P.; Castellanos, F.J.; Calvo-Zaragoza, J.; Gallego, A.J. Maritime search and rescue missions with aerial images: A survey. Comput. Sci. Rev. 2025, 57, 100736. [Google Scholar] [CrossRef]
Prakash, N.; Zielinski, O. AI-enhanced real-time monitoring of marine pollution: Part 1-A state-of-the-art and scoping review. Front. Mar. Sci. 2025, 12, 1486615. [Google Scholar] [CrossRef]
Er, M.J.; Chen, J.; Zhang, Y.; Gao, W. Research Challenges, Recent Advances, and Popular Datasets in Deep Learning-Based Underwater Marine Object Detection: A Review. Sensors 2023, 23, 1990. [Google Scholar] [CrossRef]
Geng, W.; Yi, J.; Cheng, L. An efficient detector for maritime search and rescue object based on unmanned aerial vehicle images. Displays 2025, 87, 102994. [Google Scholar] [CrossRef]
Zhang, Y.; Yin, Y.; Shao, Z. An Enhanced Target Detection Algorithm for Maritime Search and Rescue Based on Aerial Images. Remote Sens. 2023, 15, 4818. [Google Scholar] [CrossRef]
Liu, H.; Li, Y.; Qian, T.; Tang, Y. Recent Progress in Ocean Intelligent Perception and Image Processing and the Impacts of Nonlinear Noise. Mathematics 2025, 13, 1043. [Google Scholar] [CrossRef]
Seo, D.; Lee, D.; Park, S.; Oh, S. Hyperspectral Image-Based Identification of Maritime Objects Using Convolutional Neural Networks and Classifier Models. J. Mar. Sci. Eng. 2024, 13, 6. [Google Scholar] [CrossRef]
Ji, G.; Fan, L.; Li, C. MBSDet: A Novel Method for Marine Object Detection in Aerial Imagery with Complex Background Suppression. Electronics 2024, 13, 4764. [Google Scholar] [CrossRef]
Lili, Z.; Zhang, N.; Shi, R.; Wang, G.; Xu, Y.; Chen, Z. SG-Det: Shuffle-GhostNet-Based Detector for Real-Time Maritime Object Detection in UAV Images. Remote Sens. 2023, 15, 3365. [Google Scholar]
Zhilin, Y.; Yin, Y.; Jing, Q.; Shao, Z. A High-Precision Detection Model of Small Objects in Maritime UAV Perspective Based on Improved YOLOv5. J. Mar. Sci. Eng. 2023, 11, 1680. [Google Scholar] [CrossRef]
Zhang, Y.; Ge, H.; Lin, Q.; Zhang, M.; Sun, Q. Research of Maritime Object Detection Method in Foggy Environment Based on Improved Model SRC-YOLO. Sensors 2022, 22, 7786. [Google Scholar] [CrossRef] [PubMed]
Ma, Q.; Jin, S.; Bian, G.; Cui, Y. Multi-Scale Marine Object Detection in Side-Scan Sonar Images Based on BES-YOLO. Sensors 2024, 24, 4428. [Google Scholar] [CrossRef]
Zhen, W.; Jianxin, G.; Shanwen, Z.; Nan, X. Marine object detection in forward-looking sonar images via semantic-spatial feature enhancement. Front. Mar. Sci. 2025, 12, 1539210. [Google Scholar]
Wang, H.; Sun, S.; Chang, L.; Li, H.; Zhang, W.; Frery, A.C.; Ren, P. INSPIRATION: A reinforcement learning-based human visual perception-driven image enhancement paradigm for underwater scenes. Eng. Appl. Artif. Intell. 2024, 133 Pt D, 108411. [Google Scholar] [CrossRef]
Xu, J.; Kiah, L.M.; Noor, R.M.; Por, L.Y.; Wu, Y. MHF-UIE a multi-task hybrid fusion method for real-world underwater image enhancement. Sci. Rep. 2025, 15, 18131. [Google Scholar] [CrossRef]
Bovcon, B.; Mandeljc, R.; Perš, J.; Kristan, M. Stereo obstacle detection for unmanned surface vehicles by IMU-assisted semantic segmentation. Robot. Auton. Syst. 2018, 104, 1–13. [Google Scholar] [CrossRef]
Prasad, D.K.; Rajan, D.; Rachmawati, L.; Rajabally, E.; Quek, C. Video Processing from Electro-Optical Sensors for Object Detection and Tracking in a Maritime Environment: A Survey. IEEE Trans. Intell. Transp. Syst. 2017, 18, 1993–2016. [Google Scholar] [CrossRef]
Huang, L.; Liu, B.; Li, B.; Guo, W.; Yu, W.; Zhang, Z. OpenSAR-Ship: A Dataset Dedicated to Sentinel-1 Ship Interpretation. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2018, 11, 195–208. [Google Scholar] [CrossRef]
Lagrange, A.; De Vieilleville, F.; Ruiloba, R.; Le Saux, B.; Mathieu, P.P. CORTEX: Open training datasets of Sentinel images: Ships (Sentinel-2) and refugee camps detection (Sentinel-1 and 2). In Proceedings of the ESA EO PHI-WEEK 2020, Virtual Event, 28 September–2 October 2020. [Google Scholar]
GąsIenica-Józkowy, J.; Knapik, M.; Cyganek, B. An ensemble deep learning method with optimized weights for drone-based water rescue and surveillance. Integr. Comput. Eng. 2021, 28, 221–235. [Google Scholar] [CrossRef]
Paolo, F.S.; Lin, T.; Gupta, R.; Goodman, B.; Patel, N.; Kuster, D.; Kroodsma, D.; Dunnmon, J. XView3-SAR: Detecting dark fishing activity using synthetic aperture radar imagery. In Proceedings of the 36th International Conference on Neural Information Processing Systems, New Orleans, LA, USA, 28 November–9 December 2022; pp. 37604–37616. [Google Scholar]
Žust, L.; Perš, J.; Kristan, M. LaRS: A Diverse Panoptic Maritime Obstacle Detection Dataset and Benchmark. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 2–3 October 2023; pp. 20247–20257. [Google Scholar]
Varga, L.A.; Kiefer, B.; Messmer, M.; Zell, A. SeaDronesSee: A Maritime Benchmark for Detecting Humans in Open Water. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 3–8 January 2022; pp. 3686–3696. [Google Scholar]
Ribeiro, R.; Cruz, G.; Matos, J.; Bernardino, A. A Data Set for Airborne Maritime Surveillance Environments. IEEE Trans. Circuits Syst. Video Technol. 2019, 29, 2720–2732. [Google Scholar] [CrossRef]
Sun, Z.; Hu, X.; Qi, Y.; Huang, Y.; Li, S. MCMOD: The Multi-Category Large-Scale Dataset for Maritime Object Detection. Comput. Mater. Contin. 2023, 75, 1657–1669. [Google Scholar]
Fu, Z.; Xiao, Y.; Tao, F.; Si, P.; Zhu, L. DLSW-YOLOv8n: A Novel Small Maritime Search and Rescue Object Detection Framework for UAV Images with Deformable Large Kernel Net. Drones 2024, 8, 310. [Google Scholar] [CrossRef]
Yang, D.; Solihin, M.I.; Ardiyanto, I.; Zhao, Y.; Li, W.; Cai, B.; Chen, C. A streamlined approach for intelligent ship object detection using EL-YOLO algorithm. Sci. Rep. 2024, 14, 15254. [Google Scholar] [CrossRef]
Lu, D.; Wang, Y.Y. MAR-YOLOv9: A multi-dataset object detection method for agricultural fields based on YOLOv9. PLoS ONE 2024, 19, e0307643. [Google Scholar] [CrossRef]
Mingxin, L.; Yujie, W.; Ruixin, L.; Cong, L. LFN-YOLO: Precision underwater small object detection via a lightweight reparameterized approach. Front. Mar. Sci. 2024, 11, 1513740. [Google Scholar]
Ding, G.; Liu, J.; Li, D.; Fu, X.; Zhou, Y.; Zhang, M.; Li, W.; Wang, Y.; Li, C.; Geng, X. A Cross-Stage Focused Small Object Detection Network for Unmanned Aerial Vehicle Assisted Maritime Applications. J. Mar. Sci. Eng. 2025, 13, 82. [Google Scholar] [CrossRef]
Mai, R.; Wang, J. UM-YOLOv10: Underwater Object Detection Algorithm for Marine Environment Based on YOLOv10 Model. Fishes 2025, 10, 173. [Google Scholar] [CrossRef]
Ali, M.L.; Zhang, Z. The YOLO Framework: A Comprehensive Review of Evolution, Applications, and Benchmarks in Object Detection. Computers 2024, 13, 336. [Google Scholar] [CrossRef]
Zhu, Q.; Ma, K.; Wang, Z.; Shi, P. YOLOv7-CSAW for maritime target detection. Front. Neurorobotics 2023, 17, 1210470. [Google Scholar] [CrossRef]
Kaur, R.; Singh, S. A comprehensive review of object detection with deep learning. Digit. Signal Process. 2023, 132, 103812. [Google Scholar] [CrossRef]
Zhao, H.; Zhang, H.; Zhao, Y. YOLOv7-sea: Object Detection of Maritime UAV Images based on Improved YOLOv7. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW), Waikoloa, HI, USA, 3–7 January 2023; pp. 233–238. [Google Scholar]
Jiang, Z.; Su, L.; Sun, Y. YOLOv7-Ship: A Lightweight Algorithm for Ship Object Detection in Complex Marine Environments. J. Mar. Sci. Eng. 2024, 12, 190. [Google Scholar] [CrossRef]
Zhang, Q.; Wang, X.; Shi, H.; Wang, K.; Tian, Y.; Xu, Z.; Zhang, Y.; Jia, G. BRA-YOLOv10: UAV Small Target Detection Based on YOLOv10. Drones 2025, 9, 159. [Google Scholar] [CrossRef]
Zhu, M.; Han, D.; Han, B.; Huang, X.; Majumder, H. YOLO-HPSD: A high-precision ship target detection model based on YOLOv10. PLoS ONE 2025, 20, e0321863. [Google Scholar] [CrossRef] [PubMed]
Huang, J.; Wang, K.; Hou, Y.; Wang, J. LW-YOLO11: A Lightweight Arbitrary-Oriented Ship Detection Method Based on Improved YOLO11. Sensors 2025, 25, 65. [Google Scholar] [CrossRef] [PubMed]
Zhigang, L.; Baoshan, S.; Kaiyu, B. Optimization of YOLOv7 Based on PConv, SE Attention and Wise-IoU. Int. J. Comput. Intell. Appl. 2024, 23, 2350033. [Google Scholar] [CrossRef]
Mei, J.; Zhu, W. BGF-YOLOv10: Small Object Detection Algorithm from Unmanned Aerial Vehicle Perspective Based on Improved YOLOv10. Sensors 2024, 24, 6911. [Google Scholar] [CrossRef]
Tian, Z.; Yang, F.; Yang, L.; Wu, Y.; Chen, J.; Qian, P. An Optimized YOLOv11 Framework for the Efficient Multi-Category Defect Detection of Concrete Surface. Sensors 2025, 25, 1291. [Google Scholar] [CrossRef]
Wang, C.-Y.; Bochkovskiy, A.; Liao, H.-Y.M. YOLOv7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 18–22 June 2023; pp. 7464–7475. [Google Scholar]
Wang, A.; Chen, H.; Liu, L.; Chen, K.; Lin, Z.; Han, J.; Ding, G. YOLOv10: Real-Time End-to-End Object Detection. Neural Inf. Process. Syst. 2024, 37, 107984–108011. [Google Scholar]
Alkhammash, E.H. A Comparative Analysis of YOLOv9, YOLOv10, YOLOv11 for Smoke and Fire Detection. Fire 2025, 8, 26. [Google Scholar] [CrossRef]

Figure 1. Block diagram of the benchmarking study on the YOLO models.

Figure 2. Precision, recall, and F1-score of the YOLOv7 model (SeaDroneSee dataset).

Figure 3. Precision, recall, and F1-score of the YOLOv7 model (AFO dataset).

Figure 4. Precision, recall, and F1-score of the YOLOv10 model (SeaDroneSee dataset).

Figure 5. Precision, recall, and F1-score of the YOLOv10 model (AFO dataset).

Figure 6. Precision, recall, and F1-score of the YOLOv11 model (SeaDroneSee dataset).

Figure 7. Precision, Recall, and F1-score of the YOLOv11 model (AFO dataset).

Figure 8. Resulting confusion matrix of the YOLOv7 model after the testing (SeaDroneSee dataset).

Figure 9. Resulting confusion matrix of the YOLOv7 model after testing (AFO dataset).

Table 1. Marine SAR risk matrix.

Risk Factor	Likelihood	Impact	Detectability	Risk Score	Mitigation
Small target	Low (1)	High (3)	High (3)	1 (low–medium)	AI detection
	Medium (2)	High (3)	Medium–high (2)	3 (medium–high)	AI detection
	High (3)	High (3)	Low–medium (1)	9 (critical)	AI detection with IR/radar

Table 2. Summary of recent research articles related to marine SAR.

Publication, Year	Method Used	Dataset Used	Key Findings
Person-Overboard, 2024 [19]	CNN, R-CNN, YOLO	POB-related datasets	Comprehensive review highlighting technologies, challenges, and datasets relevant to marine SAR
Recent Progress, 2025 [25]	SSD, R-CNN, YOLO	N/A	Review for SAR, mentioning MTP-YOLO for tiny person detection
Hyperspectral, 2024 [26]	EfficientNet, Inception, etc.	Hyperspectral data	Hyperspectral marine object detection accuracy reaching 90% in SAR
MBSDet, 2024 [27]	Transformer, CNN	HRSC2016, DOTA v1.0	Accuracy (90.54% mAP on HRSC2016) in aerial imagery with complex backgrounds in SAR
SG-Det, 2023 [28]	Shuffle-GhostNet, BiFPN-tiny	AFO	Real-time accuracy over 90%
Small objects, 2023 [29]	YOLOv5 (improved)	MOBDrone	High precision in detecting small objects (22.2% increase in APS) in real-time
SRC-YOLO, 2022 [30]	YOLOv4-tiny (improved)	Custom foggy ocean dataset	Improved mAP (86.15%) in foggy conditions, for SAR in adverse weather
BES-YOLO, 2024 [31]	YOLOv8n (improved)	Side-scan sonar images	High mAP (92.4% @ 0.5 IoU) for multi-scale object detection in sonar images for underwater SAR

Table 3. Maritime datasets for SAR.

Datasets	Classes	D/C/T	View	Bands	Annotation	Comments
MODDv2	109	D, C, T	Horizontal	RGB only	Masking	Not suitable
Singapore (SMD)	7	D, C, T	Top down	RGB and NIIR	BB	3 video streams
OpenSARShip 2.0	16	C only	Satellite	SAR only (VV, VH)	BB	Not suitable, only for ships
S2Ships	3	D only	Satellite	RGB and multispectral	Segmented	3 classes, only for ships
AFO	9	D, C, T	Aerial	RGB only	BB	Floating objects
xView3 SAR	3	D, C	Satellite	SAR only (VV, VH)	Mask	Very hard, only for vessels
LaRs	12	D, C	Top Down	RGB only	Segmentation	Obstacle detection
SeaDronesSee	7	D, C, T	Aerial	RGB only	BB & Mask	For SAR
Seagull	6	D, C	Top Down	RGB & Thermal	BB	Dataset not available
MCMOD	10	D, C	Horizontal	RGB	BB	Not suitable

Note: D, detection; C, classification; T, tracking; BB, bounding Box (BB); VV/VH, vertically/horizontally polarized (VV/VH)).

Table 4. Technical challenges encountered during marine object detection.

Challenges	Types	Source/Cause
Environmental	Motion blur	Relative movement, water motion
	Dynamic scene changes	Water movement, marine life motion
	Non-uniform illumination	Variable natural light, shadows, uneven artificial light
	Low image contrast	Light scattering/absorption in water
	Color degradation	Selective light absorption with depth
	Cluttered backgrounds	Diverse marine environment
	Blurred and noisy images	Water turbidity
Target	Scale variation of objects	Wide range of object sizes
	Object occlusion	Overlapping objects
	Deformable object shapes	Non-rigid nature of marine animals
	Small and camouflaged objects	Low visual footprint, similarity to surroundings
Sensor	Sensor limitations (visual)	Restricted FOV, limited range/resolution
	Sensor limitations (acoustic)	Low resolution, high noise, multipath
Data	Integration of multimodal data	Different sensor characteristics and formats
	Domain shift	Differences between training and deployment data
	Limited annotated data	Difficulty/cost of underwater data labeling
Computational	Real-time processing constraints	High computational cost, limited hardware resources
Generalization	Unseen conditions, variation in the environment
Cross-sensor	Data fusion and calibration

Table 5. YOLO models versus suitability for very small marine object detection.

Model	Suitability	Adaptations
YOLOv5	Moderate (base architecture lacks specialized small-object modules)	Limited native adaptations [53]
YOLOv6	Fewer computations and parameters	Less research and resource availability
YOLOv7	Excellent (variants show improvement in small-object precision)	Turbidity-resistant training [52]
YOLOv8	Not fit for very small marine objects	Acceptable in ideal conditions [46]
YOLOv9	Good on marine datasets, not fit for very small marine objects	Noise filtering, wave motion compensation [47]
YOLOv10	Excellent, 2.1× faster than v9 on edge devices	Fog simulation augmentation [50]
YOLOv11	Excellent, cutting-edge, new SOTA for marine objects	Glare reduction, adaptive tidal zone detection [51]

Table 6. Benchmark metric testing results for the SeaDroneSee dataset.

Metric	YOLOv7	YOL0v10	YOLOv11
mAP@50	0.907	0.677	0.906
F1-score	0.92 (confidence = 0.429)	0.70 (confidence = 0.107)	0.9 (confidence = 0.223)
Precision	1.00 (confidence = 0.934)	1.00 (confidence = 0.963)	1.00 (confidence = 0.898)
Recall	0.94 (confidence = 0.000)	0.74 (confidence = 0.00)	0.94 (confidence = 0.000)
Class APs
Ignored	0.992	0.930	0.99
Jet ski	0.927	0.832	0.949
Swimmer	0.957	0.781	0.951
Boat	0.874	0.603	0.862
Lifesaving appliances	0.786	0.240	0.776
Confidence threshold (optimal)	0.429	0.107	0.429
IoU threshold	0.50	0.50	0.50

Table 7. Benchmark metric testing results for the AFO dataset.

Metric	YOLOv7	YOL0v10	YOLOv11
mAP@50	0.610	0.604	0.612
F1-score	0.60 (confidence = 0.487)	0.59 (confidence = 0.166)	0.51 (confidence = 0.082)
Precision	1.00 (confidence = 0.912)	1.00 (confidence = 0.968)	0.89 (confidence = 0.863)
Recall	0.82 (confidence = 0.000)	0.69 (confidence = 0.00)	0.53 (confidence = 0.000)
Class Aps
Boat	0.468	0.578	0.864
Bouy	0.09	0.029	0.000
Human	0.572	0.451	0.324
Kayak	0.822	0.830	0.683
large_obj	0.897	0.876	0.720
object	0.624	0.598	0.881
sailboat	0.493	0.676	0.977
small_obj	0.565	0.459	0.300
wind/sup-board	0.959	0.936	–
Confidence threshold	0.487	0.166	0.082
IoU threshold	0.50	0.50	0.5

Table 8. General limitations in SeaDroneSee and AFO datasets.

Model	Performance on SeaDroneSee and AFO	Impact
YOLOv7	Small/overlapping object detection; sea interference; category misclassification in partially occluded objects	Low mAP; increased false positives/negatives
YOLOv10	Small object/environmental variability; generalization	Lower recall/precision for small/faint objects; accuracy vs. speed trade-off
YOLOv11	Dense/complex scenes; object size adaptation	Missed or misclassified small/overlapping objects; resource efficiency vs. accuracy trade-off

Table 9. Comparative performance of the YOLOv7, YOLOv10, and YOLOv11 models and variants.

Model and Its Variants	Performance (mAP 0.50)
	SeaDroneSee	AFO	Other
YOLOv7	0.907	0.610
YOLOv7-sea (with a prediction head for tiny-scale objects and an attention module) [54]	0.92	0.8–0.85
YOLOv7-Ship (with improved coordinate attention), custom dataset [55]			0.805
YOLOv7-CSAW [52]			0.863
ABT-YOLOv7 (with focus on performance increase for small targets) [24]	0.916
YOLOv10	0.677	0.604
UM-YOLOv10 (with a residual attention module for feature focus), custom dataset [50]			0.928
BRA-YOLOv10 (with routing attention for interference reduction) custom dataset [56]			0.995
YOLO-HPSD (with attentional feature fusion and a local channel), SeaShip dataset [57]			0.988
YOLOv11	0.906	0.612
LY-YOLOv11 (multi-scale feature dilated neck module), HRSC2016 dataset [58]			0.976

Table 10. Computational load of YOLOv7, YOLOv10, and YOLOv11 models with input size 640 × 640 [51,64].

Models	Parameters	Model Size	Floating Point Operations	Inference Time [62,63]	This Work
YOLOv7	36.9 million	~147.6 Mbytes	~104.7 GFLOPS	~3.5 ms	4.3 ms
YOLOv10-M	15.4 million	~61.6 Mbytes	~59.1 GFLOPS	~2.8 ms	3.2 ms
YOLOv11-M	20.1 million	~80.4 Mbytes	~68 GFLOPS	~2.2 ms	2.7 ms

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Alshibli, A.; Memon, Q. Benchmarking YOLO Models for Marine Search and Rescue in Variable Weather Conditions. Automation 2025, 6, 35. https://doi.org/10.3390/automation6030035

AMA Style

Alshibli A, Memon Q. Benchmarking YOLO Models for Marine Search and Rescue in Variable Weather Conditions. Automation. 2025; 6(3):35. https://doi.org/10.3390/automation6030035

Chicago/Turabian Style

Alshibli, Aysha, and Qurban Memon. 2025. "Benchmarking YOLO Models for Marine Search and Rescue in Variable Weather Conditions" Automation 6, no. 3: 35. https://doi.org/10.3390/automation6030035

APA Style

Alshibli, A., & Memon, Q. (2025). Benchmarking YOLO Models for Marine Search and Rescue in Variable Weather Conditions. Automation, 6(3), 35. https://doi.org/10.3390/automation6030035

Article Menu

Benchmarking YOLO Models for Marine Search and Rescue in Variable Weather Conditions

Abstract

1. Introduction

2. Literature Review

3. Datasets, Evaluation Metrics, and Technical Challenges

3.1. Dataset Descriptions

3.2. Evaluation Metrics

3.3. Technical Challenges

4. Proposed Approach

5. Experimental Methodology

5.1. Dataset Preprocessing

5.2. Results

6. Analysis and Discussion

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI