Next Article in Journal
Lower Limb Activity Classification with Electromyography and Inertial Measurement Unit Sensors Using a Temporal Convolutional Neural Network on an Experimental Dataset
Next Article in Special Issue
A Multi-Domain Collaborative Framework for Practical Application of Causal Knowledge Discovery from Public Data in Elite Sports
Previous Article in Journal
A Hybrid Human-Centric Framework for Discriminating Engine-like from Human-like Chess Play: A Proof-of-Concept Study
Previous Article in Special Issue
Shear Strength Prediction for RCDBs Utilizing Data-Driven Machine Learning Approach: Enhanced CatBoost with SHAP and PDPs Analyses
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Towards Intelligent Water Safety: Robobuoy, a Deep Learning-Based Drowning Detection and Autonomous Surface Vehicle Rescue System

by
Krittakom Srijiranon
,
Nanmanat Varisthanist
,
Thanapat Tardtong
,
Chatchadaporn Pumthurean
and
Tanatorn Tanantong
*
Thammasat University Research Unit in Data Innovation and Artificial Intelligence, Department of Computer Science, Faculty of Science and Technology, Thammasat University, Pathum Thani 12121, Thailand
*
Author to whom correspondence should be addressed.
Appl. Syst. Innov. 2026, 9(1), 12; https://doi.org/10.3390/asi9010012
Submission received: 23 November 2025 / Revised: 23 December 2025 / Accepted: 24 December 2025 / Published: 28 December 2025
(This article belongs to the Special Issue Recent Developments in Data Science and Knowledge Discovery)

Abstract

Drowning remains the third leading cause of accidental injury-related deaths worldwide, disproportionately affecting low- and middle-income countries where lifeguard coverage is limited or absent. To address this critical gap, we present Robobuoy, an intelligent real-time rescue system that integrates deep learning-based object detection with an unmanned surface vehicle (USV) for autonomous intervention. The system employs a monitoring station equipped with two specialized object detection models: YOLO12m for recognizing drowning individuals and YOLOv5m for tracking the USV. These models were selected for their balance of accuracy, efficiency, and compatibility with resource-constrained edge devices. A geometric navigation algorithm calculates heading directions from visual detections and guides the USV toward the victim. Experimental evaluations on a combined open-source and custom dataset demonstrated strong performance, with YOLO12m achieving an mAP@0.5 of 0.9284 for drowning detection and YOLOv5m achieving an mAP@0.5 of 0.9848 for USV detection. Hardware validation in a controlled water pool confirmed successful target-reaching behavior in all nine trials, achieving a positioning error within 1 m, with traversal times ranging from 11 to 23 s. By combining state-of-the-art computer vision and low-cost autonomous robotics, Robobuoy offers an affordable and low-latency prototype to enhance water safety in unsupervised aquatic environments, particularly in regions where conventional lifeguard surveillance is impractical.

1. Introduction

Drowning is the third leading cause of unintentional injury-related deaths worldwide, with children aged 1–9 years being particularly vulnerable. On average, 876 people lose their lives to drowning every day, accounting for approximately 320,000 deaths annually [1]. About 90% of these fatalities occur in low- and middle-income countries, with the greatest burden concentrated in South-East Asia and the Western Pacific [2]. Key contributing factors include widespread access to open water sources, ranging from adults using small fishing boats to children living near rivers, ponds, and lakes. While lifeguards provide an essential preventive measure at pools and beaches, their absence in rural areas underscores a major limitation. Even in supervised locations, timely identification of drowning victims remains difficult [3], leaving unsupervised water bodies as persistent high-risk environments.
Over the past decade, technological advancements have been introduced to enhance water safety, including wearable devices and image-based monitoring systems [4]. Wearables can track oxygen saturation and movement patterns [5,6]. However, their effectiveness in open water environments is limited, primarily due to the absence of personnel to manage the devices and their unsuitability for such conditions. High costs also restrict their adoption in lower income countries, and many swimmers find them uncomfortable, further reducing their popularity. In response to these limitations, artificial intelligence (AI) combined with image processing and deep learning has emerged as a promising alternative. Recent studies on drowning detection using AI have reported encouraging results. For example, a study using a deep Gaussian model achieved an average precision (AP) of 97.50% and an area under the ROC curve (AUC) of 93.60% [7], while another employing Mask R-CNN obtained an accuracy of 94.10% [8]. Notably, a study that applied an object detection model for background extraction followed by a Single Shot Multi-Box Detector (SSD) for drowning detection achieved a mean Average Precision (mAP) of 97.17% [9]. These findings highlight the potential of various deep learning models in addressing the challenge of drowning detection. Beyond conventional 2D image-based methods, geometry-aware 3D vision approaches have been explored to enhance perception robustness in unstructured environments [10]. However, this research focuses exclusively on 2D vision and does not incorporate 3D sensing.
One prominent example in object detection is the You Only Look Once (YOLO) algorithm [11], which has become a leading deep learning model. Based on a Convolutional Neural Network (CNN) architecture, YOLO is recognized for its speed and efficiency. It can identify objects in an entire image with a single pass through the network, making it particularly suitable for real-time applications such as drowning detection. Several studies have examined different YOLO versions for detecting drowning incidents, with YOLOv8 implementations achieving mAP values ranging from 89.00% to 91.78% [12,13]. In particular, one study [12] incorporated additional mechanisms such as coordinate attention and Flexible Rectified Linear Unit (FReLU) activation functions to enhance performance.
Apart from AI, unmanned surface vehicles (USVs) have also been developed to address this challenge. These autonomous watercrafts have shown potential for enhancing water safety and monitoring. For instance, the VIAM-USV2000 was designed with advanced autonomous capabilities for navigating narrow riverine environments [14]. Another study proposed an affordable USV approach for automated water quality monitoring using GPS guidance and telemetry operation [15]. Additionally, researchers have developed algorithms to improve USV navigation accuracy by combining thrust models, dynamic models, and event-triggering techniques [16].
Beyond water quality and navigation improvements, researchers have also advanced the integration of USVs with aerial systems and novel control strategies. Cooperative USV–UAV frameworks have demonstrated effective marine search-and-rescue operations using reinforcement learning-based control [17], while modified neuronal genetic algorithms have been proposed to optimize USV path planning in dynamic environments [18]. Other work has focused on practical USV deployments, such as autonomous bathymetric surveys with integrated sensor systems [19]. Moreover, multifunctional UAV–USV coupling systems have been introduced to expand operational flexibility [20], and multi-robot collaborations involving heterogeneous platforms have shown strong potential for disaster risk reduction and safety applications [21].
This research introduces Robobuoy, a real-time computer vision-based rescue system that combines advanced object detection with autonomous USV navigation. The term “Robobuoy” in this research refers to an academic research prototype developed for experimental validation and is not affiliated with any existing commercial systems using a similar name. The system incorporates three major components: (1) a dual-model framework employing YOLO12m for drowning detection and YOLOv5m for USV tracking, optimized for accuracy, low latency, and deployment on edge devices; (2) a localization algorithm that calculates geometric heading directions based on bottom-center coordinates of detected objects; and (3) a USV equipped with visible markers and a radio-controlled interface, enabling precise autonomous navigation guided by continuous visual input. The objectives of this work are twofold: (1) to accurately detect and localize drowning victims using deep learning and visual geometry and autonomously navigate a USV for rescue, and (2) to provide an affordable, and efficient intervention prototype for unguarded or underserved aquatic environments, especially in low- and middle-income regions.
The main contributions of this study are as follows:
  • We introduce Robobuoy, a novel real-time drowning detection and rescue system that integrates dual deep learning models with autonomous USV navigation.
  • We design an efficient framework combining YOLO12m for drowning detection and YOLOv5m for USV tracking, supported by a lightweight geometric localization algorithm for precise and low-cost navigation.
  • We conduct comprehensive validation using both datasets and hardware experiments, demonstrating state-of-the-art detection accuracy and reliable autonomous rescue performance.
This study also builds upon a project that was previously recognized at IEEE RoboComp 2024 [22], a regional competition for IEEE Student Members in Region 10 (Asia-Pacific) that showcases advancements in robotic technology. The system was later selected to represent Thailand as the official national candidate in the international round. Such acknowledgment not only reflects the quality of the research but also highlights its potential contributions to both the robotics community and the broader pursuit of solutions for global water safety.

2. System Architecture

The proposed system, Robobuoy, is designed as an integrated framework for real-time drowning detection and autonomous rescue in unsupervised aquatic environments. The architecture is organized into two main components: (1) an USV, which executes the physical intervention by navigating toward the victim, and (2) a monitoring station, which performs perception, detection, and decision-making tasks. These two modules operate in close coordination, with the monitoring station continuously analyzing video streams, detecting drowning individuals and the USV, and transmitting real-time navigation commands to guide the vehicle toward the victim, as shown in Figure 1. This design ensures that accurate visual detection is directly translated into precise rescue actions while minimizing onboard computational demands on the USV.

2.1. Unmanned Surface Vehicle

The USV plays a critical role in the proposed system, utilizing a commercially available radio-controlled (RC) platform, the FT009 RC boat, as shown in Figure 2. This model was selected due to its low cost, wide availability, compact size, and high maneuverability, which make it well suited for rapid prototyping and controlled rescue-navigation experiments. Its RC-based control architecture also enables straightforward integration with the proposed vision-guided monitoring station without requiring extensive hardware modification.
According to the specifications provided by the manufacturer, the FT009 has physical dimensions of approximately 463 mm × 124 mm × 118 mm (length × width × height) and is powered by a 7.4 V, 1500 mAh rechargeable battery. The platform supports a maximum speed of approximately 30 km/h and an operational control range of up to 150 m using a standard 2.4 GHz radio-control system. As a small RC speedboat, the FT009 is intended for calm-water operation. In this research, the USV was validated exclusively under still or near-still water conditions, with negligible wind and no measurable current, corresponding to an indoor pool environment. Manufacturer documentation does not specify certified tolerances for maximum wave height, wind speed, or current, so no such limits are claimed. Performance under rough water, strong wind, or flowing current lies outside the validated scope of this prototype.
The core structure and functionality of the USV remain unchanged. For detection purposes, bright magenta balls are attached to the top of the USV. This simple modification enables the monitoring station to reliably detect the USV using the proposed software, without altering the original design or performance of the boat. These markers are employed solely to facilitate visual detection in this proof-of-concept research, and no claim is made regarding the tracking of unmodified rescue vessels in complex or visually cluttered environments. The ultimate purpose of the USV is to deliver rescue support. However, the focus of this prototype is on integrating the USV into the system and enabling autonomous navigation toward a designated destination. Within the integrated system, commands are transmitted from the monitoring station to the USV, enabling it to autonomously navigate to the victim location. The USV passively executes the instructions received from the control unit of the monitoring station.

2.2. Monitoring Station

The monitoring station serves as the computational and control hub of Robobuoy, as shown in Figure 3. The perception and decision-making pipeline are executed at a shore-based station to reduce onboard payload, power draw, and integration complexity on the small RC platform. The station integrates sensing, decision-making, and communication functionalities through the following components:
  • Radio Transmitter: Sends navigation commands to the USV, enabling autonomous operation by transmitting real-time instructions from the computation unit.
  • Raspberry Pi 5: Serves as the primary processing unit and was selected for this prototype due to its wide availability and sufficient capability for running object detection models in a controlled environment. It is connected to a webcam for real-time visual input. It processes video data using computer vision algorithms to detect drowning victims. The detection model is executed using the default CPU-based software configuration of the Raspberry Pi 5, without GPU support. Once a victim is identified, the Raspberry Pi locates the USV using the bright magenta balls mounted on top, calculates the required navigation path, and sends this data to the Arduino via a wired serial link.
  • Arduino Nano 3.0 Microcontroller: Receives control signals from the Raspberry Pi and relays commands to the RC controller of the USV. Using the Servo.h library, the Arduino generates pulse-width modulation (PWM) signals to drive the servos mechanically coupled to the RC boat controls, adjusting the steering and throttle according to the computed navigation path. This process is repeated continuously until the USV reaches the designated point. As a fail-safe mechanism, if no new command is received for more than 3 s, the Arduino neutralizes the control outputs and stops the USV.
  • Camera: Visual sensing is provided by a Logitech C615 webcam. It captures video at 1920 × 1080 (Full HD) resolution and provides the primary visual input for drowning detection and USV localization.
The software component consists of two object detection models and an autonomous navigation system. The detection models include a drowning detection model and a USV detection model, drowning detection model trained using the latest YOLO12m. YOLO12 was selected for this task due to its advanced design optimized for both real-time performance and high accuracy, which are essential in time-sensitive rescue scenarios. Its Area Attention Mechanism efficiently captures broad spatial context by dividing feature maps into segments, enabling improved detection of small and partially visible objects such as a drowning victim or a distant USV, while reducing computational cost. The Residual Efficient Layer Aggregation Networks (R-ELAN) enhance gradient flow and model optimization, allowing for deeper yet faster networks suitable for edge deployments such as embedded systems within USVs. Furthermore, the optimized attention architecture of YOLO12, including FlashAttention and lightweight positional encoding via 7 × 7 separable convolutions, improves detection precision without slowing inference speed. These features make the model particularly effective for tracking moving targets in complex environments like open water. Finally, the flexibility of YOLO12 in deployment across various devices ensures compatibility with low-power hardware while maintaining robust performance, making it an ideal choice for reliable, real-time object detection in autonomous rescue operations.
YOLO12 produces output bounding boxes with coordinates (x, y, w, h), an objectness score, and class probabilities. These predictions are evaluated using Intersection over Union (IoU):
I o U =   Area   of   Overlap Area   of   Union .
A prediction is considered correct when the IoU with the ground truth box exceeds a threshold, commonly set to 0.5 in mAP@0.5 calculations. The mean Average Precision (mAP) for all classes, a standard metric for evaluating detection performance, is computed as
m A P @ 0.5 = 1 N i = 1 N A P i ,
where A P i is the average precision for class i , and N is the number of classes.
The autonomous navigation system is activated once both models successfully detect the drowning person and the USV in the frame. The system calculates the bottom center point of each detection box, denoted as ( x t ,   y t ) for the USV and ( x d ,   y d ) for the drowning person. In this research, the monitoring-station camera was mounted in a fixed position throughout all experiments, and no explicit camera calibration was performed. Consequently, the navigation strategy operates purely in the 2D image plane, computing the heading direction from pixel coordinates of the detected USV and target. This lightweight formulation was intentionally adopted to reduce system complexity and enable real-time control in the confined pool environment. Because full 3D scene geometry is not modeled, angular error may increase with camera tilt or mounting height, particularly as the USV moves farther from the camera. Although camera calibration and 3D modeling could improve physical accuracy, they were not incorporated in this research in order to maintain system simplicity, real-time performance, and affordability for resource-limited deployments.
To mitigate this effect, experiments were conducted with trajectories primarily constrained to the central region of the image. The heading angle θ required to steer the USV toward the victim is computed using the atan2 function with two arguments, y d   y t and x d   x t , where x d , y d and x t , y t denote the bottom-center pixel coordinates of the detected target and USV bounding boxes, respectively. The image-plane coordinate system follows standard computer vision conventions, with the origin at the top-left corner of the image, the x-axis increasing to the right, and the y-axis increasing downward. The angle θ is measured relative to the positive x-axis with counterclockwise rotation taken as positive. The orientation of the USV is estimated from consecutive frames by computing the displacement vector between its bottom-center positions at time t and t 1 . Based on the resulting heading error, the system adjusts the motor speeds to align the USV and guide it toward the victim efficiently. This approach ensures fast and accurate path planning while requiring minimal onboard computational resources.

3. Experiments

This study presents a system designed to assist in the rescue of drowning individuals by integrating a monitoring station with an USV. The system operates through real-time object detection, employing two dedicated models: one for identifying drowning individuals and another for tracking the position of the USV. The experiments are structured to evaluate system performance across both software and hardware domains. The hardware experiment focuses on validating communication reliability and closed-loop navigation accuracy between the monitoring station and the USV under controlled and repeatable target conditions, while the software experiment evaluates the detection models to ensure high accuracy in identifying drowning individuals and the USV. These experiments provide a comprehensive assessment of both the physical reliability and the perception accuracy of Robobuoy, ensuring its suitability for real-world deployment.

3.1. Object Detection Experiment

The software experiment focuses on developing and testing detection models for the two core objects of the system: drowning individuals and the USV. This component aims to improve detection accuracy for these targets, ensuring precise identification under varied conditions. The effectiveness of these models is critical for reliable system performance in real-world scenarios.
To evaluate the performance of the proposed drowning detection and USV navigation system, the experiment was conducted on a high-performance workstation equipped with an NVIDIA RTX 4000 Ada Generation GPU (16 GB VRAM) and 32 GB of system RAM. This configuration provided sufficient computational resources for training high-resolution object detection models and efficiently handling multiple datasets.

3.1.1. Dataset Construction

For the drowning detection model, the training dataset was constructed by combining two open-source datasets sourced from Roboflow. Dataset 1 contains images collected from multiple sources [23], while Dataset 2 consists of images captured in an indoor swimming pool environment [24]. Datasets 1 and 2 were merged into Dataset 3 by combining the corresponding training, validation, and test splits from each dataset in order to preserve the original data distribution. After data augmentation, Dataset 3 comprises a total of 5916 images. All experiments were conducted using the default training configuration provided by the Roboflow training pipeline. No custom seed was specified, and the training followed the standard Roboflow evaluation setting. No explicit domain adaptation or domain normalization was applied when constructing Dataset 3. Potential domain shift effects between the data sources therefore remain a limitation of this research.
During annotation, two classes were defined: drowning and swimming. Drowning was annotated for individuals exhibiting visible distress or loss of motor control in water, including vertical body posture, minimal or ineffective limb movement, head tilted backward or intermittently submerged, and lack of forward progression. In contrast, swimming was annotated for individuals demonstrating controlled and intentional movement, such as horizontal body posture, coordinated arm and leg motions, and stable head positioning above water. Frames in which the state of the subject could not be confidently determined were excluded from the dataset to reduce label ambiguity.
The labeling process followed a manual annotation protocol in which bounding boxes were drawn around the primary subject in each image and assigned to one of the two classes based on the defined visual criteria. Annotations were performed consistently across datasets, and class definitions were fixed prior to labeling to ensure uniformity.
To improve generalization across varying viewpoints, environments, and lighting conditions, a standardized data augmentation pipeline was applied. Augmentations included horizontal and vertical flipping, rotations of 30 degrees and 90 degrees, and brightness adjustments in the range of −15% to +15% relative to the original image intensity. Augmentations were applied only to the training set, resulting in the final dataset size of 5916 images.
The dataset was split into 80% for training, 10% for validation, and 10% for testing. Representative samples from each class and scenario are illustrated in Figure 4, including different scenarios: a drowning person in a pool without people (a), with people present (b), a fully submerged victim (c), with people in a wide view (d), without people in a wide view (e), and without people in a wide view from a different camera angle (f). As shown in Figure 5a–c for Dataset 1, Dataset 2, and Dataset 3, respectively, the datasets exhibit significant class imbalance, with swimming instances substantially outnumbering drowning instances across all splits, which remains a significant challenge.
For the USV detection model, a total of 2112 images were manually collected, capturing the USV in different positions and lighting conditions, as shown in Figure 6. To improve data diversity, each training image was augmented to produce three variations using horizontal flipping and color and exposure transformations, including hue adjustment (±20°), saturation shift (±20%), brightness change (±15%), exposure variation (±15%), and Gaussian blur up to 0.3 pixels. The dataset was divided into 88 percent for training, 8 percent for validation, and 4 percent for testing. Representative samples are shown in Figure 6, including the USV at a far distance in the pool (a), at a close distance in the pool (b), moving at a close distance in the pool (c), moving away at a far distance in the pond (d), moving horizontally at a far distance in the pond (e), and moving closer at a far distance in the pond (f).

3.1.2. Model Selection

To identify the most suitable model architecture for drowning detection, a comprehensive evaluation was conducted using multiple versions of the YOLO object detection models, including YOLOv5 [25], YOLOv8 [26], YOLOv9 [27], YOLOv10 [28], YOLO11 [29], YOLO12 [30]. In addition, SSD300 with a VGG16 backbone [31] was evaluated, while MobileNetV2 for USV detection model. Models were evaluated based on detection accuracy, generalization capability across aquatic environments, robustness to variations in lighting, viewpoint, and background complexity, and computational efficiency suitable for edge deployment. Given the safety-critical nature of drowning detection, priority was placed on models that maintained stable precision and recall when trained and tested on diverse datasets, thereby minimizing false negatives in unseen scenarios and high mAP@0.5 with 0.001 confidence threshold (default YOLO threshold) for navigation purpose. For USV detection, where object appearance is consistent but environmental conditions vary, emphasis was placed on achieving high detection accuracy with minimal inference latency. In both tasks, inference speed, model complexity, and training stability were considered alongside standard performance metrics to ensure reliable operation under the resource and timing constraints of this prototype rescue systems.
All models were trained for 200 epochs with no early stop, with an input resolution of 640 × 640 pixel, using an auto batch size of 90% of GPU memory, default optimizer including Stochastic Gradient Descent (SGD) for YOLOv5, YOLOv8, YOLOv9, YOLOv10, and VGG16 + SSD300 and Adam with Weight Decay (AdamW) for YOLO11 and YOLO12, using cosine annealing for learning rate schedule with initial learning rate at 0.01 and final learning rate at 0.0001, and evaluation was based mAP@0.5, precision, and recall on the testing dataset. Computational efficiency was also assessed in terms of inference time per image. However, the experiments were conducted using a single random seed, and therefore, the stability and variance of the reported metrics are not claimed.
The individual evaluations showed that Dataset 2 yielded higher detection performance across all YOLO versions. Notably, YOLOv8m achieved the highest mAP@0.5 of 0.9729, followed closely by YOLO12m at 0.9718 and YOLOv9m at 0.9689, while VGG16 + SSD300 achieved 0.7490, as summarized in Table 1. All three models also demonstrated strong precision and recall values above 0.93, indicating high data quality and clear separation between classes.
In comparison, Dataset 1 resulted in slightly lower performance. The best performing model in this case was YOLOv5m, which achieved an mAP@0.5 of 0.8779. More recent models such as YOLOv8m through YOLO12m scored between 0.8588 and 0.8737. While precision across all models remained high, exceeding 0.86, recall scores varied more substantially, ranging from 0.7786 to 0.8206. This variation may be attributed to class imbalance or greater intra-class diversity.
When both datasets were combined, also shown in Table 1, the results demonstrated strong generalization. YOLO12m achieved the highest overall mAP@0.5 of 0.9284, followed by YOLOv9m, YOLOv8m, and YOLOv5m. Precision remained above 0.89 for all models, while recall ranged from 0.8435 to 0.8602. Inference time analysis indicated that newer models, such as YOLOv9m through YOLO12m, required slightly longer processing times per image (93.06 to 134.72 milliseconds) compared to earlier versions, reflecting a trade-off between speed and accuracy. While YOLOv8m achieved slightly higher accuracy on Dataset 2, YOLO12m was selected as the core detection model for the system because it provided the most consistent balance across multiple evaluation criteria. Additional comparisons with a non-YOLO architecture further reinforced the advantage of YOLO12m. VGG16 + SSD300 achieved a decent score of 0.4401, 0.8112, and 0.7490 for precision, recall, and mAP@0.5, respectively, on Dataset 2, which is a non-diverse dataset, but when trained on a more diverse dataset its performance dropped substantially, achieving only 0.0238, 0.7070, and 0.5046 for precision, recall, and mAP@0.5, respectively. Furthermore, to competitive detection accuracy, YOLO12m demonstrated strong generalization when tested on the combined dataset, maintained high precision and recall across diverse scenarios, and offered stable inference speed suitable for real-time rescue operations. Its architectural improvements, such as advanced attention mechanisms and optimized gradient flow, further enhance robustness when deployed on resource-constrained edge devices.
Among the evaluated YOLO12 model variants, as shown in Table 2, YOLO12m was selected as the final model for deployment due to its optimal balance between detection performance and computational efficiency. Smaller variants such as YOLO12n and YOLO12s demonstrated lower inference times of 69.90 and 77.07 milliseconds per image, respectively, but did not surpass YOLO12m in detection accuracy. YOLO12m achieved the highest mAP@0.5 of 0.9284, with a precision of 0.9249 and a recall of 0.8602. Larger variants, including YOLO12l and YOLO12x, provided no substantial improvement in mAP@0.5 while incurring significantly higher inference costs of 199.42 and 321.18 milliseconds per image, respectively. Therefore, YOLO12m represents a practical solution for near real-time USV detection, particularly in resource-constrained or embedded environments where both accuracy and speed are critical.
For USV detection, YOLOv5m achieved the highest overall accuracy with a precision of 0.9768, a recall of 0.9765, and a mAP@0.5 of 0.9848. In contrast, YOLOv9m and the 92-layer variant performed better in multi-IoU evaluation indicating stronger generalization, as shown in Table 3. Although newer versions such as YOLOv10 through YOLO12 produced competitive results, they did not surpass YOLOv5m or YOLOv9m in aggregate metrics. YOLOv9m had the longest inference time, whereas YOLOv10m achieved a favorable balance between speed and accuracy, highlighting its potential for near real-time applications.
Despite several YOLOv5 variants demonstrating strong performance, YOLOv5m was chosen as the preferred model due to its superior detection accuracy and efficient runtime. It achieved the highest mAP@0.5 of 0.9848, with both precision and recall scores of 0.9765. Although YOLOv5s provided slightly faster inference, it did not match YOLOv5m in overall accuracy, as reported in Table 4. Larger models such as YOLOv5l and YOLOv5x incurred significantly higher inference costs of 11.34 and 32.66 milliseconds per image, respectively, without improving mAP@0.5. YOLOv5m therefore represents an effective compromise, delivering state-of-the-art detection performance while maintaining low computational overhead, making it well suited for deployment in real-time USV detection systems.
YOLOv5 was selected as the most suitable model for USV detection in this task based on its strong performance across both accuracy and efficiency metrics, particularly when applied to a dataset characterized by environmental diversity but consistent object characteristics. Since the dataset features the same object, a boat with a magenta ball shown in Figure 1, captured under varying lighting conditions, water surface patterns, and camera angles, the ability of the model to generalize across these variations while maintaining high precision is critical. YOLOv5 has a mature and well-optimized architecture with proven robustness in small to moderate dataset scenarios, which aligns well with the limited sample size used in this study. Unlike newer models such as YOLOv9 through YOLO12, which incorporate advanced attention mechanisms and deeper layers, YOLOv5 maintains a balance between speed, stability, and accuracy without overfitting to scene-specific visual features. Its faster inference time and lighter computational load also make it suitable for edge deployment on devices with limited resources, such as a Raspberry Pi or onboard processors in USVs. These real-world constraints, combined with consistent object appearance across diverse backgrounds, make YOLOv5 a practical and dependable choice for this application.

3.2. Controlled Navigation Experiment

The controlled navigation experiment was designed to assess whether the USV could autonomously navigate toward predefined targets with sufficient precision when guided exclusively by the monitoring station. The primary objective of this experiment was to evaluate reactive heading control and closed-loop guidance behavior based on visual target direction, rather than to achieve metric localization or global position accuracy. This experiment was conducted in a real environment with real-time computation for USV detection and navigation system. However, due to ethical and safety considerations, stationary buoys were used as repeatable victim surrogates. No hardware trials were conducted with human subjects, and victim motion dynamics (e.g., drift, intermittent submergence, or irregular movement) were not physically simulated. Accordingly, this experiment focuses on control performance under stable and repeatable target conditions, rather than end-to-end operation under realistic swimmer motion.

3.2.1. Experimental Setup

The experiment was conducted in a still water pool with dimension of 6.0 × 4.0 m. Three target locations were designated and marked with buoys positioned to the left, center, and right relative to the initial orientation of the USV. The straight-line distance between the starting point of the USV and each target buoy was fixed at 4.0 m.
A central monitoring station equipped with a vision-based detection system (camera) was placed behind the starting position of the USV and oriented opposite the direction of the target buoy, as shown in Figure 7. The monitoring station served as the primary controller, detecting the position of the USV using model that is trained earlier and issuing navigation commands based on its spatial relationship to the target.
Although the USV supports a nominal communication range of up to 150 m, the effective operational range of Robobuoy in this research was constrained by the camera-based visual field and experimental geometry rather than the radio-control link. The monitoring station was positioned to provide full visual coverage of the pool area, ensuring reliable detection of both the USV and target markers. Consequently, the maximum validated monitoring distance corresponds to the camera-to-target distance achievable within the pool environment under still-water conditions and stable lighting. System performance beyond this range, as well as in larger-scale or outdoor environments, was not evaluated and is therefore outside the validated scope of this work.

3.2.2. Procedure

For each of the three target positions, namely left, center, and right, the USV was deployed in three separate trials, resulting in a total of nine experimental runs. At the beginning of each run, the USV was positioned at the starting point facing forward, while the monitoring station initiated the vision-based detection process to localize the vehicle. Once the initial position was confirmed, the system computed the heading vector toward the assigned buoy and transmitted control commands in real time, guiding the USV along its trajectory. The navigation objective was defined as approaching within 1.0 m of the target buoy, evaluated visually by verifying that the USV reached approximately one boat width (≈0.12 m) from the buoy or made physical contact with it. Throughout the trials, the process was executed fully autonomously without human interference, relying solely on visual feedback and onboard computation. This setup was designed as a prototype experimental framework, balancing the constraints of available resources with the need for reliable and repeatable testing conditions.

3.2.3. Results and Performance Analysis

All nine trials were completed successfully. The USV reached within 1 m of the target buoy in every case, demonstrating reliable navigation performance across all positions:
  • Left target: 3/3 successful trials;
  • Center target: 3/3 successful trials;
  • Right target: 3/3 successful trials.
In addition to spatial accuracy, the travel time from launch to target proximity was recorded, as shown in Figure 8. Completion times ranged from 11 to 23 s across the trials. The variance was attributed primarily to differences in the initial detection latency by the monitoring station. In some cases, the USV required additional time to be recognized by the camera system, which delayed the initiation of movement commands. While all nine trials were successful, the limited number of runs resulted in a wide confidence interval. The reported success rate therefore reflects consistent performance under controlled conditions rather than a statistically conclusive reliability measure.

4. Discussion

This study demonstrates a novel integration of real-time object detection with autonomous surface navigation for responsive drowning intervention in unsupervised aquatic environments. By combining state-of-the-art deep learning with embedded robotic platforms, the proposed Robobuoy system addresses a longstanding gap in water safety research: the lack of scalable, automated, and cost-efficient alternatives to human-dependent rescue operations. Findings from both hardware and software evaluations highlight not only the technical feasibility of the system but also its adaptability for deployment in resource-constrained or geographically remote settings where lifeguard coverage is limited or absent.
The comparative benchmarking of YOLO architectures provided important insights into model suitability for distinct tasks. For drowning detection, YOLO12m was selected due to its consistent balance between detection accuracy, generalization capacity, and efficiency on edge devices. Its architectural features, such as the Area Attention Mechanism and R-ELAN, contributed to improved recognition of partially visible individuals, an essential requirement in dynamic water environments. By contrast, YOLOv5m outperformed newer models for USV detection, reflecting the characteristics of the dataset: a visually consistent object with minimal intra-class variance. In this context, the additional complexity of advanced attention mechanisms offered little benefit, whereas the efficiency of YOLOv5m and lower inference time proved advantageous for real-time control on constrained hardware, i.e., the Raspberry Pi 5 used in this study. Together, the dual-model framework illustrates how optimal system design requires a task-specific alignment of algorithmic capabilities with operational constraints.
With controlled environment experiment further strengthened confidence in the proposed approach. Across all trials, the USV reliably reached within 1 m of its target using only camera-based localization and lightweight geometric navigation. Although end-to-end latency on the Raspberry Pi 5 was not explicitly measured, the 3 s fail-safe threshold was implemented as a safety constraint, and its activation did not occur during the experiments. These findings confirm that a low-cost visual feedback loop can achieve sufficient accuracy for short-range rescue tasks, reducing the need for expensive onboard sensors. Such results are particularly significant for enabling deployment in low- and middle-income regions, where budgetary constraints often limit access to advanced robotics.
Despite these promising results, several limitations warrant attention. First, the drowning detection dataset is highly imbalanced, with more swimming examples than drowning cases. This imbalance could bias the model’s learning and lead to high mAP scores that do not truly reflect real-world conditions. Second, the reliance on clear line-of-sight image detection raises concerns regarding robustness under environmental challenges such as glare, water turbulence, or low-light conditions. Although the dataset incorporated variations in brightness and perspective, further evaluation in adverse environments is necessary. Third, the navigation strategy assumed obstacle-free water, but in natural conditions, debris, vegetation, or other vessels could disrupt performance. Future enhancements may integrate complementary sensing modalities, such as LiDAR or sonar, with SLAM-based algorithms to improve navigation safety. Finally, all experiments were conducted in a controlled pool environment. Scaling to larger and more complex water bodies will be essential to assess robustness under environmental influences such as wind, current, or turbidity.
In summary, this study delivers both technical and societal contributions by presenting an intelligent, low-cost drowning intervention system that leverages computer vision and robotics. The complementary use of YOLO12m and YOLOv5m demonstrates that a hybrid detection strategy can simultaneously achieve robustness and real-time efficiency. By validating performance through both hardware and software experiments, this work advances the development of autonomous rescue systems and provides a foundation for future interdisciplinary efforts at the intersection of artificial intelligence, robotics, and public health.

5. Conclusions

This study presented Robobuoy, a real-time drowning intervention system that integrates deep learning-based object detection with autonomous USV navigation. Designed for deployment in unsupervised and resource-limited aquatic environments, the system addresses the urgent global challenge of drowning through a cost-effective technological prototype. The architecture combines a monitoring station, equipped with dual detection models and camera-based localization, with a USV enhanced by visual markers and a radio-controlled interface.
Extensive evaluations confirmed the feasibility and reliability of the proposed approach. The YOLO12m model was identified as the most effective for drowning detection, providing high accuracy, strong generalization, and efficient edge-device performance across diverse datasets. For USV tracking, YOLOv5m achieved superior stability and efficiency, making it the optimal choice for integration with resource-constrained hardware. Hardware experiments further validated the ability of the system to achieve accurate and consistent autonomous navigation, with the USV reliably reaching designated targets in all trials.
By leveraging advances in computer vision and embedded robotics, Robobuoy delivers a robust, low-latency solution for drowning detection and intervention. Its practicality in contexts where conventional lifeguard coverage is unavailable highlights its potential societal value, particularly for low- and middle-income regions disproportionately affected by drowning. Future work will extend this research by mitigating class imbalance in drowning detection datasets through improved data collection, resampling strategies, or cost-sensitive learning, as well as addressing environmental challenges such as glare, turbulence, and low-light conditions, incorporating obstacle avoidance strategies, and validating performance through large-scale field trials in natural water bodies. Overall, this research demonstrates how AI-driven autonomous rescue systems can be applied to one of the most pressing global public health concerns, establishing a foundation for future interdisciplinary efforts in intelligent water safety systems that can ultimately contribute to saving lives.

Author Contributions

Conceptualization, K.S., N.V., T.T. (Thanapat Tardtong) and T.T. (Tanatorn Tanantong); methodology, K.S. and N.V.; software and hardware, N.V., T.T. (Thanapat Tardtong) and C.P.; validation, K.S. and T.T. (Tanatorn Tanantong); formal analysis, K.S. and N.V.; investigation, K.S.; resources, K.S. and T.T. (Tanatorn Tanantong); data curation, K.S., N.V. and T.T. (Thanapat Tardtong); writing—original draft preparation, K.S., N.V. and T.T. (Thanapat Tardtong); project administration, T.T. (Tanatorn Tanantong); writing—review and editing, K.S. and T.T. (Tanatorn Tanantong); visualization, K.S.; supervision, T.T. (Tanatorn Tanantong). All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Faculty of Science and Technology, Thammasat University (grant number SciGR 25/2567).

Institutional Review Board Statement

The study was approved by the Human Research Ethics Committee of Thammasat University (Science), Thailand (COA No. 140/2567, Project No. 67SC147, approved on 24 December 2024).

Data Availability Statement

The raw data supporting the findings of this study are available online and from the corresponding author upon request. Further inquiries should be directed to the corresponding author.

Acknowledgments

During the preparation of this study, the authors used ChatGPT 5 to check the grammar. The authors have reviewed and edited the output and take full responsibility for the content of this publication.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Stop Drowning Now. Facts & Stats About Drowning. Available online: https://www.stopdrowningnow.org/drowning-statistics/ (accessed on 14 April 2024).
  2. World Health Organization. Drowning; World Health Organization: Geneva, Switzerland, 2024; Available online: https://www.who.int/news-room/fact-sheets/detail/drowning (accessed on 14 April 2024).
  3. Dworkin, G.M. 3-Year-Old Child Drowns in Guarded Pool in Front of Lifeguard in Elevated Stand. Lifesaving.com. Available online: https://lifesaving.com/case-studies/3-year-old-child-drowns-in-guarded-pool-in-front-of-lifeguard-in-elevated-stand/ (accessed on 25 April 2024).
  4. Jalalifar, S.; Karami, S.; Salimi, S.; Riahi, A.; Sedaghat, M.; Wang, H. Enhancing Water Safety: Exploring Recent Technological Approaches for Drowning Detection. Sensors 2024, 24, 331. [Google Scholar] [CrossRef] [PubMed]
  5. Pratap, K.; Marjorie, R. Anti Drowning System with Remote Alert Using Zigbee. Int. J. Pharm. Technol. 2016, 8, 20523–20527. [Google Scholar]
  6. Kulkarni, A.; Lakhani, K.; Lokhande, S. A Sensor-Based Low-Cost Drowning Detection System for Human Life Safety. In Proceedings of the 2016 5th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO), Noida, India, 7–9 September 2016; pp. 301–306. [Google Scholar] [CrossRef]
  7. Liu, T.; He, X.; He, L.; Yuan, F. A Video Drowning Detection Device Based on Underwater Computer Vision. IET Image Process. 2023, 17, 1905–1918. [Google Scholar] [CrossRef]
  8. Hayat, M.A.; Yang, G.; Iqbal, A. Mask R-CNN Based Real-Time Near-Drowning Person Detection System in Swimming Pools. In Proceedings of the 2022 Mohammad Ali Jinnah University International Conference on Computing (MAJICC), Karachi, Pakistan, 27–28 October 2022; pp. 1–6. [Google Scholar] [CrossRef]
  9. He, Q.; Zhang, H.; Mei, Z.; Xu, X. High Accuracy Intelligent Real-Time Framework for Detecting Infant Drowning Based on Deep Learning. Expert Syst. Appl. 2023, 228, 120204. [Google Scholar] [CrossRef]
  10. Wang, H.; Zhang, G.; Cao, H.; Hu, K.; Wang, Q.; Deng, Y.; Gao, J.; Tang, Y. Geometry-Aware 3D Point Cloud Learning for Precise Cutting-Point Detection in Unstructured Field Environments. J. Field Robot. 2025, 42, 3063–3076. [Google Scholar] [CrossRef]
  11. Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
  12. He, T.; Ye, X.; Wang, M. An Improved Swimming Pool Drowning Detection Method Based on YOLOv8. In Proceedings of the 2023 IEEE 7th Information Technology and Mechatronics Engineering Conference (ITOEC), Chongqing, China, 15–17 September 2023; Volume 7, pp. 835–839. [Google Scholar] [CrossRef]
  13. Rusakov, K.D.; Gladkikh, T.Y.; Grafenkov, A.V.; Mostakov, N.A.; Goloburdin, N.V.; Migachev, A.N. Drowning Detection Algorithm in Coastal Zones. In Proceedings of the 2023 7th International Conference on Information, Control, and Communication Technologies (ICCT), Astrakhan, Russia, 2–6 October 2023; pp. 1–5. [Google Scholar] [CrossRef]
  14. Tran, N.-H.; Pham, Q.-H.; Lee, J.-H.; Choi, H.-S. VIAM-USV2000: An Unmanned Surface Vessel with Novel Autonomous Capabilities in Confined Riverine Environments. Machines 2021, 9, 133. [Google Scholar] [CrossRef]
  15. Tran, H.D.; Nguyen, N.T.; Tran Cao, T.N.; Gia, L.X.; Ho, K.; Nguyen, D.D.; Pham, B.T.; Truong, V.N. Unmanned Surface Vehicle for Automatic Water Quality Monitoring. E3S Web Conf. 2024, 496, 03005. [Google Scholar] [CrossRef]
  16. Yan, X.; Yang, X.; Feng, B.; Liu, W.; Ye, H.; Zhu, Z.; Shen, H.; Xiang, Z. A Navigation Accuracy Compensation Algorithm for Low-Cost Unmanned Surface Vehicles Based on Models and Event Triggers. Control Eng. Pract. 2024, 146, 105896. [Google Scholar] [CrossRef]
  17. Wang, Y.; Liu, W.; Liu, J.; Sun, C. Cooperative USV–UAV Marine Search and Rescue with Visual Navigation and Reinforcement Learning-Based Control. ISA Trans. 2023, 137, 222–235. [Google Scholar] [CrossRef] [PubMed]
  18. Hamid, N.; Dharmawan, W.; Nambo, H. Dynamic Path Planning for Unmanned Surface Vehicles with a Modified Neuronal Genetic Algorithm. Appl. Syst. Innov. 2023, 6, 109. [Google Scholar] [CrossRef]
  19. Sotelo-Torres, F.; Alvarez, L.V.; Roberts, R.C. An Unmanned Surface Vehicle (USV): Development of an Autonomous Boat with a Sensor Integration System for Bathymetric Surveys. Sensors 2023, 23, 4420. [Google Scholar] [CrossRef] [PubMed]
  20. Chen, M.; Zhang, X.; Xiong, X.; Zeng, F.; Zhuang, W. Transformer: A Multifunctional Fast Unmanned Aerial Vehicles–Unmanned Surface Vehicles Coupling System. Machines 2021, 9, 146. [Google Scholar] [CrossRef]
  21. Pillai, B.M.; Suthakorn, J.; Sivaraman, D.; Nakdhamabhorn, S.; Nillahoot, N.; Ongwattanakul, S.; Magid, E. A Heterogeneous Robots Collaboration for Safety, Security, and Rescue Robotics: E-ASIA Joint Research Program for Disaster Risk and Reduction Management. Adv. Robot. 2024, 38, 129–151. [Google Scholar] [CrossRef]
  22. IEEE Region 10. IEEE RoboComp 2024. Available online: https://robocomp.ieeer10.org/ (accessed on 14 December 2025).
  23. Lifeguarding Project. Lifeguarding w/YOLOv10 Dataset. Roboflow Universe, Roboflow. Available online: https://universe.roboflow.com/lifeguarding-project/lifeguarding-w-yolov10 (accessed on 7 September 2025).
  24. Project-Rrsvg. Swimming Pool Safety Management Dataset. Roboflow Universe, Roboflow. Available online: https://universe.roboflow.com/project-rrsvg/swimming-pool-safety-management (accessed on 7 September 2025).
  25. Jocher, G. YOLOv5; GitHub Repository: San Francisco, CA, USA, 2020. Available online: https://github.com/ultralytics/yolov5 (accessed on 14 December 2025).
  26. Jocher, G. YOLOv8; Ultralytics: Frederick, MD, USA, 2023. Available online: https://docs.ultralytics.com/models/yolov8/ (accessed on 14 December 2025).
  27. Wang, C.-Y.; Yeh, I.-H.; Liao, H. YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information. arXiv 2024, arXiv:2402.13616. [Google Scholar] [CrossRef]
  28. Wang, A.; Chen, H.; Liu, L.; Chen, K.; Lin, Z.; Han, J.; Ding, G. YOLOv10: Real-Time End-to-End Object Detection. arXiv 2024, arXiv:2405.14458. [Google Scholar]
  29. Ultralytics. YOLOv11; Ultralytics Documentation: Frederick, MD, USA, 2024. Available online: https://docs.ultralytics.com/models/yolo11/ (accessed on 14 December 2025).
  30. Ultralytics. YOLOv12; Ultralytics Documentation: Frederick, MD, USA, 2024. Available online: https://docs.ultralytics.com/models/yolo12/ (accessed on 14 December 2025).
  31. Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.-Y.; Berg, A.C. SSD: Single Shot MultiBox Detector. In Proceedings of the Computer Vision—ECCV 2016, Amsterdam, The Netherlands, 11–14 October 2016; Springer: Cham, Switzerland, 2016; Volume 9905. [Google Scholar] [CrossRef]
Figure 1. System architecture overview.
Figure 1. System architecture overview.
Asi 09 00012 g001
Figure 2. Commercially available radio-controlled USV with magenta ball for marking.
Figure 2. Commercially available radio-controlled USV with magenta ball for marking.
Asi 09 00012 g002
Figure 3. Monitoring station prototype integrating visual sensing, control interface, and communication.
Figure 3. Monitoring station prototype integrating visual sensing, control interface, and communication.
Asi 09 00012 g003
Figure 4. Sample of drowning images: (a) alone in pool, (b) with people present, (c) fully submerged, (d) wide view with people, (e) wide view alone, (f) wide view from different angle.
Figure 4. Sample of drowning images: (a) alone in pool, (b) with people present, (c) fully submerged, (d) wide view with people, (e) wide view alone, (f) wide view from different angle.
Asi 09 00012 g004
Figure 5. Distribution of drowning and swimming instances across training, validation, and test splits for (a) Dataset 1, (b) Dataset 2, and (c) Dataset 3.
Figure 5. Distribution of drowning and swimming instances across training, validation, and test splits for (a) Dataset 1, (b) Dataset 2, and (c) Dataset 3.
Asi 09 00012 g005
Figure 6. Sample USV images: (a) far in pool, (b) close in pool, (c) moving close in pool, (d) moving away in pond, (e) moving horizontally in pond, (f) moving closer in pond.
Figure 6. Sample USV images: (a) far in pool, (b) close in pool, (c) moving close in pool, (d) moving away in pond, (e) moving horizontally in pond, (f) moving closer in pond.
Asi 09 00012 g006
Figure 7. Experimental environment setup.
Figure 7. Experimental environment setup.
Asi 09 00012 g007
Figure 8. USV navigation trials which color indicate different run and dot indicate position of the USV every second: (a) left target, (b) center target, (c) right target.
Figure 8. USV navigation trials which color indicate different run and dot indicate position of the USV every second: (a) left target, (b) center target, (c) right target.
Asi 09 00012 g008
Table 1. Comparative evaluation of YOLO versions across datasets.
Table 1. Comparative evaluation of YOLO versions across datasets.
DatasetModelPerformance Metrics
PrecisionRecallmAP@0.5Time *
1YOLOv5m0.89440.80580.877988.12
YOLOv8m0.86840.80940.873093.74
YOLOv9m0.86720.79900.8718116.16
YOLOv10m0.89210.77860.858888.30
YOLO11m0.88290.82060.873795.18
YOLO12m0.88890.80640.8698124.90
VGG16 + SSD3000.01820.52060.18798.14
2YOLOv5m0.90940.94030.966484.16
YOLOv8m0.93160.95680.972998.97
YOLOv9m0.93870.94340.9689127.22
YOLOv10m0.93210.93400.964394.62
YOLO11m0.91120.94440.9662106.07
YOLO12m0.94010.94310.9718136.76
VGG16 + SSD3000.44010.81120.74907.47
3YOLOv5m0.91710.85120.911783.72
YOLOv8m0.89030.84990.913893.28
YOLOv9m0.90810.85890.9155121.20
YOLOv10m0.90110.84930.909993.06
YOLO11m0.91540.84350.907699.31
YOLO12m0.92490.86020.9284134.72
VGG16 + SSD3000.02380.70700.50467.88
* Inference time in millisecond per image.
Table 2. Performance of YOLO12 model variants on drowning detection.
Table 2. Performance of YOLO12 model variants on drowning detection.
ModelPerformance Metrics
PrecisionRecallmAP@0.5Time *
YOLO12n0.92450.83620.918269.90
YOLO12s0.89680.87770.918477.07
YOLO12m0.92490.86020.9284134.72
YOLO12l0.90540.88190.9204199.42
YOLO12x0.89270.86700.9191321.18
* Inference time in millisecond per image.
Table 3. Performance of USV tracking models.
Table 3. Performance of USV tracking models.
ModelPerformance Metrics
PrecisionRecallmAP@0.5Time *
YOLOv5m0.97780.97750.98589.67
YOLOv8m0.97150.97200.977610.24
YOLOv9m0.97180.96450.979512.43
YOLOv10m0.97670.93860.97179.44
YOLO11m0.97660.94340.97709.92
YOLO12m0.97230.97340.976212.14
VGG16 + SSD3000.90780.67820.656613.15
MobileNetV20.02520.13790.025222.82
* Inference time in millisecond per image.
Table 4. Performance of YOLOv5 model variants on USV detection.
Table 4. Performance of YOLOv5 model variants on USV detection.
ModelPerformance Metrics
PrecisionRecallmAP@0.5Time *
YOLOv5n0.97740.86580.971310.07
YOLOv5s0.98320.95940.97258.04
YOLOv5m0.97680.97650.98489.55
YOLOv5l0.97090.96590.973411.34
YOLOv5x0.97080.96630.970532.66
* Inference time in millisecond per image.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Srijiranon, K.; Varisthanist, N.; Tardtong, T.; Pumthurean, C.; Tanantong, T. Towards Intelligent Water Safety: Robobuoy, a Deep Learning-Based Drowning Detection and Autonomous Surface Vehicle Rescue System. Appl. Syst. Innov. 2026, 9, 12. https://doi.org/10.3390/asi9010012

AMA Style

Srijiranon K, Varisthanist N, Tardtong T, Pumthurean C, Tanantong T. Towards Intelligent Water Safety: Robobuoy, a Deep Learning-Based Drowning Detection and Autonomous Surface Vehicle Rescue System. Applied System Innovation. 2026; 9(1):12. https://doi.org/10.3390/asi9010012

Chicago/Turabian Style

Srijiranon, Krittakom, Nanmanat Varisthanist, Thanapat Tardtong, Chatchadaporn Pumthurean, and Tanatorn Tanantong. 2026. "Towards Intelligent Water Safety: Robobuoy, a Deep Learning-Based Drowning Detection and Autonomous Surface Vehicle Rescue System" Applied System Innovation 9, no. 1: 12. https://doi.org/10.3390/asi9010012

APA Style

Srijiranon, K., Varisthanist, N., Tardtong, T., Pumthurean, C., & Tanantong, T. (2026). Towards Intelligent Water Safety: Robobuoy, a Deep Learning-Based Drowning Detection and Autonomous Surface Vehicle Rescue System. Applied System Innovation, 9(1), 12. https://doi.org/10.3390/asi9010012

Article Metrics

Back to TopTop