Three distinct models were developed, each trained on datasets with varying compositions to assess the influence of data diversity on detection performance. The models were trained with an input resolution of 416 × 416 pixels, a batch size of 64, max_batches set to 8000, and an initial learning rate of 0.001. A detection was considered valid when the Intersection over Union (IoU) exceeded 0.5. Training and inference were performed on a workstation configured with an Intel Core i7-10700 @ 2.9 GHz CPU, 32 GB RAM, and an NVIDIA RTX 3090 GPU (24 GB), supported by CUDA 10.0 and cuDNN 7.0. Under this environment, the optimized BEACH model achieved an inference speed of approximately 40 FPS, exceeding the standard 30 FPS requirement for real-time video processing.
3.1. Model Evaluation
The model performances were assessed across multiple quantitative metrics, as summarized in
Table 4. The NOAA and Refined-NOAA models demonstrated comparable performance, exhibiting only minor differences in accuracy metrics—less than 3 percent—arising from the limited modifications made to the original NOAA dataset labels. In contrast, the BEACH model demonstrated superior performance, achieving a precision of 91%, a recall of 81%, an F1-score of 87%, and an average precision of 92%. This model consistently surpassed the other two across all evaluated metrics, positioning it as the most robust and effective choice among the trio. The enhanced performance of the BEACH model is likely attributed to its larger dataset, which offers a more extensive range of imagery for training.
As summarized in
Table 5, the BEACH model exhibited a False Negative (FN) rate of 19%, representing a clear reduction compared with the 36%, 38% FN rates observed in the NOAA-based models. Because channelized rip currents are typically characterized by spatial persistence and temporal stability, occasional missed detections in isolated frames are unlikely to compromise the identification of sustained rip current features. Within the precision–recall trade-off, the model was trained to prioritize precision, aligning with the operational needs of lifeguards and beach managers who require reliable information.
Relying solely on numerical metrics may not comprehensively capture a model’s efficacy in real-world scenarios; thus, we analyzed differences in prediction using two test images.
Figure 7a [
65] shows a straightforward case with a distinctly visible dark rip current channel against white breaking waves and brown sand. In contrast,
Figure 7b [
29] presents a more complex scenario where the rip current occupies a smaller portion of the image. Both images, extracted from an educational site and journal article, respectively, include marked rip current locations that serve as ground truth references for evaluating our model predictions. The prediction results in
Figure 7 demonstrate that all three models—NOAA, Refined-NOAA, and BEACH—successfully identified the rip currents in the simpler
Figure 7a scenario. However, for the more challenging
Figure 7b, the Refined-NOAA and BEACH models outperformed the standard NOAA model. Notably, the BEACH model also demonstrated a superior detection capability compared to the Refined-NOAA model, successfully identifying three rip currents present in
Figure 7b, while the Refined-NOAA model detected only two. This suggests that the refined labeling approach used during training enhanced the model’s ability to recognize rip currents in complex environments in which the rip current occupies a smaller proportion of the image compared to the surrounding breaking waves and background elements.
3.2. Rip Current Detection in Videos
Timelapse videos were recorded during daylight hours from 11 November to 13 November 2022, at the Taitung Jinzun Recreation Area in Taiwan. The timeframe of these recordings is depicted in
Figure 8 within the indicated red boxes. Additionally, we obtained videos from government-operated webcams, with the corresponding recording times detailed in the gray boxes in
Figure 8. These video data represent the real-world scenarios used to assess the performance of our model and the reliability of our warning system framework. The blue curve in
Figure 8 represents the normalized tidal height.
The BEACH model, identified as the most effective model in previous investigations, was employed to detect rip currents in the video data.
Figure 9 displays two examples of the model’s predicted results. Each image in
Figure 9 is a frame extracted from the results of the inference process. The bounding boxes highlight the regions where the model has identified dark channel patterns associated with rip currents. Given that the camera is positioned slightly north of the beach’s center and rotates horizontally along the north–south axis, the perspectives captured in the images of the northern (Case A, on the left side of
Figure 9) and the southern (Case B, on the right side of
Figure 9) areas of the beach differ. Since the BEACH model processed the video at a speed that exceeded the frame rate of 30 FPS, it generated numerous positive detections of rip currents within each one-minute recording interval for both cases. Due to the large volume of detections, only three representative frames from each case are shown in
Figure 9. The timestamps of these frames are sequential but not continuous. The top images in
Figure 9 are the earliest, followed by the middle and the bottom images, which represent the latest timestamp. During video recording, we also conducted simultaneous visual observations of the beach, and in both cases, rip currents were indeed present and visible to the naked eye.
Case A captures the northern section of the beach at 1:04 PM on November 11, under clear skies, with only a few clouds creating shadow patterns. In Case A, the model correctly identifies a rip current channel near the center-left area of the beach in the earliest frame. This channel exhibits typical visual characteristics of a rip current, such as a gap in the breaking waves and darker water. In the middle frame, the same rip current region remains detected, showing consistent recognition, but the model incorrectly identifies a false positive near the far right of the beach, marked by an “x”. This false positive also appears in the last frame of Case A. These false positives likely result from image artifacts or obstruction. For example, the view is partially blocked by vegetation, along with transient or breaking wave shadows that resemble the dark features of rip currents, leading the model to mistakenly classify the area as a rip current. Case B depicts the southern section at 3:47 PM on the same day, with overcast skies and reduced lighting. In Case B, despite the webcam’s timestamp resolution being limited to one-second intervals (15:47:27), the model generates frame-specific predictions within each second and maintains accurate detection throughout the sequence, though this results in detection variability among frames with identical timestamps. Four distinct channels of rip currents were detected throughout the sequence. While the locations of the detected rip currents remain consistent and accurate, there are minor variations in the dimensions of the bounding boxes across consecutive frames. These variations occur as the model responds to temporary changes in the surf zone characteristics, such as changes in foam distribution and lighting conditions. The model’s sensitivity to these visual similarities highlights its capability to capture changes in the dynamic surf zone environment while detecting the fundamental structures of rip currents.
However, for rip current warning systems, relying on real-time detection results to trigger alerts can result in an excessive number of warnings or false alarms, leading to alarm fatigue among users. A reliable alarm system should accurately represent the true patterns of rip current activity, giving users a clearer understanding of when danger is imminent. Therefore, we use a time-based averaging technique to generate a TIMEX image that acts as a reference for alarm activation.
3.3. Time-Based Image Averaging
While time-averaging techniques have been employed in various studies on rip currents, most of these studies typically extracted images at regular, predetermined time intervals and computed the average pixel values across these frames to produce a single averaged output. In contrast, in our research, we take a unique approach by selectively averaging only the frames in which the model has indicated a positive detection. Additionally, we established two criteria for maximum and minimum durations for the averaging time, which can be adjusted based on the characteristics of rip currents at different beaches. As a result, non-fixed time intervals are used in our averaging process, enabling a more dynamic and context-specific assessment.
This specialized averaging method offers two key advantages. Firstly, it effectively filters out potential false positives generated by the model. Taking Case A in
Figure 9 as an example, even though the model predicted some false positives, when applying our temporal averaging technique, these false detections become nearly imperceptible in the resultant TIMEX image (
Figure 10a). Meanwhile, the model’s prediction emerges as a prominent bounding box that correctly identifies the genuine rip current channel, as confirmed by the researcher-drawn arrow in
Figure 10a. Suppressing these false detections before they can trigger warnings ensures that users receive alerts only when rip currents are consistently present, thus maintaining user trust in the system. Secondly, our averaging method provides an effective means to evaluate the model’s real-world performance.
In
Figure 9, the second bounding box on the left side of Case B exhibits noticeable variations in position and size across frames. These fluctuations pose challenges for a rip current alarm system; however, the averaging process mitigates this issue. In the resulting TIMEX image (
Figure 10b), the second bounding box on the left side appears hazier and more diffused than the other three because it is a combination of multiple frames with shifting detections. Despite this reduced clarity, the rip current channel is still correctly identified within the averaged box, ensuring that genuine threats are not missed while preventing the system from issuing erratic or contradictory warnings. Importantly, the varying visual clarity in the TIMEX image reflects the model’s variable confidence and provides experts with crucial information for making alarm decisions. By distinguishing between sharp, well-defined detections and hazier, less stable ones, experts examining the TIMEX images can better assess which detections warrant immediate alerts and which require further observation, ensuring that warnings are issued based on robust, well-supported evidence rather than transient or uncertain predictions. Additionally, these visual characteristics provide valuable diagnostic information for identifying areas where the model performance requires refinement, enabling continuous improvement of the alarm system’s accuracy over time.
For an overall examination of the performance in the detection of rip current channels across different beach sections,
Figure 11 presents representative TIMEX images alongside a panoramic image. The rectangular regions correspond to the northern (solid line), central (dash–dot line), and southern (dotted line) beach sections captured within fixed observational fields of view. The accompanying TIMEX imagery is organized by location: northern section (
Figure 11a–c), central section (
Figure 11d,e), and southern section (
Figure 11f–i). Two special TIMEX types are highlighted: “zoomed-in” TIMEX images (
Figure 11a,g), generated when the camera automatically zoomed in and paused at the northernmost and southernmost positions, and a “panning” TIMEX image (
Figure 11f), generated during active camera rotation between viewing angles.
The deep learning model demonstrates superior performance in the southern section compared to the northern section, which can primarily be attributed to vegetation interference and the spatial characteristics of each camera view. Coastal vegetation along the northern coastline blocks part of the view and leads to false positives, as shown in
Figure 11b,c. This interference also affects the center section of the beach, resulting in multiple false detections in
Figure 11e. However, the camera in the southern section has a clear view of the surf zone, enabling more reliable detection performance. When the camera is zoomed in, not enough of the surf zone is shown in the northern section (less than 50% of the total image area in
Figure 11a), resulting in the model missing one rip current feature, marked by a circle. Under the same conditions, there is more surf zone coverage and more rip current features in the image of the southern section (
Figure 11g), giving the model enough information for accurate detection.
Additionally, the bounding boxes in
Figure 11g appear more subtle compared to those in images such as those in
Figure 11a–e because the rip current feature appears in fewer frames during the averaging period of 8 s. Under highly irregular wave conditions, variations in the spatial coordinates and dimensions of the bounding boxes between consecutive frames may occur. In such cases, these variations lead to a more subtle or diffused bounding box in the resulting TIMEX image, as illustrated in
Figure 11g. Moreover, the consecutive frame criterion, which requires at least 8 s of temporal continuity, functions as a secondary filter. This mechanism suppresses potential false positives caused by irregular lighting conditions or transient wave foam, ensuring that only temporally stable and physically meaningful rip current events are represented in the final visualization.
The model demonstrates reliable detection performance even under challenging conditions such as low-light scenes and camera motion. TIMEX
Figure 11b,d,h were generated from frames captured at dusk, when lighting conditions were poor. Despite the reduced contrast, the model still produced positive rip current predictions, indicating its ability to operate under limited illumination. In
Figure 11f, taken while the camera was actively panning from the southern to northern view, the model still successfully identified the rip current. This suggests that the model’s inference speed is sufficient to capture transient rip current features even during motion, allowing for meaningful predictions to be made despite camera movement.