eCBAM and saSIoU Co-Optimized YOLOv11 for Riverine Floating Garbage Classification Under Complex Aquatic Scenarios

Zhao, Ziyu; Huang, Wenquan; Li, Teng; Zhu, Jing

doi:10.3390/app16020651

Open AccessArticle

eCBAM and saSIoU Co-Optimized YOLOv11 for Riverine Floating Garbage Classification Under Complex Aquatic Scenarios

¹

College of Water Conservancy and Civil Engineering, Inner Mongolia Agricultural University, Hohhot 010018, China

²

School of Artificial Intelligence, Anhui University, Hefei 230601, China

³

College of Art and Design, Nanning University, Nanning 530200, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2026, 16(2), 651; https://doi.org/10.3390/app16020651

Submission received: 7 December 2025 / Revised: 28 December 2025 / Accepted: 1 January 2026 / Published: 8 January 2026

Download

Browse Figures

Versions Notes

Abstract

A method for classifying floating garbage in rivers using a modified YOLOv11 algorithm is proposed to solve the problem of poor recognition of river floating objects using the conventional object detection algorithms. This approach first integrates a stronger CBAM that applies multi-scale channel attention to extract the features of floating objects of different sizes, as well as boundary-enhanced spatial attention to highlight target edge features. Second, an enhanced scenario-adapted SIoU Loss Function (saSIoU) is presented, which contains an angle-sensitive increase for large targets, shape-adaptive coefficients for irregular floating objects, and dynamic boundary blur tolerance for complex aquatic environments. Experimental validation on a self-collected dataset of river floating objects-containing six categories and 12,000 images, shows that the improved model has an mAP@0.5 of 86.48%, an mAP@0.95 of 56.44%, a precision of 80.43%, and a recall of 84.36%. Compared with the original YOLOv11, the improved model has an increase of 2.65 percentage points in mAP@0.5, and an increase of 4.27 percentage points in mAP@0.95, while remaining lightweight (2.60 M parameters, 6.44 giga floating-point operations (GFLOPs)). The proposal method has relatively better detection accuracy and real-time performance in terms of detection accuracy and real-time performance, which can provide a relatively reliable technical approach to achieve intelligent cleaning of river float garbage and water environment management.

Keywords:

river floating garbage; object detection; YOLOv11; attention mechanism; loss function optimization; environmental monitoring; computer vision

1. Introduction

1.1. Research Background and Significance

Urbanization and industrialization are progressing rapidly, and municipal and industrial wastes are flushed into water bodies, causing river pollution to worsen. Floating debris such as plastic bits, thrown-away furniture, and withered foliage pose considerable dangers to underwater environments [1]. According to the UNESCO World Water Development Report 2024, more than 10 million tons of plastic waste flow into the world’s waters every year, with 30% ending up in rivers. It causes the blockage of hydraulic facilities, death of aquatic animals, and a decline in water quality [2]. The traditional methods to deal with riverine floating debris are manual patrols and salvage, which are inefficient, time-consuming, and costly, and are not safe in bad weather [3]. Thus, there is an increasing demand for automatic, real-time, and highly accurate technologies that can detect floating objects.

Deep learning-driven object detection has come to be seen as an innovative approach for environmental oversight, providing special abilities for handling intricate matters such as dealing with aquatic pollution tracking and protecting nature. YOLO, which is an acronym for You Only Look Once and is used as a term to describe something that you look at quickly and still see something correctly, is the YOLO series used to detect objects on the surface of water. For example, YOLOv7 -tiny has been used for the detection of floating debris in lake environments [4], whereas YOLOv8 has been used for the recognition of marine debris [5]. However YOLOv11, the latest light version, faces three major limitations in a river: (1) water surface reflections and waves cause blurry target edges, leading to the misjudgment of the classification of debris versus the background water ripples [6]; (2) floating stuff has extremely irregular morphologies and sizes such as crumpled plastic bags and branching logs that challenge traditional feature extraction techniques [7]; (3) larger floating items tend to rotate in the currents, which create large offsets from predicted bounding boxes compared with ground truth annotations [8]. These scenario-specific challenges need to be addressed so that object detection technologies can be practically applied in river environmental management.

1.2. Related Work

1.2.1. Object Detection in Aquatic Environmental Monitoring

The application of object detection to water surface target recognition has gained momentum in recent years. Li et al. [9] proposed a YOLOv7-based system with a feature pyramid network (FPN) enhancement to detect water surface garbage, improving the small target recall by 12% but failing to address boundary blur caused by reflection. Zhao [10] integrated a triplet attention mechanism into YOLOv8, achieving 85.7% mAP@0.5 on a river floating object dataset; however, their model showed limited robustness to large target rotation. Beyond the YOLO family, Yi et al. [11] adopted a Faster R-CNN with a deformable convolution module for floating debris detection, but its 15 frames per second inference (FPS) speed made it unsuitable for real-time monitoring.

The marine and lake environments have also been examined in relevant studies. For example, Li et al. [12] designed a lightweight detector based on MobileNetV3 for offshore floating object detection, achieving 78.9% mAP@0.5 with 30 FPS, but this model was not optimized for the high reflection and turbulence common in rivers. Zou et al. [13] used U-Net++ to segment floating algae in a lake, and segmentation-based methods could not provide the category information needed for garbage sorting. These studies point out that object detection does have some progress in the water area but not enough special optimizations to deal with specific challenges of rivers (strong reflection, irregular targets).

1.2.2. Attention Mechanism for Target Detection

Based on this research, it appears that object detection technologies have come a long way in water, but there are no special adjustments for rivers that have problems, such as shiny surfaces and not all shapes. The attention mechanism, which can dynamically assign weight values to key information, has become an important way to improve the quality of features. The Convolutional Block Attention Module (CBAM) [14] is an attention module that combines both channel and spatial attention to effectively suppress background noise. However, the original CBAM uses only one global average pooling layer, which limits the feature extraction on different scales of objects [15]. To tackle this issue, Wang et al. [16] proposed a multi-scale CBAM (MS-CBAM) that employs 1 × 1, 3 × 3, and 5 × 5 pooling kernels, which results in an 8.5% improvement in small-object detection accuracy on the Pascal VOC dataset. In addition, Liu et al. [17] proposed EE-CBAM, which adds Canny edge detection to spatial attention, and the boundaries in medical image segmentation are more prominent.

In water bodies, attention mechanisms are used to alleviate reflection. Zhao et al. [18], proposed reflection-aware attention (RAM) to YOLOv6, which uses a brightness threshold to mask reflective areas to preserve feature integrity. Their model attained an mAP of 79.6% on a dataset of garbage on water surfaces, whereas RAM used handcrafted thresholds and had a limited adaptability to different lighting conditions. However, existing attention-based methods have not sufficiently solved the size distribution and shape non-uniformity problems of floating objects on rivers, which restricts their practical application.

1.2.3. Loss Function Optimization for Bounding Box Regression

The loss function serves as a significant guide for model training and accuracy of bounding box regression. Traditional IoU loss [19] focuses only on the overlap of the predicted and ground truth bounding boxes, without considering other spatial relationships such as the distance between the centers and the shape of the object. To mitigate this, Rezatofighi et al. [20] proposed a Generalized IoU (GIoU), which adds to the area of the smallest enclosing rectangle and penalizes non-overlapping boxes. Distance-IoU (DIoU) [21] and Complete IoU (CIoU) [22] integrate center distance and aspect ratio constraints, respectively.

The introduction of SIoU [23], a recent development in the field, includes an angle cost to measure the angle difference between bounding boxes, which results in a 5–10% improvement in regression accuracy over CIoU for rotated targets. Although it is of general purpose, the SIoU lacks specializations for aquatic targets. For example, Li et al. [24] pointed out that Siou performs poorly on irregular water surfaces owing to its rigid shape-penalty mechanism, which does not accommodate different aspect ratios. To resolve these limitations, Zhang et al. [25] introduced an Adaptive SIoU (A-SIoU) with dynamically changing shape coefficients to perform marine debris detection; however, A-SIoU cannot solve problems such as boundary blurring and rotation of large targets which are important for rivers. These deficiencies emphasize the need for a custom loss function to successfully spot floating objects in rivers.

1.3. Research Content and Innovations

This study aims to optimize the YOLOv11 for the classification of riverine floating garbage, with three main research objectives stated below: To develop an enhanced CBAM (eCBAM) attention mechanism that can improve multi-scale feature extraction and boundary recognition to address the challenges posed by target size and boundary blurring in riverine environments; To create a scenario-adapted SIoU (saSIoU) loss function which will enhance bounding box regression for rotated large targets and morphologically irregular floating objects; To validate the improvement of the model via extensive experiments such as ablation studies, comparative experiments and robustness experiments on a self-collected dataset of riverine floating objects.

The main innovations of this study can be summarized as follows. First, eCBAM incorporates multi-scale weighted channel attention that is guided by the empirical size distribution of riverine floating garbage (72.93% large, 18.62% medium, and 8.45% small). Specifically, it uses 1 × 1, 2 × 2, and 4 × 4 adaptive pooling layers with learnable scale weights initialized to [0.7, 0.2, 0.1], thereby ensuring balanced representation across different object sizes instead of overemphasizing large targets. Compared with the original CBAM, which relies on a single global pooling operation, this design markedly strengthens the feature extraction capability for small and medium objects that occupy fewer pixels. In addition, eCBAM employs a residual fusion strategy of the form x + α(attention − x), which enhances discriminative features while preserving the original feature distribution, thereby stabilizing training and mitigating information loss caused by overly aggressive reweighting.

Second, eCBAM integrates Sobel edge detection into the spatial attention mechanism to compute the magnitudes of horizontal and vertical gradients, thereby enhancing boundary feature representation and reducing boundary blur caused by water-surface reflections. Specifically, a dedicated edge branch extracts gradient-based boundary cues, which are then fused with traditional spatial statistics via a 1 × 1 convolution. This dual-branch spatial attention enables the network to better distinguish true object contours from wave patterns and specular highlights, thereby addressing a key failure mode of conventional CBAM in reflective aquatic environments. Furthermore, a shape-adaptive weighting mechanism is introduced to dynamically adjust the relative contributions of channel and spatial attention according to the characteristics of the input features. This mechanism allows the model to emphasize channel information for texture-rich, branching debris and spatial information for smooth, planar floating objects.

Finally, we introduce a scenario-adapted saSIoU loss tailored to riverine floating-object detection. This loss applies an angle-sensitive scaling factor of 1.4 for large targets (normalized area > 0.1) to reduce rotation-induced regression errors that commonly occur when bulky objects, such as logs or foam boards, drift and rotate with the current. It further employs a shape-adaptive coefficient of 0.9 when the aspect ratio difference between predicted and ground-truth boxes exceeds 0.3, thereby relaxing overly strict penalties on highly irregular or elongated shapes and improving the model’s adaptability to diverse floating debris. In addition, a dynamic boundary-tolerance factor of 1.05 is activated when IoU > 0.6 to avoid overpenalizing bounding boxes for targets with blurred boundaries in high-reflection regions. Together with adjusted shape weights for large targets, this design yields a loss function that jointly accounts for overlap, distance, rotation, shape irregularity, and boundary uncertainty.

In summary, the proposed eCBAM and saSIoU modules are specifically designed to address scale imbalance, boundary blur, irregular shapes, and rotation characteristics in riverine floating garbage. When integrated into YOLOv11, these modules achieve consistent gains in mAP, precision, and recall with negligible increases in parameters and FLOPs, demonstrating that the proposed innovations provide a more robust and scene-adaptive solution for real-time river-garbage detection.

2. Related Theoretical Foundations

2.1. YOLOv11 Model Architecture

In 2024, Ultralytics released YOLOv11, which enhances previous YOLO versions by redesigning the backbone, neck, and head architectures [26]. The backbone consists of a C3K2 module (a modified CSPNet block with two convolutional layers) and an SPPF (spatial pyramid pooling fast) layer to extract multi-scale features. The neck employs a path aggregation network (PANet) to fuse high-level semantic features with low-level spatial features through upsampling and concatenation operations. The head adopts a decoupled design that separates the classification branch from the regression branch, thereby improving detection accuracy. Overall, YOLOv11 features a compact architecture with a C3K2 backbone, a C2PSA neck, and an anchor-free decoupled head, providing stronger feature representation and global context while remaining lightweight (2.59 M parameters, 6.44 GFLOPs). It achieves a superior accuracy-efficiency trade-off compared with earlier versions (e.g., YOLOv7-tiny and YOLOv8-n). This performance, confirmed by Ultralytics’ benchmarks and our preliminary tests on the river-garbage dataset, makes YOLOv11 a more suitable foundation for real-time river monitoring on embedded hardware. Moreover, building on a state-of-the-art and widely used baseline (YOLOv11) increases the potential impact and reusability of our method for future work that is likely to adopt newer YOLO versions.

Figure 1 shows the integration of the two important optimizations in YOLOv11. First, C2PSA replaces ordinary convolutional layers in the neck and adds long-range feature dependency via a parallel self-attention mechanism. Second, the anchor free detection head can eliminate fixed anchors and adjust freely to targets of different sizes [26]. 2.59 million parameters and 6.44GFLOPs. To achieve both effective deployment and high-precision detection, it is established as a reasonable basic model for detecting floating objects in rivers.

2.2. CBAM Attention Mechanism

CBAM, introduced by Woo et al. [14], is a lightweight attention mechanism that sequentially applies channel and spatial attention to refine the features. Channel Attention Module (CAM): Global average and maximum pooling are performed on the feature maps, and then two fully connected layers are applied to produce channel weights that highlight the informative channels. The spatial attention module (SAM) concatenates the average and maximum pooling results along the channel dimension, and then applies a 7 × 7 convolution to generate spatial weights, focusing on target regions.

Mathematically, CAM is defined as:

M_{c} (F) = σ (F C_{2} (R e L U (F C_{1} (C o n c a t (A v g P o o l (F), M a x P o o l (F))))))

(1)

where

F \in R^{C \times H \times W}

is the input feature map,

σ

is the sigmoid function, and

F C_{1} / F C_{2}

are the fully connected layers for dimension reduction/expansion. SAM is defined as:

M_{s} (F) = σ ({C o n v}_{7 \times 7} (C o n c a t (A v g P o o l (F), M a x P o o l (F))))

(2)

In the point cloud, process the final attention-enhanced feature map is

F^{'} = F ⊙ M_{c} (F) ⊙ M_{s} (F)

. While effective in general scenarios, CBAM’s single-scale pooling and lack of boundary awareness limit its performance on river floating objects. Here, ‘

⊙

’ denotes the element-wise (Hadamard) multiplication.

2.3. SIoU Loss Function

The SIoU process, proposed by Zha et al. [23] extends the traditional IoU by incorporating three additional costs: distance cost (Δ), shape cost (Ω), and angle cost (Θ). The total SIoU loss is:

S I o U = I o U - \frac{Δ + Ω + Θ}{2}

(3)

where:

(Δ): Normalized Euclidean distance between the centers of predicted and ground-truth boxes, penalizing large positional deviations;

(Ω): Aspect ratio (AR) difference between boxes, computed as

(\max ({A R}_{p} / {A R}_{g}, {A R}_{g} / {A R}_{p}) - 1)

(

A R

= width/height);

(Θ): Angle difference between boxes, calculated using

(\sin (α))

, where

(α)

is the angle between the box diagonals.

SIoU addresses the limitations of IoU-based losses by modeling spatial relationships more comprehensively. However, its fixed coefficients for the angle and shape costs are not adapted to the irregularity and rotation of river floating-objects, necessitating scenario-specific modifications.

3. Design of Improved YOLOv11 Model

3.1. Enhanced CBAM (eCBAM) Attention Mechanism

To enhance the representation of floating objects under complex water-surface conditions, we design a floating-object attention module that eCBAM attention with multi-scale channel optimization and boundary-enhanced spatial modeling (Figure 2). Given an input feature map

X \in R^{B \times C \times H \times W}

, the module comprises two coordinated branches—Floating Object Channel Attention and Floating Object Spatial Attention—followed by a shape-adaptive fusion mechanism and progressive feature enhancement.

3.1.1. Floating Object Channel Attention (Multi-Scale Channel Attention)

The original CBAM uses a single global average pooling layer that fails to capture the features of small and medium-sized floating objects. Based on the statistical analysis of the dataset (Table 1), eCBAM employs three adaptive average pooling layers with kernel sizes of 1 × 1, 2 × 2, and 4 × 4 to extract the features of large, medium, and small targets, respectively.

Learnable scale weights

ω = [ω_{1}, ω_{2}, ω_{3}]

are introduced to dynamically fuse multi-scale features. The weights are initialized to

[0.7, 0.2, 0.1]

based on the size distribution, ensuring that large targets (the majority) receive sufficient attention, while small targets are not neglected. The fusion process is:

F_{a v g}^{k} = A d a p t i v e A v g P o o l 2 d (F, (k, k)), k = 1, 2, 4

(4)

X_{h} (c, h) = \frac{1}{W} \sum_{w = 1}^{W} X (c, h, w), X_{w} (c, w) = \frac{1}{H} \sum_{h = 1}^{H} X (c, h, w)

(5)

F_{c a m} = σ (F C_{2} (R e L U (F C_{1} (F_{c a t})))) ⊙ F

(6)

where

F C_{1}

reduces the channel dimension to

C / 4

, and

F C_{2}

restores it to

C

, balancing the computational efficiency and feature expressiveness. Here,

C

denotes the number of channels in the input feature map. In Equations (6) and (10), the operator ‘

⊙

’ denotes the element-wise (Hadamard) multiplication between tensors, used to apply the learned attention weights to the feature maps.

3.1.2. Floating Object Spatial Attention (Boundary-Enhanced Spatial Attention)

This Water surface reflection causes the target boundaries to blur, making it difficult for traditional SAM to distinguish targets from the background. eCBAM adds a boundary detection branch based on the Sobel operator that can extract gradient features and boost the model’s edge representation ability. The Sobel operator computes horizontal

(F_{e d g e_{x}})

and vertical

(F_{e d g e_{y}})

edges:

F_{e d g e_{x}} = S o b e l (F, d x = 1, k s i z e = 3), F_{e d g e_{y}} = S o b e l (F, d y = 1, k s i z e = 3)

(7)

The edge magnitude, which quantifies boundary strength, is calculated as:

F_{e d g e} = \sqrt{F_{e d g e_{x}}^{2} + F_{e d g e_{y}}^{2}}

(8)

Traditional spatial statistical features

(F_{s t a t})

were concatenated with

F_{e d g e}

, and a 1 × 1 convolution was applied to fuse them into a 1-channel feature map. The spatial attention weight map was then generated via sigmoid activation as follows:

F_{s t a t} = C o n c a t (A v g P o o l (F), M a x P o o l (F))

(9)

F_{s a m} = σ ({C o n v}_{1 \times 1} (C o n c a t (F_{s t a t}, F_{e d g e}))) ⊙ F

(10)

This design enhances the model’s ability to recognize blurred boundaries caused by reflections or waves.

3.1.3. Shape-Adaptive Weight Fusion

Different floating objects (linear branches, irregular plastic bags) have different priority attentions: channel attention for texture-rich objects and spatial attention for smooth objects. eCBAM introduces a learnable shape-adaptive weight α to dynamically adjust the contributions of the CAM and SAM:

F_{f i n a l} = F + α \cdot F_{c a m} + (1 - α) \cdot F_{s a m}

(11)

where

α \in [0, 1]

is optimized during training. For example, α increases for texture-rich objects (e.g., branched trees) to emphasize channel features, whereas α decreases for smooth objects (e.g., foam boards) to prioritize spatial boundary features. This adaptive fusion ensures an optimal feature refinement for diverse floating object shapes.

3.2. Scenario-Adapted saSIoU Loss Function

The saSIoU loss function was designed to address three key issues in river floating object regression: large target rotation, irregular shape, and boundary blur. Its structure is shown in Figure 3.

3.2.1. Angle Sensitivity Enhancement for Large Targets

Large floating objects (e.g., discarded sofas and large foam boards) are highly susceptible to rotation under water flow, leading to significant deviations between the predicted and ground-truth boxes. saSIoU first identifies large targets using an area threshold

(w \times h > 0.1

, normalized to the image size) and then applies an angle sensitivity enhancement coefficient of 1.4 to amplify the angle cost penalty. Angle cost is computed as follows:

Θ = \{\begin{array}{l} 1.4 \times (1 - \cos (θ_{p} - θ_{g})) & i f w \times h > 0.1 \\ 1 - \cos (θ_{p} - θ_{g}) & o t h e r w i s e \end{array}

(12)

where

θ_{p}

and

θ_{g}

are the rotation angles of the predicted and ground truth boxes, respectively. This enhancement ensures that the model prioritizes the correction of the rotation-induced deviations for large targets.

3.2.2. Shape-Adaptive Mechanism for Irregular Objects

River floating objects exhibit extreme shape irregularity, with aspect ratios (AR) ranging from 0.2 (e.g., square foam blocks) to 8.0 (e.g., long plastic strips). The original SIoU uses a fixed-shape cost that over-penalizes irregular objects. saSIoU introduces a shape-adaptive coefficient based on the aspect ratio difference between the predicted and ground-truth boxes as follows:

a s p e c t_d i f f = | {A R}_{p} - {A R}_{g} |

(13)

Ω = \{\begin{array}{l} 0.9 \times (\max (\frac{{A R}_{p}}{{A R}_{g}}, \frac{{A R}_{g}}{{A R}_{p}}) - 1) & i f a s p e c t_d i f f > 0.3 \\ \max (\frac{{A R}_{p}}{{A R}_{g}}, \frac{{A R}_{g}}{{A R}_{p}}) - 1 & o t h e r w i s e \end{array}

(14)

When the aspect ratio difference is more than 0.3 (highly irregular), the coefficient is set to 0.9 to prevent over-constraint and improve adaptability for irregular floating objects.

3.2.3. Dynamic Boundary Blur Tolerance

Because of the water surface reflection and wave interference the target boundaries are blurred, and even though the overall shape is correct, there is a small deviation in the predicted boxes relative to the ground-truth boxes. The original SIoU penalizes them equally which causes the loss to increase unneccessarily. saSIoU adds dynamic boundary tolerance. If IoU > 0.6 (high overall overlap), a tolerance coefficient of 1.05 is used to reduce the penalty for small boundary differences. The final saSIoU is computed as follows:

s a S I o U = \{\begin{array}{l} I o U - 0.5 \times (Δ + Ω) \times 1.05 & i f I o U > 0.6 \\ I o U - 0.5 \times (Δ + Ω) & o t h e r w i s e \end{array}

(15)

where Δ is the normalized center distance cost, computed as:

Δ = \sqrt{{(x_{p}^{n o r m} - x_{g}^{n o r m})}^{2} + {(y_{p}^{n o r m} - y_{g}^{n o r m})}^{2}}

(16)

This dynamic tolerance can ensure that the model focuses on the overall shape and position, rather than being disturbed by minor boundary noise caused by the environment.

3.3. Overall Architecture of the Improved YOLOv11 Model

The enhanced YOLOv11 model adds the eCBAM mechanism and saSIoU loss to the base model, as illustrated in Figure 4. Key Modifications: eCBAM integration-eCBAM is integrated after C3K2 in the neck and inserted between the fusion nodes of P3-P4 and P4-P5. This step increases the multi-scale feature extraction and boundary recognition before fusing the features, saSIoU adaption: replace the base line loss function with saSIoU for better bounding box regression for riverine floating objects. Preserving multi-scale detection branches: the P3 (8 × 8), P4 (16 × 16), P5 (32 × 32) branches are preserved for the detection of small, medium, and large floating objects, respectively. Thus, everything is covered by all the sizes.

4. Experiments and Results Analysis

4.1. Experimental Dataset and Environment

4.1.1. Dataset Construction and Augmentation

A self-constructed riverine floating object dataset was built for simulating a real- world monitoring scenario. Images were collected from 12 rivers in Jiangsu, Zhejiang, and Shanghai (China) from March 2023 to February 2024. It is also divided by the water environment, 40% calm water environment, 30% for turbulent flow environment, 30% is a high reflection, and by time, 25% of early morning, 35% afternoon, 20% in the dark, 20% for afternoons; and by climate condition, it is also very rich in various weather environments, as well as clear weather of 45% cloudy weather, 30% rainy weather, foggy weather, 15% and 10%; it also has a lot of target class, as well as plastic pollution at 35%, 25% dead branch pollution, 15% deserted furniture, foam board at 10% paper waste, and miscellaneous waste of 5%. There are 12,000 images, all uniformly rescaled to 640 × 640 and there are 38,452 manually annotated objects. It was divided into training (8000 images), validation (1600 images), and testing (2400 images) subsets according to the ratio of 8:1.6:2.4. To improve the model’s ability to generalize, different data augmentation methods are used for the training set, such as spatial transformation, including horizontal/vertical random flipping with a 50% probability, random cropping with a scale range between 0.8 and 1.2, and random rotation with a degree range from 0 to 30; color space transformation, including brightness adjustment by ±20%, contrast adjustment by ±15%, and saturation adjustment by ±10%; noise addition, including Gaussian blur with a kernel size of 3 × 3 and a 20% probability, and motion blur, which simulates the jitter of a camera on unmanned surveillance ships.

4.1.2. Experimental Environment and Parameter Settings

Experiments were conducted on a workstation equipped with an Intel^® Xeon^® Gold 5418Y CPU (10 cores), an NVIDIA RTX 4090 GPU with 24 GB VRAM, and 120 GB of RAM. The software environment included Ubuntu 20.04 LTS, Python 3.10, PyTorch 2.2.2, CUDA 11.8, OpenCV 4.8.0, and Ultralytics 8.1.0. Detailed hardware and software configurations are presented in Table 2. Training was performed using the following hyperparameters: stochastic gradient descent (SGD) optimizer (initial learning rate 0.01, momentum 0.937, weight decay 0.0005), linear learning rate decay with a 3-epoch warm-up, batch size of 64 (mixed precision), 100 epochs total, and early stopping after 20 consecutive epochs without validation loss improvement. Loss weights were assigned as follows: 7.5 for bounding box regression, 0.5 for classification, and 1.5 for distribution focal loss.

4.2. Evaluation Metrics

Five evaluation metrics related to accuracy, robustness and efficiency were chosen; mean average precision (mAP), including mAP@0.5 (IoU = 0.5), and mAP@0.95 (IoU = 0.95) representing accuracy at different overlap thresholds, precision (P), which is the ratio of true positives (TP) to the sum of TP and false positives (FP), indicating the model’s false detection avoidance ability, calculated by

P = \frac{T P}{T P + F P}

(17)

Recall (R): Ratio of TP to the sum of TP and false negatives (FN), measuring the model’s ability to avoid missed detections, calculated as:

R = \frac{T P}{T P + F N}

(18)

Inference speed: FPS on the test set to evaluate real-time performance.

Parameter quantity and GFLOPs: Measuring model lightweight properties for embedded deployment.

4.3. Ablation Experiments

Ablation experiments are conducted on the validation set to verify the effectiveness of each improved module. The baseline model was the original YOLOv11, and three additional variants were tested: (1) YOLOv11 + eCBAM, (2) YOLOv11 + saSIoU, (3) YOLOv11 + eCBAM + saSIoU (proposed model). The results are presented in Table 3.

4.3.1. Effect of eCBAM

Compared with the baseline, YOLOv11 + eCBAM achieved 1.69 and 2.52 percentage-point increases in the mAP@0.5 and mAP@0.95, respectively, a 2.96 percentage-point recall increase (from improved multi-scale feature extraction, especially for small targets), and a 2.23 percentage-point precision drop (due to enhanced boundary feature sensitivity causing minor reflective region false detections).

These results confirm that eCBAM effectively enhances the feature representation for river-floating objects, particularly when capturing small targets and blurred boundaries.

4.3.2. Effect of saSIoU

Compared with the baseline, YOLOv11 + saSIoU showed a 1.37 and 2.66 percentage-point increase in the mAP@0.5 and mAP@0.95, respectively, a 4.29 percentage-point precision rise (from optimized bounding box regression reducing rotation-induced false positives), and a 0.24 percentage-point recall drop (due to dynamic boundary tolerance lessening blurred target over-constraining), demonstrating that saSIoU boosts regression accuracy without compromising efficiency to address large target rotation and irregular shapes.

4.3.3. Synergistic Effect of eCBAM and saSIoU

The proposed model (eCBAM + saSIoU) achieved the optimal overall performance: mAP@0.5 (86.48%) and mAP@0.95 (56.44%) were 2.65 and 4.27 percentage-points higher than the baseline, precision (80.43%) and recall (84.36%) were balanced (precision recovered from eCBAM’s slight decrease via saSIoU-reduced false positives), and inference speed (107 FPS) met real-time monitoring needs (>30 FPS).

This synergistic effect confirms that eCBAM (feature extraction) and saSIoU (loss optimization) complement each other, addressing the core challenges of river floating object detection.

4.4. Comparative Experiments

The proposed model was compared with six mainstream lightweight object detection models on the test set; YOLOv5 [27], YOLOv6 [28], YOLOv8n [29], YOLOv9t [30], YOLOv10n [31], and the original YOLOv11 [26]. The results are presented in Table 4.

4.4.1. Accuracy Comparison

The proposed model outperformed all competitors in the mAP@0.5 and mAP@0.95:

3.54 percentage-points higher than YOLOv8n (82.94% vs. 86.48% mAP@0.5), 7.87 percentage-points higher than YOLOv10n (50.70% vs. 56.44% mAP@0.95).

The largest improvement was observed in irregular and rotated targets: for branched tree trunks (aspect ratio > 5), mAP@0.5 increased by 5.2% compared with YOLOv11. For rotated foam boards (angle > 30°), mAP@0.5 increased by 4.8%.

4.4.2. Precision and Recall Comparison

The proposed model balanced precision and recall better than other models:

Precision (80.43%) was 0.81 percentage-points higher than YOLOv11 and 7.38 percentage-points higher than YOLOv10n.

Recall (84.36%) was 2.42 percentage-points higher than YOLOv11 and 13.36 percentage-points higher than YOLOv5.

For small targets (e.g., 5–10 cm plastic fragments), recall increased by 6.3% compared with YOLOv8n, which is attributed to eCBAM’s multi-scale attention.

4.4.3. Efficiency Comparison

The proposed model maintained lightweight properties and high real-time performance:

The parameters (2.60 M) were only 0.3% higher than those of YOLOv11, 3.6% lower than those of YOLOv8n, and 37.5% lower than those of YOLOv6;

GFLOPs (6.44) were the same as YOLOv11, but lower than YOLOv8n (6.94) and YOLOv10n (8.40).

The inference speed (107 FPS) was sufficient for real-time monitoring and was only 4.5% lower than that of YOLOv11.

4.5. Robustness Experiments

To evaluate the performance of the model under varying environmental conditions, robustness tests were conducted on three subsets of the test set: high- reflection (300 images), turbulent flow (300 images), and dense floating objects (300 images). The results are presented in Table 5.

The proposed model showed significant robustness advantages:

High-reflection scenes: mAP@0.5 was 5.35 percentage-points higher than that of YOLOv11, as eCBAM’s boundary-enhanced attention suppressed reflection interference.

Turbulent flow scenes: mAP@0.5 was 4.65 percentage-points higher than that of YOLOv11, owing to saSIoU’s dynamic boundary tolerance adapting to wave-induced blur.

Dense objects: mAP@0.5 was 3.95 percentage-points higher than that of YOLOv11, as eCBAM’s multi-scale attention avoided feature competition between overlapping targets.

These results confirm that the proposed model is robust to diverse environmental challenges in river-floating object detection.

4.6. Visualization Analysis

To demonstrate the detection effect intuitively, typical river scene images were selected for a visualization comparison.

Figure 5 compares the detection performance of different models for small floating debris in urban inland river nearshore scenes, highlighting the improvement in the small-target recognition accuracy.

Figure 6 demonstrates the ability of the models to distinguish floating debris from natural objects in vegetation-dense water areas, reflecting the reduction in misclassification caused by background interference.

Figure 7 presents the detection performance for large floating debris in high-reflection open water scenes, illustrating the enhancement of the bounding box regression accuracy under strong light interference.

In summary, the preceding analysis shows that YOLOv11 (baseline) suffers from reflective boundary ambiguity, misclassification of irregular targets, and missed detections. YOLOv11 + eCBAM alleviates reflective boundary blur but still exhibits residual classification errors. YOLOv11 + saSIoU improves bounding-box alignment for irregular objects yet omits some small targets. The proposed model, which jointly optimizes eCBAM and saSIoU, achieves precise localization of reflective targets and accurate classification and bounding-box regression for irregular targets. These results support the claim that the proposed model effectively addresses scenario-specific detection challenges for riverine floating objects.

5. Conclusions

This paper presents an improved version of YOLOv11 for detecting riverine floating waste by integrating eCBAM and saSIoU. Riverine floating waste often exhibits unclear boundaries, irregular shapes, and frequent rotations, which pose challenges for conventional detectors. Key findings show that eCBAM enhances feature representation. Multi-scale channel attention with size-adapted weights improves multi-scale detection, and boundary-enhanced spatial attention reduces reflection-induced blur, yielding gains of 1.69 percentage points in mAP@0.5 and 2.96 percentage points in recall compared with the original CBAM. In addition, saSIoU increases the sensitivity of bounding-box regression by enhancing angle sensitivity, introducing shape-adaptive coefficients, and incorporating dynamic boundary tolerance to handle rotation, irregular shapes, and boundary blur. These modifications result in a 1.37 percentage-point increase in mAP@0.5 and a 4.29 percentage-point improvement in precision compared with the original SIoU. The synergistic optimization of these components yields further performance gains. The proposed model achieves 86.48% mAP@0.5, 56.44% mAP@0.95, 80.43% precision, and 84.36% recall on the dataset, outperforming mainstream lightweight models while maintaining only 2.60 M parameters, 6.44 GFLOPs, and 107 FPS. We will integrate the detector with temporal tracking and decision-making modules on real river-cleaning platforms (e.g., unmanned boats) to build end-to-end intelligent cleaning systems. In parallel, we will further investigate model lightweighting and hardware-aware optimization to enable deployment on low-power embedded devices and edge AI platforms.

Author Contributions

Conceptualization, Z.Z.; methodology, W.H.; software, T.L.; validation, Z.Z., T.L. and W.H.; formal analysis, W.H.; investigation, Z.Z.; resources, Z.Z.; data curation, Z.Z.; writing—original draft preparation, Z.Z.; writing—review and editing, Z.Z.; visualization, J.Z.; supervision, T.L.; project administration, W.H.; funding acquisition, J.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsin-ki Declaration and its later amendments or comparable ethical standards.

Data Availability Statement

The datasets are available at https://github.com/hwqwlsu/hwq (accessed on 28 December 2025).

Conflicts of Interest

The authors declare that they have no affiliations with or involvement in any organization or entity with any financial interest in the subject matter or materials discussed in this manuscript.

References

Yin, H.; Islam, M.S.; Ju, M. Urban river pollution in the densely populated city of Dhaka, Bangladesh: Big picture and rehabilitation experience from other developing countries. J. Clean. Prod. 2025, 321, 129040. [Google Scholar] [CrossRef]
UNESCO, UN-Water, World Water Assessment Programme. The United Nations World Water Development Report 2024: Water for Prosperity and Peace. 2024. United Nations Digital Library System. ISBN 9789231006579. Available online: https://digitallibrary.un.org/record/4042870 (accessed on 8 August 2025).
Hangzhou Lvzhongyou Intelligent Technology Co., Ltd. Governance of Floating Garbage in Urban Rivers: Dilemmas of Traditional Cleaning Methods and the Breakthrough Path of Waterborne Trash Bins. 2025. Available online: http://m.toutiao.com/group/7542790129312727562/?upstream_biz=doubao (accessed on 15 September 2025).
Du, X.; Xie, Z. Water Floating Garbage Detection Algorithm Based on Improved YOLOv7-Tiny. 2025. Available online: https://www.researchgate.net/publication/374791946_Water_Floating_Garbage_Detection_Algorithm_Based_on_Improved_YOLOv7-Tiny/fulltext/652fd4685d51a8012b52c1fb/Water-Floating-Garbage-Detection-Algorithm-Based-on-Improved-YOLOv7-Tiny.pdf (accessed on 20 September 2025).
Zhu, J.; Hu, T.; Zheng, L.; Zhou, N.; Ge, H.; Hong, Z. YOLOv8-C2f-Faster-EMA: An improved underwater trash detection model based on YOLOv8. Sensors 2024, 24, 2483. [Google Scholar] [CrossRef] [PubMed]
Lu, X.T.; Yang, T.; Jin, W.; Liu, J.; Wen, R. Methods for water fluctuation and underwater turbulence degraded imaging. J. Appl. Opt. 2017, 38, 42–55. [Google Scholar] [CrossRef]
Xian, R.; Tang, L.; Liu, S. Development of a lightweight floating object detection algorithm. Water 2024, 16, 1633. [Google Scholar] [CrossRef]
Kim, Y.; Kim, S.; Jeon, M. NBBOX: Noisy Bounding Box Improves Remote Sensing Object Detection. arXiv 2025, arXiv:2409.09424v3. [Google Scholar] [CrossRef]
Li, M.; Wang, Q.; Zhang, W. Research on water surface garbage detection method based on machine vision. Environ. Sci. Technol. 2020, 43, 189–195. [Google Scholar]
Zhao, C.Z. Research and Implementation of Water Surface Floating Object Detection Method Based on Deep Learning. Master’s Thesis, Hebei University of Engineering, Handan, China, 2025. [Google Scholar] [CrossRef]
Yi, Z.R.; Yao, D.Y.; Li, G.J.; Ai, J.Y.; Xie, W. Detection and localization for lake floating objects based on CA-faster R-CNN. Multimed. Tools Appl. 2022, 81, 17263–17281. [Google Scholar] [CrossRef]
Li, H.; Yang, S.P.; Liu, J.J.; Yang, Y.; Kadoch, M.; Liu, T.Y. A Framework and Method for Surface Floating Object Detection Based on 6G Networks. Electronics 2022, 11, 2939. [Google Scholar] [CrossRef]
Zou, Y.; Wang, X.; Wang, L. A high-quality instance-segmentation network for floating-algae detection using RGB images. Remote Sens. 2022, 14, 6247. [Google Scholar] [CrossRef]
Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. CBAM: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision, Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
Yang, Z.; Yin, Y.; Jing, Q.; Shao, Z. A High-Precision Detection Model of Small Objects in Maritime UAV Perspective Based on Improved YOLOv5. J. Mar. Sci. Eng. 2023, 11, 1680. [Google Scholar] [CrossRef]
Wang, G.; Chen, Y.; Zhang, J.; Li, H.; Liu, S. Multi-scale CBAM for small object detection. Pattern Recognit. Lett. 2021, 146, 10–16. [Google Scholar]
Liu, Y.; Li, X.; Wang, Z.; Zhang, H.; Chen, L. Edge-enhanced CBAM for medical image segmentation. Comput. Biol. Chem. 2022, 100, 107568. [Google Scholar]
Zhao, J.; Li, H.; Zhang, L.; Wang, Y.; Chen, S. Reflection-aware attention module for water surface object detection. IEEE Sens. J. 2023, 23, 17452–17461. [Google Scholar]
Yu, J.H.; Jiang, Y.N.; Wang, Z.Y.; Cao, Z.M.; Huang, T.S. UnitBox: An Advanced Object Detection Network. In Proceedings of the 24th ACM International Conference on Multimedia; ACM: New York, NY, USA, 2016; pp. 516–520. [Google Scholar] [CrossRef]
Rezatofighi, H.; Tsoi, N.; Gwak, J.Y.; Sadeghian, A.; Reid, I.; Savarese, S. Generalized intersection over union: A metric and a loss for bounding box regression. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2019, Long Beach, CA, USA, 16–20 June 2019; pp. 658–666. [Google Scholar]
Zheng, Z.; Wang, P.; Liu, W.; Li, J.; Ye, R.; Ren, D. Distance-IoU loss: Faster and better learning for bounding box regression. In Proceedings of the AAAI Conference on Artificial Intelligence 2020, New York, NY, USA, 9–11 February 2020; pp. 12993–13000. [Google Scholar]
Zheng, Z.; Wang, P.; Liu, W.; Ren, D.W.; Ye, R.G.; Hu, Q.H.; Zuo, W.M. Complete IoU loss: Improving object detection bounding box regression. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 4897–4912. [Google Scholar]
Gevorgyan, Z. SIoU loss: More powerful learning for bounding box regression. arXiv 2022, arXiv:2205.12740. [Google Scholar] [CrossRef]
Li, H.; Kong, M.; Shi, Y. Tea bud detection model in a real picking environment based on an improved YOLOv5. Biomimetics 2024, 9, 692. [Google Scholar] [CrossRef]
Zhang, Y.; Li, J.; Wang, H.; Zhang, X.; Zhou, Y.; Zhang, J. Adaptive SIoU loss for marine debris detection. IEEE J. Ocean. Eng. 2023, 48, 890–901. [Google Scholar]
Ultralytics. YOLOv11 Official Documentation. 2024. Available online: https://docs.ultralytics.com/models/yolov11/ (accessed on 21 July 2025).
Redmon, J.; Farhadi, A. YOLOv3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar] [CrossRef]
Li, C.; Li, L.; Jiang, H.; Weng, K.; Geng, Y.; Li, L.; Ke, Z.; Li, Q.; Cheng, M.; Nie, W.; et al. YOLOv6: A single-stage object detection framework for industrial applications. arXiv 2022, arXiv:2209.02976. [Google Scholar] [CrossRef]
Ultralytics. YOLOv8 Official Documentation. 2023. Available online: https://docs.ultralytics.com/models/yolov8/ (accessed on 20 June 2025).
Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv9: Learning what you want to learn using programmable gradient information. arXiv 2024, arXiv:2402.13616. [Google Scholar] [CrossRef]
Wang, A.; Chen, H.; Liu, L.; Chen, K.; Lin, Z.; Han, J.; Ding, G. YOLOv10: Real-Time End-to-End Object Detection. arXiv 2024, arXiv:2405.14458. [Google Scholar]

Figure 1. Structure diagram of the YOLOv11 model. This figure depicts the architecture of an YOLOv11 object detection model, where the blue SPPF/SPP (in backbone/neck) captures multi-scale features, the green C2PSA (in backbone) enhances key-region representation via Pyramid Self-Attention, and the red MISA-related components (in head) refine feature interactions—all working with basic layers to enable multi-scale feature processing and detection.

Figure 2. Structure of the enhanced CBAM (eCBAM) attention mechanism. The architecture combines Boundary Feature Extraction using Sobel operators and pooling for edge detection, with Floating Object Spatial Attention enhanced by multi-scale and channel weighting. This design optimizes feature extraction for accurate detection of floating objects in complex environments. This figure outlines a multi-scale attention module, where the light purple component (in the left multi-scale fusion area) corresponds to a feature branch from Pool(2) (one of the multi-scale pooling operations). It is assigned a weight of 0.2 during the multi-scale fusion process, contributing to the integrated feature representation by capturing mid-level scale information from the input.

Figure 3. Structure of the scenario-adapted saSIoU loss function. The diagram outlines a processing framework for water surface floating objects (note: * denotes multiplication), the beige top module handles adaptive preprocessing (aspect ratio, shape/angle adjustments) for floating objects; the cyan upper-left module calculates objects’ center points while extracting geometric features; the green upper-middle module optimizes large objects via angle calculations and outputs refined similarity; the pink upper-right module computes angle costs and performs angle loss calculations; the light blue lower-left module calculates basic IoU metrics and the purple middle-lower module refines shape costs and boundary tolerance for large objects. Together, these modules process object features, optimize detection metrics, and compute the final adjusted IoU for water surface floating object analysis.

Figure 4. Overall architecture of the improved YOLOv11 model. This diagram shows a YOLO-based detection architecture for water surface floating objects, where the “Floating Object Attention” module (the light blue block connected to the P4 branch) is designed to enhance feature representation for floating objects: it focuses on refining the features of the P4 (16 × 16) scale (targeting medium objects) by capturing key contextual information related to floating targets, thereby improving the model’s detection accuracy for such objects. (Note: * denotes multiplication.) This module works with layers like Conv, C3k2, and Upsample to enable multi-scale feature fusion, supporting separate detection of small (P3, 8 × 8), medium (P4, 16 × 16), and large (P5, 32 × 32) floating objects, with EnSIoU Loss optimizing detection performance.

Figure 5. Detection Effect Comparison of Small Floating Debris in Urban Inland River Nearshore Areas (Row 1: Detection results of the YOLOv11 (Baseline) Model; Row 2: Detection results of the YOLOv11 + eCBAM Model; Row 3: Detection results of the YOLOv11 + saSIoU Model; Row 4: Detection results of the Proposed Model).

Figure 6. Discrimination Performance Between Floating Debris and Natural Objects in Vegetation-Dense Water Areas (Row 1: Detection results of the YOLOv11 (Baseline) Model; Row 2: Detection results of the YOLOv11 + eCBAM Model; Row 3: Detection results of the YOLOv11 + saSIoU Model; Row 4: Detection results of the Proposed Model).

Figure 7. Detection Effect of Large Floating Debris in High-Reflection Open Water Areas (Row 1: Detection results of the YOLOv11 (Baseline) Model; Row 2: Detection results of the YOLOv11 + eCBAM Model; Row 3: Detection results of the YOLOv11 + saSIoU Model; Row 4: Detection results of the Proposed Model).

Table 1. Size distribution of river floating objects in the dataset.

Target Size	Definition (Pixel Area)	Count	Proportion (%)
Large	>10,000	6691	72.93
Medium	1000–10,000	1717	18.62
Small	<1000	778	8.45

Table 2. Configuration of the experimental conditions.

Parameter	Enviroment Configuration
GPU	NVIDIA RTX 4090
CPU	Intel(R) Xeon(R) Gold 5418Y (10 cores)
VRAM	24 G
RAM	120 GB
PyTorch	2.2.2
Python	3.10

Table 3. Ablation experiment results.

Model	mAP@0.5 (%)	mAP@0.95 (%)	Precision (%)	Recall (%)	FPS	Parameters	GFLOPs
YOLOv11 (Baseline)	83.83	52.17	79.62	81.94	112	2,590,425	6.44
YOLOv11 + eCBAM	85.52	54.69	77.39	84.90	108	2,598,864	6.44
YOLOv11 + saSIoU	85.20	54.83	83.91	81.70	111	2,590,425	6.44
Proposed Model	86.48	56.44	80.43	84.36	107	2,598,864	6.44

Table 4. Comparative experiment results.

Model	mAP@0.5 (%)	mAP@0.95 (%)	Precision (%)	Recall (%)	FPS	Parameters	GFLOPs
YOLOv5	74.96	46.25	74.26	71.00	98	2,188,409	5.93
YOLOv6	73.88	46.61	73.18	69.24	85	4,160,041	11.57
YOLOv9t	77.67	47.51	71.10	76.47	105	1,765,513	6.70
YOLOv10n	78.61	50.70	73.05	74.69	102	2,708,210	8.40
YOLOv8n	82.94	52.62	79.67	80.25	109	2,690,793	6.94
YOLOv11 (Baseline)	83.83	52.17	79.62	81.94	112	2,590,425	6.44
Proposed Model	86.48	56.44	80.43	84.36	107	2,598,864	6.44

Table 5. Robustness experiment results.

Model	High-Reflection mAP@0.5 (%)	Turbulent Flow mAP@0.5 (%)	Dense Objects mAP@0.5 (%)	Average mAP@0.5 (%)
YOLOv11	78.32	79.56	81.24	79.71
YOLOv8n	77.89	78.91	80.55	79.12
YOLOv10n	76.45	77.82	79.33	77.87
Proposed Model	83.67	84.21	85.19	84.36

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhao, Z.; Huang, W.; Li, T.; Zhu, J. eCBAM and saSIoU Co-Optimized YOLOv11 for Riverine Floating Garbage Classification Under Complex Aquatic Scenarios. Appl. Sci. 2026, 16, 651. https://doi.org/10.3390/app16020651

AMA Style

Zhao Z, Huang W, Li T, Zhu J. eCBAM and saSIoU Co-Optimized YOLOv11 for Riverine Floating Garbage Classification Under Complex Aquatic Scenarios. Applied Sciences. 2026; 16(2):651. https://doi.org/10.3390/app16020651

Chicago/Turabian Style

Zhao, Ziyu, Wenquan Huang, Teng Li, and Jing Zhu. 2026. "eCBAM and saSIoU Co-Optimized YOLOv11 for Riverine Floating Garbage Classification Under Complex Aquatic Scenarios" Applied Sciences 16, no. 2: 651. https://doi.org/10.3390/app16020651

APA Style

Zhao, Z., Huang, W., Li, T., & Zhu, J. (2026). eCBAM and saSIoU Co-Optimized YOLOv11 for Riverine Floating Garbage Classification Under Complex Aquatic Scenarios. Applied Sciences, 16(2), 651. https://doi.org/10.3390/app16020651

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

eCBAM and saSIoU Co-Optimized YOLOv11 for Riverine Floating Garbage Classification Under Complex Aquatic Scenarios

Abstract

1. Introduction

1.1. Research Background and Significance

1.2. Related Work

1.2.1. Object Detection in Aquatic Environmental Monitoring

1.2.2. Attention Mechanism for Target Detection

1.2.3. Loss Function Optimization for Bounding Box Regression

1.3. Research Content and Innovations

2. Related Theoretical Foundations

2.1. YOLOv11 Model Architecture

2.2. CBAM Attention Mechanism

2.3. SIoU Loss Function

3. Design of Improved YOLOv11 Model

3.1. Enhanced CBAM (eCBAM) Attention Mechanism

3.1.1. Floating Object Channel Attention (Multi-Scale Channel Attention)

3.1.2. Floating Object Spatial Attention (Boundary-Enhanced Spatial Attention)

3.1.3. Shape-Adaptive Weight Fusion

3.2. Scenario-Adapted saSIoU Loss Function

3.2.1. Angle Sensitivity Enhancement for Large Targets

3.2.2. Shape-Adaptive Mechanism for Irregular Objects

3.2.3. Dynamic Boundary Blur Tolerance

3.3. Overall Architecture of the Improved YOLOv11 Model

4. Experiments and Results Analysis

4.1. Experimental Dataset and Environment

4.1.1. Dataset Construction and Augmentation

4.1.2. Experimental Environment and Parameter Settings

4.2. Evaluation Metrics

4.3. Ablation Experiments

4.3.1. Effect of eCBAM

4.3.2. Effect of saSIoU

4.3.3. Synergistic Effect of eCBAM and saSIoU

4.4. Comparative Experiments

4.4.1. Accuracy Comparison

4.4.2. Precision and Recall Comparison

4.4.3. Efficiency Comparison

4.5. Robustness Experiments

4.6. Visualization Analysis

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI