Balancing Feature Symmetry: IFEM-YOLOv13 for Robust Underwater Object Detection Under Degradation

Feng, Zhen; Liu, Fanghua

doi:10.3390/sym17091531

Open AccessArticle

Balancing Feature Symmetry: IFEM-YOLOv13 for Robust Underwater Object Detection Under Degradation

by

Zhen Feng

and

Fanghua Liu

^*

Mechanical Engineering, Jiangsu University of Science and Technology, Zhenjiang 212003, China

^*

Author to whom correspondence should be addressed.

Symmetry 2025, 17(9), 1531; https://doi.org/10.3390/sym17091531

Submission received: 30 July 2025 / Revised: 5 September 2025 / Accepted: 8 September 2025 / Published: 13 September 2025

(This article belongs to the Section Engineering and Materials)

Download

Browse Figures

Versions Notes

Abstract

This paper proposes IFEM-YOLOv13, a high-precision underwater target detection method designed to address challenges such as image degradation, low contrast, and small target obscurity caused by light attenuation, scattering, and biofouling. Its core innovation is an end-to-end degradation-aware system featuring: (1) an Intelligent Feature Enhancement Module (IFEM) that employs learnable sharpening and pixel-level filtering for adaptive optical compensation, incorporating principles of symmetry in its multi-branch enhancement to balance color and structural recovery; (2) a degradation-aware Focal Loss incorporating dynamic gradient remapping and class balancing to mitigate sample imbalance through symmetry-preserving optimization; and (3) a cross-layer feature association mechanism for multi-scale contextual modeling that respects the inherent scale symmetry of natural objects. Evaluated on the J-EDI dataset, IFEM-YOLOv13 achieves 98.6% mAP@0.5 and 82.1% mAP@0.5:0.95, outperforming the baseline YOLOv13 by 0.7% and 3.0%, respectively. With only 2.5 M parameters and operating at 217 FPS, it surpasses methods including Faster R-CNN, YOLO variants, and RE-DETR. These results demonstrate its robust real-time detection capability for diverse underwater targets such as plastic debris, biofouled objects, and artificial structures, while effectively handling the symmetry-breaking distortions introduced by the underwater environment.

Keywords:

underwater target detection; marine plastic debris; YOLOv13; small target detection; underwater image degradation; symmetry

1. Introduction

Marine plastic pollution is progressively infiltrating subsurface ecosystems, posing significant ecological and biosafety risks. Recent studies indicate that 68% of seabed plastic debris causes over 1 million marine organism mortalities annually, while microplastics (MPs) derived from their degradation have breached the human placental barrier [1]. Global plastic production has grown exponentially, surging from 1.5 million tons in the 1950s to nearly 199 million tons in 2025, with approximately 4.8–12.7 million tons of plastic waste entering the marine environment each year [2]. Conventional plastics gradually fragment in the marine environment to form microplastics (MPs) and nano plastics (NPs) with particle sizes smaller than 5 mm, which are extremely environmentally persistent and bioaccumulation potential [3].

Surface floating debris detection has achieved significant breakthroughs through deep learning. The APM-YOLOv7 algorithm effectively suppresses water surface ripple interference through adaptive Canny edge detection (A Canny), and its dual thresholds are empirically set to 0.075 and 0.175, respectively, with the ratio of high and low thresholds of 7:3, to realize more accurate edge point calculation [4]. The model introduces the PConv-ELAN module, inspired by Faster Net, which integrates a lightweight partial convolution technique into the YOLOv7 architecture, replacing the 3 × 3 convolutional layers in the original ELAN module. PConv significantly reduces the number of parameters by reducing redundant computations while maintaining the feature extraction capability. Together with the multi-scale gated attention mechanism (MGA), the algorithm achieves an mAP@0.5 of 89.5% on the self-built surface litter dataset. The recall of small target trash is improved by 11.82% [5]. Lightweight design is another important trend in water surface detection, and the improved YOLOv5 developed by a team from Shanghai Ocean University uses EfficientNetV2-S as the backbone network, combined with deep separable convolution technology to compress the number of model parameters to 12% of the original structure. The algorithm achieves an mAP@0.5 of 84.9% on the Orca marine life dataset. The inference speed reaches 5.9 ms/frame, which lays the foundation for deployment in mobile monitoring platforms [6]. For small target detection, for underwater small target detection, the improved YOLOv5 algorithm enhances the model’s ability to perceive targets in low-contrast, noisy, and blurred images by integrating the CBAM module and the BiFPN architecture, and significantly improves the detection of small targets by boosting the recall precision to 89.84% on the WSODD dataset [7].

However, these advanced algorithms optimized for surface environments have significantly degraded target detection performance in special underwater environments. The core challenge stems from the complexity and degradation effects of underwater imaging: the absorption and scattering effects of light in the water column lead to severe attenuation of visible light bands, with the red band (600–700 nm) failing completely at a depth of 5 m, and the blue band (450–495 nm) having an the attenuation coefficient of blue light (450–495 nm) is as high as 0.2 m⁻¹. This selective attenuation destroys the color balance and natural symmetry of spectral response, reducing image contrast and resulting in a decrease in signal-to-noise ratio of more than 40%. At the same time, suspended particles and dissolved organic matter in the water column cause forward scattering effects, resulting in blurring of target edges and loss of details—a clear symmetry-breaking in spatial structures [8]. This optical distortion makes it difficult for traditional detection algorithms based on RGB information to accurately extract target features, especially in turbid waters or deep-water environments; biological attachment and texture confusion further increase the difficulty of identification, and the surface of plastic trash that remains on the seafloor for a long period of time is gradually covered by algae and microbial films, forming a layer of biological peridium. This process changes the optical properties of the plastic surface, such as surface reflectivity, masking the geometry, and increasing the complexity of the texture. According to J-EDI (Underwater Litter Image Dataset), the texture similarity between plastic fragments covered by biofilm and seabed corals and seaweeds is more than 75%, which leads to a significant decrease in the accuracy of classifiers based on texture analysis. In addition, bio-attachment also accelerates the embrittlement and fragmentation of plastics, producing more small target fragments with particle sizes smaller than 50 pixels, which are difficult to distinguish in sonar or optical images; there are significant differences in the composition of marine litter, with plastic packaging accounting for as much as 41.7%, while bio-attached litter (e.g., shellfish-attached plastic fragments) accounts for only 2.3%, and this long-tailed distribution causes the model to over-focus on the major categories, ignoring rare but ecologically risky categories [9]. This problem is particularly acute in underwater scenarios with limited training data—the model may accurately detect common plastic bottle bags but miss plastic debris covered in sea squirts or discarded fishing nets, which are the sources of entanglement and microplastic releases that pose the greatest threat to marine life. The degradation of imaging quality further exacerbates the difficulty of recognizing rare and complex forms of litter.

Notwithstanding these endeavors, a critical appraisal of the literature reveals that prevailing approaches often address the symptoms of underwater degradation in isolation rather than the root cause in a holistic manner. This leads to three persistent and interconnected research gaps that fundamentally limit deployment in real-world scenarios: (1) Static enhancement strategies are difficult to adapt to dynamic underwater degradation, which breaks the natural symmetry of terrestrial vision models; (2) Higher-order semantic associations of dense targets are insufficiently modeled, ignoring the structural symmetry present in man-made objects; and (3) It is profoundly challenging to combine both lightweight design and high accuracy under complex degradation conditions while retaining symmetry in computational graphs for efficient inference.

Degradation effects in underwater environments, such as insufficient lighting, color distortion, contrast degradation, and edge blurring, lead to serious degradation of image quality, which fundamentally restricts the performance of target detection models relying on raw image information underwater [10]. These effects break the inherent symmetry assumptions standard in computer vision models trained on terrestrial imagery. Traditional preprocessing methods (e.g., histogram equalization, color correction) are difficult to adapt to complex underwater degradation patterns, while mainstream detection algorithms (including Faster R-CNN [11,12], YOLO series [13,14,15,16,17]) are significantly less robust under low illumination, blurring, and dense occlusion scenes. In order to break through this technical bottleneck, this paper proposes a high-precision underwater garbage detection method based on the YOLOv13 framework, IFEM-YOLOv13, whose core innovation lies in the construction of an end-to-end degradation-aware learning system that explicitly models and compensates for symmetry-breaking phenomena [18].

The main contributions of this paper include:

(1) Proposing a feature quality reconstruction module (IFEM) that can be embedded into a deep learning framework, establishing a physically inspired feature recovery pathway to effectively compensate for underwater optical degradation effects through a learnable sharpening filter, pixel-level filter, and disparity enhancement co-optimization, designed with symmetry principles to ensure balanced enhancement across color channels and spatial domains.

(2) Designing a multi-scale attention fusion mechanism and integrating a channel-space dual-attention guided feature optimization strategy to significantly improve the feature discrimination ability of small and biologically attached targets, leveraging scale symmetry in natural and artificial objects.

(3) Developing a degradation-aware Focal loss function to strengthen the model’s learning ability for low-frequency key categories (e.g., microplastic pollutants) through dynamic gradient remapping and solving the problem of uneven underwater sample distribution, incorporating symmetry-aware regularization to prevent model bias toward majority classes.

The IFEM acts as an intelligent “visual corrector” for the underwater detection model, and constructs physically accurate feature enhancement links at the input side: a learnable sharpening filter to recover the edge structure information, a pixel-level filter to improve the local contrast, and spatially varying gamma enhancement to realize detail enhancement, all designed with symmetry constraints to maintain natural image statistics. At the same time, the improved Focal loss establishes a decision optimization mechanism at the output side, and the two collaborate to form a complete enhancement system from feature extraction to decision inference that respects the underlying symmetry properties of visual scenes.

The rest of the paper is structured as follows: Section 2 summarizes and analyzes the related work, Section 3 describes the proposed method in detail, Section 4 presents the experimental results, and Section 5 summarizes the experimental results and proposes future research.

2. Related Works

Underwater target detection, as a core technology for marine monitoring and ecological protection, has long faced challenges such as light attenuation, motion blur, dense distribution of small targets, and data scarcity [19]. Traditional methods rely on sonar or artificial visual analysis, which are inefficient and poorly generalized [20]. In recent years, deep learning has significantly improved the detection accuracy and real-time performance of underwater targets through end-to-end feature learning capability. The single-stage detector represented by the YOLO series has become a mainstream solution due to its excellent speed-accuracy balance, but the complexity of underwater environments requires the model to be further optimized in terms of lightweighting, feature enhancement, and adaptive sensing [21].

For the underwater trash detection task, the combination of lightweight network design and image enhancement strategies has become a research focus. SLD-Net proposed by Zhou Hua ping et al. [22] fuses the two-stage detection framework with the lightweight backbone network MobileNetV2, which effectively mitigates the underwater image degradation problem through the Multi-Domain Image Enhancement Module (MDIE) combined with the color conversion and detail denoising submodules. These submodules implicitly leverage principles of symmetry by applying uniform transformation rules across spatial and chromatic domains to recover natural image statistics. Meanwhile, a two-way FPN structure is introduced to enhance the feature extraction capability of small targets, which achieves a real-time performance of 94.5% mAP and 65 FPS on the J-EDI dataset, with a model parameter count of only 5.4 M [23]. This work demonstrates that the synergistic optimization of preprocessing enhancement and a lightweight backbone network significantly improves the robustness of spam detection. Similarly, Li He et al. [24] improved YOLOv5s for water surface floating object detection by designing Mosaic-based small target data augmentation (STDA) and coordinate attention mechanism (MCA), enhancing small target sensing by adding shallow feature maps with bilinear interpolated upsampling, and combining with channel pruning to compress the model to 4.31 MB, and realizing 92.01% accuracy in edge devices with 33 FPS real-time detection at the edge device. Image enhancement provides preprocessing support for underwater detection. Cheng et al. [25] design an adversarial generative lightweight infrared enhancement network, which reduces the number of parameters by 75% and the edge inference time by 32.07% through multilevel feature fusion. Zhao et al. [26] propose the Swift Water network, which constructs a U-Net architecture based on the Swift Former encoder, and introduces an HSV encoder. Net architecture, introducing the HSV color space hinting module (LPB), and improving PSNR by 4.2 dB in the UIEB dataset, with only 1/3 of the number of parameters of the mainstream model, while adhering to color symmetry principles through consistent transformation across hue, saturation, and value channels.These works highlight the effectiveness of lightweight design with targeted enhancement for underwater small target detection.

In dense biological detection scenarios, dynamic routing and feature fusion mechanisms show unique value. Jiang Yu et al. [27] proposed a feature enhancement and adaptive routing framework for fish detection in complex underwater environments: a learnable feature enhancement module mitigates the illumination and blurring problems, and an adaptive target routing strategy is designed to dynamically switch between simple/complex detection branches according to the complexity of the scenario (based on YOLOv7, respectively). Its multi-scale feature fusion module (MFFR) integrates two-branch features, while the routing module assigns detection paths based on an a priori threshold (τ = 0.497), achieving 73.3% mAP on the self-constructed DUFish dataset, which is a 3% enhancement over the baseline model. This work demonstrates that the dynamic adaptation mechanism can effectively balance the detection speed and accuracy of complex scenarios but does not address the higher-order semantic association modeling problem.

As the YOLO series continues to evolve, higher-order feature modeling becomes the key to breaking through the performance bottleneck. The Nankai University team [28] proposed YOLO-MS, introduced a hierarchical feature fusion module (MS-Block) based on the Kernel-Size study, and enlarged the receptive field through progressive heterogeneous convolutional kernels to improve the AP of YOLOv8 from 37% to 40%. Lei et al. [29] proposed the HyperACE mechanism for YOLOv13 to propose the Hypergraph Self-Adaptive Correlation Enhancement (HyperACE). Adaptive Correlation Enhancement (HyperACE) mechanism, which dynamically captures the many-to-many higher-order correlations across space and scale through a learnable hyperedge building block, replacing the traditional local convolution and binary attention mechanism; its Full Process Aggregation-Distribution (FullPAD) paradigm injects correlation-enhanced features into the backbone network, necks, and detection heads to realize the global information synergy. The model outperforms YOLOv12 on MS COCO with lower computation (1.5–3.0% mAP enhancement), but its adaptability in underwater scenarios has not been explored. Summarizing the existing work, current underwater detection methods still have three limitations: (1) static enhancement strategies are difficult to adapt to dynamic underwater degradation that breaks visual symmetry; (2) higher-order semantic associations of dense targets are insufficiently modeled, ignoring the structural symmetry present in man-made objects; and (3) it is difficult to combine both lightweight and high accuracy while maintaining computational symmetry for efficient inference. This paper accordingly improves YOLOv13 by fusing dynamic enhancement and hypergraph perception to fit the unique needs of underwater biology and trash detection.

To directly bridge these gaps, we propose IFEM-YOLOv13. Our framework is designed as an integrated solution: the Intelligent Feature Enhancement Module (IFEM) addresses the first limitation through learnable, adaptive compensation that restores imaging symmetry; the Hyper-ACE engine tackles the second by explicitly modeling cross-layer and cross-scale feature correlations while preserving relational symmetry; and the entire system’s design philosophy, culminating in only 2.5 M parameters and 217 FPS, squarely addresses the third challenge of achieving high accuracy without sacrificing efficiency, maintaining an elegant symmetry between performance and computational complexity.

3. Methods

This study builds upon the YOLOv13 architecture with core innovations: the introduction of the “Hypergraph Adaptive Correlation Enhancement (HyperACE)” mechanism and the “Full-Process Aggregation-Distribution Paradigm (FullPAD)”, significantly enhancing the model’s high-order semantic perception capabilities for multi-scale objects.

The HyperACE mechanism maps pixels from multi-scale feature maps to hypergraph vertices. It dynamically constructs hyperedges via a learnable participation matrix (rather than relying on manual thresholds) to capture multi-to-multi high-order associations between objects. This module incorporates parallel high-order (C3AH module) and low-order (DS-C3k module) branches. They aggregate global semantics and local details through hypergraph convolution and deep separable convolution, respectively. Finally, a shortcut branch preserves original information, achieving complementary feature enhancement.

The FullPAD paradigm distributes the correlation-enhanced features from HyperACE through three tunnels to the backbone network, neck network, and detection head: First, the features are scaled to each stage’s resolution and convolutionally aligned in channel count. Then, they undergo gated fusion with original features using learnable weights, ensuring the enhanced features permeate the entire pipeline and promoting gradient propagation and cross-scale information collaboration.

3.1. IFEM-YOLOv13

The overall network architecture of IFEM-YOLOv13 is shown in Figure 1. IFEM-YOLOv13 reconfigures the traditional serial computation paradigm of YOLO by implementing a closed-loop information flow through the FullPAD (Full Pipeline Feature Aggregation and Distribution) framework, which establishes a symmetrical bidirectional pathway for feature reuse and gradient propagation.The input image first undergoes degradation compensation through the Intelligent Feature Enhancement Module (IFEM), which jointly learns the sharpening intensity λ, pixel modulation coefficient μ, and scattering simulation parameter σ to achieve source-level restoration of underwater optical degradation while preserving the intrinsic symmetry of natural color and structural distributions. The corrected high-quality image is then input into the improved backbone network, which extracts hierarchical features through a multi-scale feature pyramid extraction layer. The enhanced features are then processed through a high-resolution feature pyramid network (HR-FPN) for multi-scale fusion, using a combination of dilated convolutions and sub-pixel upsampling to preserve both the receptive field and the fragmented texture information of small objects. At this stage, the HyperACE hypergraph engine dynamically constructs a cross-layer feature correlation model that captures higher-order symmetries in spatial and semantic relationships; enhanced features undergo cross-scale feature fusion via the high-resolution FPN while retaining fragmented texture information of small objects; the final target detection results are output by the multi-scale detection head, which applies symmetry-aware prediction to handle variably shaped underwater objects.

The innovation of our framework lies in three synergistic mechanisms: (1) the IFEM constructs a physically inspired degradation compensation front end that restores visual symmetry disrupted by underwater conditions; (2) the HyperACE engine establishes a cross-layer feature correlation map with built-in relational symmetry to enhance target context modeling capabilities; (3) multi-scale residual connections form feature back-propagation pathways that ensure computational symmetry during information flow. This closed-loop architecture effectively addresses feature degradation issues in traditional serial pipelines by introducing symmetry-preserving learning throughout the network, providing a system-level solution for underwater debris detection that balances enhancement, detection, and reconstruction within a unified symmetrical framework.

3.2. Feature Quality Reconstruction Module

Underwater images exhibit significant variability in detection difficulty due to environmental dynamics (light attenuation, turbidity changes) and variations in target density [30]. Existing target detection algorithms face dual challenges: on one hand, they struggle to robustly extract high-level semantic features from degraded images; on the other hand, they fail to effectively model the distribution patterns of targets in complex underwater scenes [31]. Traditional image processing methods (such as contrast enhancement and Gaussian denoising) can locally improve image quality but cannot adaptively balance the mutual constraints of multiple degradation factors due to the unlearnable constraints between parameters, and often break the natural symmetry of color and structural information. Therefore, we propose the lightweight adaptive feature quality reconstruction module (IFEM) introduced by Yu Jiang et al. [27]. By jointly learning parameters such as sharpening intensity λ, color compensation μ, and scattering simulation parameter σ through a convolutional attention mechanism, the feature reconstruction process is dynamically optimized to suppress adverse noise while preserving the edge structure of biofouling targets, providing degradation-invariant discriminative features for downstream target detection [32]. As shown in Figure 2, the module includes an information acquisition module, a parameter prediction module, a feature processing filter, an aggregation module, and a difference enhancement module, all designed with symmetry-aware learning constraints.

Information Collection Module: The information collection module employs a cascaded design combining convolution and max pooling to progressively transform raw images into high-dimensional features, gradually converting the three-dimensional spatial dependencies of the image into dependencies in the high-dimensional space while preserving spatial symmetry through structured downsampling. After the original image F1 is input, it first passes through a convolution layer to expand the 2D spatial information into a multi-channel space. This is followed by a max pooling layer that compresses redundant background information while retaining key feature responses in a symmetry-preserving manner. This cascaded operation progressively transforms the three-dimensional spatial dependencies of lower layers into semantic dependencies of higher layers, providing feature maps with complete structure and dense information for subsequent feature sampling. The specific cascade operations are as follows:

F_{c} = (M a x p o o l - {C o n v}_{3}) \times (M a x p o o l - {C o n v}_{2}) \times (M a x p o o l - {C o n v}_{1} (F_{1}))

(1)

M a x p o o l - {C o n v}_{i}

denotes the convolution and pooling operations performed on the i-th feature map. At the end of the information collection module, a spatial attention mechanism is used to model the importance of the spatial location of the feature map, focusing on the key areas between pixels and ignoring irrelevant backgrounds while maintaining translational symmetry in attention weighting:

F_{p i n} = {A v g P o o l (F}_{c} \otimes S P (F_{c}))

(2)

S P (F_{c})

denotes the attention weight for the prediction space of

F_{c}

, and

\otimes

denotes pixel-wise multiplication.

Parameter Prediction Module: As a key component affecting image reconstruction quality, the core function of this module is to predict five parameters used to regulate filter image processing:

({1) θ}_{1}; (2) θ_{2}; (3) θ_{3}; (4) λ; (5) μ

. The main architecture of this module is based on fully connected layers. Specifically, the compressed feature map

F_{p i n}

is first fed into a Multi-Layer Perceptron for dimension reduction and feature transformation. Subsequently, a channel attention mechanism (CHANNEL) is applied to dynamically recalibrate the importance of each feature channel by modeling inter-channel dependencies, which enhances the discriminative power for degradation-specific parameter prediction while enforcing symmetry in channel-wise adaptation. The refined features are then passed through

{M L P}_{o u t}

, which outputs a 5-dimensional vector corresponding to the five target parameters. This is formally expressed as:

F_{p r e} = {{M L P}_{o u t} (C H A N N E L ({M L P}_{i n} (F}_{p i n})))

(3)

where the output vector

F_{p r e} \in R^{5}

is directly mapped as [

θ_{1}, θ_{2}, θ_{3}, λ, μ

]. The module introduces a channel attention mechanism to model the importance weights of different feature channels for parameter prediction, capturing the global dependencies between pixels with symmetry in parameter influence. Specifically,

θ_{i}

is used to adjust the RGB channel information of the image in a symmetry-preserving way, while

λ

and

μ

serve as the core parameters of the filter.

Feature processing filter: The feature processing filter includes pixel-level filters and sharpening filters. Sharpening filters enhance the high-frequency components of an image (such as edges, textures, and other detailed information) to improve image sharpness (or edge clarity), which is particularly important in the field of underwater image restoration. This is because underwater optical attenuation and scattering effects can degrade images, breaking the structural symmetry of objects and making target detection difficult. The parameter λ in this filter enables the model to adaptively adjust the sharpening intensity, allowing it to optimize based on the degradation level of the input image: for severely degraded images, the model can learn to apply high-intensity sharpening (larger λ values); for mildly degraded images, it applies low-intensity sharpening (smaller λ values). For detection tasks involving different categories such as fish schools and plastic debris, sharpening processing significantly enhances the contrast between target and background boundaries, effectively mitigating difficulties in contour recognition caused by target overlapping and blurring. The specific process is as follows:

F_{2} = (1 - λ) \times L u m (F_{1}) + λ \times \frac{1 - c o s (π \times L u m (F_{1}))}{2 \times L u m (F_{1})}

(4)

Pixel-level filter: The core function of the pixel-level filter is to correct image pixel values and improve image brightness distribution in a targeted manner while maintaining illumination symmetry. In underwater imaging scenarios, non-uniform lighting conditions are a typical challenge, resulting in uneven image brightness distribution. This filter enhances local contrast through Gaussian difference operations to improve lighting unevenness issues. The parameter μ serves as an adaptive adjustment factor, dynamically adjusting the intensity of contrast enhancement based on the characteristics of the input image while preserving the natural symmetry of luminance distribution. Specifically, for low-contrast images, the model can learn to use a larger μ value; for images with sufficient contrast, a smaller μ value is used for fine-tuning or suppressing enhancement.

F_{3} = (1 - μ) \times L u m (F_{1}) + μ \times (L u m (F_{1}) - G a u (L u m (F_{1})))

(5)

Aggregation Module: The aggregation module aims to fuse the output features of the pixel-level filter and sharpening filter (

F_{3}

,

F_{2}

) with the original image (

F_{1}

) in a symmetry-preserving manner that balances contributions from different enhancement pathways. This module first concatenates F₂, F₃, and the original image in the channel dimension to generate a multi-channel feature tensor. It then applies a convolution operation to the concatenated features to perform a nonlinear transformation and channel normalization, ultimately outputting a fused feature map with the target number of channels while maintaining feature symmetry across spatial dimensions.

F_{a g g} = C o n v (C o n v (C o n c a t (F_{1}, {F_{2}, F}_{3})) \otimes S P (C o n v (C o n c a t (F_{1}, {F_{2}, F}_{3}))))

(6)

Contrast Enhancement Module: This module (Figure 3) addresses the issue that traditional fixed Gaussian blurring is difficult to adapt to diverse image degradation levels. It introduces a learnable Gaussian kernel to enhance high-frequency details and adjust the intensity of image quality enhancement adaptively.

This module adopts a dual-branch architecture with symmetrical processing pathways. Branch 1: predicts the parameters of Gaussian blur with constant space; Branch 2: predicts the underwater scattering space enhancement coefficient

θ

through a convolutional neural network (CNN).

F_{o u t} = x + \frac{γ}{2} \times (x - b)

(7)

In the learnable fuzzy implementation, we directly optimize the kernel weight matrix

G_{θ}

instead of

θ

. This learning method avoids the formal constraints of the Gaussian function and gives the model the flexibility to learn non-Gaussian shape fuzzy kernels while maintaining approximate symmetry in spatial filtering operations, which is very important for more accurately simulating complex underwater water scattering effects.

F_{b l u r r e d} = F * G_{θ}

(8)

3.3. Loss Function Design

In the marine debris detection task, the category distribution of the J-EDI dataset exhibits a significant long-tail characteristic (Table 1). Plastic samples account for as much as 55.4%, while biologically significant samples (BIO) account for only 23.0%, and ROV equipment samples account for 21.6% [33]. The traditional cross-entropy loss function exacerbates category bias during training, particularly for difficult-to-classify samples such as small-sized microplastics and biologically attached debris, resulting in low learning efficiency [34]. This long-tail distribution causes the model to overemphasize dominant categories while neglecting rare but ecologically high-risk categories. The traditional cross-entropy loss function exhibits overfitting toward dominant categories, leading to degraded recognition capabilities for low-frequency categories. To enhance the model’s ability to focus on difficult-to-classify samples (e.g., small targets, inter-category similar samples), this paper introduces an improved Focal loss function to overcome this limitation:

L_{focal} = - α_{t} (1 - p_{t})^{γ} l o g (p_{t})

(9)

α_{t} = \frac{N_{t o t a l}}{N_{t}}

(10)

The inverse number of samples weighting scheme (

α_{t}

) is chosen for its direct proportionality to class imbalance severity, effectively increasing the gradient contribution of tail classes. Compared to alternative strategies—such as inverse class frequency weighting which may overcompensate extreme minorities, or the effective number of samples accounting for data overlap—this approach provides stable rebalancing without over-amplifying noise from very rare categories, which is particularly suitable for the moderate yet significant imbalance in J-EDI.

Among them,

α_{t}

is the category balancing factor,

N_{t o t a l}

is the total number of samples, and

N_{t}

is the number of samples in category t. This factor assigns higher weights to low-frequency categories (such as BIO), alleviating the gradient imbalance caused by the long-tail distribution. When applied to the biological category,

α_{t} = 10521 / 2417 \approx 4.35

, which is significantly higher than 1.80 for the plastic category. Among them,

γ

is the focus modulation factor, which is set to

γ = 2.0

in this paper, and the loss contribution of easily distinguishable samples is dynamically reduced through Formula (10). Among them,

p_{t}

represents the model’s predicted probability of the true category t of the sample.

This design forms an adaptive learning mechanism: when the model predicts a low confidence level

p_{t}

for difficult samples (such as small targets or occluded targets), the loss value is significantly amplified, forcing the network to strengthen feature learning for such samples and pushing the decision boundary to expand into sparse sample regions.

4. Experimental Protocols and Evaluation Measures

4.1. Datasets

This study utilizes the J-EDI dataset developed by the Japan Agency for Marine-Earth Science and Technology (JAMSTEC) [35], which contains over one million high-quality images and video sequences captured in deep-sea regions surrounding Japan between 1982 and 2019. As one of the largest deep-sea visual databases currently available, J-EDI has become a benchmark dataset for underwater visual research due to its rich environmental diversity and high-quality annotations [36]. Based on the diversity and statistical reliability of the data, we selected 7684 images with a resolution of 480 × 360 from the J-EDI dataset to construct a local experimental sample library. These samples comprehensively cover various key targets, including deep-sea organisms (BIO), seafloor geological structures, artificial structures (ROV), and plastic debris (Plastic), among other key targets. The plastic debris samples include typical underwater pollutants such as plastic bags, fishing net fragments, and plastic containers. The construction process of this dataset strictly adheres to repeatable and verifiable scientific standards, ensuring high reliability, category balance, and environmental diversity, thereby providing a benchmark platform for underwater object detection [37].

As shown in Table 1, to ensure the authenticity of imaging in complex underwater environments, this study retains naturally occurring degradation features such as low light, non-uniform lighting, and motion blur, providing a scientific basis for evaluating the performance of algorithms under real conditions. The dataset is divided proportionally into training, validation, and test sets, and the final algorithm can effectively support the evaluation of the generalization ability of underwater target detection.

4.2. Experimental Platform and Model Evaluation Measures

(1) Experimental environment: The model proposed in this experiment is implemented based on the PyTorch (2.2.2+cu118) architecture and trained using NVIDIA RTX 2080ti (12 G). The SGD optimizer is selected for iterative optimization, with a batch size of 16 and 150 training epochs.

(2) Evaluation metrics: This experiment employs evaluation metrics from the object detection domain to ensure the comparability of results, namely mAP50, mAP50-95, FPS, and params. Among these, mAP50 and mAP50-95 represent the average precision at IoU thresholds of 0.5 or 0.5–0.95, reflecting the model’s accurate localization capability; FPS is used to assess the model’s real-time performance; params characterizes the model’s complexity and computational resource requirements [38]. The formulas for calculating the average precision values mAP, AP, Precision, and Recall are shown in Equations (11)–(14):

m A P = \frac{\sum_{j = 1}^{m} A P (j)}{m}

(11)

A P (j) = \frac{\sum_{i = 1}^{n} P r e c i s i o n (i)}{n}

(12)

P r e c i s i o n = T P / (T P + F P)

(13)

R e c a l l = T P / (T P + F N)

(14)

Among them, TP represents the number of correctly matched predicted bounding boxes, FP represents the number of mismatched predicted bounding boxes, and FN represents the number of mismatched actual bounding boxes.

5. Experimental Results and Discussion

5.1. Ablation Experiment and Visualization Analysis

To validate the effectiveness of different submodules, three models were set up for ablation experiments. Model1 uses YOLOv13 as the main algorithm framework; Model2 adds an improved Focal loss function based on Model1; and Model3 adds a feature quality reconstruction module based on Model2. Here, A represents the Focal loss function, and B represents the IFEM. The ablation experiment results are shown in Table 2. The baseline model YOLOv13n achieved mAP50 and mAP50-95 of 97.9% and 79.1%, respectively, on the J-EDI deep-sea object detection dataset, with 2.4 million parameters; After optimizing the loss function, mAP50 and mAP50-95 improved to 98% and 80%, respectively, representing increases of 0.1% and 0.9%; After further integrating the feature quality reconstruction module IFEM, which restores structural and chromatic symmetry disrupted by underwater conditions, the mAP50 and mAP50-95 values were 98.6% and 82.1%, respectively, representing improvements of 0.7% and 3%, respectively, while the number of parameters increased by only 0.1 M. This validates that the feature quality reconstruction module IFEM proposed in this study can enhance model performance at an extremely low computational cost while effectively recovering image symmetry. We also demonstrate the visualization effects after module enhancement.

As shown in the visualization results of Figure 4, compared to the baseline model, our model demonstrates superior object detection performance under complex conditions characterized by multiple degrading factors such as blurring, low visibility, and small targets, which break the natural symmetry of object appearance. The visualization results in Figure 4 are consistent with the ablation experiment results in Table 1. In deep-sea scenes, the difficult samples marked with dashed ellipses simultaneously exhibit the triple characteristics of blurring, low contrast, and small scale. The baseline model not only misclassifies the difficult samples themselves but also causes contextual interference due to its incorrect detection results, leading to spatial suppression effects on neighboring samples—a clear breakdown of spatial symmetry in detection response. This occurs because the baseline’s degraded feature representations lack discriminative power, causing erroneous high-activation regions that propagate through the network and suppress features of adjacent objects. For example, in Figure 4, the Plastic target is incorrectly classified as an ROV device on the baseline due to interference from neighboring difficult samples.

The model proposed in this study effectively addresses this challenge by reintroducing symmetry-preserving processing. The learnable feature enhancement module (IFEM) mitigates this issue through a comprehensive feature restoration pipeline: the sharpening filter (parameter λ) enhances edge structures of small/low-contrast objects while recovering structural symmetry; the pixel-level filter (parameter μ) improves local contrast to separate targets from noisy backgrounds in a symmetry-aware manner; while the disparity enhancement module adaptively handles spatially varying degradation through learnable Gaussian kernels that approximate rotational symmetry. The synergistic operation of these components, integrated via the aggregation module, reconstructs more discriminative and degradation-invariant features while preserving natural symmetry properties. These enhancements collectively reduce spatial activation conflicts between proximate objects and restore detection symmetry across the image. The learnable feature enhancement module significantly improves the visual representation quality of challenging samples, enhances the clarity of small object boundaries, and reduces misclassification caused by background interference. As shown in the visualization results in Figure 4, the IFEM-YOLOv13 model can accurately detect large plastic objects, even in the presence of luminous interference (such as the red luminous object in Figure 4), while maintaining good detection accuracy (confidence level 0.7) and spatial symmetry in response.

5.2. Comparison of Experimental Results

To validate the advanced nature of the model proposed in this study, this experiment selected representative mainstream object detection models in the field of object detection for comparison, including two-stage object detection methods such as Faster-RCNN, Cascade-RCNN [39], Faster-Mobile [40], and SLD-Net [41], as well as single-stage object detection methods such as SSD [42], YOLOv5 [43], YOLOv8 [29], YOLOv13 [44], and RE-DETR [45].

As shown in the experimental results in Table 3, we compared the proposed method with mainstream detection models on the J-EDI deep-sea object detection dataset. The experiments demonstrate that the algorithm proposed in this study outperforms other models in terms of detection accuracy while maintaining an elegant symmetry between performance and efficiency. Among two-stage object detection algorithms, the SLD-Net algorithm achieves the best detection performance on the J-EDI test set, with mAP50 and mAP50-95 values of 94.5% and 75.2%, respectively. The Faster-RCNN algorithm suffers from severe parameter redundancy and poor real-time performance; the Cascade-RCNN algorithm has the lowest real-time performance. The lightweight design of the Mobile-Faster algorithm results in a significant decline in model accuracy, with mAP50 and mAP50-95 values of 84.4% and 68.1%, respectively, both lower than the detection accuracy of the SLD-Net algorithm. Among single-stage object detection algorithms, the SSD algorithm has relatively low detection accuracy and efficiency; the YOLOv5/YOLOv8 algorithms prioritize detection efficiency, resulting in detection accuracy below 82.1%; the RE-DETR algorithm’s FPS is only 34% of ours; Our model achieves the highest values for mAP50 and mAP50-90, with an FPS speed 3.3 times that of the SLD-Net algorithm and a parameter count of only 7.8% that of RE-DETR, demonstrating superior symmetry in the accuracy-efficiency balance. While maintaining a parameter count similar to YOLOv13, it enhances detection capabilities for small targets and complex scenes through symmetry-aware feature processing, demonstrating that the model proposed in this study can achieve deep-sea detection functionality.

To verify the robustness of the model proposed in this study for each category of garbage recognition, we conducted a comparison experiment with the benchmark model YOLOv13 for category-specific recognition accuracy. The experimental results are shown in Table 4.

As shown in the experimental results in Table 4, our method achieves improved performance in the detection tasks of the three categories: Plastic, underwater organisms (BIO), and artificial devices (ROV). The radar chart in Figure 5 visualizes the detection results of this model for multiple targets.

6. Conclusions

This paper proposes a new object detection framework to address the unique challenges posed by image quality degradation caused by adverse factors in complex underwater environments, including low visibility and blurring, which disrupt the natural symmetry of visual information. Extensive experimental results on the J-EDI dataset demonstrate that our proposed detection framework achieves outstanding detection performance compared to several mainstream advanced detection algorithms, while effectively restoring perceptual and structural symmetry. This detection framework holds promise for deployment in underwater detection systems, promoting sustainable practices in ecological conservation, resource management, and marine engineering construction.

We believe the outstanding performance of the proposed detection framework can be attributed to several key factors:

(1) Traditional methods are constrained by parameters and cannot be taught, making it difficult to balance the mutual constraints of multiple degradation factors (such as light attenuation and turbidity changes) that break environmental symmetry. The feature quality reconstruction module IFEM achieves adaptive optimization by combining dynamic learning of key parameters (such as sharpening intensity λ and color compensation μ) through a convolutional attention mechanism, effectively restoring chromatic and spatial symmetry. IFEM suppresses scattering noise while preserving critical structural information and natural symmetry, providing degradation-robust discriminative features for downstream object detection.

(2) Traditional cross-entropy loss tends to cause the model to overemphasize dominant categories while neglecting rare categories with high ecological risk. The improved Focal loss effectively addresses this issue through a dual-mechanism dynamic adjustment, enhancing the model’s ability to learn features of low-frequency key categories while maintaining symmetry in category attention. Its sample reweighting mechanism dynamically adjusts the loss contribution of difficult samples, effectively addressing core challenges in underwater environments such as large target size differences and imbalanced sample distributions while preserving symmetry in gradient updates.

Although this framework performs well in typical underwater degraded scenarios (insufficient lighting, color distortion, reduced contrast, and blurred edges) by restoring visual symmetry, its current limitations should be acknowledged: on one hand, modeling photon scattering effects in extremely turbid water bodies remains incomplete, particularly regarding the complex symmetry-breaking patterns in high-turbidity conditions; on the other hand, target separation capabilities in dense fish school occlusion scenarios need improvement, primarily constrained by the scarcity of high-quality dense sample data that captures the intricate spatial symmetry of schooling behavior. Despite the promising results, this study has certain limitations that warrant further investigation. As rightly pointed out by the reviewers, the proposed Intelligent Feature Enhancement Module (IFEM) is primarily designed and optimized to compensate for optical degradation effects, such as light attenuation, scattering, and color distortion, with a focus on restoring imaging symmetry. Its compliance and robustness to non-optical distortions—such as motion blur, severe occlusion, or geometric deformations that break different types of symmetry—were not explicitly validated and remain an open question. This represents a key area for our future work to enhance the general applicability of the framework in even more diverse and challenging underwater environments. To address these limitations and further advance research in the field of marine engineering, it may be necessary to construct an enhanced dataset that integrates complex degradation patterns such as turbid water bodies, chemical pollution color bias, and bioluminescence interference to establish realistic underwater scenarios with known symmetry properties; and to couple the detection framework with marine pollutant dispersion models that respect physical symmetry principles to achieve real-time prediction of plastic debris migration paths. The future work will include porting and evaluating the model on low-power embedded platforms to comprehensively assess its performance and latency while maintaining computational symmetry in real-world streaming applications.

Author Contributions

Z.F.: conceptualization, methodology, writing—review and editing; F.L.: data curation, software, writing—original draft. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data presented in this study are available on request from the corresponding author. The data are not publicly available due to the privacy policy of the organization.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Peng, L.; Fu, D.; Qi, H.; Lan, C.Q.; Yu, H.; Ge, C. Micro- and nano-plastics in marine environment: Source, distribution and threats—A review. Sci. Total Environ. 2020, 698, 134254. [Google Scholar] [CrossRef]
Stuurop, J.C.; van der Zee, S.E.A.T.M.; Voss, C.I.; French, H.K. Simulating water and heat transport with freezing and cryosuction in unsaturated soil: Comparing an empirical, semi-empirical and physically-based approach. Adv. Water Resour. 2021, 149, 103846. [Google Scholar] [CrossRef]
Gonçalves, J.M.; Bebianno, M.J. Nanoplastics impact on marine biota: A review. Environ. Pollut. 2021, 273, 116426. [Google Scholar] [CrossRef]
Jiang, Z.; Wu, B.; Ma, L.; Zhang, H.; Lian, J. Apm-yolov7 for small-target water-floating garbage detection based on multi-scale feature adaptive weighted fusion. Sensors 2024, 24, 50. [Google Scholar] [CrossRef]
Ho, C.-C.; Tai, L.-L.; Su, E. Deep-Learning-Based Surface Texture Feature Simulation for Surface Defect Inspection. Symmetry 2022, 14, 1465. [Google Scholar] [CrossRef]
Apffel, B.; Novkoski, F.; Eddi, A.; Fort, E. Floating under a levitating liquid. Nature 2020, 585, 48–52. [Google Scholar] [CrossRef]
Chen, H.; Chen, Z.; Yu, H. Enhanced yolov5: An efficient road object detection method. Sensors 2023, 23, 8355. [Google Scholar] [CrossRef]
Han, C.; Meng, Q.; Chen, J.; Cui, L.; Xue, J.; Liu, H.; Yan, R. Variations in Soil Salt Ions and Salinization Degree in Shallow Groundwater Areas During the Freeze–Thaw Period. Water 2025, 17, 2234. [Google Scholar] [CrossRef]
Uhrin, A.V.; Hong, S.; Burgess, H.K.; Lim, S.; Dettloff, K. Towards a North Pacific long-term monitoring program for ocean plastic pollution: A systematic review and recommendations for shorelines. Environ. Pollut. 2022, 310, 119862. [Google Scholar] [CrossRef] [PubMed]
Jingchun, Z.; Yanyun, W.; Weishi, Z.; Chongyi, L. Underwater Image Restoration Via Feature Priors to Estimate Background Light and Optimized Transmission Map. Optics Express 2021, 29, 28228–28245. [Google Scholar] [CrossRef] [PubMed]
Liu, Y.; Wu, J.; Zhao, H.; Li, C.; Mao, J.; Zhang, R.; Liu, J.; Zhao, Q. Ions transport in seasonal frozen farmland soil and its effect on soil salinization chemical properties. Agronomy 2023, 13, 660. [Google Scholar] [CrossRef]
Biswas, P.; Bushra, N.Z.; Shiddiki, A.H. Underwater Image Enhancement Techniques Using Sigmoid Fuzzy, Convolution and Fourier Transform method along with Histogram Equalization. In Proceedings of the 2024 International Conference on Recent Progresses in Science, Engineering and Technology (ICRPSET), Rajshahi, Bangladesh, 7–8 December 2024; pp. 1–4. [Google Scholar] [CrossRef]
Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar] [CrossRef]
Zhu, C.; Zhang, H.; Chen, W.; Tan, M.; Liu, Q. An Occlusion Compensation Learning Framework for Improving the Rendering Quality of Light Field. IEEE Trans. Neural Netw. Learn. Syst. 2021, 32, 5738–5752. [Google Scholar] [CrossRef]
Peng, J.; Huang, X.; Kang, R.; Chen, Z.; Huang, J. Lightweight algorithm for detecting fishing boats in offshore aquaculture areas based on yolov7-tiny. KSII Trans. Internet Inf. Syst. 2025, 19, 811–830. [Google Scholar] [CrossRef]
Shi, Z.; Bahat, Y.; Baek, S.-H.; Fu, Q.; Amata, H.; Li, X.; Chakravarthula, P.; Heidrich, W.; Heide, F. Seeing through obstructions with diffractive cloaking. ACM Trans. Graph. 2022, 41, 37. [Google Scholar] [CrossRef]
Bumbálek, R.; Umurungi, S.N.; Ufitikirezi, J.D.D.M.; Zoubek, T.; Kune, R.; Stehlík, R.; Lin, H.-I.; Bartoš, P. Deep learning in poultry farming: Comparative analysis of yolov8, yolov9, yolov10, and yolov11 for dead chickens detection. Poult. Sci. 2025, 104, 105440. [Google Scholar] [CrossRef] [PubMed]
Wang, B.; Kang, Z.; Dong, P.; Wang, F.; Ma, P.; Bai, J.; Liang, P.; Li, C. Underwater Image Enhancement by Maximum-Likelihood Based Adaptive Color Correction and Robust Scattering Removal. Front. Comput. Sci. 2023, 17, 172702. [Google Scholar] [CrossRef]
Jinghui, Y.; Zhuang, Z.; Dujuan, Z.; Binghua, S.; Xuanyuan, Z.; Jialin, T.; Yunting, L.; Jiongjiang, C.; Wanxin, L. Underwater Object Detection Algorithm Based on Attention Mechanism and Cross-Stage Partial Fast Spatial Pyramidal Pooling. Front. Mar. Sci. 2022, 9, 1056300. [Google Scholar] [CrossRef]
Minh-Tan, P.; Luc, C.; Chloe, F.; Sebastien, L.; Alexandre, B. YOLO-Fine: One-Stage Detector of Small Objects under Various Backgrounds in Remote Sensing Images. Remote Sens. 2020, 12, 2501. [Google Scholar]
Jiehua, L.; Yan, Z.; Shigang, W.; Yu, T. YOLO-DA: An Efficient YOLO-Based Detector for Remote Sensing Object Detection. IEEE Geosci. Remote Sens. Lett. 2023, 20, 6008705. [Google Scholar]
Zhou, H.P.; Wang, J.W. SLD-Net: A seabed garbage detection algorithm based on lightweight network. J. Hubei Polytech. Univ. 2024, 40, 41–46. [Google Scholar]
Qi, X.T.; Yuan, H.C. Small target biological detection based on self-attention mechanism and improved YOLOv5s. J. Hunan Agric. Univ. (Nat. Sci. Ed.) 2024, 50, 108–114. [Google Scholar]
Li, H.; Yang, S.; Zhang, R.; Yu, P.; Fu, Z.; Wang, X.; Kadoch, M.; Yang, Y. Detection of Floating Objects on Water Surface Using YOLOv5s in an Edge Computing Environment. Water 2024, 16, 86. [Google Scholar] [CrossRef]
Cheng, J.H.; Pan, L.H.; Liu, T.; Cheng, B.; Li, J.Y.; Wu, Z.H. Lightweight infrared image enhancement network based on generative adversarial networks. Signal Process. 2024, 40, 484–491. [Google Scholar]
Zhao, Z.Y.; Guo, P.W.; Chen, X.H.; Zhang, Q. SwiftWater: A lightweight SwiftFormer-based underwater image enhancement network with auxiliary prompt module. Digit. Signal Process. 2024, 155, 104722. [Google Scholar] [CrossRef]
Jiang, Y.; Wang, Y.; Zhang, Y.; Guo, Q.; Zhao, M.; Qin, H. A Feature-Enhanced and Adaptive Routing Framework for Fish School Detection on AUVs for Degraded Underwater Imaging Environments. IEEE Internet Things J. 2024, 11, 18335–18350. [Google Scholar] [CrossRef]
Chen, Y.; Yuan, X.; Wang, J.; Wu, R.; Li, X.; Hou, Q.; Cheng, M.-M. YOLO-MS: Rethinking Multi-Scale Representation Learning for Real-time Object Detection. Comput. Res. Repos. 2025, 47, 4240–4252. [Google Scholar] [CrossRef]
Lei, M.Q.; Li, S.Q.; Wu, Y.H.; Hu, H.; Zhou, Y.; Zheng, X.H.; Ding, G.G.; Du, S.Y.; Wu, Z.Z.; Gao, Y. YOLOv13: Real-Time Object Detection with Hypergraph-Enhanced Adaptive Visual Perception. arXiv 2025, arXiv:2506.17733. [Google Scholar]
Peng, Y.; Heng, W.; Chunhua, H.; Shaojuan, L. Underwater Image Restoration for Seafloor Targets with Hybrid Attention Mechanisms and Conditional Generative Adversarial Network. Digit. Signal Process. 2023, 134, 103900. [Google Scholar]
Chen, Y.; Liu, L.; Phonevilay, V.; Gu, K.; Xia, R.; Xie, J.; Zhang, Q.; Yang, K. Image Super-Resolution Reconstruction Based on Feature Map Attention Mechanism. Appl. Intell. 2021, 51, 4367–4380. [Google Scholar] [CrossRef]
Serra-Goncalves, C.; Lavers, J.L.; Bond, A.L. Global Review of Beach Debris Monitoring and Future Recommendations. Environ. Sci. Technol. 2019, 53, 12158–12167. [Google Scholar] [CrossRef]
Santodomingo, N.; Perry, C.; Waheed, Z.; Hussein, M.A.b.S.; Rosedy, A.; Johnson, K.G. Marine Litter Pollution on Coral Reefs of Darvel Bay (east Sabah, Malaysia). Mar. Pollut. Bull. 2021, 173, 112998. [Google Scholar] [CrossRef]
Jian, M.; Qi, Q.; Yu, H.; Dong, J.; Cui, C.; Nie, X.; Zhang, H.; Yin, Y.; Lam, K.-M. The Extended Marine Underwater Environment Database and Baseline Evaluations. Appl. Soft Comput. 2019, 80, 425–437. [Google Scholar] [CrossRef]
Francescangeli, M.; Marini, S.; Martínez, E.; Del Río, J.; Toma, D.M.; Nogueras, M.; Aguzzi, J. Image Dataset for Benchmarking Automated Fish Detection and Classification Algorithms. Sci. Data 2023, 10, 5. [Google Scholar] [CrossRef] [PubMed]
Teng, C.; Kylili, K.; Hadjistassou, C. Deploying Deep Learning to Estimate the Abundance of Marine Debris from Video Footage. Mar. Pollut. Bull. 2022, 183, 114049. [Google Scholar] [CrossRef] [PubMed]
Sun, X.; Wu, P.; Hoi, S.C.H. Face detection using deep learning: An improved faster rcnn approach. Neurocomputing 2018, 299, 42–50. [Google Scholar] [CrossRef]
Roy, A.M.; Bhaduri, J. DenseSPH-YOLOv5: An automated damage detection model based on DenseNet and Swin-Transformer prediction head-enabled YOLOv5 with attention mechanism. Adv. Eng. Inform. 2023, 56, 102007. [Google Scholar] [CrossRef]
Cheng, S.; Lu, J.; Yang, M.; Zhang, S.; Xu, Y.; Zhang, D.; Wang, H. Wheel hub defect detection based on the DS-Cascade RCNN. Measurement 2023, 206, 112208. [Google Scholar] [CrossRef]
Zhou, Z.; Li, R.; Gao, Y.; Zhang, C.; Hei, X. SLDNet: A Branched, Spatio-Temporal Convolution Neural Network for Detecting Solid Line Driving Violation in Intelligent Transportation Systems. In Proceedings of the 2020 Information Communication Technologies Conference (ICTC), Nanjing, China, 29–31 May 2020. [Google Scholar]
Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.-Y.; Berg, A.C. SSD: Single Shot Multibox Detector; Springer: Cham, Switzerland, 2016. [Google Scholar]
Gao, P.; Song, C.; Zhang, Y. A lightweight insulator string defect detection method based on improved yolov5. J. Phys. Conf. Ser. 2024, 2906, 012012. [Google Scholar] [CrossRef]
Yuan, C.; Long, M.; Song, Y.K.; Wang, F.; Su, S. Improved yolov8 for high-precision detection of rail surface defects on heavy-haul railways. Chin. J. Electron. 2024, 34, 802–815. [Google Scholar]
Li, N.; Wei, D. REDETR-RISTD: Real-Time Long-Range Infrared Small Target Detection Network Based on the Reparameterized Efficient Detection Transformer. Sensors 2025, 25, 2771. [Google Scholar] [CrossRef] [PubMed]
Braga, F.; Pasqualetti, S.; Borrillo, F.; Capoferri, A.; Chibireva, M.; Rovegno, L.; Panteghini, M. Definition and Application of Performance Specifications for Measurement Uncertainty of 23 Common Laboratory Tests: Linking Theory to Daily Practice. Clin. Chem. Lab. Med. 2023, 61, 213–223. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Overall network structure diagram of IFEM-YOLOv13.

Figure 2. Specific schematic diagram of the proposed feature quality reconstruction module.

Figure 3. Specific schematic diagram of the proposed disparity enhancement module. GB is the learn-able Gaussian kernel weight matrix G_θ. CSA consists of ordinary convolution and spatial attention convolution, which are used to improve the modeling ability of global parameters.

Figure 4. Comparison of Visual Detection Effect of J-EDI Dataset in Baseline Model with Ours.

Figure 5. Radar graph visualizing the detection results of this model for multiple targets.

Table 1. Detailed data segmentation of the J-EDI dataset.

Types	Train	Val	Test	Total
Plastic	4580	853	396	5829
BIO	1951	70	396	2417
ROV	1799	141	335	2275
Total	8330	1064	1127	10,521

Table 2. Ablation experiment results.

Baseline	IFEM	Focal	mAP50	mAP75	mAP50-95	Params/M
√			0.979	0.895	0.791	2.4
√	√		0.98	0.898	0.8	2.5
√	√	√	0.986	0.91	0.821	2.5

Bold text indicates the best.

Table 3. Experimental results of the multi-class algorithm on the J-EDI test set.

	Methods	mAP50	mAP50-90	FPS	Params (M)
Two-stage	Faster-RCNN	0.884	0.702	7	188
	Cascade-RCNN	0.898	0.718	5	75
	Mobile-Faster	0.844	0.681	28	11
	SLD-Net	0.945	0.752	65	5.4
One-stage	SSD	0.831	0.662	32	45
	YOLOv5	0.947	0.773	320	6.1
	YOLOv8	0.977	0.794	500	2.7
	YOLOv13	0.979	0.791	345	2.4
	RE-DETR	0.971	0.769	73	32
	Ours	0.986	0.821	217	2.5

Bold text indicates the best.

Table 4. Comparative experimental results for validating model robustness.

Method	Types	$P r e c i s i o n$	Recall	mAP50	mAP75	mAP50-95
YOLOv13	Plastic	0.981	0.977	0.993	0.903	0.754
Ours	Plastic	0.987	0.989	0.995 ↑	0.919 ↑	0.778 ↑
YOLOv13	BIO	0.962	0.896	0.971	0.88	0.797
Ours	BIO	0.974	0.94	0.984 ↑	0.902 ↑	0.83 ↑
YOLOv13	ROV	0.932	0.91	0.972	0.903	0.822
Ours	ROV	0.937	0.955	0.98 ↑	0.909 ↑	0.855 ↑
YOLOv13	all	0.958	0.928	0.979	0.895	0.791
Ours	all	0.966	0.962	0.986	0.91	0.821

↑ Indicates an improvement in precision. Bold text indicates the best.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Feng, Z.; Liu, F. Balancing Feature Symmetry: IFEM-YOLOv13 for Robust Underwater Object Detection Under Degradation. Symmetry 2025, 17, 1531. https://doi.org/10.3390/sym17091531

AMA Style

Feng Z, Liu F. Balancing Feature Symmetry: IFEM-YOLOv13 for Robust Underwater Object Detection Under Degradation. Symmetry. 2025; 17(9):1531. https://doi.org/10.3390/sym17091531

Chicago/Turabian Style

Feng, Zhen, and Fanghua Liu. 2025. "Balancing Feature Symmetry: IFEM-YOLOv13 for Robust Underwater Object Detection Under Degradation" Symmetry 17, no. 9: 1531. https://doi.org/10.3390/sym17091531

APA Style

Feng, Z., & Liu, F. (2025). Balancing Feature Symmetry: IFEM-YOLOv13 for Robust Underwater Object Detection Under Degradation. Symmetry, 17(9), 1531. https://doi.org/10.3390/sym17091531

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Balancing Feature Symmetry: IFEM-YOLOv13 for Robust Underwater Object Detection Under Degradation

Abstract

1. Introduction

2. Related Works

3. Methods

3.1. IFEM-YOLOv13

3.2. Feature Quality Reconstruction Module

3.3. Loss Function Design

4. Experimental Protocols and Evaluation Measures

4.1. Datasets

4.2. Experimental Platform and Model Evaluation Measures

5. Experimental Results and Discussion

5.1. Ablation Experiment and Visualization Analysis

5.2. Comparison of Experimental Results

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI