Author Contributions
Conceptualization, Y.C., M.S. and X.G.; Methodology, J.W., Z.L., Z.M., Y.X., X.G. and S.H.; Software, J.W., M.S., Z.L., Z.M., Y.X., Y.W. and X.G.; Validation, Y.C., Y.M., Z.M., Y.X., Y.W. and S.H.; Formal analysis, M.S. and S.H.; Investigation, J.W.; Resources, Z.L. and S.H.; Data curation, Y.C.; Writing—original draft, Y.C. and Y.W.; Writing—review & editing, Y.C., J.W., M.S. and X.G.; Visualization, J.W. and Y.M.; Supervision, Y.C., Y.M., Z.L., Y.X. and S.H.; Project administration, M.S., Y.X., Y.W. and X.G.; Funding acquisition, Y.M. All authors have read and agreed to the published version of the manuscript.
Figure 1.
Overall architecture of DenseFish-v13 for smart aquaculture monitoring.
Figure 1.
Overall architecture of DenseFish-v13 for smart aquaculture monitoring.
Figure 2.
Bio-harmonic frequency gate (B-HFG) for spectral-domain feature refinement with hydrodynamic-informed constraint (HIC). The input feature map is transformed into the frequency domain via FFT, where the magnitude spectrum is adaptively modulated by a learnable harmonic attention map while preserving phase information. The filtered spectrum is reconstructed via IFFT to obtain denoized features for downstream detection. HIC provides additional physical consistency by constraining motion patterns.
Figure 2.
Bio-harmonic frequency gate (B-HFG) for spectral-domain feature refinement with hydrodynamic-informed constraint (HIC). The input feature map is transformed into the frequency domain via FFT, where the magnitude spectrum is adaptively modulated by a learnable harmonic attention map while preserving phase information. The filtered spectrum is reconstructed via IFFT to obtain denoized features for downstream detection. HIC provides additional physical consistency by constraining motion patterns.
Figure 3.
YOLOv13-Mamba backbone with global occlusion-aware modeling.
Figure 3.
YOLOv13-Mamba backbone with global occlusion-aware modeling.
Figure 4.
Density-aware feature decoupling for NMS-free matching. The asterisk (*) marks the selected or emphasized feature/matching branch. Different colors are used only for visual distinction and do not denote different operations or additional mathematical meanings.
Figure 4.
Density-aware feature decoupling for NMS-free matching. The asterisk (*) marks the selected or emphasized feature/matching branch. Different colors are used only for visual distinction and do not denote different operations or additional mathematical meanings.
Figure 5.
Bio-kinematic behavior head for trajectory-based behavioral analysis. Object trajectories are constructed from consecutive frames, from which kinematic features (velocity and turning angle) and their statistics are computed. A rule-based bio-logic tree then classifies behavioral states into feeding, hypoxia, and normal, thereby enabling trajectory-level behavioral interpretation in aquaculture scenarios. The trajectory lines represent fish movement paths, the green dots indicate trajectory starting points, the red stars mark trajectory endpoints, and the black arrows indicate movement directions.
Figure 5.
Bio-kinematic behavior head for trajectory-based behavioral analysis. Object trajectories are constructed from consecutive frames, from which kinematic features (velocity and turning angle) and their statistics are computed. A rule-based bio-logic tree then classifies behavioral states into feeding, hypoxia, and normal, thereby enabling trajectory-level behavioral interpretation in aquaculture scenarios. The trajectory lines represent fish movement paths, the green dots indicate trajectory starting points, the red stars mark trajectory endpoints, and the black arrows indicate movement directions.
Figure 6.
Representative visual challenge types targeted by DenseFish-v13. (a) Adult-fish aquaculture monitoring scene, illustrating the basic application context of aquatic livestock inspection. (b) Multi-scale and multi-pose fish distribution, where fish appear with different sizes, orientations, and depths, requiring global context modeling. (c) Crowded occlusion and boundary entanglement, where adjacent fish bodies overlap, and instance separation becomes difficult. (d) Strong aeration-bubble interference, showing bubble-induced high-frequency disturbance that motivates the proposed Bio-Harmonic Frequency Gate. (e) Illumination fluctuation and low-contrast underwater scene, where unstable visibility and turbidity degrade fish contours and local textures. (f) Motion blur and trajectory instability, where fast fish movement and underwater disturbance reduce localization stability and motivate trajectory-level behavior analysis. Together, these image types reflect the core methodological motivations of DenseFish-v13, including global occlusion-aware modeling, density-aware instance separation, bubble-noise suppression, underwater visibility robustness, and motion-based behavior interpretation.
Figure 6.
Representative visual challenge types targeted by DenseFish-v13. (a) Adult-fish aquaculture monitoring scene, illustrating the basic application context of aquatic livestock inspection. (b) Multi-scale and multi-pose fish distribution, where fish appear with different sizes, orientations, and depths, requiring global context modeling. (c) Crowded occlusion and boundary entanglement, where adjacent fish bodies overlap, and instance separation becomes difficult. (d) Strong aeration-bubble interference, showing bubble-induced high-frequency disturbance that motivates the proposed Bio-Harmonic Frequency Gate. (e) Illumination fluctuation and low-contrast underwater scene, where unstable visibility and turbidity degrade fish contours and local textures. (f) Motion blur and trajectory instability, where fast fish movement and underwater disturbance reduce localization stability and motivate trajectory-level behavior analysis. Together, these image types reflect the core methodological motivations of DenseFish-v13, including global occlusion-aware modeling, density-aware instance separation, bubble-noise suppression, underwater visibility robustness, and motion-based behavior interpretation.
![Symmetry 18 01084 g006 Symmetry 18 01084 g006]()
Figure 7.
Sensitivity curves of the repulsion mechanism. (a) Influence of on mAP@50:95, Counting MAE, and . (b) Influence of on mAP@50:95, Counting MAE, and .
Figure 7.
Sensitivity curves of the repulsion mechanism. (a) Influence of on mAP@50:95, Counting MAE, and . (b) Influence of on mAP@50:95, Counting MAE, and .
Figure 8.
Qualitative comparison of DenseFish-v13 with representative baseline detectors under challenging aquaculture visual conditions. The four rows show representative scenarios of (a) crowded occlusion, (b) bubble interference, (c) low visibility, and (d) motion blur, respectively. The columns present the ground-truth annotations and the detection results of YOLOv13-m, RT-DETR-l, and DenseFish-v13. Green bounding boxes indicate correctly localized fish instances. In contrast, red bounding boxes highlight typical failure cases, including false positives due to bubble-like noise, missed fish in degraded visibility, merged detections in overlapping regions, and unstable localization under motion blur. Compared with YOLOv13-m and RT-DETR-l, DenseFish-v13 produces more complete fish localization, clearer separation between adjacent individuals, and fewer noise-induced false detections, demonstrating the effectiveness of global occlusion-aware modeling, spectral noise suppression, and density-aware instance separation.
Figure 8.
Qualitative comparison of DenseFish-v13 with representative baseline detectors under challenging aquaculture visual conditions. The four rows show representative scenarios of (a) crowded occlusion, (b) bubble interference, (c) low visibility, and (d) motion blur, respectively. The columns present the ground-truth annotations and the detection results of YOLOv13-m, RT-DETR-l, and DenseFish-v13. Green bounding boxes indicate correctly localized fish instances. In contrast, red bounding boxes highlight typical failure cases, including false positives due to bubble-like noise, missed fish in degraded visibility, merged detections in overlapping regions, and unstable localization under motion blur. Compared with YOLOv13-m and RT-DETR-l, DenseFish-v13 produces more complete fish localization, clearer separation between adjacent individuals, and fewer noise-induced false detections, demonstrating the effectiveness of global occlusion-aware modeling, spectral noise suppression, and density-aware instance separation.
![Symmetry 18 01084 g008 Symmetry 18 01084 g008]()
Figure 9.
Spectral mechanism visualization of B-HFG under strong synthetic bubble perturbation. (a) Representative test image with artificially added bubble artefacts; (b) feature-response heatmap of the baseline model without B-HFG, showing strong activation on bubble-like clutter; (c) refined feature-response heatmap after applying B-HFG, where noise-related responses are suppressed while fish contours remain salient; (d) detection result of the baseline model; (e) detection result of DenseFish-v13. In the heatmaps, warmer colors indicate stronger feature responses, whereas cooler colors indicate weaker responses.
Figure 9.
Spectral mechanism visualization of B-HFG under strong synthetic bubble perturbation. (a) Representative test image with artificially added bubble artefacts; (b) feature-response heatmap of the baseline model without B-HFG, showing strong activation on bubble-like clutter; (c) refined feature-response heatmap after applying B-HFG, where noise-related responses are suppressed while fish contours remain salient; (d) detection result of the baseline model; (e) detection result of DenseFish-v13. In the heatmaps, warmer colors indicate stronger feature responses, whereas cooler colors indicate weaker responses.
Figure 10.
Performance comparison under increasing density levels. (a) Low-to-extreme decline of occlusion recall . (b) mAP@50:95 comparison across low-, medium-, high-, and extreme-density subsets.
Figure 10.
Performance comparison under increasing density levels. (a) Low-to-extreme decline of occlusion recall . (b) mAP@50:95 comparison across low-, medium-, high-, and extreme-density subsets.
Figure 11.
Qualitative comparison of bounding-box predictions under extreme occlusion.
Figure 11.
Qualitative comparison of bounding-box predictions under extreme occlusion.
Figure 12.
Trajectory-based behavior recognition results generated by the Bio-Kinematic Behavior Head. (a) A smooth trajectory with moderate swimming velocity, low turning variance, and no persistent surface-floating pattern characterizes normal behavior. (b) Feeding behavior shows a more tortuous trajectory with frequent directional changes, corresponding to high swimming velocity and high turning variance. (c) Hypoxia-related floating behavior is characterized by slow movement near the water surface, with low turning variance and high surface proximity. Green circles and red stars denote the start and end points of each trajectory, respectively; black arrows indicate the movement direction, and the dashed horizontal line in (c) represents the water surface. All trajectories are plotted using normalized image coordinates.
Figure 12.
Trajectory-based behavior recognition results generated by the Bio-Kinematic Behavior Head. (a) A smooth trajectory with moderate swimming velocity, low turning variance, and no persistent surface-floating pattern characterizes normal behavior. (b) Feeding behavior shows a more tortuous trajectory with frequent directional changes, corresponding to high swimming velocity and high turning variance. (c) Hypoxia-related floating behavior is characterized by slow movement near the water surface, with low turning variance and high surface proximity. Green circles and red stars denote the start and end points of each trajectory, respectively; black arrows indicate the movement direction, and the dashed horizontal line in (c) represents the water surface. All trajectories are plotted using normalized image coordinates.
Table 1.
Key training and deployment settings.
Table 1.
Key training and deployment settings.
| Parameter | Value | Justification |
|---|
| Batch Size | 32 (GPU), 1 (Jetson) | Balances training stability and real-time edge inference |
| Learning Rate (initial) | warm-up; cosine decay | Stable starting point for dense underwater detector training |
| Learning Rate Schedule | Cosine decay | Smooth convergence for noisy and crowded scenes |
| Weight Decay (L2 reg) | 0.0005 | Reduces overfitting to scene-specific noise patterns |
| Momentum (SGD) | 0.937 | Standard setting for stable detector optimization |
| Optimizer | SGD with momentum | More stable than Adam for object detection |
| Loss Function | CIoU + BCE + DFL + Repulsion + HIC | Improves localization in crowded fish scenes |
| Warm-up Epochs | 10 | Stabilizes early-stage optimization |
| Total Epochs | 100 | Sufficient for convergence under the current setup |
| Repulsion Loss Weight () | 0.2 | Balances dense-instance separation and training stability |
| Repulsion Activation Epoch | 50 | Avoids unstable matching before basic localization is learned |
| Spatial Density Threshold () | 0.5 | Activates repulsion mainly in truly crowded regions |
| Behavior Thresholds ) | Percentile init. + validation grid search | Distinguish normal, feeding, and hypoxia-related motion states. |
| Precision Mode | FP16 | Improves edge-side inference efficiency on Orin NX |
| HIC Loss Weight() | 0.1 | Balances kinematic regularization without destabilizing localization loss |
| HIC Activation Epoch | 50 | Applied after stable trajectory formation |
| MoE-SG Expert Count | 3 | Covers high-aeration, high-turbidity, and clear-water conditions |
| CPO Prototype Bank Size | 256 per class | Sufficient coverage of fish orientation and scale variation |
| CPO EMA Momentum | 0.999 | Stable prototype updates across training batches |
Table 2.
Component-wise ablation (baseline: YOLOv13-m).
Table 2.
Component-wise ablation (baseline: YOLOv13-m).
| Variant | Precision (%) | Recall (%) | mAP@50 (%) | mAP@50:95 (%) | Counting MAE (↓) |
|---|
| Baseline (YOLOv13-m) | 82.4 | 78.1 | 88.6 | 56.8 | 10.2 |
| + Bi-MSW-Mamba Block | 84.1 | 80.7 | 90.2 | 59.5 (+2.7) | 8.5 |
| + Bi-MSW-Mamba + B-HFG | 86.8 | 82.4 | 91.7 | 62.1 (+2.6) | 5.8 |
| + Bi-MSW-Mamba + B-HFG + MoE-SG | 87.6 | 83.5 | 92.5 | 63.0 (+0.9) | 5.1 |
| + above + + CPO | 88.5 | 84.9 | 93.4 | 64.2 (+1.2) | 4.1 |
| + above + (Full DenseFish-v13) | 88.9 | 85.6 | 93.8 | 64.8 (+0.6) | 3.7 |
Table 3.
Computational overhead analysis of each ablation variant.
Table 3.
Computational overhead analysis of each ablation variant.
| Variant | Params (M) | FLOPs (G) | Peak Memory (GB) | FPS (RTX 4090) | FPS (Orin NX) |
|---|
| Baseline YOLOv13-m | 20.1 | 68.4 | 1.42 | 356 | 130 |
| + Bi-MSW-Mamba Block | 21.7 | 71.2 | 1.55 | 334 | 126 |
| + Bi-MSW-Mamba + B-HFG | 22.0 | 72.6 | 1.62 | 326 | 125 |
| + above + Repulsion Loss | 22.0 | 72.6 | 1.62 | 326 | 125 |
| + above + Repulsion Loss + CPO | 22.0 | 72.6 | 1.62 | 326 | 125 |
| + above + HIC (Full DenseFish-v13) | 22.0 | 72.6 | 1.62 | 326 | 125 |
Table 4.
Sensitivity analysis of .
Table 4.
Sensitivity analysis of .
| mAP@50:95 (%) | (%) | Counting MAE |
|---|
| 0.0 | 62.1 | 62.9 | 5.8 |
| 0.1 | 63.5 | 66.1 | 4.7 |
| 0.2 | 64.2 | 68.7 | 4.1 |
| 0.3 | 63.9 | 68.1 | 4.3 |
| 0.4 | 63.1 | 66.4 | 4.9 |
Table 5.
Sensitivity analysis of .
Table 5.
Sensitivity analysis of .
| mAP@50:95 (%) | (%) | Counting MAE |
|---|
| 0.3 | 62.9 | 66.2 | 5.0 |
| 0.4 | 63.7 | 67.8 | 4.4 |
| 0.5 | 64.2 | 68.7 | 4.1 |
| 0.6 | 63.8 | 67.5 | 4.6 |
| 0.7 | 63.0 | 66.0 | 5.1 |
Table 6.
Orthogonal ablation study based on the vanilla YOLOv13-m baseline.
Table 6.
Orthogonal ablation study based on the vanilla YOLOv13-m baseline.
| Bi-MSW-Mamba | B-HFG | Repulsion Loss | HIC | Precision (%) | Recall (%) | mAP@50 (%) | mAP@50:95 (%) | Counting MAE ↓ |
|---|
| × | × | × | × | 83.6 | 79.2 | 89.4 | 58.7 | 8.4 |
| √ | × | × | × | 84.9 | 81.1 | 90.5 | 60.4 | 7.2 |
| × | √ | × | × | 85.7 | 81.8 | 91.1 | 61.0 | 6.5 |
| × | × | √ | × | 85.2 | 82.4 | 90.9 | 60.8 | 5.9 |
| √ | √ | × | × | 86.9 | 83.2 | 92.0 | 62.8 | 5.2 |
| √ | × | √ | × | 87.1 | 83.8 | 92.4 | 63.1 | 4.8 |
| × | √ | √ | × | 87.4 | 84.1 | 92.7 | 63.4 | 4.6 |
| √ | √ | √ | × | 88.5 | 84.9 | 93.4 | 64.2 | 4.1 |
| √ | √ | √ | √ | 88.9 | 85.6 | 93.8 | 64.8 | 3.7 |
Table 7.
Effect of NMS-free matching and density-aware repulsion.
Table 7.
Effect of NMS-free matching and density-aware repulsion.
| Detection Head | mAP@50:95 (%) | Counting MAE ↓ | Occlusion Recall (%) ↑ | Merged Error Rate (%) ↓ |
|---|
| YOLOv13 with NMS | 58.7 | 8.4 | 53.6 | 18.2 |
| NMS-free YOLOv13 | 60.5 | 6.6 | 59.1 | 14.7 |
| NMS-free + Repulsion Loss | 63.5 | 4.8 | 66.3 | 9.6 |
| NMS-free YOLOv13 + Repulsion + CPO | 64.2 | 4.1 | 68.7 | 8.1 |
Table 8.
Performance on the extreme-density split (Dense-Aqua dataset).
Table 8.
Performance on the extreme-density split (Dense-Aqua dataset).
| Model | Architecture Paradigm | mAP@50:95 | Counting MAE (↓) | Occlusion Recall (Rocc) (↑) | FPS (Edge) |
|---|
| Deep-Fish [45] | Point-supervision | - | 9.2 ± 0.6 | - | 45 |
| CSRNet [46] | Density-map based | - | 7.8 ± 0.5 | - | 32 |
| FR-CNN [9] | Faster R-CNN based | 51.4 ± 1.2 | 14.2 ± 0.7 | 50.2 ± 1.5 | 18 |
| YOLO-FC [8] | YOLO-based (CNN) | 57.2 ± 0.9 | 9.6 ± 0.5 | 56.4 ± 1.3 | 95 |
| YOLOv8-m | CNN w/NMS | 54.2 ± 1.1 | 12.4 ± 0.6 | 45.1 ± 1.4 | 135 |
| YOLOv10-m | CNN (Early NMS-Free) | 55.1 ± 1.0 | 11.8 ± 0.6 | 46.8 ± 1.3 | 142 |
| YOLOv11-m | Advanced CNN | 56.8 ± 1.0 | 10.2 ± 0.5 | 49.3 ± 1.2 | 130 |
| YOLOv13-m | Vanilla YOLOv13 baseline | 58.7 ± 0.9 | 8.4 ± 0.4 | 53.6 ± 1.1 | 128 |
| RT-DETR-l | Transformer | 58.3 ± 0.9 | 7.2 ± 0.4 | 58.4 ± 1.1 | 74 |
| CrowdDet | Multi-Head CNN | 56.5 ± 1.0 | 8.5 ± 0.5 | 55.2 ± 1.2 | 98 |
| DenseFish-v13 | YOLOv13-Mamba | 64.2 ± 0.7 | 4.1 ± 0.2 | 68.7 ± 0.9 | 125 |
Table 9.
Density-level comparison on Dense-Aqua.
Table 9.
Density-level comparison on Dense-Aqua.
| Model | Low mAP@50:95 | Medium mAP@50:95 | Extreme mAP@50:95 | Low MAE | Medium MAE | Extreme MAE | Low () | Medium () | Extreme () |
|---|
| YOLOv8-m | 72.8 ± 0.6 | 63.5 ± 0.9 | 54.2 ± 1.1 | 3.6 ± 0.2 | 7.8 ± 0.4 | 12.4 ± 0.6 | 69.4 ± 0.8 | 55.7 ± 1.2 | 45.1 ± 1.4 |
| YOLOv10-m | 73.6 ± 0.6 | 64.2 ± 0.8 | 55.1 ± 1.0 | 3.4 ± 0.2 | 7.4 ± 0.4 | 11.8 ± 0.6 | 70.8 ± 0.8 | 57.2 ± 1.1 | 46.8 ± 1.3 |
| YOLOv11-m | 75.1 ± 0.5 | 66.7 ± 0.8 | 56.8 ± 1.0 | 3.1 ± 0.2 | 6.8 ± 0.3 | 10.2 ± 0.5 | 72.6 ± 0.7 | 59.5 ± 1.1 | 49.3 ± 1.2 |
| YOLOv13-m | 76.2 ± 0.5 | 68.1 ± 0.7 | 58.7 ± 0.9 | 2.8 ± 0.1 | 5.9 ± 0.3 | 8.4 ± 0.4 | 74.5 ± 0.7 | 62.3 ± 1.0 | 53.6 ± 1.1 |
| RT-DETR-l | 76.4 ± 0.5 | 68.9 ± 0.7 | 58.3 ± 0.9 | 2.9 ± 0.2 | 5.3 ± 0.3 | 7.2 ± 0.4 | 76.3 ± 0.6 | 65.4 ± 0.9 | 58.4 ± 1.1 |
| CrowdDet | 73.2 ± 0.6 | 65.1 ± 0.8 | 56.5 ± 1.0 | 3.0 ± 0.2 | 5.9 ± 0.3 | 8.5 ± 0.5 | 74.1 ± 0.7 | 62.8 ± 1.0 | 55.2 ± 1.2 |
| DenseFish-v13 | 78.3 ± 0.4 | 72.4 ± 0.6 | 64.2 ± 0.7 | 2.4 ± 0.1 | 3.8 ± 0.2 | 4.1 ± 0.2 | 81.5 ± 0.5 | 74.2 ± 0.7 | 68.7 ± 0.9 |
Table 10.
Disaggregated performance analysis on source datasets.
Table 10.
Disaggregated performance analysis on source datasets.
| Source Dataset | Environment | mAP@50:95 (%) | Counting MAE (↓) | Precision (%) | Recall (%) |
|---|
| Pond-Aqua | Turbid Pond | 62.3 ± 0.8 | 4.2 ± 0.3 | 86.4 | 82.5 |
| Salmon-Aqua | Sea-cage | 72.1 ± 0.6 | 1.8 ± 0.1 | 93.7 | 91.2 |
| Combined (Dense-Aqua) | Full Test Set | 64.8 ± 0.7 | 3.7 ± 0.2 | 88.9 | 85.6 |
Table 11.
Robustness comparison under synthetic bubble perturbation.
Table 11.
Robustness comparison under synthetic bubble perturbation.
| Model | Original | Synthetic Bubble | Drop (Δ) |
|---|
| YOLOv13-m | 60.1% | 51.2% | −8.9% |
| DenseFish-v13 | 64.8% | 63.5% | −1.3% |
Table 12.
Performance comparison under different levels of synthetic bubble perturbation.
Table 12.
Performance comparison under different levels of synthetic bubble perturbation.
| Model | Clear mAP | Low mAP | Medium mAP | Strong mAP | Original MAE | Strong MAE | Original () | Strong () | ΔmAP | ΔMAE |
|---|
| YOLOv13-m | 60.1 | 57.3 | 54.4 | 51.2 | 6.4 | 11.6 | 55.1 | 43.8 | −8.9 | +5.2 |
| RT-DETR-l | 60.8 | 59.2 | 56.5 | 54.3 | 5.0 | 8.1 | 62.7 | 54.9 | −6.5 | +3.1 |
| CrowdDet | 58.9 | 57.1 | 54.6 | 52.4 | 5.7 | 9.4 | 59.8 | 51.6 | −6.5 | +3.7 |
| DenseFish-v13 | 64.8 | 64.4 | 64.0 | 63.5 | 3.9 | 4.3 | 69.4 | 67.8 | −1.3 | +0.4 |
Table 13.
Comparison of different noise suppression strategies under strong synthetic bubble perturbation.
Table 13.
Comparison of different noise suppression strategies under strong synthetic bubble perturbation.
| Filtering Strategy | mAP@50:95 (%) | Counting MAE ↓ | Precision (%) | Recall (%) |
|---|
| No filtering | 55.8 | 8.9 | 80.6 | 76.4 |
| Gaussian filtering | 53.4 | 9.5 | 78.9 | 74.7 |
| Fixed low-pass filtering | 54.1 | 9.1 | 79.5 | 75.2 |
| Fixed band-pass filtering | 57.3 | 7.6 | 82.1 | 78.5 |
| B-HFG | 63.5 | 4.3 | 87.8 | 84.2 |
Table 14.
Performance comparison under different density/occlusion levels.
Table 14.
Performance comparison under different density/occlusion levels.
| Model | Low Density | Medium Density | High Density | Extreme Density | ΔmAP (Low → Extreme) | (Low → Extreme) |
|---|
| YOLOv13-m | 75.1 ± 0.5 | 66.7 ± 0.8 | 60.3 ± 0.9 | 56.8 ± 1.0 | −18.3 | 72.6 → 49.3 |
| RT-DETR-l | 76.4 ± 0.5 | 68.9 ± 0.7 | 62.0 ± 0.8 | 58.3 ± 0.9 | −18.1 | 76.3 → 58.4 |
| CrowdDet | 73.2 ± 0.6 | 65.1 ± 0.8 | 60.8 ± 0.9 | 56.5 ± 1.0 | −16.7 | 74.1 → 55.2 |
| DenseFish-v13 | 78.3 ± 0.4 | 72.4 ± 0.6 | 68.1 ± 0.6 | 64.2 ± 0.7 | −14.1 | 81.5 → 68.7 |
Table 15.
Quantitative results of behavior classification.
Table 15.
Quantitative results of behavior classification.
| Behavior State | Precision (%) | Recall (%) | F1-Score (%) |
|---|
| Normal | 90.8 | 92.4 | 91.6 |
| Feeding | 88.7 | 87.3 | 88.0 |
| Hypoxia-related Floating | 86.5 | 84.9 | 85.7 |
| Macro average (Ours) | 88.7 | 88.2 | 88.4 |
| SlowFast [47] (SOTA) | - | - | 82.1 |
| FishBehavior-Net [36] (SOTA) | - | - | 85.2 |
Table 16.
Edge deployment performance comparison on Jetson Orin NX.
Table 16.
Edge deployment performance comparison on Jetson Orin NX.
| Model | Params (M) | FLOPs (G) | Peak Memory (GB) | FPS (Orin NX) | mAP@50:95 (%) |
|---|
| YOLOv8-m | 25.9 | 78.9 | 1.56 | 135 | 54.2 |
| YOLOv10-m | 16.5 | 64.0 | 1.32 | 142 | 55.1 |
| YOLOv13-m | 20.1 | 68.4 | 1.42 | 130 | 56.8 |
| RT-DETR-l | 32.0 | 108.3 | 2.85 | 74 | 58.3 |
| CrowdDet | 28.7 | 94.6 | 2.21 | 98 | 56.5 |
| DenseFish-v13 | 22.0 | 72.6 | 1.62 | 125 | 64.2 |