Fish Detection in Fishways for Hydropower Stations Using Bidirectional Cross-Scale Feature Fusion

Wang, Junming; Gong, Yuanfeng; Deng, Wupeng; Lu, Enshun; Hu, Xinyu; Zhang, Daode

doi:10.3390/app15052743

Open AccessArticle

Fish Detection in Fishways for Hydropower Stations Using Bidirectional Cross-Scale Feature Fusion

by

Junming Wang

,

Yuanfeng Gong

,

Wupeng Deng

^*,

Enshun Lu

,

Xinyu Hu

and

Daode Zhang

School of Mechanical Engineering, Hubei University of Technology, Wuhan 430068, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(5), 2743; https://doi.org/10.3390/app15052743

Submission received: 7 January 2025 / Revised: 18 February 2025 / Accepted: 20 February 2025 / Published: 4 March 2025

Download

Browse Figures

Versions Notes

Abstract

Fishways can effectively validate the effectiveness and rationality of their construction, optimize operational modes, and achieve intelligent scientific management through fish species detection. Traditional fish species detection methods for fishways are unsuitable due to inefficiency and disruption of the fish ecological environment. Therefore, combining cameras with target detection technology provides a better solution. However, challenges include the limited computational power of onsite equipment, the complexity of model deployment, low detection accuracy, and slow detection speed, all of which are significant obstacles. This paper proposes a fish detection model for accurate and efficient fish detection. Firstly, the backbone network integrates FasterNet-Block, C2f, and an efficient multi-scale EMA attention mechanism to address attention dispersion problems during feature extraction, delivering real-time object detection across different scales. Secondly, the Neck introduces a novel architecture to enhance feature fusion by integrating the RepBlock and BiFusion modules. Finally, the performance of the fish detection model is demonstrated based on the Fish26 dataset, in which the detection accuracy, computational cost, and parameter count are significantly optimized by 1.7%, 23.4%, and 24%, respectively, compared to the state-of-the-art model. At the same time, we installed detection devices in a specific fishway and deployed the proposed method within these devices. We collected data on four fish species passing through the fishway to create a dataset and train the model. The results of the practical application demonstrated superior fish detection capabilities, with rapid detection ability achieved while minimizing resource usage. This validated the effectiveness of the proposed method for equipment deployment in real-world engineering environments. This marks a shift from traditional manual detection to intelligent fish species detection in fishways, promoting water resource utilization and the protection of fish ecological environments.

Keywords:

fishway; fish detection; multi-scale feature extraction; cross-scale feature fusion

1. Introduction

Recently, the wide construction of hydropower stations in river basins, including sluices and dams, has disrupted river continuity [1]. This has significantly impacted the habitats of aquatic species, and some species relying on upstream migration are facing the threat of extinction [2]. Fishways are designed to assist migratory fish in overcoming the abovementioned challenges, thereby ensuring the ecological balance within the river basin. Detecting aquatic life in migration channels is crucial for protecting the natural behavior of fish [3], and fish detection is a critical step [4]. However, the complexity of deploying models onsite in fishways is high, and images with turbid sediment are difficult to detect. To overcome these limitations, more efficient and accurate detection methods are needed.

With the development of deep learning technology, significant progress has been made in object detection for vessels on the water. Tang et al. [5] proposed a PEGNet detection model to address issues such as missed detections and false positives in dense watercraft scenarios, with experimental results showing its superior performance. Ieracitano et al. [6] introduced HO-ShipNet, a custom convolutional neural network (CNN), and experimental results demonstrated its excellent detection performance. This provides a solid foundation for fish detection in underwater environments and can be used to address many challenges associated with underwater images, including variations in lighting, water turbidity, and the diversity of marine species’ appearances [7]. Huang et al. [8] enhanced the effectiveness of Faster R-CNN for detecting and recognizing marine organisms by utilizing data augmentation techniques. Zeng et al. [9] integrated Faster R-CNN with an adversarial occlusion network to achieve underwater object detection and overcome the challenges posed by occlusions in underwater environments. Similarly, Song et al. [10] improved underwater object detection accuracy by reweighting R-CNN samples based on errors from the Region Proposal Network in Boosting R-CNN. To enhance the performance of Faster R-CNN in fish detection, Dulhare et al. [11] applied data augmentation techniques to improve underwater image quality. To focus on the morphological feature measurement of fish, Han et al. [12] designed the Mask_LaC R-CNN model, which offers stronger object segmentation capabilities compared to Faster R-CNN. Zhao et al. [13] developed Composited FishNet for detecting fish based on low-quality underwater videos. Conrady et al. [14] used Mask R-CNN for automated detection and classification of southern African Roman seabreams. Feng et al. [15] developed Faster R-CNN for shellfish recognition through an enhanced deep-learning framework, demonstrating the flexibility of Faster R-CNN in classifying various underwater organisms. Md et al. [16] and Kislu et al. [17] applied Faster R-CNN for seagrass detection, while Noman et al. incorporated NASNet further to refine detection accuracy. Song et al. [18] integrated MSRCR with Mask R-CNN to recognize underwater creatures from small sample datasets.

In addition to Faster R-CNN and Mask R-CNN, YOLO is also widely used to accomplish fish detection tasks. Joseph et al. [19] first proposed YOLO for real-time object detection, thereby activating the application for underwater object detection. Li et al. [20] adapted YOLO for real-time detection of underwater fish by incorporating transfer learning. Liu et al. [21] utilized YOLO v4 to study fish locomotion in aquaponic systems. In addition, Eldin et al. [22] developed Msr-YOLO to improve fish detection and tracking in fish farms, addressing challenges specific to aquaculture environments. Abdullah et al. [23] proposed YOLO-Fish for detecting fish in realistic underwater settings, which underscores the ongoing refinement of YOLO models. Zheng et al. [24] applied an improved YOLO v4 model for fish object detection, contributing to the enhancement of YOLO’s performance in aquatic contexts. Li et al. [25] combined YOLOv5s with TensorRT deployment for monitoring fish passage. Qin et al. [26] introduced YOLO8-FASG for fish detection and equipped it with underwater robotic systems.

The above research has greatly developed the techniques of fish detection, but it still faces several challenges. First, traditional fishway fish detection heavily relies on manual labor, which requires an intelligent approach to reduce costs and improve management efficiency. Second, the complex underwater environment in fishways next to hydropower stations leads to low image contrast and insufficient color information, weakening the texture and contour details of fish, which results in the fish detection algorithm’s accuracy not meeting the demands of the fishway environment. Finally, due to the limited computational capacity of onsite equipment in fishways, standard feature fusion modules contain many bottleneck blocks, increasing computational consumption and limiting the convergence speed of model training. Moreover, the numerous sampling operations in standard object detection algorithms add to computational complexity. Since fishway detection devices have limited space, the detection algorithms must be embedded into underwater cameras. Therefore, the detection algorithms need to be more lightweight to meet the demands of fish detection tasks in fishways. To address these issues, we propose a fish detection algorithm for fish passage detection in fishways. The key contributions are summarized as follows.

(1).: Proposing a C2f-Faster-EMA module by introducing the C2f module, FasterNet-Block, and efficient multi-scale EMA attention mechanism, delivering the improvement of feature extraction and operational efficiency for fish detection. By integrating the RepBlock and BiFusion modules, multi-scale features are captured, providing a more accurate and robust solution for detecting diverse target classes.
(2).: Constructing a freshwater fish dataset and evaluating the performance of existing loss functions on fish detection, demonstrating the satisfactory capability of the proposed model compared to the state-of-the-art methods.
(3).: In practical applications at a designated hydropower station’s fishway, detection devices were installed, and the proposed method was deployed in underwater cameras. The system was trained using data from four fish species passing through the fishway, followed by fish target detection over some time. The results show that the device can stably and accurately detect fish, performing well in frame-by-frame capture and detection.

This paper is organized as follows. Section 2 introduces the proposed fish detection model. Section 3 illustrates the experiments conducted and the results obtained. Section 4 covers the practical application in the field and demonstrates the application results. Section 5 summarizes the paper and concludes the future direction.

2. Materials and Methods

Traditional fishway fish detection methods are time-consuming, incur high labor costs, and are not suitable for fishway environments. The water quality in fishway environments is often turbid and unclear, as shown in Figure 1. The use of deep learning methods can effectively meet the fishway fish detection task, enabling automated fishway fish detection and intelligent management of fishways.

This paper proposes a novel, more intelligent fish detection model for fishway environments to address the various challenges in fish detection within fishways. First, the model’s detection performance is comprehensively tested through fish dataset collection and classification in a laboratory setting. Next, the proposed model is compared with other detection models. Finally, the model is applied to real-world fishway environments for field testing. The specific structure of the model is outlined below. As shown in Figure 2, the backbone is responsible for extracting low-level and mid-level features of the fish, such as contours, colors, and textures, and the Neck is designed to merge multi-level features by optimizing the feature pyramid. Finally, the detection head takes the multi-scale features to detect fish from the low-quality images captured under poor lighting conditions. The processes are detailed in the following subsections.

2.1. Bidirectional Cross-Scale Feature Fusion in Backbone

The backbone is engineered for efficient and high-performance feature extraction, in which the Cross Stage Partial Fusion (C2f) module offers strong feature extraction and fusion capabilities by utilizing partial connections and multiple bottleneck layers. However, its performance may be hindered by the challenges of low-contrast imagery, complex backgrounds, and small object detection. In this paper, we propose a C2f-Faster-EMA module by integrating C2f, EMA [27], and Faster Block [28] modules to enhance multi-scale feature fusion, global attention, and computational efficiency in the detection process, as shown in Figure 3b.

The approach is as follows. Inspired by FasterNet, we integrate Partial Convolution (PConv) with C2f, aiming to achieve a faster network and reduce floating-point computations. Figure 3a illustrates the basic principle of PConv, which applies a conventional convolution on a portion of the input channels for spatial feature extraction while keeping the remaining channels unchanged. The computational cost of conventional convolution is denoted as

h * w * k^{2} * c^{2}

, while the computational cost of PConv is denoted as

h * w * k^{2} * c_{p}^{2}

. In the case of a classic offset ratio of

r = \frac{c_{p}}{c} = \frac{1}{4}

, the floating-point operations of PConv are only

\frac{1}{16}

of those of full convolution. This significantly reduces computational overhead while improving computational efficiency. By treating the first or last continuous channel as a representative of the entire feature map for computation, memory access is further minimized.

Memory access count of Conv, as illustrated in Equation (1)

h \times w \times 2 c + k^{2} * c^{2} = h \times w \times 2 c

(1)

Memory access count of PConv, as illustrated in Equation (2)

h \times w \times 2 c_{p} + k^{2} \times c_{p}^{2} = h \times w \times 2 c_{p}

(2)

With the offset ratio remaining unchanged, PConv accesses memory

\frac{1}{4}

as many times as conventional convolution during convolution calculations. By using PConv in C2f-Faster, the advantages of parallel computation can be leveraged to accelerate the training and inference processes of the model, thereby improving its efficiency.

In addition, to more effectively focus on important feature information and improve object detection accuracy, we have added a multi-scale feature extraction module, EMA, based on the above. EMA utilizes three parallel paths to extract attention–weight descriptors from the grouped feature maps, as shown in Figure 3c. The input tensor shape is defined as

C / / G \times H \times W

where the first two parallel paths are 1 × 1 branches, and the third parallel path is a 3 × 3 branch. To capture the dependencies among all channels and reduce computational load, a one-dimensional global average pooling operation is applied in the horizontal or vertical dimension direction in the 1 × 1 branch, represented by Equations (3) and (4):

Z_{C}^{H} (H) = \frac{1}{W} \sum_{0 \leq i \leq W} X_{C} (H, i)

(3)

Z_{C}^{W} (W) = \frac{1}{H} \sum_{0 \leq j \leq H} X_{C} (j, W)

(4)

where

C

is the number of channels,

H

and

W

are the spatial dimensions of the input features.

Feature extraction is performed without dimensionality reduction, followed by fitting the 2D binomial distribution after linear convolution using two nonlinear Sigmoid functions. The channel attention maps within each group are aggregated through multiplication, achieving different cross-channel interaction features between the two parallel paths. On the other path, a 3 × 3 convolution captures local cross-channel interactions, expanding the feature space.

The added EMA attention mechanism is located at the deepest part of the model. By integrating the EMA attention mechanism into the Faster Block structure, we obtain the Faster-EMA Block, which is then added to C2f and named C2f-Faster-EMA.

2.2. Bidirectional Cross-Scale Feature Fusion

The Path Aggregation Network-Feature Pyramid Network has certain limitations in underwater fish detection tasks: the variations in lighting and noise in underwater environments can affect feature extraction, and high computational complexity could impact real-time detection capability. In this paper, as shown in Figure 4a, integrating RepBi-PAN as the neck component in underwater fish detection provides significant advantages by enhancing multi-scale feature fusion through bidirectional connections and top-down flow:

F_{i}^{T D} = F_{i}^{'} + u (F_{i + 1}^{'}), {f o r i = n - 1, n - 2, \dots, 1}

(5)

Features from higher layers are passed down to lower layers and fused with the lower-layer features, where

u

represents an upsample operation.

Bottom-up flow:

F_{i}^{B U} = F_{i}^{T D} + D (F_{i - 1}^{T D}), {f o r i = 2,3, \dots, n}

(6)

Features from lower layers are passed upward to higher layers, where

D

represents a downsampling operation.

The connections improve detection accuracy for fish of varying sizes. Its ability to transmit features bidirectionally helps preserve fine details and rich semantic information. After bidirectional connections, feature fusion is applied. Let the weights for fusion be denoted by

α_{i}

, which controls the importance of each scale. The final fused feature map

F_{i}^{f i n a l}

at scale

i

is:

F_{i}^{f i n a l} = α_{i} \times F_{i}^{T D} + (1 - α_{i}) \times F_{i}^{B U}

(7)

where

α_{i} ϵ [0, 1]

is a learnable parameter that dynamically adjusts the importance of the top-down and bottom-up features during training. The final output feature map

F^{o u t}

is the aggregation of the fused feature maps across all scales:

F^{o u t} = \sum_{i = 1}^{n} F_{i}^{f i n a l}

(8)

This output is passed to subsequent layers of the network for further processing, such as detection. Additionally, RepBi-PAN maintains computational efficiency, ensuring that enhanced detection performance is achieved without excessive computational overhead, thus supporting real-time applications in challenging underwater environments.

2.3. Loss Function

Loss functions are crucial for model detection accuracy. The CIoU [29] loss function is used for bounding box regression, as described by the formula in Equation (9). The Intersection over Union (IoU) measures the intersection ratio to the union of the predicted and ground truth boxes. This formula,

P^{2} (b, b^{g t})

represents the Euclidean distance between the centers of the predicted and ground truth boxes,

c

is the diagonal length of the smallest enclosing rectangle that contains both the predicted and ground truth boxes,

h

and

h^{g t}

denotes the height of the predicted and ground truth boxes, respectively, while

w

and

w^{g t}

represent their widths.

L_{C I o U} = 1 - I o U + \frac{P^{2} (b, b^{g t})}{c^{2}} + \frac{4}{π^{2}} (a r c t a n \frac{w^{g t}}{h^{g t}} - a r c t a n \frac{w}{h})

(9)

The CIoU loss function presents several limitations in the context of fishway fish detection. It introduces additional calculations for diagonal distance and correction factors, increasing computational complexity and potentially prolonging training and inference time. To address the limitations of CIoU in fishway detection tasks, this study uses the WIoU [30] loss function as a replacement. The formula for the WIoU loss function is given in Equation (10), where

α

and

δ

are hyperparameters that can be adjusted based on different models,

L_{I o U}^{*}

represents the monotonic focus coefficient,

r

is the non-monotonic focus factor, and

β

denotes outliers.

L_{W I o U} = r \times \exp (\frac{{(b_{C_{x}}^{g t} - b_{C_{x}})}^{2} + {(b_{C_{y}}^{g t} - b_{C_{y}})}^{2}}{C^{2}}) \times (1 - I o U); r = \frac{β}{δ α^{β - α}}, β = \frac{L_{I o U}^{*}}{L_{I o U}}

(10)

The WIoU loss function adds a focusing coefficient to the attention-based predicted box loss. The additional computational cost mainly comes from the calculation of the focusing coefficient and the mean statistics of the IoU loss. However, it is faster because it does not involve the calculation of aspect ratios. Furthermore, it performs well in small object detection, enabling more accurate evaluation of small object detection results and reducing missed and false detections. This is particularly important for multi-scale fishway fish detection tasks.

3. Results and Analysis

3.1. Experimental Environment

The experiments were conducted on a Windows 10-based computer equipped with an Intel^® i5-12490F CPU and an NVIDIA GeForce RTX4060 with 16 GB of memory GPU. The PyCharm software platform was utilized to carry out the experiments, employing the Python 3.11 deep learning framework for training purposes. During training, 100 epochs were set, with a batch size of 16. To balance convergence speed and stability to some extent, the initial learning rate was set to 0.01. To reduce computational overhead, the optimizer used was SGD. The input image size was uniformly scaled to 640 × 640.

3.2. Experimental Dataset

The Fish26 dataset is a compilation of the Fish4knowledge fish datasets [31] and images of various fish species collected in a laboratory environment, carefully organized to encompass various underwater scenarios. The dataset includes 11,204 images in .jpg format, covering 26 fish species. These species span different categories, including freshwater fish, marine fish, benthic fish, and pelagic fish. A portion of the Fish26 dataset is shown in Figure 5.

The Fish4knowledge dataset contains 23 fish species captured from underwater videos along the coast of Taiwan. It is an imbalanced dataset, with the number of images per species ranging from 25 to 12,112. From these 23 species, we selected 21 species and collected approximately 400 images per species. Additionally, we included 5 fish species, including bighead carp, tilapia, yellow-bone fish, crucian carp, and silver carp, which were collected in a laboratory environment. For each of these species, we collected 100 images and applied image augmentation techniques, such as rotation and scaling, to increase the number of images to 400 per species, forming the Fish26 dataset. This dataset supplements common freshwater fish datasets and covers a variety of species.

3.3. Experimental Metrics

This study uses mean Average Precision (IoU = 0.5) (mAP@0.5), Precision (P), and Reca^® (R) to assess the model’s training accuracy. Additionally, model evaluation metrics include the number of parameters and giga floating-point operations per second (GFLOPs).

3.4. Comparison Experiments

To verify the effectiveness of the WIoU loss function, this study analyzes the performance differences between WIoU and the shape intersection over union (SIoU), efficient intersection over union (EIoU), and generalized intersection over union (GIoU) loss functions. Each of these loss functions was modified in the YOLOv8 n model without any other changes. As shown in the loss function experiment results in Table 1, SIoU and EIoU both lead to a decrease in precision (P) and mean average precision (mAP@0.5) of the original YOLOv8 n model. GIoU decreases the precision (P) but improves the mAP@0.5. In contrast, YOLOv8 n + WIoU achieves the highest mAP@0.5.

To verify the superior detection performance of the proposed method, a comparative analysis was conducted on the Fish26 dataset. The proposed method was compared with current mainstream object detection algorithms such as the YOLO series, SSD, and Faster R-CNN. The experimental results are presented in Table 2. The results indicate that both the Faster R-CNN and SSD models exhibit lower detection accuracy and speed, making them unsuitable for fish detection tasks in underwater environments. While YOLOv5n, YOLOv6n, and YOLOv7-Tiny models benefit from having fewer model parameters and lower computational complexity, their detection accuracy is lower compared to the YOLOv8n model. In contrast, the proposed method outperforms the YOLOv8n in both recall and precision, with improved detection accuracy and speed. Specifically, the mAP@0.5 of the proposed method is 1.7% higher than that of the YOLOv8n model, while the model’s parameter count has been reduced by 24%, and its computational load is only 6.2G. This balance of detection accuracy, speed, and lightweight design makes the proposed method better suited for complex underwater fish detection tasks. Compared to mainstream object detection models, the proposed method model demonstrates superior detection performance, as illustrated in Figure 6.

3.5. Ablation Experiment

To verify the effectiveness of the enhanced modules in the proposed method, ablation experiments were conducted on the Fish26 dataset. Using the YOLOv8n network as the baseline, each improvement module was sequentially added to the network. The impact of different modules on the final object detection performance was assessed through ablation experiments. The results of these experiments are presented in Table 3. Group 1 represents the model YOLOv8n, while Groups 2 to 4 incorporate individual improvements to the baseline model. The results indicate that replacing the C2f module in the backbone network with the C2f-Faster-EMA module increases mAP@0.5 by 1.5%, reduces model parameters by 8.4%, and decreases computational cost. The modifications make the model more lightweight and improve feature extraction efficiency.

After using RepBiPAN to improve the neck component, although the mAP@0.5 showed only a slight increase, model parameters decreased by 15.9%, and the computation speed improved, demonstrating the effectiveness of the RepBiPAN-enhanced Neck. The WIoU loss function further boosts detection accuracy by intelligently adjusting the IoU threshold, effectively reducing the negative impact of low-quality samples on model performance.

Groups 5 to 7, which incorporate two different modules each, provide further evidence of the effectiveness of each module. Group 8 represents the model with all modules applied. In this configuration, the C2f module in the backbone network is replaced by the C2f-Faster-EMA module, the Neck is improved with RepBiPAN, and CIoU is replaced by Wise-IoU as the bounding box regression loss function. This combination not only maintains the model’s lightweight nature but also enhances detection accuracy and recall.

Based on the validation of the effectiveness and advancement of the proposed method improvement strategy, the model was trained on a prepared dataset, with the results shown in Figure 7.

3.6. Discussion

The algorithm proposed in this paper not only focuses on accuracy but also on lightweight design. The aim of this study is to improve the accuracy and lightweight nature of object detection algorithms. To achieve this, we compared our improved algorithm with other commonly used loss functions and conducted both qualitative and quantitative comparisons with various object detection algorithms. In addition, we performed ablation experiments on the proposed algorithm to compare its performance and verify its high accuracy and robustness. The results clearly show that the algorithm exhibits high accuracy and robustness when detecting different fish species. However, since the algorithm needs to be applied to actual engineering projects, its performance in real fishway environments requires further observation. Therefore, field tests were conducted in a designated fishway in subsequent experiments.

4. Application

To verify the feasibility of the proposed method, fish species target detection in the complex fishway environment was conducted. The method ensures that fish ecological habitats are not disrupted while achieving accurate and rapid fish detection with minimal resource usage. We installed and deployed the detection equipment at the designated fishway next to the hydropower station on the Han River in Hubei, ensuring that the normal operation of the fishway was not affected. The detection equipment was installed at the inlet of the fishway, as shown in Figure 8. The underwater detection system consists of a fish passage box culvert and an encapsulated camera. The fish passage box culvert restricts the fish from passing through the fishway via the box. The proposed method in this study was deployed in the encapsulated camera to detect the fish passing through the channel. The underwater detection system was slowly lowered along a track on the inner wall of the fishway to the bottom and fixed in the corresponding position.

Similarly, we installed the underwater detection device at the fishway outlet and used a Hikvision camera in the detection device to collect a dataset of fish species in the fishway. The camera resolution was 1920 × 1080 pixels. To detect different fish migrations, the captured data were filtered based on different time segments and target fish species, creating a dataset for the fish migration in this season, as shown in Figure 9. This dataset includes four fish species: Schizothorax macropogon, Racoma waltoni, Rhynchocypris lagowskii, and Schizothorax oconnori, with a total of 4233 images. Due to the presence of small fish with very short lengths, which could not be classified based on features and silhouettes, these small fish were uniformly labeled as “Small Fish”. LabelImg software (version 1.8.6) was used for annotation, and the dataset was divided into training and validation sets in an 8:2 ratio using a random selection process.

Through the computer system managed by the hydropower station authorities, our onsite detection equipment allows for real-time observation of fish passage and detection performance. Compared to traditional methods (manual detection), this approach significantly reduces manual intervention while improving both accuracy and efficiency. The detection results are shown in Figure 10. This method eliminates the need for intervention by fishway maintenance and management personnel. Additionally, analyzing the daily fish passage data can effectively provide guiding parameters for the rational operation of the fishway. At the same time, multiple datasets of different fish species can be collected. Detecting fish passage through the fishway ensures the scientific construction, reasonable operation, and management of fishway facilities. Additionally, it allows for real-time observation of fish passage in the system, effectively meeting the requirements for real-time fish detection and improving the efficiency of fishway monitoring.

5. Conclusions and Future Work

This study addresses the high cost, low efficiency, and accuracy issues associated with fish detecting in the fishway. By selecting a portion of the Fish4knowledge and images of various fish species collected in a laboratory environment, a Fish26 dataset was created, which contains 26 species of fish. To address the challenges of image blurring due to turbid water and image degradation from insufficient lighting in underwater environments, the fish detection algorithm was proposed, effectively performing fish detection tasks in fishway environments.

Through comparison experiments with loss functions, it was found that the WIoU loss function used in the fishway fish detection algorithm proposed in this paper achieved a mAP@0.5 value of 91.8%, with a significant improvement in accuracy. Compared with state-of-the-art methods, the improved model showed a 1.7% increase in detection accuracy, a 23.4% reduction in computational complexity, and a 24% reduction in model parameters. This resulted in enhanced detection accuracy, a lighter model, and fewer missed and false detections. Additionally, we conducted ablation experiments on the proposed algorithm to verify its high accuracy and robustness. Field tests were also carried out in the fishway near a specific hydropower station. The results show that the fishway fish detection algorithm proposed in this paper not only performs well in a laboratory environment but also meets the requirements for fishway fish detection in real-world applications. However, due to the varying fish species in different fishways across seasons, the algorithm proposed in this paper is currently only applicable to the specific fishway during the given season. Fish detection in other fishways still faces challenges. Future work will focus on expanding fishway fish datasets for different fishways and conducting diversified experiments across multiple fishways in various seasons to enhance the algorithm’s robustness, as well as exploring algorithms with higher detection accuracy and real-time performance. Additionally, the generalizability of the algorithm for detecting other underwater targets remains an area for further research.

Author Contributions

J.W.: conceptualization, methodology, project administration; Y.G.: methodology, formal analysis, writing—original draft; W.D.: data curation, writing—review and editing, supervision; E.L.: supervision, funding acquisition; X.H.: writing—review and editing, supervision; D.Z.: supervision, funding acquisition. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the research on high-precision industrial vision measurement methods for large components (Wuhan Science and Technology Bureau Shuguang Program, No. 2023010201020374) and the research on calibration methods for distributed visual perception systems of intelligent agricultural machinery clusters in hilly and mountainous areas (Hubei Province Natural Science Foundation Youth Program, No. 2024AFB501).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in the study are included in the article; future inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Zhang, Q.; Chen, S. Research on Improved Lightweight Fish Detection Algorithm Based on Yolov8n. J. Mar. Sci. Eng. 2024, 12, 1726. [Google Scholar] [CrossRef]
Park, J.M.; Riedel, R.; Ju, H.H.; Choi, H.C. Fish Assemblage Structure Comparison between Freshwater and Estuarine Habitats in the Lower Nakdong River, South Korea. J. Mar. Sci. Eng. 2020, 8, 496. [Google Scholar] [CrossRef]
Qin, Y.; Wei, Q.; Ji, Q.; Li, K.; Liang, R.; Wang, Y. Determining the position of a fish passage facility entrance based on endemic fish swimming abilities and flow field. Environ. Sci. Pollut. Res. 2023, 30, 6104–6116. [Google Scholar] [CrossRef] [PubMed]
Wei, Y.; Duan, Y.; An, D.J.F. Monitoring fish using imaging sonar: Capacity, challenges and future perspective. Fish Fish. 2022, 23, 1347–1370. [Google Scholar] [CrossRef]
Tang, X.; Zhang, J.; Xia, Y.; Cao, K.; Zhang, C. PEGNet: An Enhanced Ship Detection Model for Dense Scenes and Multi-scale Targets. IEEE Geosci. Remote Sens. Lett. 2025, 22, 10836932. [Google Scholar] [CrossRef]
Ieracitano, C.; Mammone, N.; Spagnolo, F.; Frustaci, F.; Perri, S.; Corsonello, P.; Morabito, F.C. An explainable embedded neural system for on-board ship detection from optical satellite imagery. Eng. Appl. Artif. Intell. 2024, 133, 108517. [Google Scholar] [CrossRef]
Yang, X.; Zhang, S.; Liu, J.; Gao, Q.; Dong, S.; Zhou, C. Deep learning for smart fish farming: Applications, opportunities and challenges. Rev. Aquac. 2021, 13, 66–90. [Google Scholar] [CrossRef]
Huang, H.; Zhou, H.; Yang, X.; Zhang, L.; Qi, L.; Zang, A. Faster R-CNN for marine organisms detection and recognition using data augmentation. Neurocomputing 2019, 337, 372–384. [Google Scholar] [CrossRef]
Zeng, L.; Sun, B.; Zhu, D. Underwater target detection based on Faster R-CNN and adversarial occlusion network. Eng. Appl. Artif. Intell. 2021, 100, 104190. [Google Scholar] [CrossRef]
Song, P.; Li, P.; Dai, L.; Wang, T.; Chen, Z. Boosting R-CNN: Reweighting R-CNN samples by RPN’s error for underwater object detection. Neurocomputing 2023, 530, 150–164. [Google Scholar] [CrossRef]
Dulhare, U.N.; Ali, M.H. Underwater human detection using faster R-CNN with data augmentation. Mater. Today Proc. 2023, 80, 1940–1945. [Google Scholar] [CrossRef]
Han, B.; Hu, Z.; Su, Z.; Bai, X.; Yin, S.; Luo, J.; Zhao, Y. Mask_LaC R-CNN for measuring morphological features of fish. Measurement 2022, 203, 111859. [Google Scholar] [CrossRef]
Zhao, Z.; Liu, Y.; Sun, X.; Liu, J.; Yang, X.; Zhou, C. Composited FishNet: Fish detection and species recognition from low-quality underwater videos. IEEE Trans. Image Process. 2021, 30, 4719–4734. [Google Scholar] [CrossRef] [PubMed]
Conrady, C.R.; Şebnem, E.; Colin, G.A.; Leslie, A.R.; Lauren, d.V. Automated detection and classification of southern African Roman seabream using mask R-CNN. Ecol. Inform. 2022, 69, 101593. [Google Scholar] [CrossRef]
Feng, Y.; Tao, X.; Lee, E.J.L. Classification of Shellfish Recognition Based on Improved Faster R-CNN Framework of Deep Learning. Math. Probl. Eng. 2021, 2021, 1966848. [Google Scholar] [CrossRef]
Moniruzzaman, M.; Shamsul, I.S.M.; Paul, L.; Mohammed, B. Faster R-CNN based deep learning for seagrass detection from underwater digital images. In Proceedings of the 2019 Digital Image Computing: Techniques and Applications (DICTA), Perth, Australia, 2–4 December 2019; pp. 1–7. [Google Scholar]
Kislu, N.M.; Shamsul, I.S.M.; Jumana, A.-K.; Paul, L. Seagrass detection from underwater digital images using Faster R-CNN with NASNet. In Proceedings of the 2021 Digital Image Computing: Techniques and Applications (DICTA), Gold Coast, Australia, 29 November–1 December 2021; pp. 1–6. [Google Scholar]
Song, S.; Zhu, J.; Li, X.; Huang, Q. Integrate MSRCR and mask R-CNN to recognize underwater creatures on small sample datasets. IEEE Access 2020, 8, 172848–172858. [Google Scholar] [CrossRef]
Joseph, R.; Santosh, D.; Ross, G.; Ali, F. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the IEEE Conference on Computer Vision & Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
Li, Q.; Li, Y.; Niu, J. Real-time detection of underwater fish based on improved YOLO and transfer learning. Pattern Recognit. Artif. Intell. 2019, 32, 193–203. [Google Scholar] [CrossRef]
Liu, C.; Gu, B.; Sun, C.; Li, D. Effects of aquaponic system on fish locomotion by image-based YOLO v4 deep learning algorithm. Comput. Electron. Agric. 2022, 194, 106785. [Google Scholar] [CrossRef]
ElDin, M.H.; Ali, F.; Omar, A.; Youssef, W.; Noha, E.; Ayman, N.; Ayman, A. Msr-yolo: Method to enhance fish detection and tracking in fish farms. Procedia Comput. Sci. 2020, 170, 539–546. [Google Scholar] [CrossRef]
Abdullah, A.M.; Fakhrul, H.; Bhuiyan, E.M.F.H.; Rakibul, H.M.; Reza, A.A. YOLO-Fish: A robust fish detection model to detect fish in realistic underwater environment. Ecol. Inform. 2022, 72, 101847. [Google Scholar] [CrossRef]
Zheng, Z.; Li, Y.; Lu, P.; Zou, G.; Wang, Z. Application research of improved YOLO v4 model in fish object detection. Fish. Mod. 2022, 49, 82–88. [Google Scholar] [CrossRef]
Li, J.; Liu, C.; Wang, L.; Liu, Y.; Li, R.; Lu, X.; Lu, J.; Shen, J. Multi-species identification and number counting of fish passing through fishway at hydropower stations with LigTraNet. Ecol. Inform. 2024, 82, 102704. [Google Scholar] [CrossRef]
Qin, X.; Yu, C.; Liu, B.; Zhang, Z. YOLO8-FASG: A High-Accuracy Fish Identification Method for Underwater Robotic System. IEEE Access 2024, 12, 10538125. [Google Scholar] [CrossRef]
Ouyang, D.; He, S.; Zhang, G.; Luo, M.; Guo, H.; Zhan, J. Efficient multi-scale attention module with cross-spatial learning. In Proceedings of the ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece, 4–10 June 2023; pp. 1–5. [Google Scholar]
Chen, J.; Kao, S.; He, H.; Zhuo, W.; Wen, S.; Lee, C.; Chan, S.G. Run, don’t walk: Chasing higher FLOPS for faster neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 12021–12031. [Google Scholar]
Zheng, Z.; Wang, P.; Liu, W.; Li, J.; Ye, R.; Ren, D. Distance-IoU loss: Faster and better learning for bounding box regression. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; pp. 12993–13000. [Google Scholar]
Tong, Z.; Chen, Y.; Xu, Z.; Yu, R. Wise-IoU: Bounding box regression loss with dynamic focusing mechanism. arXiv 2023, arXiv:2301.10051. [Google Scholar] [CrossRef]
Boom, B.J.; He, J.; Palazzo, S.; Huang, P.X.; Beyan, C.; Chou, H.-M.; Lin, F.-P.; Spampinato, C.; Fisher, R.B.J.E.I. A research tool for long-term and continuous analysis of fish assemblage in coral-reefs using underwater camera footage. Ecol. Inform. 2014, 23, 83–97. [Google Scholar] [CrossRef]

Figure 1. The environment around the fishway and the underwater imaging of the fishway.

Figure 2. The overall framework of the fishway fish detection model and the details of some modules.

Figure 3. The improved structure diagram of each part of the C2f module: (a) Structure diagram of the Faster Block module and a schematic of the convolution principle. (b) Structure diagram of the C2f-Faster-EMA module after the improvement of the C2f module using the Faster Block and the addition of the EMA attention mechanism. (c) Refined flowchart of the EMA attention mechanism module.

Figure 4. The overall flowchart of the neck part in the fishway fish detection and the module operation process at each stage: (a) Overall flow structure diagram of RepBi-PAN; (b) Operation flowchart of the RepBlock module in the training phase of the RepBi-PAN structure; (c) Operation flowchart of the RepBlock module in the inference phase of the RepBi-PAN structure; (d) Structure diagram of the BiFusion module in the RepBi-PAN process.

Figure 5. Sample Images from the Fish26 Dataset.

Figure 6. The mAP@0.5 value curve of different detection algorithms after training and testing on the Fish26 dataset.

Figure 7. The performance of the proposed fishway fish detection model during the training and validation process on the Fish26 dataset, including various metrics.

Figure 8. The 3D schematic of the underwater detection system, consisting of a fish passage box culvert and an encapsulated camera, and its installation at the designated location at the fishway inlet after processing and assembly.

Figure 9. The fishway detection system was installed at the designated position of the fishway outlet using an electric hoist, and underwater images of different fish species in the fishway were captured.

Figure 10. The proposed fishway fish detection model can perform real-time monitoring of different fish species in real fishways.

Table 1. Detection results of the proposed method with different bounding box loss functions.

Metrics		P	R	mAP@0.5
YOLOv8 n +	CIOU	0.88	0.894	0.901
	SIOU	0.863	0.87	0.872
	EIOU	0.876	0.887	0.893
	GIOU	0.886	0.894	0.905
	WIOU	0.892	0.916	0.918

Table 2. Comparison with the state-of-the-art methods on the dataset.

Model	P	R	mAP@0.5	Parameters/M	GFLOPs
Faster R-CNN	0.764	0.828	0.821	72.1	682
SSD	0.796	0.832	0.852	25.02	263.7
YOLOv5n	0.886	0.889	0.892	2.32	5.1
YOLOv6n	0.872	0.863	0.872	3.21	8.6
YOLOv7-Tiny	0.862	0.892	0.883	5.42	13
YOLOv8n	0.882	0.877	0.901	3.08	8.1
proposed	0.892	0.916	0.918	2.34	6.2

Table 3. Evaluation indicators for the proposed method ablation study.

Number	C2f-Faster-EMA	RepBiPAN	Wise-IoU	P	R	mAP@0.5	Parameters/M	GFLOPs
1	-	-	-	0.882	0.877	0.901	3.08	8.1
2	√	-	-	0.919	0.894	0.916	2.82	7.3
3	-	√	-	0.912	0.904	0.909	2.59	7.2
4	-	-	√	0.898	0.895	0.913	3.12	8.5
5	√	√	-	0.912	0.909	0.92	2.68	7.5
6	√	-	√	0.92	0.912	0.919	2.82	7.4
7	-	√	√	0.913	0.908	0.907	2.61	6.9
8	√	√	√	0.892	0.916	0.918	2.34	6.2

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, J.; Gong, Y.; Deng, W.; Lu, E.; Hu, X.; Zhang, D. Fish Detection in Fishways for Hydropower Stations Using Bidirectional Cross-Scale Feature Fusion. Appl. Sci. 2025, 15, 2743. https://doi.org/10.3390/app15052743

AMA Style

Wang J, Gong Y, Deng W, Lu E, Hu X, Zhang D. Fish Detection in Fishways for Hydropower Stations Using Bidirectional Cross-Scale Feature Fusion. Applied Sciences. 2025; 15(5):2743. https://doi.org/10.3390/app15052743

Chicago/Turabian Style

Wang, Junming, Yuanfeng Gong, Wupeng Deng, Enshun Lu, Xinyu Hu, and Daode Zhang. 2025. "Fish Detection in Fishways for Hydropower Stations Using Bidirectional Cross-Scale Feature Fusion" Applied Sciences 15, no. 5: 2743. https://doi.org/10.3390/app15052743

APA Style

Wang, J., Gong, Y., Deng, W., Lu, E., Hu, X., & Zhang, D. (2025). Fish Detection in Fishways for Hydropower Stations Using Bidirectional Cross-Scale Feature Fusion. Applied Sciences, 15(5), 2743. https://doi.org/10.3390/app15052743

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Fish Detection in Fishways for Hydropower Stations Using Bidirectional Cross-Scale Feature Fusion

Abstract

1. Introduction

2. Materials and Methods

2.1. Bidirectional Cross-Scale Feature Fusion in Backbone

2.2. Bidirectional Cross-Scale Feature Fusion

2.3. Loss Function

3. Results and Analysis

3.1. Experimental Environment

3.2. Experimental Dataset

3.3. Experimental Metrics

3.4. Comparison Experiments

3.5. Ablation Experiment

3.6. Discussion

4. Application

5. Conclusions and Future Work

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI