Next Article in Journal
Mobile-Based Deep Learning Approach for Multi-Object Aiwen Mango Grade Classification
Previous Article in Journal
Edge AI Trustworthiness: Revisiting Bit-Flip Impacts and Criticality Conditions
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Enhancing Aquarium Fish Tracking with Mirror Reflection Elimination and Enhanced Deep Learning Techniques

1
Department of Computer Science and Information Engineering, National Yunlin University of Science and Technology, Yunlin County, Douliu City 64002, Taiwan
2
Graduate School of Technological and Vocational Education, National Yunlin University of Science and Technology, Yunlin County, Douliu City 64002, Taiwan
*
Author to whom correspondence should be addressed.
Electronics 2025, 14(16), 3187; https://doi.org/10.3390/electronics14163187
Submission received: 1 July 2025 / Revised: 4 August 2025 / Accepted: 5 August 2025 / Published: 11 August 2025
(This article belongs to the Special Issue Computer Vision and AI Algorithms for Diverse Scenarios)

Abstract

The popularity of keeping ornamental fish has grown increasingly, as their vibrant presence can provide a calming influence. Accurately assessing the health of ornamental fish is important but challenging. For this, researchers have focused on developing fish tracking methods that provide trajectories for health assessment. However, issues such as mirror images, occlusion, and motion prediction errors can significantly reduce the accuracy of existing algorithms. To address these problems, we propose a novel ornamental fish tracking method based on deep learning techniques. We first utilize the You Only Look Once (YOLO) v5 deep convolutional neural network algorithm with Distance Intersection over Union–Non Maximum Suppression (DIoU-NMS) to handle occlusion problems. We then design an object removal algorithm to eliminate fish mirror image coordinates. Finally, we adopt an improved DeepSORT algorithm, replacing the original Kalman Filter with an advanced Noise Scale Adaptive (NSA) Kalman Filter to enhance tracking accuracy. In our experiment, we evaluated our method in three simulated real-world fish tank environments, comparing it with the YOLOv5 and YOLOv7 methods. The results show that our method can increase Multiple Object Tracking Accuracy (MOTA) by up to 13.3%, Higher Order Tracking Accuracy (HOTA) by up to 10.0%, and Identification F1 Score by up to 14.5%. These findings confirm that our object removal algorithm effectively improves Multiple Object Tracking Accuracy, which facilitates early disease detection, reduces mortality, and mitigates economic losses—an important consideration given many owners’ limited ability to recognize common diseases.

1. Introduction

1.1. Background and Motivation

Ornamental fish keeping has become a popular global hobby, with their vibrant colors and peaceful movements known to reduce stress. The ornamental fish industry is a growing global market, with the Asia–Pacific region alone accounting for 22.5% of the 1.43 billion revenue in 2023 and 65% of exports [1]. A major challenge of this industry involves the lack of expertise among fish owners and dealers in diagnosing common fish diseases. As Mecha et al.’s research [2] revealed, out of 39 common types of external diseases, only some of them were correctly identified by respondents. This deficiency can lead to delayed preventive measures, high fish death rates, and significant financial losses [3]. Consequently, early detection of illness becomes crucial. This can be achieved by monitoring abnormal fish behavior, such as swimming at the water surface, swimming rapidly, or ceasing to swim [4]. Unfortunately, traditional methods, based on manual observation and visual inspection, are impractical and time-consuming for large-scale operations. Therefore, developing automated fish tracking technologies is highly desirable. By accurately monitoring and analyzing fish movements, we can implement timely interventions and preventive measures against disease outbreaks. Such systems enable precise monitoring of fish behavior, allowing early disease detection, reducing mortality, and minimizing financial losses—an essential capability given the limited diagnostic expertise among many fish owners. Furthermore, such technologies can also contribute to the growth and prosperity of the ornamental fish industry.

1.2. Fish Anomaly Analysis

Research on fish anomaly analysis can be categorized into two areas: fish identification through appearance symptoms and fish behavior tracking. Appearance-based methods focus on detecting diseases from fish appearance. For instance, Aditya et al. [5] proposed a method to identify wounds and lice on salmon, helping farmers treat infected fish promptly. However, this method was limited to a few specific symptoms. Other researchers have aimed to detect a broader range of diseases. Waleed et al. [6] used convolutional neural network (CNN) architectures to detect white spot, columnaris, and epizootic ulcerative syndrome (EUS) diseases. Yu et al. [7] focused on hemorrhagic septicemia, saprolegniasis, benedeniasis, and scuticociliatosis infections, enhancing their model’s robustness with data augmentation. Sikder et al. [8] identified multiple diseases, including EUS, red spot, argulus, tail rot, fin rot, broken antennae rostrum, and bacterial gill rot, aiming for early detection and treatment. While these methods show high accuracy, they require extensive datasets of abnormal fish, which are challenging and time-consuming to collect due to the diverse nature of fish diseases and species.
Fish behavior tracking focuses on monitoring and analyzing how fish move and act in their environment. Yiqun Hou et al. [9] used ultrasonic tagging technology for fish positioning in complex aquatic environments. However, using ultrasonic tags for positioning requires professional personnel to attach the tags to the fish, which is time-consuming and labor-intensive when tracking multiple fish. Saberioon et al. [10] employed a Kinect sensor with red, green, blue (RGB) and infrared cameras for three-dimensional (3D) automatic multi-fish tracking. Their results showed that even under severe occlusion, they could track the 3D trajectories of multiple fish with high accuracy. Wenkai Xu et al. [11] simulated different ammonia concentrations in an aquaculture environment and revealed that fish movement and activity significantly decreased with higher ammonia concentrations. Zhao Xiaoqiang et al. [12] proposed a multi-fish tracking algorithm, limiting the water depth to 3 cm to prevent vertical movement. They tracked fish in different pH environments and found that more acidic or alkaline conditions, compared to normal water (pH 7), affected the fishes’ movement speed. Hanyu Zhang et al. [13] approached fish fry counting as a multi-object tracking problem. They used You Only Look Once (YOLO) v5-nano with the SORT tracking algorithm to match and recover trajectories. Their experiments showed that this method achieved high accuracy.
In summary, although analyzing fish activity through trajectory tracking has been explored, existing studies overlook critical challenges such as the mirror effect caused by aquarium materials and occlusion issues in complex environments with decorations and water ripples. In addition, most experiments are conducted in controlled, simplified settings. Therefore, we aim to propose solutions to tackle mirror reflections and occlusion problems, enabling accurate multi-fish tracking in realistic aquarium environments.

1.3. Our Methods and Contributions

This paper presents the FITS (Fish Identification and Tracking System), which utilizes YOLOv5 [14] and DeepSORT [15] for fish identification and tracking. The FITS includes a mirror removal algorithm for identification, Distance-IoU NMS (DIoU-NMS) for handling fish overlapping, and integrates the Noise Scale Adaptive (NSA) Kalman Filter to improve tracking accuracy. The main contributions of this research are detailed as follows:
  • The FITS includes three key innovations to enhance multi-fish tracking accuracy in realistic aquarium environments. Firstly, it employs a mirror removal algorithm by considering the relationship of coordination among objects and effectively deleting mirror objects that cause errors in fish tracking. Secondly, it replaces the traditional NMS with DIoU-NMS. This method considers both the overlap ratio and the distance between objects, which effectively mitigates occlusion issues. Thirdly, the tracking algorithm is improved by incorporating a NSA Kalman Filter, which better predicts a fish’s position in the next frame by adapting to its movement speed and sudden turns. These innovations enhance the FITS’s ability to accurately track multiple fish in realistic aquarium environments.
  • The FITS was evaluated in three different ornamental fish tank environments: basic, decorated, and water ripple. Across these diverse conditions, the FITS consistently demonstrated remarkable tracking accuracy. It also significantly reduced ID switches, an error where the tracker assigns the same ID to the wrong object. In the basic environment, compared with YOLOv5, the FITS increased Multiple Object Tracking Accuracy (MOTA) by 13.3% and reduced ID switches by 224. In the decorated environment, the FITS increased MOTA by 5.5% and decreased ID switches by 174. In the water ripple environment, the FITS increased MOTA by 3.8% and reduced ID switches by 249. These results highlight the FITS’s robustness and accuracy in diverse conditions.
This paper is organized as follows. Section 2 reviews existing fish identification and tracking methods, outlining their limitations. Section 3 details the design and implementation of the FITS. Section 4 presents the evaluation results of the FITS in three different ornamental fish tank environments. Finally, Section 5 concludes this work and discusses potential future directions.

2. Related Work

Current research in fish anomaly analysis can be categorized into two main approaches: (1) identifying fish based on their appearance and symptoms, and (2) analyzing fish tracking and activity patterns. This section reviews the literature related to fish appearance identification and fish tracking, as follows.

2.1. Fish Appearance Identification Methods

Many studies have utilized image recognition, and machine learning techniques to identify fish abnormalities through external symptoms. Jun Hu et al. [16] used deep learning to identify fish suffering from oxygen deficiency, hunger, and those in normal condition. They proposed an improved YOLOv3-Lite for feature extraction and model lightweightness. Their experimental results showed real-time responsiveness in behavior recognition and a reduction in model size. Shili Zhao et al. [17] used a deep neural network to detect fish deaths in an aquaculture environment. They assessed the fish’s mortality by observing its swimming posture and whether it was lying belly-up on the bottom. Ahmed Waleed et al. [6] utilized CNN methods to detect fish diseases. In particular, they focused on white spot disease, columnaris disease, and ulcer disease. Four different CNN architectures were compared across three different color spaces. Their results showed that AlexNet performed best in the XYZ color space. Aditya Gupta et al. [5] also utilized a modified CNN to identify wounds and lice on salmon. They claimed high accuracy in detecting fish lice and wounds, helping aquaculture farmers identify wounds. Jia-Ming Liang et al. [18] used a two-stage Faster-RCNN to identify fish abnormalities, distinguishing between healthy, diseased, and dead guppies based on their appearance. Gangyi Yu et al. [7] focused on fish skin disease recognition. They targeted four common fish skin diseases: hemorrhagic septicemia, water mold, benedeniasis, and ciliate infection. Their results showed that, compared to the original YOLOv4 model, their approach achieved high accuracy and real-time performance. Yaoyi Cai et al. [19] developed a computer vision framework combining YOLOv7, Efficient Layer Aggregation Network (ELAN), and the Normalization-based Attention Module (NAM) attention mechanism (N-ELAN) with auto-MSRCR image enhancement. This framework can rapidly detect Spring Viremia of Carp symptoms like hemorrhages, slow swimming, and a bloated abdomen, improving accuracy and recall rates.
Recent advances in object detection, such as the enhanced YOLOv8 models by Jiaqi Zhong et al. [20] incorporating modules like Faster-C2f and Sim-SPPF, achieve a balance between accuracy and real-time performance by optimizing model complexity. Although effective on standard datasets (e.g., PASCAL VOC, MSCOCO), their direct application to fish health monitoring is limited due to aquatic-specific challenges such as reflections and occlusions. Lei Dong et al. [21] further highlight that underwater detection suffers from issues like light scattering, motion blur, and color distortion, which complicate small and occluded target detection. To address these, some methods leverage feature fusion and semantic decoupling strategies, demonstrating improved accuracy on datasets like UTDAC2020.
The major limitation of appearance-based methods is the requirement for abnormal fish data for training. Due to the wide variety of fish diseases and species, obtaining this relevant abnormal data is a challenging and time-consuming process. This lack of abnormal data poses a significant obstacle to the effective implementation of these techniques in fish health monitoring and disease detection.

2.2. Fish Tracking Methods

Many research efforts have focused on tracking fish and their activity patterns for anomaly analysis. Zhi-Ming Qian et al. [22] and Shuo Hong Wang et al. [23] utilized fish head recognition for tracking fish. They analyzed the unique features of fish heads, such as shape, size, and coloration patterns, to identify fish and track them. However, utilizing fish head recognition is limited by the high variability and complexity of fish head shapes and orientations, which can reduce the accuracy of tracking. Some researchers have concentrated on the single-fish tracking problem. He Wang et al. [24] proposed a real-time recognition method for tracking the behavior of a single abnormal fish. They designed a modified YOLOv5 for small object detection and incorporated SiamRPN++ for tracking. While this method achieves high accuracy, it requires significantly more powerful computational resources for multi-fish tracking. Additionally, this approach requires adequate training data containing abnormal fish behavior, which are often unavailable. Yupeng Mei et al. [25] introduced a method for tracking a single fish in an aquaculture environment using a Siamese network. They applied Contrast Limited Adaptive Histogram Equalization (CLAHE) to enhance image contrast and reduce noise. Their experimental results demonstrated higher accuracy compared to other methods and achieved real-time speed. However, the effectiveness of CLAHE can vary depending on the image type and the chosen parameters.
There are two commonly used Multiple Object Tracking (MOT) approaches for fish tracking: Joint Detection and Embedding (JDE) and Separate Detection and Embedding (SDE). The JDE approach performs detection and tracking within a single network, providing better real-time performance but with moderate tracking accuracy. In contrast, our work focuses on ornamental fish tracking in aquarium settings, where challenges such as mirror images and dynamic occlusions require specialized mirror removal and adaptive filtering techniques. Unlike general JDE-based trackers, our approach integrates domain-specific heuristics tailored to aquatic environments to maintain identity consistency and reduce tracking errors.
On the other hand, the SDE approach executes the detector and tracker separately, feeding the detector’s results into the tracker. This method offers high tracking accuracy but suffers from poor real-time performance. In short, JDE prioritizes speed, while SDE prioritizes accuracy. In JDE-based research, Weiran Li et al. [26] introduced CMFTNet, a joint method-based multi-fish tracking system for aquaculture. They integrated deformable convolution into the network backbone to enhance contextual feature extraction and distinguish background information. They also addressed challenges of fish overlap and significant changes in fish posture. According to their experimental results on the OptMFT dataset, it delivers good performance in MOTA and IDF1 metrics. However, in scenarios with significant occlusion, there are more frequent changes in identification (ID switches), which leads to reduced tracking accuracy. Given the focus of our research on accuracy and the need to address the issue of frequent occlusions, JDE-based research is not appropriate for our study. To mitigate such issues, Yaoqi Hu et al. [27] enhanced the JDE framework by introducing an Occlusion Prediction Module (OPM) and an Occlusion-Aware Association Module (OAAM). The OPM estimates occlusion states to guide the consistency learning of visual embeddings, while the OAAM produces distinct embeddings for occluded and unoccluded detections. This dual-embedding strategy enhances robustness during occlusions. Experimental evaluations confirm improved performance in both occluded and non-occluded tracking scenarios. In SDE-based research, Xianyi Zhai et al. [28] proposed an improved method combining GN-YOLOv5 and StrongSORT for aquaculture applications. They enhanced the YOLOv5 detector with an attention mechanism module and integrated GhostNetV2 to improve feature extraction capabilities and computational speed. Additionally, they incorporated Generalized Intersection over Union (GIoU) into the StrongSORT tracking algorithm to enhance multi-fish tracking performance.
Despite the attention given to fish tracking, existing studies usually neglect crucial challenges encountered in real-world aquarium environments, such as the mirror effect from transparent materials and occlusion issues. In addition, most experiments were conducted in simplified environments, not fully capturing the complexities of realistic scenarios, as shown in Figure 1. Table 1 summarizes these conventional methods. Therefore, in this work, we aim to address mirror reflections and occlusion problems, enabling accurate and reliable multi-fish tracking. In addition, by conducting experiments in more realistic settings, as detailed in Section 4, we ensure its practical applicability.

3. System Design and Implementation

This section details the FITS, an ornamental fish tracking system designed to overcome mirror image interference. It starts with the design requirement and system architecture, followed by the fish detection algorithm. The detection algorithm covers model selection, a technique for handling bounding box overlap, and a crucial mirror removal step. Finally, it presents a fish tracking algorithm to follow individual fish.

3.1. Overview and System Architecture

Achieving high accuracy in tracking fish is the main design requirement. However, several difficulties need to be addressed before this can be accomplished. Firstly, due to the reflective material of the tank, mirror images of the fish are generated, which can be recognized and associated with the tracking algorithm, leading to incorrect ID assignments. Secondly, during the detection process, occlusion of the fish may occur, causing the recognition algorithm to fail in identifying the fish, resulting in decreased subsequent tracking performance. We aim to improve the accuracy of fish identification by addressing two primary issues. Our focus is on eliminating the interference of mirror images and reducing occlusion to enhance fish recognition.
Figure 2 and Figure 3 illustrate the FITS system software and hardware architecture, consisting of both hardware and software components. The system includes a webcam mounted above a fish tank, capturing video from an overhead perspective. This space-efficient setup is ideal for household environments, providing a clear view of the aquarium. The video is then sent to a computer for multi-stage processing. On the computer, the video processing goes through multiple stages. First, the YOLOv5 version 6.0 model performs object detection to identify the fish in each frame. Then, these detected fish areas are refined using the DIoU-NMS (Distance Intersection over Union–Non-Maximum Suppression) method. Next, a specialized mirror removal algorithm eliminates any false detections caused by reflections. Finally, the enhanced DeepSORT algorithm with a Noise Scale Adaptive Kalman Filter handles robust tracking of the fish across frames, and each fish is given an ID.

3.2. Detection Model Selection

YOLO is a widely adopted object detection algorithm that divides the input image into a grid system, predicting bounding boxes and class probabilities for objects within each grid cell. There are currently many versions of the YOLO algorithm. According to Muhammad et al.’s research [29], YOLOv5 is more focused on edge deployment. Starting from YOLOv5, all versions use the PyTorch [30] framework, making deployment easier. Based on their experimental results on the COCO dataset, YOLOv5 and YOLOv7 show higher average precision and have smaller model parameters. Considering the potential need for future deployment on edge devices, we decided to choose between YOLOv5 and YOLOv7. Our dataset includes 756 annotated images of zebrafish. We selected 529 images for the training set and 227 images for the testing set. The training parameters for YOLOv5 and YOLOv7 were kept at their default settings.
The training results are shown in Figure 4 and Table 2. True positives ( T P ) indicate the model correctly identifying fish, false positives ( F P ) are incorrect identifications, and false negatives ( F N ) are missed fish. Precision represents the ratio of correctly predicted fish to all instances predicted to be fish and is calculated as follows:
Precision = T P T P + F P .
Recall represents the ratio of correctly predicted fish to the total number of actual fish instances and is calculated as follows:
Recall = T P T P + F N .
IoU (Intersection over Union) quantifies the degree of overlap between the predicted bounding box and the actual bounding box of the fish. The mean accuracy (mAP) provides the mean accuracy at different IoU thresholds. The mAP@0.5 indicates the mean accuracy at an IoU threshold of 0.5, while the mAP@.5:.95 indicates the mean accuracy within the IoU threshold range 0.5 to 0.95.
As Figure 4 shows, YOLOv5 outperforms YOLOv7 in all four metrics when trained on the same dataset. YOLOv5 consistently scores close to 1 in precision, recall, and mAP@0.5, demonstrating its high recognition efficiency on the aquarium fish dataset. In contrast, YOLOv7 exhibits unstable recognition performance over many epochs, with fluctuating results. This performance difference is further quantified in Table 2. YOLOv5 achieves 8.8% higher precision, 9.9% higher recall, 8.6% higher mean Average Precision (mAP) at IoU threshold 0.5, and a significant 11.4% higher mAP across IoU thresholds from 0.5 to 0.95 compared to YOLOv7. This unexpected advantage of YOLOv5 over YOLOv7 may be due to YOLOv5’s simpler architecture, which likely offers better generalization on smaller, specialized datasets like our aquarium fish collection. In contrast, YOLOv7’s more complex design may lead to overfitting issues. Given YOLOv5’s superior performance, we selected YOLOv5 for our study. Furthermore, YOLOv5 is available in several models, YOLOv5n, YOLOv5s, YOLOv5m, YOLOv5l, and YOLOv5x, each varying in network depth and computational requirements. YOLOv5s was chosen because it is fast, efficient, and lightweight, making it ideal for practical deployment in household environments.

3.3. Detection Enhancement

Object detection aims to identify and localize objects in images. In YOLOv5, each predicted candidate box B i is characterized by its center coordinates ( x i , y i ) and confidence score S i , where i = 1 , 2 , , n . The Non-Maximum Suppression (NMS) process begins by sorting the candidate boxes in descending order based on their confidence scores. The candidate box B η with the highest confidence score is then compared to other candidate boxes B i using the IoU metric, defined as
I i = B η B i B η B i .
A threshold (e.g., 0.5) is set to determine whether a candidate box should be suppressed. If I i exceeds this threshold, B i ’s confidence score S i is set to 0, indicating that B i is considered redundant and is removed. This process iterates until all candidate boxes have been processed.
However, NMS may not provide optimal results as it only considers the IoU between the highest-scoring candidate box and others. In scenarios with multiple objects, such as overlap, it may mistakenly remove candidate boxes belonging to different objects, leading to inaccurate detection results. To address this limitation, we adopted Distance-IoU-based NMS (DIoU-NMS) [31], which considers both the IoU and the distance between the center coordinates of candidate boxes.
The DIoU-NMS process begins by identifying the highest-confidence predicted candidate box B η , where η is determined by
η = arg max { S 1 , S 2 , , S n } .
It then compares remaining boxes with B η one by one to determine if each compared box should be kept or removed. For each box B i , DIoU-NMS first calculates the Euclidean distance d i between the centers of B i and B η ; that is,
d i = ( x i x η ) 2 + ( y i y η ) 2 .
Let L i denote the diagonal length of the smallest box covering B i and B η . The normalized distance indicator D i is defined as
D i = d i 2 L i 2 .
The Distance-IoU indicator C i is then calculated as C i = I i D i . Finally, the confidence score S i is adjusted by
S i = S i , C i < ψ , i = { 1 , 2 , , n } , i η 0 , Otherwise .
In other words, boxes with a Distance-IoU indicator exceeds the threshold ψ are considered redundant and removed. In our implementation, we set ψ to 0.5, as is common practice in traditional IoU-based NMS. Figure 5 demonstrates the comparison between NMS and DIoU-NMS in detecting occluded fish. Figure 5a shows the original image with three fish, Figure 5b displays the candidate boxes after YOLO processing, Figure 5c shows the result after NMS, misidentifying two fish, and Figure 5d presents the DIoU-NMS result, correctly identifying all three fish. This approach effectively removes overlapping boxes while preserving detections of nearby but distinct objects. It improves accuracy in complex scenarios with occluded objects.

3.4. Mirror Object Removal

The reflective nature of fish tank materials can produce mirrored fish images and reduce tracking accuracy. In addition, the real fish and the mirrored fish do not always have a one-to-one correspondence; a one-to-two relationship can also occur. This means that there may be one or two mirrored fish corresponding to the same real fish, depending on the location of the fish in the tank. An illustrated example is shown in Figure 6, in which A, B, and E are real fish. F is the mirrored fish for E, while C and D are mirrored fish for B. We aim to remove mirrored fish by considering the possible mapping between real fish and mirrored fish. An intuitive way to distinguish real fish from mirrored fish is by using the ROI (region of interest). Fish inside the ROI are considered real, while those outside are considered mirrored. However, this method can be inaccurate, especially when depth information is unavailable. As illustrated in Figure 7, the tank contains 10 real fish, each identified by a number. If ROI1, representing the tank bottom, is used, it will miss the fish swimming near the surface of the water (fish numbered 7, 8, 9, 10). On the other hand, if ROI2, representing the entire top view of the tank, is used, the mirrored fish will be included (fish numbered 11 and 12). Therefore, the ROI-based approach is not suitable for distinguishing real fish from mirrored fish.
To overcome this problem, we identify the relative position of each pair of fish to determine if mirroring occurs. Assuming we have m fish, each is denoted by Q i where i = 1 , 2 , , m . The center coordinates of Q i are ( x i , y i ) . In addition, Δ x i , j = | x i x j | and Δ y i , j = | y i y j | are their central distances in the x and y axes, respectively. For each pair of fish Q i and Q j , mirroring occurs when one axis of their coordinates is the same while the other is not. However, according to our observations, the size of the bounding box of a fish and its associated mirrored fish is not always exactly the same. The most likely reasons are the differences in lighting and water distortion. Reflections or shadows can vary slightly between the original fish and its mirrored object, while small ripples can distort the reflected image. These factors can influence how YOLO determines the edges of the object, resulting in minor variations in the size of a bounding box. Therefore, when mirroring occurs, the center coordinates of Q i and Q j may not be exactly aligned. Based on this observation, for each pair Q i and Q j , if mirroring occurs, then
min { Δ x i , j , Δ y i , j } ρ 1 and max { Δ x i , j , Δ y i , j } ρ 2
The first part of Equation (1) indicates that one of the center coordinates of Q i and Q j is nearly the same. Ideally, ρ 1 is zero. We determine this value based on the experiment. The second part of the equation represents the maximum possible distance between a real fish and its mirrored object, which is defined as ρ 2 .
The value of ρ 2 depends on the setup of the webcam and fish tank. As Figure 8 shows, the field of view of the webcam, where the angle of the field of view in the x direction is θ and its height is H. Therefore, the visible range in the x direction is g, which is determined by
g = 2 × tan θ 2 × H .
In addition, let r denote the length of the fish tank and f denote the maximum possible distance between a mirrored object and the side wall. The value of f is determined by
f = ( g r ) 2 .
In other words, a real fish has a mirrored fish if its distance in the x direction to the side wall is less than f. Similarly, in the y direction, the visible range s is
s = 2 × tan β 2 × H .
and the maximum possible distance z between a mirrored object and the side wall is
z = ( s u ) 2 .
Therefore, if two objects are aligned in the y direction, then
Δ y i , j = min { Δ x i , j , Δ y i , j } ρ 1 ,
ρ 2 is set to 2 f to check if mirroring occurs in the x direction. On the other hand, if two objects are aligned in the x direction
Δ x i , j = min { Δ x i , j , Δ y i , j } ρ 1 ,
ρ 2 is set to 2 z to check if mirroring occurs in the y direction.
In summary, if
Δ x i , j = min { Δ x i , j , Δ y i , j } ρ 1 and Δ y i , j ρ 2 = 2 z ,
then mirroring occurs in the y direction. In addition, if
Δ y i , j = min { Δ x i , j , Δ y i , j } ρ 1 and Δ x i , j ρ 2 = 2 f ,
then mirroring occurs in the x direction. Equation (2) describes the condition for mirroring in the y direction, while Equation (3) describes the condition for mirroring in the x direction. Fish overlap is a special case which could also satisfy Equations (2) and (3). This occurs when Q i and Q j are very close to each other. In this case, we use the overlap between Q i and Q j to distinguish between overlap and mirroring. If Q i and Q j have overlap, it is classified as an overlap case, and both Q i and Q j are retained. If Q i and Q j do not have overlap, it is classified as a mirroring case. In this situation, the mirrored image being farther from the center of the fish tank is removed.
In our implementation, H is 30 cm, r is 35 cm, and u is 24 cm. Thus, f is 9.5 cm and z is 3 cm. The HP w500 webcam is positioned above the fish tank and uses the default settings for brightness and white balance. The video resolution is 1920 × 1080 pixels at 30 frames per second, and θ is 84 degrees while β is 53 degrees. Therefore, ρ 2 is 225 pixels (≈18 cm) or 77 pixels (≈6 cm). We set ρ 1 to 5 pixels based on our experiment.

3.5. Fish Tracking

Given the fish detection results, fish tracking aims to identify fish locations in each frame to construct continuous trajectories across frames. The SORT (Simple Online and Realtime Tracking) [32] is a commonly used tracking algorithm, which utilizes a Kalman Filter for motion prediction and the Hungarian algorithm for object association. The primary advantage of SORT is computational effectiveness, but its main limitation is that it does not consider object features. This can lead to frequent ID switching when objects overlap, which is particularly problematic in fish tracking scenarios. DeepSORT [15] further improved SORT by integrating a convolutional neural network (CNN) to extract appearance and motion features of objects, such as color, texture, shape, bounding box size, and orientation. These features are then used for association to reduce the ID switching problem. However, DeepSORT still relies on a standard Kalman Filter for prediction and updating, which assumes linear motion and constant noise. This assumption can be problematic for fish tracking, as fish often exhibit rapid turns, reversals, and sudden accelerations.
We propose NSA-DeepSORT to improve fish tracking by replacing DeepSORT’s standard Kalman Filter with an NSA (Noise Scale Adaptive) Kalman Filter [33]. Figure 9 shows these modifications, with red boxes highlighting key changes. The standard Kalman Filter consists of two main processes: state prediction and state update. In the state prediction step, the filter predicts the current state based on the previous state. This prediction is performed using two equations:
x ^ k = F k x ^ k 1
P k = F k P k 1 F k T + Q k
In Equation (4), x ^ k is the predicted state at time k. Also, F k is the state transition model and x ^ k 1 represents the mean of the state at time k 1 . Equation (5) predicts the covariance P k at time k, where P k 1 is the covariance at time k 1 , and Q k represents the covariance of the process noise. Following the prediction, the state update step refines the prediction using the current measurement. Here, P k is the prior covariance representing uncertainty before measurement, while P k is the posterior covariance after the measurement update. The difference indicates the uncertainty reduction gained from the observation. This distinction allows our Noise Scale Adaptive Kalman Filter to adjust its confidence dynamically based on detection quality, improving tracking robustness. This update process involves three equations:
K k = P k H k T ( H k P k H k T + R k ) 1
x ^ k = x ^ k + K k ( z k H k x ^ k )
P k = ( I K k H k ) P k
Equation (6) calculates the Kalman gain K k , using the predicted covariance P k , the observation model H k , and the measurement noise covariance R k . In Equation (7), the updated state estimate x ^ k is computed by adjusting the predicted state x ^ k based on the difference between the actual measurement z k and the predicted measurement H k x ^ k , weighted by the Kalman gain K k . Finally, Equation (8) updates the covariance P k to reflect the new uncertainty in the state estimate after incorporating the measurement. Here, I represents the identity matrix.
The core innovation of NSA-DeepSORT is to replace the standard measurement noise covariance R k in Equation (6) with an adaptive noise covariance R ¯ k . This substitution enables more accurate prediction of fish movement. The adaptive noise covariance at time k (i.e., the k-th frame) is calculated by
R ¯ k = ( 1 S k ) R k
In Equation (9), S k represents the detection confidence score for the k-th frame, directly obtained from YOLOv5, and ranges from 0 to 1. It modulates the strength of the update step in the Kalman Filter. R k denotes the standard measurement noise covariance. By incorporating S k , the filter dynamically adjusts the contribution of the observation: high-confidence detections result in stronger corrections, while low-confidence ones are down-weighted, favoring prediction. This adaptive behavior improves robustness under occlusion or detection noise. This adaptive R ¯ k is then used in place of R k in Equation (6) of the Kalman Filter. The effectiveness of this replacement stems from its ability to dynamically adjust the model’s certainty based on detection confidence. When S k is high, R ¯ k decreases, indicating greater certainty in the model’s predictions and leading to more precise estimates. Conversely, when S k is low, R ¯ k increases, allowing for more variability in the estimates and promoting caution in the model’s predictions. This adaptive approach enables NSA-DeepSORT to better manage rapid and unpredictable fish motions, thereby reducing ID switching errors. Our implementation is based on the open-source DeepSORT [34].

4. Experiment

4.1. Datasets and Experiment Design

In our experiment, we established three distinct environments, basic, decorative, and rippling, as illustrated in Figure 10. The basic environment, shown in Figure 10a, features only zebrafish without decorations. This setup presents challenges such as fish occlusion and mirroring. Figure 10b depicts the decorative environment, which includes two decorative items alongside zebrafish, introducing potential fish occlusion by these objects. Figure 10c presents the rippling environment, where water filters provide oxygen to the fish, creating ripples that increase tracking complexity due to interference. Our experiments involved tracking nine zebrafish, each approximately 2.5 cm in length. We recorded videos using an HP w500 webcam positioned in a top-down perspective, capturing only the aquarium within the frame. The video resolution was 1920 × 1080 pixels at 30 frames per second.
For the model training dataset, we recorded six 30 s videos, producing approximately 900 frames per video. From these recordings, we compiled a dataset of 756 frames, split into 529 for training and 227 for testing. The dataset encompassed the three distinct environments with a ratio of 1:1:2 for basic, decorative, and rippling environments, respectively. In addition, we recorded daytime videos of each environment when activity levels were higher. Each video lasted one minute for testing purposes.
These test videos contained 1842 frames for the basic environment, 1843 frames for the decorative environment, and 1898 frames for the rippling environment. From these videos, we used the LabelImg annotation tool [35] to draw bounding boxes around each fish in every frame and assigned a unique ID to each fish to track them over time. This set of annotated data was then used to validate our fish tracking algorithm. Our dataset did not include open datasets as they did not match our specific requirements. These datasets show fish from side-view or 3D perspectives, which were unsuitable for our top-down approach. Consequently, we collected our own dataset.
We trained YOLOv5 and YOLOv7 models on the AI learning platform at National Yunlin University of Science and Technology, Taiwan, using an A100 GPU. Both models were trained using default settings. The trained models were tested on a computer with an Intel i7-9700 3.0 GHz CPU, a GTX 1060 6 GB GPU, 16 GB RAM, running Windows 11 OS, Python 3.9.13, and Torch 1.12.1. In the following sections, we will present and discuss the results from each environment, demonstrating how our proposed solution improves tracking performance.

4.2. Evaluation Metrics

We used separate metrics to evaluate detection and tracking performance. These are well-known metrics selected from the MOT (Multi-Object Tracking) challenge benchmark [36]. Table 3 outlines these performance metrics, categorized into detection and tracking, along with their formulas. It also indicates whether a higher (↑) or lower (↓) value is desirable. Table 4 provides definitions for the notations used in these formulas. For detection evaluation, Precision represents the ratio of correctly predicted fish to the total number of fish predicted by the model. Recall represents the ratio of correctly predicted fish to the total number of actual fish. The F 1 - S c o r e combines precision and recall into a single metric, providing a comprehensive assessment of the model’s performance. For these three metrics, higher values indicate better performance. The indicator M represents the total number of misidentified mirrored fish counted as real fish. A lower value of M indicates better performance.
For tracking evaluation, as shown in Table 3, Multiple Object Tracking Accuracy ( M O T A ) is used to assess accuracy. Detection accuracy ( D e t A α ) measures the alignment of predictions with ground truth. The Association Accuracy Score ( A s s A α ) averages the accuracy of all predicted and actual ID association scores. Higher Order Tracking Accuracy ( H O T A ) integrates localization, association, and detection to evaluate overall tracker performance. D e t A α and A s s A α scores can be calculated by integrating over the range of α values (0.05 to 0.95 in 0.05 intervals) for the H O T A score. The Identification F1 Score ( I D F 1 ) evaluates how accurately a model maintains the identity of tracked targets throughout the tracking process. Multiple Object Tracking Precision ( M O T P ) measures how precisely objects are localized. Multi-Object Detection Accuracy ( M O D A ) combines false positives and missed objects to assess overall detection accuracy. Mostly Tracked ( M T ) indicates the percentage of ground truth trajectories which are covered by tracker output for more than 80% in length. For these three metrics, higher values indicate better performance. I D s w denotes the number of object identity switches. The value of I D s w increases by one when the fish tracking algorithm cannot find an object to map. A lower value of I D s w indicates better performance.

4.3. Performance in Basic Environment

Table 5 shows the detection performance comparison between our method (YOLOv5s + DIoU-NMS + Mirror) and other algorithms. Our method achieved a Precision of 0.904, which is a 16.8% improvement over the original YOLOv5s. By reducing the M value by 77.1%, the number of false positives ( F P ) decreased, resulting in higher precision. Although YOLOv7 performs better in Recall, its Precision is only 0.811. This means that while YOLOv7 can accurately identify fish, it also produces many false positives. The YOLOv5 + DIoU-NMS method shows good recall performance, as it can recognize occluded fish, demonstrating effectiveness in occlusion scenarios. However, there is still room for improvement in its precision. Our proposed method significantly increases precision, making its F 1 - S c o r e superior to other methods. This indicates that our method has a clear advantage in overall performance.
In the tracking experiments, as shown in Table 6, our detection method combined with NSA-DeepSORT tracking achieves better performance in metrics such as M O T A (72.0%), H O T A (29.5%), and I D F 1 (32.0%). These represent significant gains over the baseline, with improvements ranging from 10.0% to 14.5% for key metrics. Compared to YOLOv5s (baseline), our method improves M O T A by 13.3%, H O T A by 10.0%, and I D F 1 by 14.5%. Our method also reduces identity switches ( I D s w ) by 224 compared to the baseline. Overall, our approach consistently shows superior performance in both accuracy and efficiency for object detection and tracking in basic environments.
Furthermore, experiments with NSA-DeepSORT show that reducing ID switches ( I D s w ) significantly enhances tracking metrics. NSA-DeepSORT outperforms DeepSORT in tracking performance, particularly when ID switches are reduced. This occurs because mirror images have similar appearances, leading to incorrect ID switches of actual fish or frequent ID switches due to fish’s nonlinear movements challenging standard Kalman Filter predictions.

4.4. Performance in Decorative Environment

Table 7 shows the detection performance comparison between our method (YOLOv5s + DIoU-NMS + Mirror) and other algorithms in the decorative environment. Our method achieved a Precision of 0.918 and an F 1 - s c o r e of 0.909, demonstrating overall superior detection performance. Consistent with the basic environment experiments, we observed that reducing the M value led to an increase in precision. In the last row marked as M, which represents the number of misidentified mirrored fish counted as real fish, our method reduced the number of such detections by 1927 (from 3086 to 1159) compared to YOLOv5s (baseline). This reduction corresponds to a 62.4% decrease in M.
Table 8 shows the tracking performance of DeepSORT and NSA-DeepSORT tracking algorithms using different detection methods. When using the DeepSORT algorithm, our method achieved the highest M O T A of 57.6%, H O T A of 16.3%, and I D F 1 of 16.2%, indicating superior overall tracking performance. Under the NSA-DeepSORT tracking algorithm with different detection methods, our method achieved the highest M O T A of 60.5%, H O T A of 21.0%, and I D F 1 of 22.6%, further demonstrating its effectiveness with the NSA Kalman Filter. In terms of the I D s w metric, our method reduces ID switches by 174 compared to YOLOv5s, indicating better identity preservation performance. Additionally, experiments with NSA-DeepSORT showed that reducing ID switches ( I D s w ) significantly enhances tracking metrics. Compared to YOLOv5s (baseline), our method improves M O T A by 5.5%, H O T A by 5.5%, and I D F 1 by 8.3%, indicating that our method outperforms the original YOLOv5 approach.
Compared to the basic environment, tracking performance in the decorative environment is lower. The main reason for this decrease is that decorations can block the fish. When fish are completely blocked by decorations for a period of time, their tracking trajectories cannot be matched. Upon reappearance, these fish are assigned new IDs, resulting in ID switching and decreased tracking performance. Experiments based on the DeepSORT tracking algorithm indicate that the main difference in performance across environments lies in the detection method. Our results show that reducing the number of mirroring fish can enhance tracking performance.

4.5. Performance in Rippling Environment

Table 9 shows the detection results of each algorithm in the rippling environment. Our method achieved a precision of 0.968 and an F 1 - S c o r e of 0.947, demonstrating excellent detection accuracy. The M value of our method was 517, indicating effective removal of mirrored instances. Compared to YOLOv5, our method further reduces mirrored instances by 38.1%. This indicates that in this environment, our method also performs well in fish detection.
In the tracking experiments, as shown in Table 10, our method achieved the highest performance with the DeepSORT algorithm, with a H O T A of 64.0%, M O T A of 19.3%, and I D F 1 of 19.1%. This indicates that our method improves tracking performance by effectively handling mirror images. Under the NSA-DeepSORT tracking algorithm, our method achieved a H O T A of 66.9%, M O T A of 21.4%, and I D F 1 of 21.0%, with a reduction of 249 ID switches. This demonstrates better predictive performance with the NSA Kalman Filter.
The experiments reveal that tracking performance varies across different environments—basic, decorative, and rippling. Compared to the basic environment, both the decorative and rippling environments show lower overall tracking performance. However, the rippling environment demonstrates improvements over the decorative environment. In the rippling environment, our method shows better performance in key metrics like M O T P , M O D A , and D e t A , indicating that our trained detector performs well, even with the challenges posed by water ripples. Additionally, the rippling environment presents challenges such as an increased number of ID switches and a lower M T metric compared to the basic environment. Although our method demonstrates significant improvements over previous approaches, as reflected in higher M O T A , H O T A , and I D F 1 scores, it still faces challenges. In particular, there is room for improvement in minimizing ID switches and enhancing tracking consistency. Overall, our method outperforms other methods in the rippling environment. Specifically, it improved M O T A by 3.8%, H O T A by 4.5%, and I D F 1 by 3.2%, demonstrating better performance in tracking tasks.

4.6. Discussion

Figure 11 shows detection results of different methods in the three environments. The yellow block contains two real fish that overlap each other. However, YOLOv5 detects only one fish because it mistakenly removes the candidate box belonging to the second fish. This is because YOLOv5 uses traditional NMS, which tends to eliminate overlapping bounding boxes with high IoU scores. On the other hand, YOLOv5 + DIoU-NMS and YOLOv5 + DIoU-NMS + Mirror methods successfully identify both fish. The DIoU-NMS algorithm improves detection by considering both the overlap and the distance between bounding boxes. This approach allows for better separation between nearby but distinct objects, showing the effectiveness of Distance-IoU in such scenarios. The output of the YOLOv5 + Mirror method is the same as the YOLOv5 method. Due to space limitations, we did not include it in this figure.
Figure 12 shows fish detection results in environments with mirroring. The yellow block contains one real fish and its mirrored image. YOLOv5 cannot distinguish between the real fish and its mirrored image. This failure occurs because YOLOv5 treats the mirrored image as a separate object and cannot recognize reflections. In contrast, YOLOv5 + Mirror and YOLOv5 + DIoU-NMS + Mirror successfully remove the mirrored fish’s bounding box. This is because these methods account for mirror conditions. In particular, they use the mirror object removal algorithm that identifies symmetrical patterns and removes duplicate detections caused by reflections. The output of the YOLOv5 + DIoU-NMS method is the same as the YOLOv5 method. Due to space limitations, it is not included in this figure.
Our experiments demonstrate the effectiveness of DIoU-NMS in mitigating occlusion and enhancing recognition. They also show that the mirror removal algorithm reduces false detections, thereby improving tracking accuracy. Furthermore, the NSA Kalman Filter decreases ID switches and improves MOTA by better handling rapid and unpredictable fish motions. These enhancements collectively create a more accurate aquarium fish tracking system. While our study focused on a single fish species, the proposed method is adaptable to multi-species scenarios. In such cases, we can implement a preliminary classification step to identify different fish types. Following this classification, the DIoU-NMS, mirror removal algorithm, and NSA Kalman Filter can be applied directly without modification.
To enhance the robustness, generalizability, and completeness of our system, future work will include experimental comparisons with more recent multi-object trackers such as ByteTrack [38] and FairMOT [39], as well as compatible methods listed in Table 1. Although many of these rely on different sensing modalities or assume single-object tracking, adapting our framework for fair comparison will improve benchmarking and cross-domain applicability. We also plan to expand our dataset by incorporating publicly available top-down or underwater video data, enabling better generalization and reproducibility across environments. To handle more diverse real-world scenarios, we aim to validate our system on multiple fish species by incorporating a species classification module prior to mirror removal and tracking.
Furthermore, to address the limitations of our current heuristic-based mirror removal algorithm, future directions will include learning-based or hybrid methods that adapt to varied lighting, camera angles, and tank configurations. Depth-assisted or multi-angle-aware techniques will also be explored to scale the system toward more complex 3D environments. Finally, inspired by recent progress in underwater detection, we intend to integrate advanced techniques such as feature fusion, semantic decoupling, and content-aware modules to improve detection accuracy under occlusion, reflection, and other visual challenges common in ornamental fish monitoring.

5. Conclusions

In this paper, we present an innovative fish tracking system named the FITS that combines YOLOv5 for detection and NSA-DeepSORT for tracking. The major improvements include replacing standard NMS with DIoU-NMS to better handle overlapping fish, introducing a novel mirror removal algorithm to eliminate false detections, and enhancing tracking accuracy with the NSA Kalman Filter (NSA-DeepSORT). Our system demonstrates effective performance across various aquarium environments, comparing it with YOLOv5 and YOLOv7 methods. The results show that our method can increase Multiple Object Tracking Accuracy (MOTA) by up to 13.3%, Higher Order Tracking Accuracy (HOTA) by up to 10.0%, and Identification F1 Score (IDF1) by up to 14.5%. These findings confirm that our mirror removal algorithm effectively improves Multiple Object Tracking Accuracy. The NSA-DeepSORT algorithm enhances overall tracking performance by minimizing ID switches. In the future, we aim to focus on advanced applications in aquarium management, such as trace analysis and illness prediction. These developments could lead to better monitoring of fish health, early detection of diseases, and improved overall aquarium maintenance.

Author Contributions

Conceptualization: K.-D.Z., E.T.-H.C., C.-R.L., and J.-H.S.; Writing—original draft preparation: K.-D.Z. and J.-H.S.; Writing—review and editing: E.T.-H.C. and C.-R.L.; Supervision: E.T.-H.C. and C.-R.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The datasets generated or analyzed during the current study are available in the zenodo repository, [https://zenodo.org/doi/10.5281/zenodo.13208793], last accessed on 4 August 2025.

Acknowledgments

We would like to thank Hsu Cheng-Huang and Liu Zhe-Zhang for assistance with proofreading. We acknowledge the use of AI language tools (ChatGPT [https://openai.com/zh-Hant/chatgpt/overview/, last accessed on 4 August 2025], Gemini [https://gemini.google.com/app?hl=zh-TW, last accessed on 4 August 2025], and Claude [https://claude.ai/, last accessed on 4 August 2025]) for proofreading assistance. The authors take full responsibility for the content of this publication.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Asia Pacific Ornamental Fish Market Size. Available online: https://www.grandviewresearch.com/industry-analysis/asia-pacific-ornamental-fish-market-report (accessed on 20 January 2024).
  2. Mecha, N.J.M.F.; Gonzales-Plasus, M.M. The Common Diseases of Freshwater Ornamental Fishes and the Treatments Applied by Local Fish Owners in Puerto Princesa City, Palawan, Philippines. J. Fish. 2023, 11, 111205. Available online: https://journal.bdfish.org/index.php/fisheries/article/view/485 (accessed on 20 January 2024).
  3. Anjur, N.; Sabran, S.F.; Daud, H.M.; Othman, N.Z. An Update on the Ornamental Fish Industry in Malaysia: Aeromonas Hydrophila-Associated Disease and Its Treatment Control. Vet. World 2021, 14, 1143–1152. Available online: https://pmc.ncbi.nlm.nih.gov/articles/PMC8243671/ (accessed on 10 August 2024). [CrossRef]
  4. Shreesha, S.; Manohara Pai, M.M.; Ujjwal, V.; Radhika, M.P. Computer Vision Based Fish Tracking And Behaviour Detection System. In Proceedings of the 2020 IEEE International Conference on Distributed Computing, VLSI, Electrical Circuits and Robotics (DISCOVER), Udupi, India, 30–31 October 2020; pp. 252–257. [Google Scholar] [CrossRef]
  5. Gupta, A.; Bringsdal, E.; Knausgård, K.M.; Goodwin, M. Accurate Wound and Lice Detection in Atlantic Salmon Fish Using a Convolutional Neural Network. Fishes 2022, 7, 345. [Google Scholar] [CrossRef]
  6. Waleed, A.; Medhat, H.; Esmail, M.; Osama, K.; Samy, R.; Ghanim, T.M. Automatic Recognition of Fish Diseases in Fish Farms. In Proceedings of the 2019 14th International Conference on Computer Engineering and Systems (ICCES), Cairo, Egypt, 17 December 2019; pp. 201–206. [Google Scholar] [CrossRef]
  7. Yu, G.; Zhang, J.; Chen, A.; Wan, R. Detection and Identification of Fish Skin Health Status Referring to Four Common Diseases Based on Improved YOLOv4 Model. Fishes 2023, 8, 186. [Google Scholar] [CrossRef]
  8. Sikder, J.; Sarek, K.; Das, U. Fish Disease Detection System: A Case Study of Freshwater Fishes of Bangladesh. Int. J. Adv. Comput. Sci. Appl. 2021, 12, 861–865. [Google Scholar] [CrossRef]
  9. Hou, Y.; Zou, X.; Tang, W.; Jiang, W.; Zhu, J.; Deng, C.; Zhang, Y. Precise Capture of Fish Movement Trajectories in Complex Environments via Ultrasonic Signal Tag Tracking. Fish. Res. 2019, 219, 105307. [Google Scholar] [CrossRef]
  10. Saberioon, M.M.; Cisar, P. Automated Multiple Fish Tracking in Three-Dimension Using a Structured Light Sensor. Comput. Electron. Agric. 2016, 121, 215–221. [Google Scholar] [CrossRef]
  11. Xu, W.; Zhu, Z.; Ge, F.; Han, Z.; Li, J. Analysis of Behavior Trajectory Based on Deep Learning in Ammonia Environment for Fish. Sensors 2020, 20, 4425. [Google Scholar] [CrossRef]
  12. Zhao, X.; Yan, S.; Gao, Q. An Algorithm for Tracking Multiple Fish Based on Biological Water Quality Monitoring. IEEE Access 2019, 7, 15018–15026. [Google Scholar] [CrossRef]
  13. Zhang, H.; Li, W.; Qi, Y.; Liu, H.; Li, Z. Dynamic Fry Counting Based on Multi-Object Tracking and One-Stage Detection. Comput. Electron. Agric. 2023, 209, 107871. [Google Scholar] [CrossRef]
  14. Jocher, G.; Chaurasia, A.; Stoken, A.; Borovec, J.; NanoCode012; Kwon, Y.; Michael, K.; TaoXie; Fang, J.; Imyhxy; et al. Ultralytics/Yolov5: V7.0—Yolov5 Sota Realtime Instance Segmentation. Available online: https://ui.adsabs.harvard.edu/abs/2022zndo...7347926J/abstract (accessed on 10 September 2023).
  15. Wojke, N.; Bewley, A.; Paulus, D. Simple Online and Realtime Tracking with a Deep Association Metric. arXiv 2017, arXiv:1703.07402. [Google Scholar] [CrossRef]
  16. Hu, J.; Zhao, D.; Zhang, Y.; Zhou, C.; Chen, W. Real-Time Nondestructive Fish Behavior Detecting in Mixed Polyculture System Using Deep-Learning and Low-Cost Devices. Expert Syst. Appl. 2021, 178, 115051. [Google Scholar] [CrossRef]
  17. Zhao, S.; Zhang, S.; Lu, J.; Wang, H.; Feng, Y.; Shi, C.; Li, D.; Zhao, R. A Lightweight Dead Fish Detection Method Based on Deformable Convolution and YOLOV4. Comput. Electron. Agric. 2022, 198, 107098. [Google Scholar] [CrossRef]
  18. Liang, J.-M.; Mishra, S.; Cheng, Y.-L. Applying Image Recognition and Tracking Methods for Fish Physiology Detection Based on a Visual Sensor. Sensors 2022, 22, 5455. [Google Scholar] [CrossRef]
  19. Cai, Y.; Yao, Z.; Jiang, H.; Qin, W.; Xiao, J.; Huang, X.; Pan, J.; Feng, H. Rapid Detection of Fish with SVC Symptoms Based on Machine Vision Combined with a NAM-YOLO v7 Hybrid Model. Aquaculture 2024, 582, 740558. [Google Scholar] [CrossRef]
  20. Zhong, J.; Qian, H.; Wang, H.; Wang, W.; Zhou, Y. Improved real-time object detection method based on YOLOv8: A refined approach. J. Real-Time Image Proc. 2025, 22, 4. [Google Scholar] [CrossRef]
  21. Deng, L.; Luo, S.; He, C.; Xiao, H.; Wu, H. Underwater small and occlusion object detection with feature fusion and global context decoupling head-based YOLO. Multimed. Syst. 2024, 30, 208. [Google Scholar] [CrossRef]
  22. Qian, Z.-M.; Wang, S.H.; Cheng, X.E.; Chen, Y.Q. An Effective and Robust Method for Tracking Multiple Fish in Video Image Based on Fish Head Detection. BMC Bioinform. 2016, 17, 251. [Google Scholar] [CrossRef] [PubMed]
  23. Wang, S.H.; Zhao, J.W.; Chen, Y.Q. Robust Tracking of Fish Schools Using CNN for Head Identification. Multimed. Tools Appl. 2017, 76, 23679–23697. [Google Scholar] [CrossRef]
  24. Wang, H.; Zhang, S.; Zhao, S.; Wang, Q.; Li, D.; Zhao, R. Real-Time Detection and Tracking of Fish Abnormal Behavior Based on Improved YOLOV5 and SiamRPN++. Comput. Electron. Agric. 2022, 192, 106512. [Google Scholar] [CrossRef]
  25. Mei, Y.; Yan, N.; Qin, H.; Yang, T.; Chen, Y. SiamFCA: A New Fish Single Object Tracking Method Based on Siamese Network with Coordinate Attention in Aquaculture. Comput. Electron. Agric. 2024, 216, 108542. [Google Scholar] [CrossRef]
  26. Li, W.; Li, F.; Li, Z. CMFTNet: Multiple Fish Tracking Based on Counterpoised JointNet. Comput. Electron. Agric. 2022, 198, 107018. [Google Scholar] [CrossRef]
  27. Hu, Y.; Niu, A.; Zhu, Y.; Yan, Q.; Sun, J.; Zhang, Y. Multiple Object Tracking based on Occlusion-Aware Embedding Consistency Learning. In Proceedings of the ICASSP 2024—2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Seoul, Republic of Korea, 14–19 April 2024. [Google Scholar] [CrossRef]
  28. Zhai, X.; Wei, H.; Wu, H.; Zhao, Q.; Huang, M. Multi-Target Tracking Algorithm in Aquaculture Monitoring Based on Deep Learning. Ocean. Eng. 2023, 289, 116005. [Google Scholar] [CrossRef]
  29. Hussain, M. YOLO-v1 to YOLO-v8, the Rise of YOLO and Its Complementary Nature toward Digital Manufacturing and Industrial Defect Detection. Machines 2023, 11, 677. [Google Scholar] [CrossRef]
  30. Ansel, J.; Yang, E.; He, H.; Gimelshein, N.; Jain, A.; Voznesensky, M.; Bao, B.; Bell, P.; Berard, D.; Burovski, E.; et al. PyTorch 2: Faster Machine Learning Through Dynamic Python Bytecode Transformation and Graph Compilation [Conference paper]. In Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS ’24), San Diego, CA, USA, 27 April–1 May 2024; Volume 2. [Google Scholar] [CrossRef]
  31. Zheng, Z.; Wang, P.; Liu, W.; Li, J.; Ye, R.; Ren, D. Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression. arXiv 2019, arXiv:1911.08287. [Google Scholar] [CrossRef]
  32. Bewley, A.; Ge, Z.; Ott, L.; Ramos, F.; Upcroft, B. Simple Online and Realtime Tracking. In Proceedings of the 2016 IEEE International Conference on Image Processing (ICIP), Phoenix, AZ, USA, 25–28 September 2016. [Google Scholar] [CrossRef]
  33. Du, Y.; Wan, J.; Zhao, Y.; Zhang, B.; Tong, Z.; Dong, J. GIAOTracker: A Comprehensive Framework for MCMOT with Global Information and Optimizing Strategies in VisDrone 2021. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), Montreal, BC, Canada, 11–17 October 2021; Volume 178, pp. 2809–2819. [Google Scholar] [CrossRef]
  34. Wojke, N.; Bewley, A.; Paulus, D. DeepSORT. 2017. Available online: https://github.com/nwojke/deep_sort (accessed on 22 May 2024).
  35. Tzutalin. LabelImg. Free Software: MIT License. 2015. Available online: https://github.com/tzutalin/labelImg (accessed on 10 September 2023).
  36. Dendorfer, P.; Osep, A.; Milan, A.; Schindler, K.; Cremers, D.; Reid, I.; Roth, S.; Leal-Taixé, L. MOTChallenge: A Benchmark for Single-Camera Multiple Target Tracking. Int. J. Comput. Vis. 2020, 1–37. [Google Scholar] [CrossRef]
  37. Wang, C.-Y.; Bochkovskiy, A.; Liao, H.-Y.M. YOLOv7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors. In Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 7464–7475. [Google Scholar] [CrossRef]
  38. Zhang, Y.; Sun, P.; Jiang, Y.; Yu, D.; Weng, F.; Yuan, Z.; Luo, P.; Liu, W.; Wang, X. ByteTrack: Multi-object Tracking by Associating Every Detection Box. In Computer Vision—ECCV 2022, Proceedings of 17th European Conference, Tel Aviv, Israel, 23–27 October 2022; Springer: Berlin/Heidelberg, Germany, 2022. [Google Scholar] [CrossRef]
  39. Zhang, Y.; Wang, C.; Wang, X.; Zeng, W.; Liu, W. FairMOT: On the Fairness of Detection and Re-identification in Multiple Object Tracking. Int. J. Comput. Vis. 2021, 129, 3069–3087. [Google Scholar] [CrossRef]
Figure 1. Representative methods in prior fish tracking studies. (a) Experimental setup by Cai et al. [19] for detecting Spring Viremia of Carp symptoms using YOLOv7 with enhanced image preprocessing. (b) Fish head-based trajectory tracking by Qian et al. [22]. (c) Multiple fish identification using morphological features by Wang et al. [23].
Figure 1. Representative methods in prior fish tracking studies. (a) Experimental setup by Cai et al. [19] for detecting Spring Viremia of Carp symptoms using YOLOv7 with enhanced image preprocessing. (b) Fish head-based trajectory tracking by Qian et al. [22]. (c) Multiple fish identification using morphological features by Wang et al. [23].
Electronics 14 03187 g001
Figure 2. Fish tracking system setup with PC, top-mounted webcam, aquarium filter, and fish tank.
Figure 2. Fish tracking system setup with PC, top-mounted webcam, aquarium filter, and fish tank.
Electronics 14 03187 g002
Figure 3. FITS software architecture.
Figure 3. FITS software architecture.
Electronics 14 03187 g003
Figure 4. Comparison of training results of YOLOv5 and YOLOv7.
Figure 4. Comparison of training results of YOLOv5 and YOLOv7.
Electronics 14 03187 g004
Figure 5. Comparison of NMS and DIoU-NMS in detecting occluded fish. (a) Original image with three fish, (b) YOLO candidate boxes, (c) NMS result with two fish, and (d) DIoU-NMS result with three fish.
Figure 5. Comparison of NMS and DIoU-NMS in detecting occluded fish. (a) Original image with three fish, (b) YOLO candidate boxes, (c) NMS result with two fish, and (d) DIoU-NMS result with three fish.
Electronics 14 03187 g005
Figure 6. Illustration of mirrored fish detection. Fish A, B, and E are real fish. Fish F is a mirrored reflection of E, while fish C and D are mirrored reflections of B. (a) Before removal. (b) After removal.
Figure 6. Illustration of mirrored fish detection. Fish A, B, and E are real fish. Fish F is a mirrored reflection of E, while fish C and D are mirrored reflections of B. (a) Before removal. (b) After removal.
Electronics 14 03187 g006
Figure 7. ROI effects on fish counting: (a) 10 real fish (correct); (b) after using ROI1: 6 fish (incorrect); (c) after using ROI2: 12 fish (incorrect).
Figure 7. ROI effects on fish counting: (a) 10 real fish (correct); (b) after using ROI1: 6 fish (incorrect); (c) after using ROI2: 12 fish (incorrect).
Electronics 14 03187 g007
Figure 8. The field of view of the webcam.
Figure 8. The field of view of the webcam.
Electronics 14 03187 g008
Figure 9. NSA-DeepSORT flow.
Figure 9. NSA-DeepSORT flow.
Electronics 14 03187 g009
Figure 10. Fish tank three environment diagrams. (a) Basic environment. (b) Decorative environment. (c) Rippling environment.
Figure 10. Fish tank three environment diagrams. (a) Basic environment. (b) Decorative environment. (c) Rippling environment.
Electronics 14 03187 g010
Figure 11. The fish detection results of different methods in three environments with overlap. (a) YOLOv5s, (b) YOLOv5s + DIoU-NMS, (c) YOLOv5s + DIoU-NMS + Mirror.
Figure 11. The fish detection results of different methods in three environments with overlap. (a) YOLOv5s, (b) YOLOv5s + DIoU-NMS, (c) YOLOv5s + DIoU-NMS + Mirror.
Electronics 14 03187 g011
Figure 12. The fish detection results of different methods in three environments with mirroring. (a) YOLOv5s, (b) YOLOv5s + Mirror, (c) YOLOv5s + DIoU-NMS + Mirror.
Figure 12. The fish detection results of different methods in three environments with mirroring. (a) YOLOv5s, (b) YOLOv5s + Mirror, (c) YOLOv5s + DIoU-NMS + Mirror.
Electronics 14 03187 g012
Table 1. Comparison of fish tracking methods.
Table 1. Comparison of fish tracking methods.
Specific Challenges
Addressed
Experimental
Design
Reference/
Year
Tracking
Method
Water
Ripple
Water
Bubble
OcclusionMirroringEnvironmentTracking
Evaluation Indicators
Qian, Z.-M. et al.
[22]/
2016
SDECMTT, PTT, TIS
Wang, S.-H. et al.
[23]/
2017
SDECPrecision, Recall, F1,
MT, ML, Frag,
IDS
Li, W. et al.
[26]/
2022
JDEBIDF1, IDP, IDR,
MOTA, Rcll, AP,
IDsw
Wang, H. et al.
[24]/
2022
SDEBAUC, Precision
Zhai, X. et al.
[28]/
2023
SDEAMOTA, MOTP,
IDS, FN, FP
Mei, Y. et al.
[25]/
2024
SDEBPrecision, Success
FITSSDECMOTA, HOTA, IDF1,
MOTP, MODA, DetA,
IDsw, MT
Note: A is for marine environment, B is for aquaculture environment, and C is for fish tank environment.
Table 2. Comparison of the custom YOLOv5 and custom YOLOv7 training results data.
Table 2. Comparison of the custom YOLOv5 and custom YOLOv7 training results data.
AlgorithmClassEpochImagesLabelsPrecisionRecallmAP@.5mAP@.5:.95
YOLOv5ALL10052947610.9790.9470.9810.672
YOLOv7ALL30052947610.8910.8480.8950.558
Table 3. Evaluation metrics for fish detection and tracking.
Table 3. Evaluation metrics for fish detection and tracking.
MetricFormulaRepresentationCategory
Precision T P T P + F P Detection
Recall T P T P + F N Detection
F 1 - S c o r e 2 × Precision × Recall Precision + Recall Detection
M k m k Detection
M O T A 1 k ( F N k + F P k + I D S W k ) k G T k Tracking
D e t A α T P T P + F P + F N Tracking
A s s A α 1 T P c { T P } T P A ( c ) T P A ( c ) + F N A ( c ) + F P A ( c ) Tracking
H O T A α D e t A α × A s s A α Tracking
H O T A 1 0 H O T A α d α Tracking
I D F 1 2 × I D T P 2 × I D T P + I D F P + I D F N Tracking
M O T P i , k μ i , k k q k Tracking
M O D A 1 F N + F P G T Tracking
M T χ Tracking
I D s w k I D S W k Tracking
Table 4. Notations for fish detection and tracking.
Table 4. Notations for fish detection and tracking.
NotationsDefinitions
T P Number of correctly identified fish as fish
F P Number of instances where another object is incorrectly identified as a fish
F N Number of instances where an actual fish is missed by the model
m k Number of mirrored fish identified as fish in the k-th frame
I D S W k Number of object identity switches in the k-th frame
G T k Total number of actual fish in the k-th frame
T P A ( c ) The set c where the predicted track ID matches the actual track ID
F N A ( c ) The set c where the predicted track ID does not corresponds to any actual ID
F P A ( c ) The set c where the predicted track ID is incorrect
I D T P Number of correctly assigned identities
I D F P Number of incorrectly assigned identities
I D F N Number of unassigned identities
μ i , k Bounding box of target i overlaps with the ground truth in the k-th frame
q k Number of matches in the k-th frame
χ Number of more than 80% of the trajectory is tracked
Table 5. Comparison of detection results in the basic environment.
Table 5. Comparison of detection results in the basic environment.
AlgorithmsPrecisionRecall F 1 - Score M
YOLOv7 [37]0.8110.9490.8755927 (+2.2%)
YOLOv5s [14] (Baseline)0.7360.9340.8236062
YOLOv5s + DIoU-NMS0.7360.9420.8266062 (0%)
YOLOv5s + Mirror0.9010.9270.9141435 (−76.3%)
YOLOv5s + DIoU-NMS + Mirror (Ours)0.9040.9270.9151388 (−77.1%)
Table 6. Comparison of tracking results in the basic environment.
Table 6. Comparison of tracking results in the basic environment.
AlgorithmsDetector MOTA  (%) HOTA  (%) IDF 1  (%) MOTP  (%)
DeepSORT [15]YOLOv7 [37]23.716.815.171.2
 YOLOv5s [14] (Baseline)58.719.517.572.8
 YOLOv5s + DIoU-NMS59.821.520.772.8
 YOLOv5s + Mirror65.519.218.472.8
 YOLOv5s + DIoU-NMS+
Mirror (Ours)
67.822.323.773.2
NSA-DeepSORTYOLOv7 [37]28.419.817.671.5
 YOLOv5s [14]63.426.125.173.3
 YOLOv5s + DIoU-NMS63.626.224.873.3
 YOLOv5s + Mirror68.226.226.873.3
 YOLOv5s + DIoU-NMS+
Mirror (Ours)
72.0 (+13.3)29.5 (+10.0)32.0 (+14.5)73.9 (+1.1)
AlgorithmsDetector MODA  (%) DetA  (%) ID sw   MT
DeepSORT [15]YOLOv7 [37]26.941.05338
 YOLOv5s [14] (Baseline)61.752.55199
 YOLOv5s + DIoU-NMS60.952.25138
 YOLOv5s + Mirror68.754.85068
 YOLOv5s + DIoU-NMS +
Mirror (Ours)
70.455.84528
NSA-DeepSORTYOLOv7 [37]30.842.34118
 YOLOv5s [14]65.354.63379
 YOLOv5s + DIoU-NMS65.654.63309
 YOLOv5s + Mirror67.055.23189
 YOLOv5s + DIoU-NMS +
Mirror (Ours)
73.7 (+12.0)57.6 (+5.1)295 (−224)9
Table 7. Comparison of detection results in the decorative environment.
Table 7. Comparison of detection results in the decorative environment.
AlgorithmsPrecisionRecall F 1 - Score M
YOLOv7 [37]0.8480.6830.7574012 (+30.0%)
YOLOv5s [14] (Baseline)0.8300.9510.8863086
YOLOv5s + DIoU-NMS0.8300.9530.8873086 (0%)
YOLOv5s + Mirror0.9140.8990.9041224 (−60.3%)
YOLOv5s + DIoU-NMS + Mirror (Ours)0.9180.9010.9091159 (−62.4%)
Table 8. Comparison of tracking results in the decorative environment.
Table 8. Comparison of tracking results in the decorative environment.
AlgorithmsDetector MOTA  (%) HOTA  (%) IDF 1  (%) MOTP  (%)
DeepSORT [15]YOLOv7 [37]22.313.512.470.8
 YOLOv5s [14] (Baseline)55.015.514.370.4
 YOLOv5s + DIoU-NMS55.915.915.570.4
 YOLOv5s + Mirror56.416.216.170.4
 YOLOv5s + DIoU-NMS +
Mirror (Ours)
57.616.316.270.7
NSA-DeepSORTYOLOv7 [37]25.715.714.171.0
 YOLOv5s [14]59.820.723.070.8
 YOLOv5s + DIoU-NMS60.219.520.270.7
 YOLOv5s + Mirror60.419.019.670.8
 YOLOv5s + DIoU-NMS +
Mirror(Ours)
60.5 (+5.5)21.0 (+5.5)22.6 (+8.3)70.7 (+0.3)
AlgorithmsDetector MODA  (%) DetA  (%) ID sw   MT
DeepSORT [15]YOLOv7 [37]25.739.05735
 YOLOv5s [14] (Baseline)58.649.25846
 YOLOv5s + DIoU-NMS58.649.15626
 YOLOv5s + Mirror60.349.35905
 YOLOv5s + DIoU-NMS +
Mirror (Ours)
61.450.15635
NSA-DeepSORTYOLOv7 [37]28.540.04685
 YOLOv5s [14]62.451.14416
 YOLOv5s + DIoU-NMS62.150.94346
 YOLOv5s + Mirror62.951.44256
 YOLOv5s + DIoU-NMS +
Mirror (Ours)
63.1 (+4.5)51.2 (+2.0)410 (−174)6
Table 9. Comparison of detection results in the rippling environment.
Table 9. Comparison of detection results in the rippling environment.
AlgorithmsPrecisionRecall F 1 - Score M
YOLOv7 [37]0.8510.6870.7603651 (+337.2%)
YOLOv5s [14] (Baseline)0.9480.9450.946835
YOLOv5s + DIoU-NMS0.9480.9470.947835 (0%)
YOLOv5s + Mirror0.9600.9230.941597 (−28.5%)
YOLOv5s + DIoU-NMS + Mirror (Ours)0.9680.9270.947517 (−38.1%)
Table 10. Comparison of tracking results in the rippling environment.
Table 10. Comparison of tracking results in the rippling environment.
AlgorithmsDetector MOTA  (%) HOTA  (%) IDF 1  (%) MOTP  (%)
DeepSORT [15]YOLOv7 [37]38.313.011.270.9
 YOLOv5s [14] (Baseline)63.116.917.871.2
 YOLOv5s + DIoU-NMS63.216.317.971.2
 YOLOv5s + Mirror63.319.218.472.2
 YOLOv5s + DIoU-NMS+
Mirror (Ours)
64.019.319.171.2
NSA-DeepSORTYOLOv7 [37]42.916.414.771.1
 YOLOv5s [14]65.319.920.071.6
 YOLOv5s + DIoU-NMS65.419.920.271.5
 YOLOv5s + Mirror66.521.020.870.2
 YOLOv5s + DIoU-NMS+
Mirror (Ours)
66.9 (+3.8)21.4 (+4.5)21.0 (+3.2)71.6 (+0.4)
AlgorithmsDetector MODA  (%) DetA  (%) ID sw   MT
DeepSORT [15]YOLOv7 [37]44.443.810105
 YOLOv5s [14] (Baseline)67.853.17875
 YOLOv5s + DIoU-NMS67.852.57665
 YOLOv5s + Mirror68.754.87565
 YOLOv5s + DIoU-NMS+
Mirror (Ours)
68.354.97175
NSA-DeepSORTYOLOv7 [37]48.245.38776
 YOLOv5s [14]69.253.86516
 YOLOv5s + DIoU-NMS69.653.46896
 YOLOv5s + Mirror70.154.35986
 YOLOv5s + DIoU-NMS+
Mirror (Ours)
71.6 (+3.8)54.4 (+1.3)538 (−249)6
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhang, K.-D.; Chu, E.T.-H.; Lee, C.-R.; Su, J.-H. Enhancing Aquarium Fish Tracking with Mirror Reflection Elimination and Enhanced Deep Learning Techniques. Electronics 2025, 14, 3187. https://doi.org/10.3390/electronics14163187

AMA Style

Zhang K-D, Chu ET-H, Lee C-R, Su J-H. Enhancing Aquarium Fish Tracking with Mirror Reflection Elimination and Enhanced Deep Learning Techniques. Electronics. 2025; 14(16):3187. https://doi.org/10.3390/electronics14163187

Chicago/Turabian Style

Zhang, Kai-Di, Edward T.-H. Chu, Chia-Rong Lee, and Jhih-Hua Su. 2025. "Enhancing Aquarium Fish Tracking with Mirror Reflection Elimination and Enhanced Deep Learning Techniques" Electronics 14, no. 16: 3187. https://doi.org/10.3390/electronics14163187

APA Style

Zhang, K.-D., Chu, E. T.-H., Lee, C.-R., & Su, J.-H. (2025). Enhancing Aquarium Fish Tracking with Mirror Reflection Elimination and Enhanced Deep Learning Techniques. Electronics, 14(16), 3187. https://doi.org/10.3390/electronics14163187

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop