Cost-Effective Fish Volume Estimation in Aquaculture Using Infrared Imaging and Multi-Modal Deep Learning
Abstract
1. Introduction
2. Related Works
2.1. Fish Detection and Segmentation in Underwater and Low-Visibility Settings
2.2. Fish Size and Biomass Estimation
2.3. Multi-Modal and Generative Approaches for Missing Depth/Color Cues
2.4. This Work in Context
3. Materials and Methods
3.1. Dataset Collection and Preparation
- Infrared (IR) channel: Capturing thermal signatures, resilient to lighting variations and water clarity typical in ornamental setups.
- RGB channel: Providing color and texture information, critical for capturing goldfish’s vibrant patterns for generative model supervision.
- Depth channel: Offering metric depth maps (0.5–2 m range) as ground truth for 3D reconstruction.

- Geometric: Random rotations (±15°), horizontal/vertical flips, and affine shearing (up to 10°).
- Intensity: Brightness/contrast adjustments (±20%), histogram equalization for IR normalization.
- Domain-specific: Synthetic water noise via Gaussian blurring () and ripple simulations using sinusoidal wave functions:where pixels (amplitude), (frequency), and (phase).
3.2. Overall Pipeline
3.3. Depth Estimation: IR-to-Depth Mapping
3.4. Generative Module: IR-to-RGB Translation
3.5. Instance Segmentation
3.6. Fish Detection and Tracking
3.7. Volume Estimation
4. Experiments
4.1. Visualization of the Training Process
4.2. Object Tracking Results
4.3. Feature Visualization Analysis
4.4. Depth Estimation Results
4.5. Instance Segmentation Results
4.6. Heatmap Analysis
4.7. Synergistic Analysis of Modules
4.8. Ablation Experiments
4.9. Comparative Experiments
5. Discussion
5.1. Advantages and Limitations of the Cost-Effective Pipeline
5.2. Generalization to Commercial Species
5.3. Environmental Robustness and Thermal Considerations
5.4. Future Work
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
| IR | Infrared |
| RGB | Red Green Blue |
| RGBD | Red Green Blue Depth |
| MAE | Mean Absolute Error |
| RMSE | Root Mean Square Error |
| MAPE | Mean Absolute Percentage Error |
| IoU | Intersection over Union |
| mAP | mean Average Precision |
| FPS | Frames Per Second |
| CNN | Convolutional Neural Network |
| GAN | Generative Adversarial Network |
| YOLO | You Only Look Once |
| FPN | Feature Pyramid Network |
| RPN | Region Proposal Network |
| TDT | Trajectory–Depth Transformer |
| FCR | Feed Conversion Ratio |
| FAO | Food and Agriculture Organization |
References
- Priyadarshani, I.; Sahu, D.; Rath, B. Algae in aquaculture. Int. J. Health Sci. Res. 2012, 2, 108–114. [Google Scholar]
- Diaz, C.J.; Douglas, K.J.; Kang, K.; Kolarik, A.L.; Malinovski, R.; Torres-Tiji, Y.; Molino, J.V.; Badary, A.; Mayfield, S.P. Developing algae as a sustainable food source. Front. Nutr. 2023, 9, 1029841. [Google Scholar] [CrossRef] [PubMed]
- Ullmann, J.; Grimm, D. Algae and their potential for a future bioeconomy, landless food production, and the socio-economic impact of an algae industry. Org. Agric. 2021, 11, 261–267. [Google Scholar] [CrossRef]
- FAO. The State of World Fisheries and Aquaculture 2020: Sustainability in Action; Food and Agriculture Organization: Rome, Italy, 2020; pp. 1–244. [Google Scholar]
- FAO. The State of World Fisheries and Aquaculture 2024: Blue Transformation in Action; FAO: Rome, Italy, 2024. [Google Scholar]
- Habib, M.; Singh, S.; Jan, S.; Jan, K.; Bashir, K. The future of the future foods: Understandings from the past towards SDG-2. npj Sci. Food 2025, 9, 138. [Google Scholar] [CrossRef]
- Wise, T.A. Eating Tomorrow: Agribusiness, Family Farmers, and the Battle for the Future of Food; The New Press: New York, NY, USA, 2019. [Google Scholar]
- Bhujel, R.C. Global aquatic food production. In Aquatic Food Security; CABI: Wallingford, UK, 2024; pp. 99–126. [Google Scholar]
- Saini, V.P.; Singh, A.K.; Paul, T.; Thakur, A.; Singh, M. Advances in Environment Management for Sustainable Fisheries & Livestock Production; Bihar Animal Sciences University: Patna, India, 2024. [Google Scholar]
- Davison, C.; Michie, C.; Tachtatzis, C.; Andonovic, I.; Bowen, J.; Duthie, C.A. Feed conversion ratio (FCR) and performance group estimation based on predicted feed intake for the optimisation of beef production. Sensors 2023, 23, 4621. [Google Scholar] [CrossRef]
- Aftab, K.; Tschirren, L.; Pasini, B.; Zeller, P.; Khan, B.; Fraz, M.M. Intelligent fisheries: Cognitive solutions for improving aquaculture commercial efficiency through enhanced biomass estimation and early disease detection. Cogn. Comput. 2024, 16, 2241–2263. [Google Scholar] [CrossRef]
- O’Byrne, M.; Schoefs, F.; Pakrashi, V.; Ghosh, B. An underwater lighting and turbidity image repository for analysing the performance of image-based non-destructive techniques. Struct. Infrastruct. Eng. 2018, 14, 104–123. [Google Scholar] [CrossRef]
- Elmezain, M.; Saoud, L.S.; Sultan, A.; Heshmat, M.; Seneviratne, L.; Hussain, I. Advancing underwater vision: A survey of deep learning models for underwater object recognition and tracking. IEEE Access 2025, 13, 17830–17867. [Google Scholar] [CrossRef]
- Rahman, M.A.; Wang, Y. Optimizing intersection-over-union in deep neural networks for image segmentation. In Proceedings of the International Symposium on Visual Computing, Las Vegas, NV, USA, 12–14 December 2016; pp. 234–244. [Google Scholar]
- Liao, Y.; Deschamps, F.; Loures, E.D.; Ramos, L.F. Past, present and future of Industry 4.0-a systematic literature review and research agenda proposal. Int. J. Prod. Res. 2017, 55, 3609–3629. [Google Scholar] [CrossRef]
- Rahman, M.I.M.; Razak, A.F.A.; Majeed, A.P.A.; Musa, R.M.; Jalil, A.A.A.; Khairuddin, I.M.; Abdullah, M.A.; Razman, M.A.M. Analyzing fish detection and classification in IoT-based aquatic ecosystems through deep learning. PeerJ Comput. Sci. 2026, 12, e3496. [Google Scholar] [CrossRef]
- Zhang, J.; Qian, C.; Xu, J.; Tu, X.; Jiang, X.; Liu, S. Salient object detection guided fish phenotype segmentation in high-density underwater scenes via multi-task learning. Fishes 2025, 10, 627. [Google Scholar] [CrossRef]
- Liu, Z.; Yan, Z.; Li, G. FDMNet: A multi-task network for joint detection and segmentation of three fish diseases. J. Imaging 2025, 11, 305. [Google Scholar] [CrossRef] [PubMed]
- Li, L.; Shi, G.; Jiang, T. Fish detection method based on improved YOLOv5. Aquac. Int. 2023, 31, 2513–2530. [Google Scholar] [CrossRef]
- Li, D.; Yang, Y.; Zhao, S.; Ding, J. Segmentation of underwater fish in complex aquaculture environments using enhanced soft attention mechanism. Environ. Model. Softw. 2024, 181, 106170. [Google Scholar] [CrossRef]
- Al Muksit, A.; Hasan, F.; Emon, M.F.H.B.; Haque, M.R.; Anwary, A.R.; Shatabda, S. YOLO-Fish: A robust fish detection model to detect fish in realistic underwater environment. Ecol. Inform. 2022, 72, 101847. [Google Scholar] [CrossRef]
- Morimoto, T.; Zin, T.T.; Itami, T. A study on abnormal behavior detection of infected shrimp. In 2018 IEEE 7th Global Conference on Consumer Electronics (GCCE); IEEE: New York, NY, USA, 2018. [Google Scholar]
- De Graaf, G.; Prein, M. Fitting growth with the von Bertalanffy growth function: A comparison of three approaches of multivariate analysis of fish growth in aquaculture experiments. Aquac. Res. 2005, 36, 100–109. [Google Scholar] [CrossRef]
- Tengtrairat, N.; Woo, W.L.; Parathai, P.; Rinchumphu, D.; Chaichana, C. Non-intrusive fish weight estimation in turbid water using deep learning and regression models. Sensors 2022, 22, 5161. [Google Scholar] [CrossRef]
- Wang, G.; Hao, Y.; Li, D. Multimodal estimation of fish weight via two-stream Transformer and adaptive RGB-D fusion. Comput. Electron. Agric. 2026, 241, 111248. [Google Scholar] [CrossRef]
- Yu, X.; Wang, Y.; Liu, J.; Wang, J.; An, D.; Wei, Y. Non-contact weight estimation system for fish based on instance segmentation. Expert Syst. Appl. 2022, 210, 118403. [Google Scholar] [CrossRef]
- Saunders, K.; Vogiatzis, G.; Manso, L.J. Self-supervised monocular depth estimation: Let’s talk about the weather. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV); IEEE: New York, NY, USA, 2023; pp. 8907–8917. [Google Scholar]
- Yao, P.; Wang, Y.; Yang, D.; Liu, Q.; Yu, J. MAU-Depth: A multi-attention-based underwater lightweight self-supervised monocular depth estimation method. Intell. Mar. Technol. Syst. 2025, 3, 30. [Google Scholar] [CrossRef]
- Zhu, J.; Yin, S.; Liu, X.; Wang, X.; Yang, Y.-H. FishDetectLLM: Multimodal instruction tuning with large language models for fish detection. Knowl.-Based Syst. 2025, 318, 113418. [Google Scholar] [CrossRef]
- Ma, F.; Liu, X.; Xu, Z. A multimodal vision-based fish environment and growth monitoring in an aquaculture cage. J. Mar. Sci. Eng. 2025, 13, 1700. [Google Scholar] [CrossRef]
- Ding, M.; Li, G.; Hu, Y.; Liu, H.; Hu, Q.; Huang, X. Semi-supervised underwater image enhancement method using multimodal features and dynamic quality repository. J. Mar. Sci. Eng. 2025, 13, 1195. [Google Scholar] [CrossRef]
- Yang, C.; Zhou, P.; Zhang, D.-S.; Wang, Y.; Shen, H.-B.; Pan, X. FishAI 2.0: Marine fish image classification with multi-modal few-shot learning. arXiv 2025, arXiv:2509.22930. [Google Scholar]
- Sigillo, L.; Grassucci, E.; Comminiello, D. StawGAN: Structural-aware generative adversarial networks for infrared image translation. In 2023 IEEE International Symposium on Circuits and Systems (ISCAS); IEEE: Monterey, CA, USA, 2023; pp. 1–5. [Google Scholar]
- Fragkiadaki, K.; Zhang, G.; Shi, J. Video segmentation by tracing discontinuities in a trajectory embedding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 16–21 June 2012; pp. 1846–1853. [Google Scholar]
- Cui, J.; Tan, F.; Bai, N.; Fu, Y. Improving U-net network for semantic segmentation of corns and weeds during corn seedling stage in field. Front. Plant Sci. 2024, 15, 1344958. [Google Scholar] [CrossRef]
- Bauer, R. Region Proposal Network for Simple Objects in Grasping Experiments. Ph.D. Thesis, University of Stuttgart, Stuttgart, Germany, 2020. [Google Scholar]
- Tang, L.; Diao, S.; Li, C.; He, M.; Ru, K.; Qin, W. Global contextual representation via graph-transformer fusion for hepatocellular carcinoma prognosis in whole-slide images. Comput. Med. Imaging Graph. 2024, 115, 102378. [Google Scholar] [CrossRef]

















| Category | Description | Proportion (%) |
|---|---|---|
| Variety | Common goldfish (simple fins) | 30 |
| Fantail goldfish (double tail) | 30 | |
| Lionhead goldfish (head growth) | 20 | |
| Comet goldfish (long fins) | 20 | |
| Age/Size | Juvenile (5–10 cm, 10–50 g) | 50 |
| Adult (10–15 cm, 50–200 g) | 50 | |
| Health/Gender | Healthy males/females (balanced) | 90 |
| Mild variations (e.g., fin irregularities) | 10 |
| Category | Description | Proportion (%) |
|---|---|---|
| Density | Low (8–10 fish/tank) | 30 |
| Medium (11–13 fish/tank) | 40 | |
| High (14–16 fish/tank) | 30 | |
| Behavior Type | Schooling (group swimming) | 40 |
| Foraging (feeding movements) | 30 | |
| Resting/Isolated (minimal activity) | 20 | |
| Fin Display (ornamental flaring) | 10 |
| Category | Description | Proportion (%) |
|---|---|---|
| Lighting | Natural daylight | 40 |
| Artificial LED (500–1000 lux) | 40 | |
| Mixed (daylight + LED) | 20 | |
| Water Quality/Interference | Clear water | 60 |
| Mild turbidity (simulated particles) | 20 | |
| Decorated (plants/rocks) | 20 | |
| Other | Water currents (0.1–0.3 m/s) | All |
| Bubbles/ripples (occasional) | 50 |
| Category | Description | Quantity |
|---|---|---|
| Total Videos | 10 min videos at 30 FPS | 124 |
| Total Frames | Raw frames (RGB-IR-Depth pairs) | ∼2.2 M |
| Annotated Frames | Keyframes with bbox + masks | 40,000 |
| Clips | 3 s segments (90 frames/clip) | ∼160,000 |
| Augmentation | Geometric (rotations, flips) | 2× data |
| Types | Intensity (brightness adjustments) | 2× data |
| Domain-specific (water ripples) | 3× data | |
| Dataset Split | Training/Validation/Test | 70%/15%/15% |
| Model | MAE ↓ | RMSE ↓ | MAPE ↓ | ↑ | PR ↑ | mAP ↑ | IoU ↑ | Pre ↑ | Rec ↑ | F1 ↑ |
|---|---|---|---|---|---|---|---|---|---|---|
| Full | 0.85 | 0.38 | 0.55 | 0.961 | 0.977 | 0.924 | 0.873 | 0.955 | 0.934 | 0.948 |
| w/o CA | 0.89 | 0.41 | 0.59 | 0.958 | 0.973 | 0.922 | 0.869 | 0.945 | 0.931 | 0.936 |
| w/o USL | 0.92 | 0.45 | 0.62 | 0.956 | 0.968 | 0.919 | 0.866 | 0.935 | 0.929 | 0.928 |
| w/o TCI | 0.94 | 0.49 | 0.65 | 0.948 | 0.963 | 0.915 | 0.863 | 0.932 | 0.926 | 0.924 |
| w/o CMF | 0.96 | 0.51 | 0.68 | 0.945 | 0.961 | 0.908 | 0.857 | 0.926 | 0.923 | 0.921 |
| w/o DAL | 0.99 | 0.53 | 0.71 | 0.941 | 0.958 | 0.905 | 0.851 | 0.921 | 0.912 | 0.918 |
| Model | MAE ↓ | RMSE ↓ | MAPE ↓ | ↑ | PR ↑ | mAP ↑ | IoU ↑ | Pre ↑ | Rec ↑ | F1 ↑ |
|---|---|---|---|---|---|---|---|---|---|---|
| YM | 0.85 | 0.38 | 0.55 | 0.961 | 0.977 | 0.924 | 0.873 | 0.955 | 0.934 | 0.948 |
| Y5 | 1.24 | 0.45 | 0.75 | 0.911 | 0.893 | 0.855 | 0.789 | 0.872 | 0.851 | 0.870 |
| MR | 1.58 | 0.63 | 0.86 | 0.885 | 0.873 | 0.804 | 0.758 | 0.856 | 0.831 | 0.849 |
| BY | 1.05 | 0.75 | 0.64 | 0.924 | 0.915 | 0.885 | 0.806 | 0.901 | 0.883 | 0.894 |
| IP | 1.89 | 0.95 | 0.97 | 0.857 | 0.846 | 0.751 | 0.706 | 0.828 | 0.807 | 0.816 |
| VS | 1.41 | 0.72 | 0.75 | 0.890 | 0.883 | 0.826 | 0.768 | 0.864 | 0.841 | 0.855 |
| SV | 1.15 | 0.65 | 0.65 | 0.918 | 0.902 | 0.874 | 0.795 | 0.897 | 0.876 | 0.881 |
| IN | 1.62 | 0.87 | 0.89 | 0.873 | 0.862 | 0.786 | 0.731 | 0.847 | 0.825 | 0.839 |
| CT | 1.35 | 0.73 | 0.77 | 0.902 | 0.897 | 0.836 | 0.774 | 0.871 | 0.858 | 0.862 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Zhang, L.; Han, Y.; Song, G.; Wang, J.; Ma, P. Cost-Effective Fish Volume Estimation in Aquaculture Using Infrared Imaging and Multi-Modal Deep Learning. Sensors 2026, 26, 1221. https://doi.org/10.3390/s26041221
Zhang L, Han Y, Song G, Wang J, Ma P. Cost-Effective Fish Volume Estimation in Aquaculture Using Infrared Imaging and Multi-Modal Deep Learning. Sensors. 2026; 26(4):1221. https://doi.org/10.3390/s26041221
Chicago/Turabian StyleZhang, Like, Yanling Han, Ge Song, Jing Wang, and Ping Ma. 2026. "Cost-Effective Fish Volume Estimation in Aquaculture Using Infrared Imaging and Multi-Modal Deep Learning" Sensors 26, no. 4: 1221. https://doi.org/10.3390/s26041221
APA StyleZhang, L., Han, Y., Song, G., Wang, J., & Ma, P. (2026). Cost-Effective Fish Volume Estimation in Aquaculture Using Infrared Imaging and Multi-Modal Deep Learning. Sensors, 26(4), 1221. https://doi.org/10.3390/s26041221

