Deep Learning for Sustainable Aquaculture: Opportunities and Challenges

Wu, An-Qi; Li, Ke-Lei; Song, Zi-Yu; Lou, Xiuhua; Hu, Pingfan; Yang, Weijun; Wang, Rui-Feng

doi:10.3390/su17115084

Open AccessReview

Deep Learning for Sustainable Aquaculture: Opportunities and Challenges

by

An-Qi Wu

¹,

Ke-Lei Li

¹,

Zi-Yu Song

¹,

Xiuhua Lou

¹,

Pingfan Hu

²

,

Weijun Yang

^1,* and

Rui-Feng Wang

^1,3,*

¹

China Agricultural University, Haidian, Beijing 100083, China

²

Artie McFerrin Department of Chemical Engineering, Texas A&M University, College Station, TX 77843-3122, USA

³

National Innovation Center for Digital Fishery, China Agricultural University, Beijing 100083, China

^*

Authors to whom correspondence should be addressed.

Sustainability 2025, 17(11), 5084; https://doi.org/10.3390/su17115084

Submission received: 14 May 2025 / Revised: 23 May 2025 / Accepted: 27 May 2025 / Published: 1 June 2025

(This article belongs to the Special Issue Sustainable Aquaponic Systems and the Role of Deep Learning in Aquaculture)

Download

Browse Figures

Versions Notes

Abstract

With the rising global demand for aquatic products, aquaculture has become a cornerstone of food security and sustainability. This review comprehensively analyzes the application of deep learning in sustainable aquaculture, covering key areas such as fish detection and counting, growth prediction and health monitoring, intelligent feeding systems, water quality forecasting, and behavioral and stress analysis. The study discusses the suitability of deep learning architectures, including CNNs, RNNs, GANs, Transformers, and MobileNet, under complex aquatic environments characterized by poor image quality and severe occlusion. It highlights ongoing challenges related to data scarcity, real-time performance, model generalization, and cross-domain adaptability. Looking forward, the paper outlines future research directions including multimodal data fusion, edge computing, lightweight model design, synthetic data generation, and digital twin-based virtual farming platforms. Deep learning is poised to drive aquaculture toward greater intelligence, efficiency, and sustainability.

Keywords:

deep learning; sustainable aquaculture; fish detection and behavior analysis; water quality monitoring; intelligent feeding

1. Introduction

Aquaculture has emerged as a cornerstone of global food security and is currently one of the fastest-growing sectors in food production worldwide. It plays a vital role in supplying aquatic products, which serve as a major source of protein for the human population [1]. Over the past seven decades, global aquaculture output has expanded rapidly, establishing aquaculture as an essential economic activity in many countries [2]. According to data from the Food and Agriculture Organization (FAO), global aquaculture production reached 122.6 million tons in 2020. Asia, in particular, made a dominant contribution to this figure, accounting for 91.6% of the world’s production of aquatic animals and algae. Figure 1 illustrates the total aquaculture output value from 2013 to 2023 across the globe, in Asia, Europe, and the Americas, revealing an overall upward trend in the economic value of aquaculture over the past decade.

Despite its rapid global expansion, the aquaculture industry continues to face numerous challenges. In particular, the accelerating trend toward industrial-scale and intensive farming highlights the urgent need for scientifically planned spatial layouts and comprehensive end-to-end regulatory systems to ensure environmental sustainability [3]. The aquatic environment plays a critical role in maintaining the yield and quality of aquaculture products [4], with key parameters including nitrite nitrogen (

N O_{2}^{-}

), nitrate nitrogen (

N O_{3}^{-}

), pH, total nitrogen (

T N

), temperature, dissolved oxygen (

D O

), chemical oxygen demand (

C O D

), and ammonia nitrogen (

N H_{3} - N

) [5]. The nonlinear interdependence among meteorological factors and water quality indicators further complicates accurate parameter prediction, posing a significant challenge to precise modeling [6]. Additionally, large-scale production systems are often threatened by disease outbreaks, which can lead to severe economic losses [7]. Abnormal fish behaviors, frequently caused by disease, are difficult to detect due to high stocking densities, resulting in small, occluded targets in images that often lead to false detections or missed targets [8]. The real-time detection and tracking of such abnormal behaviors, along with rapid response mechanisms, are therefore critical for improving survival rates and economic returns. Furthermore, feed distribution is another key factor influencing farming cost control and efficiency. However, due to the highly dynamic and uncontrollable aquatic environment, achieving intelligent feeding through image recognition remains a major challenge in aquaculture [9]. These complexities highlight that aquaculture systems are essentially multi-dimensional, dynamic ecological systems involving technologies such as water quality monitoring, behavioral prediction, and species identification [10]. Although machine learning methods have been introduced to address the limitations of manual approaches, conventional machine learning still presents several drawbacks. For instance, while traditional SVM algorithms offer strong generalization for fish classification, they suffer from extended training times on large-scale datasets; background subtraction techniques can achieve high accuracy in motion analysis under stable conditions but perform poorly in complex scenes; and rule-based filtering systems effectively replicate fish trajectory activities but tend to be overly complicated and cumbersome [11,12,13,14].

Deep learning has been extensively applied across a wide range of domains owing to its exceptional capabilities in feature extraction and data processing [15]. Notable examples include object detection [16,17,18], protein structure prediction [19], video classification [20,21], and medical image segmentation [22,23]. With the ongoing advancement and broader adoption of deep learning technologies, the agricultural sector has similarly experienced a surge in successful applications, particularly in areas such as crop monitoring, disease diagnosis, and yield estimation [24,25,26]. For instance, Joshi et al. [27] employed BiLSTM as the base model and developed both instance-based two-stage TrAdaBoost.R2, parameter-based fine-tuning, and feature-based DANN methods for winter wheat yield prediction. Razavy et al. [28] proposed a rice classification approach based on ResNet50, capable of accurately identifying rice varieties and performing quality assessment. Subeesh et al. [29] introduced a deep learning and UAV image-based method for citrus yield prediction and developed a web-based application named “DeepYield”. Moreover, deep learning is increasingly being applied in aquaculture for tasks such as fish disease prediction, algal detection and counting, fish species identification, and water quality forecasting, achieving promising results in these areas [30,31,32,33].

With the growing body of research on deep learning in aquaculture and its widespread applications in fields such as computer vision, natural language processing, and medical diagnostics, the diversity of application scenarios has driven rapid innovation and iteration within deep learning frameworks. At present, deep learning architectures can generally be categorized into three main types: (1) deep networks for supervised or discriminative learning, such as Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs); (2) deep networks for unsupervised or generative learning, including Generative Adversarial Networks (GANs) and Autoencoders (AEs); and (3) hybrid learning networks that integrate elements of both approaches along with related architectures, such as Deep Transfer Learning and Deep Reinforcement Learning [34]. To further explore the application of deep learning techniques in aquaculture, we conducted a literature search on the Web of Science using “Aquaculture” in combination with the keywords “Deep Learning”, “CNN”, “RNN”, “LSTM”, “Transformer”, “GAN”, and “MobileNet”. The number of publications retrieved over the past five years for each keyword combination is presented in Figure 2.

This review aims to demonstrate the applicability of deep learning in aquaculture and to provide an in-depth analysis of its practical implementations. The remainder of the paper is organized into five sections: Section 2 introduces commonly used deep learning models and explains the core reasons for their suitability in aquaculture scenarios. Section 3 details the specific applications of deep learning in aquaculture, including fish species identification and counting, growth prediction and health monitoring, intelligent feeding systems, water quality monitoring and forecasting, and behavior recognition. Section 4 presents available datasets and open-source resources. Section 5 discusses the current challenges associated with applying deep learning in aquaculture and outlines potential future directions. Finally, Section 6 summarizes the applications of deep learning in aquaculture and concludes the paper.

2. Typical Deep Learning Models and Their Suitability of Aquaculture

Traditional aquaculture practices typically require substantial human labor and time investment, often resulting in low efficiency and high operational costs. To address these limitations, researchers have introduced conventional machine learning techniques into aquaculture systems [35]. However, these methods rely heavily on manually crafted features and domain-specific expertise, which significantly limits their adaptability to varying environments and hampers broader adoption [36]. In contrast, deep learning excels at uncovering complex patterns within high-dimensional data [37,38], thereby overcoming many of the inherent limitations of traditional approaches [39]. Moreover, once trained and optimized, deep learning models demonstrate notable advantages in processing unstructured data such as images, speech, and text, significantly improving both the efficiency and accuracy of data interpretation [40,41,42,43,44,45,46]. This opens new avenues for addressing many of the challenges currently faced by the aquaculture industry.

The Convolutional Neural Network (CNN), one of the most representative deep learning models, evolved from the Neocognitron proposed by Fukushima [47] and has since given rise to numerous variants. A typical CNN architecture consists of an input layer, alternating convolutional and pooling layers, one or more fully connected layers, activation functions, and an output layer [48]. CNNs are capable of recognizing stimulus patterns with robustness to minor variations in position or shape [47], and they have been widely applied in pattern recognition tasks with high accuracy [49,50]. In aquaculture production, the underwater environment is often complex, which significantly hinders the extraction of image features such as texture and shape [51]. The superior performance of CNNs in feature learning and image processing makes them particularly well suited for handling image data in aquaculture without the need for manual feature engineering.

Recurrent Neural Networks (RNNs) serve as a fundamental architecture for sequence modeling in deep learning and have demonstrated strong capabilities in natural language processing tasks such as speech translation and recognition [52]. However, standard RNN architectures face significant limitations in capturing long-range dependencies due to their constrained effective context window. Perturbations in the input signal propagate through recurrent connections and accumulate exponentially in the hidden state vector, resulting in either vanishing or exploding gradient norms over distant time steps—phenomena commonly referred to as the “vanishing gradient” and “exploding gradient” problems [53,54]. The introduction of Long Short-Term Memory (LSTM) networks effectively addresses these issues by enabling better handling of long-term dependencies [55,56]. As an improved variant of RNNs, LSTMs offer enhanced capabilities in modeling sequential data and capturing long-range dependencies [57]. Consequently, LSTM-based models exhibit strong predictive performance and accuracy, forming a robust foundation for underwater environmental forecasting and aquatic disease prediction [58].

Generative Adversarial Networks (GANs) employ a framework of adversarial training between a generator and a discriminator, enabling the extraction of deep implicit representations without the need for extensively labeled data. The learned representations can be extended to a wide range of applications, including image synthesis, semantic image editing, style transfer, image super-resolution, and classification tasks [59]. Aquatic organism image analysis often faces challenges stemming from unstructured environments and biological variability [60]. GANs are capable of generating realistic and diverse images, making them a valuable tool for developing high-performing deep learning models in scenarios, such as aquaculture, where large-scale labeled datasets are difficult to obtain [61]. Specifically, GAN-based models have proven effective in addressing underwater challenges such as color distortion and reduced contrast by enhancing image visibility, thereby improving overall model performance [62].

Transformer is a purely attention-based sequence transduction model first proposed by Vaswani et al. in 2017 [63], eliminating the need for recurrence and convolution while enabling faster training. Initially applied to natural language processing tasks, Transformer has demonstrated exceptional performance and has since shown great potential in computer vision applications as well [64]. Moreover, it has achieved promising results in agricultural research [65]. For example, Gong et al. [66] proposed a classification method based on Vision Transformer (ViT), which effectively handles images of varying resolutions and exhibits strong generalization capabilities. This approach addresses the limitations of traditional CNN architectures and delivers superior performance in classifying fish images across different resolutions. Mamba, a model introduced by Gu et al. in 2024 [67], offers fast inference and linear scalability with respect to sequence length, achieving high performance on real-world datasets with millions of sequences. The Mamba model represents a significant advancement in state space modeling (SSM) for sequence-processing tasks [68]. Given that current fish detection algorithms are highly susceptible to underwater environmental conditions, leveraging Mamba’s ability to model long-range dependencies allows for improved adaptability to complex fish detection tasks without increasing computational complexity [69,70].

MobileNet, a lightweight deep neural network model proposed by Howard et al. [71], was originally designed for mobile and embedded vision applications. It achieves parameter reduction by replacing standard convolutions with depthwise separable convolutions (DSC), significantly decreasing the number of learnable parameters required for feature extraction. Notably, the MobileNetV1 Bottleneck with Expansion architecture, based on the MobileNet framework, has demonstrated superior performance in fish freshness evaluation tasks [72]. Table 1 illustrates the advantages and applications scenarios of common deep learning models.

Deep learning offers significant advantages in aquaculture applications, particularly in handling complex data, enhancing automation, and improving farming efficiency. The core reasons underlying the suitability of deep learning for aquaculture are as follows:

Superior automatic feature extraction capability: In traditional machine learning workflows, classification tasks typically involve a sequence of stages, including preprocessing, feature extraction, feature selection, learning, and classification [73]. Among these, feature selection is particularly critical and often demands substantial domain expertise, which limits the effectiveness of conventional machine learning in handling natural data [37]. Fish species identification serves as a foundational task in aquaculture, fisheries, and aquatic environment monitoring, with its accuracy directly influencing the effectiveness of resource conservation and the scientific basis of water management decisions. Traditional identification methods are often costly and unsuitable for large-scale deployment. In contrast, deep learning networks eliminate the need for manual feature engineering in image processing tasks [48], significantly reducing training costs compared to expert-driven feature design. As a result, deep learning shows broad application potential in fisheries-related tasks, particularly in scenarios involving image, video, and spatial data analysis. For example, Banan et al. [74] propose a CNN-based identification method that demonstrates strong capabilities in phenotypic feature extraction for fish species recognition.
Efficient handling of unstructured data: Aquaculture production involves a substantial amount of unstructured data, particularly in relation to interdependent water quality parameters. Traditional machine learning approaches often lack robustness and long-term modeling capabilities when dealing with such data types [75]. Aquatic organisms are highly sensitive to physical and chemical factors such as dissolved oxygen, pH, and temperature; as a result, fluctuations in water quality directly impact production efficiency. Conventional water quality prediction methods often suffer from poor adaptability, low accuracy, and limited stability, making them unsuitable for long-term forecasting tasks. In contrast, attention-enhanced temporal models such as LSTM and Transformer are capable of effectively capturing the long-range temporal dynamics of water quality parameters. For example, Hu et al. [76] developed a deep LSTM-based model for water quality prediction, achieving 98.56% and 98.57% accuracy in pH and temperature forecasting, respectively. Furthermore, tasks such as fish behavior analysis, species recognition, and counting also involve the processing and interpretation of large volumes of unstructured data.
Real-time monitoring and rapid response: Real-time detection and monitoring are critical for advancing the intelligence and sustainability of aquaculture systems. However, achieving real-time performance requires both high responsiveness and minimal feedback latency, while contending with challenges such as imaging difficulties and the complexity of underwater object detection. Deep learning techniques are capable of extracting high-dimensional features and capturing deep information from data, as well as modeling nonlinear relationships. As a result, they have been widely applied in areas such as water quality monitoring and fish behavior tracking [77]. For instance, Hu et al. [78] propose an improved YOLO-V4 network that successfully enables the real-time detection of underwater feed pellets with higher accuracy and lower computational cost.

3. Main Applications of Deep Learning in Aquaculture

To systematically review the applications of deep learning in aquaculture, this paper focuses on the following five key areas:

1.: Fish detection, identification, and counting;
2.: Growth prediction and health monitoring;
3.: Intelligent feeding systems;
4.: Water quality monitoring and prediction;
5.: Behavioral recognition and stress analysis.

These five categories represent the major application domains of deep learning in aquaculture. An overview of these categories in the context of sustainable aquaculture is illustrated in Figure 3.

3.1. Fish Detection, Identification, and Counting

3.1.1. Recognition Based on Video or Image Data

Over the past few decades, traditional video and image recognition techniques for fish monitoring, identification, and counting have relied heavily on manual feature extraction and invasive, time-consuming sampling methods. However, in complex underwater environments, characterized by low contrast, variable lighting, water turbidity, and high-density occlusion, coupled with high morphological similarity among fish, variations in scales, and deformation due to motion, conventional object detection algorithms (e.g., Histogram of Oriented Gradients (HOG) and Scale-Invariant Feature Transform (SIFT)) and classical machine vision methods often fail to perform effectively [36,79].

In contrast, deep learning exhibits powerful nonlinear mapping and adaptive feature extraction capabilities, enabling more accurate object recognition and precise bounding box localization. These advantages make it particularly effective for processing underwater visual information [80]. In early research, Girshick et al. [81] proposed an underwater object detection method based on Region-Based Convolutional Neural Networks (R-CNNs), which enhanced detection performance by feeding numerous region proposals into a CNN and classifying the extracted features. Later, Girshick et al. [82] improved this architecture into Fast R-CNN, which achieved higher detection accuracy and faster processing speeds. Furthermore, Li et al. [83] applied Faster R-CNN to underwater image analysis in aquaculture. By combining a Region Proposal Network with a detection network that shares convolutional features, they improved segmentation performance and fish recognition accuracy while approaching real-time responsiveness.

In recent years, the YOLO (You Only Look Once) family of algorithms has made notable progress in underwater object detection. Sung et al. [79] developed a YOLO-based real-time fish detection method capable of recognizing fish even in low-light, noisy underwater images. Redmon et al. [84] demonstrated that YOLOv3 operates three times faster than SSD at the same resolution with comparable accuracy. Tong et al. [85] applied YOLOv5m to detect Pacific saury echo trajectories, achieving high training accuracy and supporting commercial fishing operations. Ouis et al. [86] validated the strong performance of YOLOv7 and YOLOv8 in fish species detection using sonar images, further demonstrating the applicability of deep learning in aquaculture.

As illustrated in Figure 4, deep learning-based fish counting typically employs either density estimation or object detection approaches. Yu et al. [87] proposed a Multi-branch Attention Network (MAN) that integrates multiple modules and attention mechanisms, outperforming traditional MCNN and CNN models in terms of generalization and stability. Zhao et al. [88] designed the LFCNet, which embeds the Ghost module to improve counting accuracy and robustness. Cai et al. [89] combined YOLOv3 with MobileNetV1 to achieve high-precision detection results. Ben Tamou et al. [90] enhanced video-based fish detection by utilizing a dual-branch Faster R-CNN architecture with shared RPN. Additionally, Patro et al. [91] implemented real-time fish detection in videos using a YOLOv5-CNN model trained on LabelImg-annotated datasets and deployed on Google Colab, outperforming R-CNN in performance. Furthermore, attention can be directed to our recent studies, including a YOLOv8n-based fish detection and counting model (unpublished), as well as a Mamba-enhanced architecture for improved fish tracking and detection [70] as illustrated in Figure 5.

In summary, deep learning demonstrates substantial potential in fish detection and counting using image and video data. It not only improves the accuracy of recognition and counting but also significantly enhances processing efficiency and real-time performance. However, current models still face challenges related to unstable accuracy and limited robustness under complex underwater conditions, such as fish size variability and environmental noise. Future research should focus on improving the stability and detection accuracy of models (e.g., Faster R-CNN) under conditions of low visibility, high fish overlap, and large interspecies size differences.

3.1.2. Multi-Fish Tracking and Behavior Analysis

The application of deep learning in fish tracking and behavior analysis has attracted growing attention in recent years, due to its superior performance in handling complex visual and temporal data. Traditional multi-fish tracking methods, which often rely on manual labeling and visual observation supplemented by computer vision, acoustic, or sensor-based technologies, suffer from high labor costs, low efficiency, poor accuracy, and limited adaptability, rendering them insufficient for modern aquaculture behavior monitoring needs [92]. By contrast, deep learning has been widely applied in fish tracking due to its powerful feature learning capabilities [93].

Integrating deep features with correlation filtering algorithms has proven effective for fish tracking. For example, the Hierarchical Convolutional Features (HCFs) method outperforms the conventional Kernelized Correlation Filter (KCF) tracker in both accuracy and robustness [94], highlighting the potential of deep learning in this domain. Danelljan et al. [95] replaced handcrafted features in SRDCF with CNN features to develop the DeepSRDCF algorithm, significantly improving tracking performance. Lai et al. [96] proposed Fast-CNT and its 3D-extended version, which integrates temporal and spatial information to enhance underwater multi-target tracking. In another study, Wang et al. [8] combined YOLOv5s with SiamRPN++ to perform multi-target tracking of fish, enabling the monitoring of flipping behavior for disease diagnosis with high detection accuracy. The tracking workflow based on deep learning is illustrated in Figure 6.

Sun et al. [97] developed a lightweight zebrafish multi-target tracking system based on deep CNNs, employing CNNs and facial recognition principles to identify positions and determine movement trajectories with improved accuracy and speed. Additionally, Liu et al. [98] proposed an underwater fish school tracking method using the FSTA algorithm, which showed superior performance in handling complex underwater conditions (Figure 7a).

In the domain of behavior analysis, deep learning also excels due to its ability to automatically extract features. As shown in Figure 7b, Hu et al. [99] designed a YOLOv3-Lite + MobileNetv2 model to establish a low-cost, non-invasive, and automated behavior monitoring system that outperformed models like Faster R-CNN and YOLOv2. Wang et al. [100] proposed the DSC3D network, which successfully recognized behaviors such as normal swimming, feeding, fear, hypothermia, and hypoxia, achieving an average accuracy of 95.79%, thereby supporting intelligent decision-making in aquaculture. Han et al. [101] further improved behavior classification by combining CNNs with a spatiotemporal attention mechanism.

Overall, deep learning has made significant strides in sustainable aquaculture, greatly enhancing the accuracy and practical potential of fish tracking and behavior analysis. However, several challenges remain:

Complex underwater environments: Low visibility, scattering, and absorption degrade image quality and impair detection performance. Future work should focus on underwater image enhancement and restoration, along with targeted preprocessing algorithms to improve model adaptability.
Severe occlusion: High fish density results in substantial body overlap, affecting recognition and tracking. Solutions may include scale normalization, optimized camera angles, and structural improvements to YOLO and Faster R-CNN for enhanced robustness and precision.
Unstable detection accuracy: Existing models struggle with reduced accuracy and efficiency in scenarios involving large size variations, frequent movement, or occlusion. Leveraging large-scale datasets, future research should explore semi-supervised or unsupervised learning to optimize model architectures.
High equipment costs: The high cost of deep learning-related equipment limits adoption in small- and medium-sized farms. Developing lightweight models and low-cost hardware solutions is crucial for promoting widespread application.

In conclusion, deep learning-powered fish tracking and behavior analysis technologies are increasingly becoming indispensable tools in intelligent aquaculture. Continued efforts to enhance their stability, adaptability, and cost-effectiveness will be key to achieving broader implementation.

3.2. Growth Prediction and Health Monitoring

Growth performance in fish is directly linked to the economic efficiency of aquaculture operations. Traditional methods for obtaining fish length and weight typically rely on manual measurements, which are labor intensive, time consuming, and may cause physical harm to the fish [102]. To address these limitations, researchers have explored sensor-based, non-manual monitoring approaches. For instance, Endo et al. [103] developed a wireless enzymatic biosensor system for glucose monitoring, enhancing the timeliness and accuracy of aquaculture health diagnostics. However, such wearable sensing systems are often expensive, operationally complex, and may still inflict irreversible damage on the fish.

As an alternative, computer vision techniques have emerged as a non-invasive, low-cost, and sustainable solution, widely applied in growth monitoring [104]. For example, Yang et al. [105] built regression models based on CNNs, using VGG-11, ResNet-18, and DenseNet-121 to directly predict fish weight from images without the aid of a ruler, achieving high predictive accuracy. Additionally, Yoshida et al. [106] developed a sea cucumber growth monitoring system integrating underwater time-lapse cameras with deep learning image analysis, using the DeepLabV3+ model for semantic segmentation and subsequent individual weight estimation. Chirdchoo et al. [107] applied deep neural networks to estimate the live weight of Pacific white shrimp in natural ponds, converting traditional feed trays into imaging tools and training an artificial neural network (ANN) on the image features, achieving a prediction accuracy of 94.50%.

The integration of computer vision with deep learning has demonstrated strong accuracy and feasibility in weight prediction for fish and other aquatic organisms, offering a viable alternative to traditional approaches and promoting intelligent and non-invasive aquaculture management.

Health monitoring is another critical aspect of aquaculture. Accurate disease prevention and control are essential not only for ensuring production yield and product quality but also for contributing to global food security [108]. Traditional fish disease identification systems rely heavily on expert knowledge [109], resulting in high costs, low efficiency, and poor generalizability. With advances in image processing and deep learning, automated image-based fish disease recognition systems have gained attention due to their non-invasive, low-cost, real-time, and environmentally friendly attributes [110].

For example, Raj et al. [111] proposed an optimized ERCN model for classifying multiple shrimp diseases by integrating spatial and temporal image features and fusing multi-layer information. Their model outperformed conventional CNN, RNN, LSTM, GRU, and VGG16 architectures, achieving classification accuracies exceeding 90%. Wang et al. [112] developed the DCW-YOLO model, introducing an NWD loss function and the C2f-D-LKA module to enhance dense object detection capabilities. In underwater experiments, it achieved mAP50 and precision scores of 96.87% and 95.46%, respectively—significantly outperforming YOLOv10, and demonstrating real-time performance and practical value. Figure 8a illustrates the effects of their fish detection model.

In terms of phenotypic and organ recognition, Chai et al. [113] proposed the CSHT-Net for keypoint detection in zebrafish larvae, addressing the limitations of conventional heatmap methods in capturing continuous anatomical features (Figure 8b). The model achieved an average precision of 83.2% and average recall of 85.8%.

Moreover, single-modality image features are often insufficient to fully characterize disease states. Multimodal data fusion has emerged as an effective approach to improve diagnostic accuracy. Huang et al. [114] developed a disease early-warning system integrating YOLOv8, ByteTrack, LSTM, and a Fuzzy Inference System. By combining water quality parameters, external appearance, and behavioral features, the system predicted largemouth bass disease onset with an accuracy of 94.08%, significantly improving the responsiveness and stability of early warnings. To evaluate the predictive performance of the LSTM model, the study conducted comparative experiments against SVM and Logistic Regression models. The results showed that LSTM reduced training loss by 8.26% and 6.23% compared to SVM and Logistic Regression, respectively. In terms of testing loss, LSTM achieved reductions of 47.98% and 19.93% relative to the two models. Furthermore, the overall test accuracy of the LSTM model improved by 19.79% over SVM and by 6.89% over Logistic Regression. These findings indicate that the LSTM model demonstrates superior predictive and classification performance for the given task, significantly outperforming traditional machine learning models.

In summary, deep learning has significantly advanced fish growth prediction and disease diagnosis, offering improved efficiency and accuracy over traditional approaches. However, challenges remain in deploying these models in complex underwater environments:

Limited cross-species generalization: Current models are often trained on data from specific species, limiting their applicability to others due to phenotypic differences and inconsistent data distributions.
Sensitivity to imaging quality: Variations in underwater lighting and turbidity affect image quality and, consequently, diagnostic accuracy.
High training resource requirements: Deep learning models demand large volumes of high-quality data and stable training conditions, incurring high costs, long development cycles, and sensitivity to environmental factors.
Lack of multimodal data fusion: Existing approaches mainly focus on visual data. Future systems should integrate images, water quality, sensor readings, and genomic information to build multi-dimensional decision-making models for more comprehensive and accurate disease diagnosis.

3.3. Intelligent Feeding Systems

Feeding behavior is closely tied to the production efficiency and cost control of aquaculture operations. A well-designed feeding system is widely regarded as a key strategy for improving economic outcomes in fish farming [115]. Traditional feeding approaches typically rely on fixed schedules and farmers’ experience, with feed dispensed in fixed quantities by automated systems [116]. However, such strategies often fail to match the actual feeding needs of fish, resulting in underfeeding or overfeeding, which leads to feed waste, water pollution, or growth inhibition [75], ultimately hindering sustainable development.

To achieve precision feeding, researchers have begun incorporating optical and acoustic sensors alongside computer vision and artificial intelligence techniques to analyze feeding behavior [117,118]. In the field of traditional image processing, Atoum et al. [119] proposed a method that identifies optimal local regions from video frames and applies an SVM classifier for intelligent feeding control. However, this approach depends on manual feature extraction and struggles to detect complex feeding patterns.

Conventional machine learning techniques rely heavily on handcrafted features, making them insufficient for capturing the intricacies of fish feeding behaviors. With the rise of deep learning, recent research has shifted toward automatic feature extraction and behavior modeling. Liu et al. [120] proposed a dual-branch model combining attention-enhanced ResNet and an improved ConvNeXt, named ResNet-MoVIT-ConvNeXt. This model utilizes MobileViT for multi-level feature fusion to accurately recognize varying feeding intensities, and supports a dynamic feeding strategy that incorporates biomass and water quality data. Zhang et al. [121] enhanced MobileNetV3 by integrating the MSIF module to improve the model’s robustness against background noise. Hu et al. [78] designed a modified YOLOv4-based model for detecting uneaten feed, while Zhang et al. [122] developed the FFishNet-YOLOv8 network by incorporating Skip-ASFF and Joint-IoU mechanisms to improve the precision of feeding fish identification in pond environments. Figure 9a–e illustrate instances where the model successfully detects all fish exhibiting feeding behavior. In cases (a) through (d), the detected targets have traditional IoU values below 50%, which would typically result in their misclassification as non-feeding individuals under conventional evaluation metrics. However, with the introduction of the proposed Joint-IoU metric, these individuals are accurately identified as feeding fish, demonstrating the method’s robustness in handling targets with incomplete boundary overlap. In contrast, the targets shown in Figure 9f–h also have IoU values below 50% but do not meet the criteria for feeding behavior under the Joint-IoU metric. These are correctly classified as non-feeding fish, consistent with manual annotations, further validating the effectiveness and discriminative accuracy of the proposed approach.

Beyond image-based methods, acoustic technologies have also shown promise in analyzing feeding behaviors. Smith et al. [123] observed that tiger shrimp emit impact sounds during feeding, which can serve as proxy indicators. Cui et al. [124] constructed the AFFIA3K audio dataset and used Mel spectrograms as CNN inputs to classify feeding intensity. Huang et al. [125] analyzed tilapia feeding behavior by integrating spectrograms with a VGG16 and attention fusion module, achieving 94.37% classification accuracy.

To address the vulnerability of unimodal methods to environmental noise, Gu et al. [126] proposed a Multimodal Fusion Interactive Network (MMFINet), which combines audio, video, and dissolved oxygen data. The model employs a lightweight depthwise separable convolution feedforward structure to reduce complexity and achieved an accuracy of 97.6% on a multimodal fish feeding dataset.

In summary, deep learning approaches based on image and acoustic information have shown considerable promise in fish feeding behavior recognition and intelligent feeding system development. However, the current research still faces several limitations:

Limited generalization: Most models are trained in specific scenarios and exhibit performance degradation under complex real-world conditions such as poor lighting or cluttered backgrounds.
Challenges with multimodal integration: While unimodal methods (image or acoustic) are easily disrupted, multimodal systems introduce increased training difficulty and architectural complexity.
Limited robustness: Many models focus solely on feeding behavior and lack mechanisms to respond to abnormal conditions such as disease onset.
Insufficient adaptability to individual variability: Feeding behavior varies among individual fish, making current models highly specialized with limited generalizability.

Future research should focus on improving model adaptability and robustness under diverse environmental conditions, across multiple species, and using heterogeneous data sources. These efforts will be critical for advancing intelligent feeding systems toward practical, scalable, and widely applicable solutions in sustainable aquaculture.

3.4. Water Quality Monitoring and Prediction

Water quality management is a critical component of aquaculture systems, directly affecting fish health and growth efficiency [127]. Traditional water quality monitoring relies heavily on manual sampling and laboratory testing, which are time consuming, labor intensive, and lack real-time responsiveness [128]. In recent years, accurate prediction of water quality parameters has become a key research focus. Traditional machine learning methods, such as SVM and Least Squares SVM (LSSVM), have been widely applied to water quality prediction tasks with promising results [129]. However, due to the highly nonlinear and time-varying nature of water quality data—often influenced by factors such as stocking density, feeding schedules, and environmental conditions—these models struggle to capture complex, multi-source feature interactions.

To overcome these limitations, researchers have increasingly adopted deep learning techniques to extract high-level representations from unstructured data. For example, Hu et al. [76] proposed an LSTM-based model that achieved prediction accuracies of 98.56% for pH and 98.97% for temperature in short-term forecasting tasks. Arasu et al. [130] developed the AquaSense framework, which integrates multiple sensors to collect temperature, pH, dissolved oxygen, and salinity data. Using a DiCNN-BiLSTM architecture, the model achieved 96.49% accuracy, with DiCNN offering an enhanced receptive field suitable for abstracting complex features.

Further developments include hybrid deep learning models such as CNN-LSTM and CNN-GRU, proposed by Haq et al. [6], which outperforms baseline models in both accuracy and computational efficiency. Gandh et al. [33] incorporated attention mechanisms into LSTM and GRU structures (A-LSTM and A-GRU), improving the focus on critical temporal features. Building upon these advances, Ma et al. [131] introduced the IPSO-CNN-GRU-TAM model, which uses CNN for primary feature extraction, GRU for capturing temporal dependencies, and TAM to emphasize important time-series signals. The model architecture is shown in Figure 10.

In addition, Song et al. [132] incorporated meteorological data by designing a Dual Encoder Cross-source Feedback Network (DECSF-Net), which uses a bidirectional feedback mechanism to enhance the modeling interaction between water quality and weather variables. The model outperformed mainstream approaches such as Transformer in 8-h prediction tasks, although extreme weather events still pose challenges to predictive accuracy. Figure 11a–d compares the predicted values of various models against actual observations for parameters including dissolved oxygen (DO), pH, temperature, and salinity. In the water quality prediction task, the proposed DECSF-Net model outperformed other benchmark models across multiple evaluation metrics, demonstrating well-balanced performance between long-term trend accuracy and short-term fluctuation sensitivity. Notably, DECSF-Net exhibited a clear advantage in handling abrupt changes, effectively capturing anomalous fluctuations. In contrast, the DLinear model performed well in predicting stable trends but lacked responsiveness to sudden variations; the Autoformer model showed considerable volatility in its predictions, resulting in poor overall stability; and while the iTransformer and Peri-midFormer models achieved relatively strong predictive performance across most time steps, they still exhibited localized deviations. These results further validate the superior adaptability and predictive stability of DECSF-Net under complex and dynamic environmental conditions.

With the advancement of Internet of Things (IoT) technologies, real-time monitoring devices can now autonomously collect multi-dimensional data and integrate with deep learning models to enable intelligent water quality management [10]. Arepalli et al. [133] developed a smart monitoring system using an SSA-LSTM model for classifying dissolved oxygen levels, achieving 99.8% accuracy in hypoxia detection. In a subsequent study, Arepalli et al. [134] proposed the DSTCNN architecture, incorporating dilated spatiotemporal convolution layers and a hybrid activation function (ReLU + Sigmoid), which enhanced the model’s ability to capture spatiotemporal patterns and improved generalization while mitigating overfitting.

In summary, deep learning models, particularly LSTM and GRU-based architectures, have demonstrated outstanding performance in modeling the nonlinear and temporal characteristics of water quality parameters. When combined with IoT technologies, they enable efficient, real-time monitoring solutions. Nevertheless, several challenges remain:

Difficulty in high-dimensional data fusion: Water quality is influenced by numerous internal and external factors, resulting in high-dimensional, unstable datasets that current models struggle to integrate and interpret efficiently.
Complexity in data acquisition: Variability across aquaculture environments makes it difficult to obtain consistent and reusable datasets.
High computational demands: Deep learning models require significant training and inference time, limiting their responsiveness to rapid changes in water quality.
Limited cross-regional adaptability: Existing models often fail to generalize across different water bodies, climates, and cultured species, necessitating large volumes of locally sourced data for retraining, which increases deployment costs and development cycles.

3.5. Behavioral Recognition and Stress Analysis

Fish behavior serves as a vital indicator of both physiological status and environmental changes, and the acquisition and analysis of behavioral information have become essential tools for assessing ecosystem health and fish welfare. Typical behavioral patterns include feeding, abnormal, and swimming behaviors [135]. Traditional behavior recognition methods rely heavily on manual observation and annotation, which are inefficient and subject to human bias. The development of computer vision has introduced efficient and non-invasive solutions for automated fish behavior detection.

Early studies primarily applied machine learning techniques to identify abnormal behaviors based on trajectory analysis. For instance, Beyan et al. [136] employed clustering methods to classify fish trajectories in underwater videos. However, such approaches depend on handcrafted features and are often ineffective in adapting to complex aquaculture environments or variations in image quality [137].

With the rise of deep learning, researchers have increasingly leveraged its advantages in automatic feature extraction and the recognition of complex behavioral patterns. For example, Zhao et al. [138] proposed a Deep Network for Abnormality model based on enhanced motion influence maps and RNNs, which achieved the high-precision detection of three local abnormal behaviors in intensive aquaculture settings, with accuracies of 98.91%, 91.67%, and 89.89%, respectively. Hu et al. [99] built a low-cost imaging platform for mixed-species systems and implemented behavior recognition using a YOLOv3-Lite network. Du et al. [137] combined ResNet50 with LSTM to accurately identify five reproductive behaviors during fish spawning, achieving a recognition accuracy of 98.52% and demonstrating strong model robustness.

To address the challenge of detecting abnormal behavior in individual fish, Li et al. [104] proposed the BCS-YOLOv5 network (Figure 12), integrating Bi-directional Feature Pyramid Networks (BiFPNs), Coordinate Attention (CA), and Spatial Pyramid Pooling (SPP). This architecture enhances spatial localization and abnormal behavior recognition, making it suitable for real-time monitoring. Wang et al. [8] developed an improved YOLOv5s network that fuses multi-layer features to identify abnormal behaviors in Pagrus major. However, its single-object tracking approach results in high computational costs in multi-object scenarios. Xu et al. [139] introduced a DT-YOLOv5-based sea cucumber behavior analysis framework with coordinate matching and trajectory reconstruction. The model incorporates BiFPN and CA for improved feature representation and supports automatic trajectory generation and motion quantification (Figure 13).

Moreover, monitoring and managing stress responses in fish through behavioral indicators can help minimize health risks associated with stress [140]. To quantify stress through multi-behavioral features, Mei et al. [141] proposed a knowledge-distillation-based stress recognition model. The system employs GhostNet to mimic ResNeXt101 outputs and integrates a hybrid relative loss function to enhance performance. Even under limited computational resources, the model achieves high accuracy, offering a practical solution for intelligent behavior recognition.

In summary, the application of deep learning in fish behavior analysis has evolved from single-behavior recognition to multi-dimensional anomaly detection and stress state assessment. Nonetheless, several challenges persist in real-world deployment:

Lack of data resources: The scarcity of high-quality, annotated datasets for abnormal fish behaviors hinders model generalization.
High behavioral complexity: Abnormal behaviors manifest in diverse ways, including movement variations, reduced feeding, and color changes, which are difficult to model comprehensively using a single architecture.
Weak modeling of group behavior: In intensive aquaculture systems, severe occlusion among individuals complicates instance segmentation and interaction modeling.
Real-time processing burden: High-frame-rate video data impose heavy computational demands, limiting scalability in large-scale deployment scenarios.

4. Datasets, Data Augmentation, and Model Training Strategies

To build robust and efficient deep learning models for aquaculture, researchers increasingly rely on publicly available datasets and open-source toolkits as foundational resources [36]. This section provides a systematic overview of commonly used aquaculture datasets, discusses prevailing data augmentation and synthetic data generation strategies, and outlines typical training pipelines, including the use of transfer learning in aquaculture applications.

4.1. Public Aquaculture and Underwater Vision Datasets

In recent years, research in aquaculture and underwater computer vision has become increasingly reliant on high-quality public datasets to support model development and evaluation. Fish4Knowledge is one of the earliest large-scale underwater video datasets, comprising approximately 700,000 underwater video clips with over 100 h of footage primarily captured in coral reef environments. It has been widely used in studies involving fish image recognition, object detection, and behavior tracking [142]. The LifeCLEF-Fish Task, part of the LifeCLEF challenge series, includes thousands of high-resolution static images covering hundreds of fish species. These images are annotated at the genus and species levels, supporting fine-grained classification and retrieval tasks [143,144,145]. The NCFM dataset, originating from a Kaggle-hosted competition organized by The Nature Conservancy, contains approximately 3777 fish images and is suitable for training and evaluating classification and recognition models [146]. The Fish Recognition Ground Truth (FRGT) dataset focuses on ground truth labeling for fish recognition. It includes around 10,000 high-resolution underwater images across 15 common aquaculture and wild fish species, with pixel-level segmentation masks and detailed species labels. This dataset is widely used for evaluating algorithms in classification, object detection, and semantic segmentation tasks [147]. A comparison of these datasets is presented in Table 2.

4.2. Data Acquisition and Augmentation Methods

Underwater images often suffer from quality degradation issues such as color distortion, blur, and low contrast. Therefore, high-quality data acquisition and effective augmentation strategies are crucial for achieving optimal model performance [148]. In the data collection phase, researchers commonly use remotely operated vehicles (ROVs), stationary underwater cameras, and fish tracking devices to capture raw images and videos. After initial acquisition, datasets are cleaned and annotated using semi-automatic tools such as LabelMe or CVAT, followed by expert review to remove blurred, occluded, or mislabeled samples.

To improve image quality, preprocessing techniques such as histogram equalization and underwater color correction are frequently applied [149,150,151]. When annotated sample sizes are insufficient, GANs are widely employed to augment training samples [61]. Additionally, geometric and photometric augmentation techniques—including random cropping, rotation, flipping, Gaussian noise injection, and color jittering—are used to enhance model generalization across diverse visual conditions [89].

4.3. Typical Training and Transfer Learning Pipelines

Model training for intelligent aquaculture systems typically follows a structured pipeline consisting of pretraining, fine-tuning, domain adaptation, and multimodal fusion [152]. During the pretraining phase, classification tasks often employ backbone networks pretrained on ImageNet, while detection tasks utilize weights initialized from models trained on the COCO dataset [32,153].

Fine-tuning is then conducted on aquaculture-specific datasets, typically by freezing the initial layers of the backbone network and updating only the deeper layers or classification/detection heads with a higher learning rate [75,154].

To improve robustness and predictive accuracy in real-world applications, researchers have explored the integration of vision models (e.g., YOLOv8) with temporal sequence models such as LSTM/GRU and Fuzzy Inference Systems. These multimodal architectures have shown promising results in tasks such as disease early warning and intelligent feeding, offering synergistic performance across visual and temporal modalities [114,155].

5. Discussions

5.1. Current Challenges

Despite the significant progress achieved by deep learning in visual perception tasks for sustainable aquaculture, several challenges and limitations continue to hinder its practical deployment.

Difficulty in acquiring high-quality annotated data and data imbalance: The development of high-accuracy deep learning models critically depends on large-scale, precisely annotated datasets [65]. However, in aquaculture environments, data collection is costly, and image annotation is labor intensive, requiring domain-specific expertise. Consequently, annotated samples are often limited, and there is a severe imbalance in the number of samples across different fish species or behavior categories. This not only constrains the learning capacity of models but also introduces class bias, reducing accuracy in recognizing underrepresented classes [156].

Limited model generalization and overfitting risk: Deep neural networks are prone to overfitting to the feature distribution of specific datasets during training, resulting in poor performance when applied to unseen regions or novel scenarios. This issue is particularly pronounced in sustainable aquaculture due to complex environmental backgrounds, species diversity, and significant posture variability. As a result, models often struggle to maintain stable recognition performance in real-world applications [75].

Conflict between real-time requirements and computational constraints: Many aquaculture applications, such as abnormal behavior monitoring and real-time feeding control, demand rapid model inference [157]. However, most high-accuracy models are computationally intensive, lacking lightweight optimization, and exhibit strong dependency on hardware resources such as GPUs, memory, and power. This limits their feasibility for deployment on edge devices or low-power platforms, constraining real-time applications [158].

Image degradation and severe occlusion in underwater environments: Underwater imagery often suffers from quality degradation caused by light scattering, suspended particles, and illumination variability. These factors result in blurred, color-shifted, and low-contrast images. Furthermore, fish often occlude each other during schooling behavior, increasing the difficulty of target detection and behavior recognition [70,159,160]. Such conditions reduce input quality and impose stricter robustness requirements on vision models.

Weak cross-domain and cross-species transferability: Most existing deep learning models are trained on data specific to particular environments or species. When transferred to different water bodies or fish populations, they often exhibit significant performance degradation [161]. This stems from differences in fish appearance, movement patterns, and background environments, revealing a lack of domain generalization and transfer learning capabilities [160]. There is an urgent need for the development of algorithms with stronger domain adaptability.

5.2. Future Directions and Perspectives

As the application of deep learning in aquaculture continues to expand, researchers are becoming increasingly aware of its limitations in generalization, real-time performance, and adaptability to diverse scenarios. To further advance intelligent aquaculture, future research can be directed toward the following key areas.

Multimodal data fusion and cross-modal learning: Single-modality data (e.g., images) are insufficient to comprehensively represent dynamic processes in complex aquaculture environments. Future studies should focus on deeply integrating image, acoustic, water quality (e.g., dissolved oxygen, pH, and ammonia nitrogen concentration), and behavioral data into unified multimodal representation frameworks. This approach would enhance the system’s capability for holistic fish health assessment and anomaly detection. Additionally, the introduction of cross-modal transfer learning and alignment mechanisms could help address issues related to missing or low-quality modalities, thereby improving model robustness and generalizability [162]. To further enhance the generalization capability of the model, future research should prioritize the integration and advancement of domain adaptation techniques, particularly by aligning the data distributions between source and target domains to achieve high-precision recognition in heterogeneous environments. Additionally, multi-source information fusion strategies may be explored by jointly modeling image data with environmental sensor inputs such as temperature, dissolved oxygen, and pH levels. Leveraging multimodal learning mechanisms can strengthen the model’s ability to comprehensively perceive and discriminate complex environmental features. Furthermore, the adoption of few-shot learning and incremental learning approaches holds promise for significantly improving model robustness and scalability under conditions of limited training data or gradually evolving environmental states.

Integration of edge computing and intelligent IoT platforms: Due to constraints in computational power and energy consumption, conventional deep learning models are difficult to deploy on edge devices. At the deployment level, edge computing is expected to play a pivotal role in the advancement of intelligent aquaculture systems. By deploying computationally capable edge devices (e.g., NVIDIA Jetson Nano, Raspberry Pi) directly at aquaculture sites, localized data processing and real-time responses can be achieved, significantly reducing reliance on remote servers and network bandwidth while enhancing system responsiveness and stability. Concurrently, greater emphasis should be placed on the coordinated design of internal system modules to develop a highly integrated architecture that unifies data acquisition, model inference, decision-making, and remote monitoring. Beyond image analysis, research on edge detection algorithms must be deepened to improve the accuracy of fish contour extraction and behavior recognition. Advanced techniques such as Holistically-Nested Edge Detection (HED) and Richer Convolutional Feature (RCF) networks may be employed to enhance the model’s perception of fish-related details in complex farming environments. At the system level, it is also essential to develop visualization management platforms and human–machine interfaces that are tightly aligned with deep learning models, enabling intuitive presentation of inference results and thereby increasing the practicality and user acceptance of the system in real-world aquaculture management. By combining edge computing, Internet of Things (IoT) technologies, and low-power sensors [163], future systems can enable localized intelligent perception and decision-making, reducing data transmission loads and latency—critical for real-world aquaculture scenarios that require low-latency and high-reliability operations. Furthermore, incorporating blockchain technology to construct trusted data traceability mechanisms for aquaculture will enhance data security and production transparency.

Lightweight and efficient model design: To improve the deployability of deep learning models on embedded devices, future efforts should prioritize the development of streamlined architectures with low computational requirements, such as the MobileNet series [164,165], lightweight YOLO variants [166], ShuffleNet [167], and MobileViT models [168,169,170]. In addition, model compression techniques—such as knowledge distillation, pruning, and quantization—should be applied to significantly reduce model complexity while maintaining accuracy, thereby enabling efficient deployment and real-time inference.

Synthetic data generation and self-supervised learning strategies: Given the scarcity of labeled samples and the high cost of manual annotation in aquaculture, synthetic data generation (e.g., GAN-based image augmentation and simulation) and self-supervised learning (e.g., contrastive learning and masked prediction) offer promising solutions to alleviate data dependency [171,172]. In tasks such as fish pose estimation and underwater behavior monitoring, the high cost of data collection and the complexity of annotation processes result in a severe shortage of high-quality samples for training deep learning models. To address this limitation, future research may focus on several strategies: first, incorporating transfer learning to leverage pretrained models from related domains, thereby enabling knowledge transfer and rapid adaptation; second, applying data augmentation techniques (such as image rotation, flipping, and illumination variation) to expand the training dataset and enhance model generalization; and third, employing generative approaches like GANs to synthesize realistic images, thereby increasing sample diversity and model robustness. Moreover, the further exploration of high-fidelity underwater image simulation engines and domain adaptation techniques is necessary to improve the transferability and practical applicability of synthetic data in real-world scenarios.

Digital twin and virtual simulation platform development: Digital twin technologies create real-time mappings between physical and digital systems, allowing for real-time modeling, forecasting, and control within virtual environments [173]. By leveraging visual perception systems and multi-source sensor data, future research could develop virtual aquaculture training and testing platforms to support efficient algorithm validation, policy optimization, and remote decision-making, driving aquaculture toward more precise, controllable, and intelligent operations.

6. Conclusions

This paper presents a comprehensive review of the major applications of deep learning in sustainable aquaculture, encompassing fish detection and counting, growth prediction and health monitoring, intelligent feeding, water quality forecasting, and behavior analysis. Leveraging its strengths in feature extraction and nonlinear representation, deep learning has markedly enhanced the automation, accuracy, and responsiveness of aquaculture systems, progressively transforming traditional experience-based practices. However, several practical challenges remain unresolved, including data scarcity, class imbalance, overfitting, computational limitations, and limited cross-regional generalization. These issues are particularly pronounced in complex underwater environments, where image degradation, variable lighting, and fish occlusion frequently impair model performance.

To facilitate the green and low-carbon transformation of aquaculture, future research should emphasize multimodal data fusion, integration with edge computing and blockchain, lightweight model optimization, synthetic data generation, and the implementation of self-supervised learning frameworks. In addition, the development of digital twin platforms for virtual aquaculture simulation will offer promising solutions for intelligent monitoring, remote management, and predictive decision-making. These advancements will not only help overcome the limitations of current models but also establish an intelligent, scalable, and energy-efficient foundation for sustainable aquaculture. Overall, deep learning is catalyzing a paradigm shift in aquaculture—from empirical practices toward a data-driven, sustainable intelligence paradigm.

Author Contributions

Conceptualization, P.H., W.Y. and R.-F.W.; methodology, X.L., P.H., W.Y. and R.-F.W.; formal analysis, A.-Q.W., K.-L.L. and Z.-Y.S.; investigation, A.-Q.W., K.-L.L., Z.-Y.S. and X.L.; resources, P.H., W.Y. and R.-F.W.; data curation, A.-Q.W. and K.-L.L.; writing—original draft preparation, A.-Q.W., K.-L.L. and Z.-Y.S.; writing—review and editing, P.H., W.Y. and R.-F.W.; visualization, A.-Q.W. and Z.-Y.S.; supervision, P.H. and R.-F.W.; project administration, R.-F.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data will be made available on request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Godfray, H.C.J.; Beddington, J.R.; Crute, I.R.; Haddad, L.; Lawrence, D.; Muir, J.F.; Pretty, J.; Robinson, S.; Thomas, S.M.; Toulmin, C. Food security: The challenge of feeding 9 billion people. Science 2010, 327, 812–818. [Google Scholar] [CrossRef] [PubMed]
Valenti, W.C.; Barros, H.P.; Moraes-Valenti, P.; Bueno, G.W.; Cavalli, R.O. Aquaculture in Brazil: Past, present and future. Aquac. Rep. 2021, 19, 100611. [Google Scholar] [CrossRef]
Naylor, R.L.; Hardy, R.W.; Buschmann, A.H.; Bush, S.R.; Cao, L.; Klinger, D.H.; Little, D.C.; Lubchenco, J.; Shumway, S.E.; Troell, M. A 20-year retrospective review of global aquaculture. Nature 2021, 591, 551–563. [Google Scholar] [CrossRef] [PubMed]
Li, W.; Wu, H.; Zhu, N.; Jiang, Y.; Tan, J.; Guo, Y. Prediction of dissolved oxygen in a fishery pond based on gated recurrent unit (GRU). Inf. Process. Agric. 2021, 8, 185–193. [Google Scholar] [CrossRef]
Yang, J.; Jia, L.; Guo, Z.; Shen, Y.; Li, X.; Mou, Z.; Yu, K.; Lin, J.C.W. Prediction and control of water quality in Recirculating Aquaculture System based on hybrid neural network. Eng. Appl. Artif. Intell. 2023, 121, 106002. [Google Scholar] [CrossRef]
Haq, K.R.A.; Harigovindan, V. Water quality prediction for smart aquaculture using hybrid deep learning models. IEEE Access 2022, 10, 60078–60098. [Google Scholar]
Yilmaz, S.; Yilmaz, E.; Dawood, M.A.; Ringø, E.; Ahmadifar, E.; Abdel-Latif, H.M. Probiotics, prebiotics, and synbiotics used to control vibriosis in fish: A review. Aquaculture 2022, 547, 737514. [Google Scholar] [CrossRef]
Wang, H.; Zhang, S.; Zhao, S.; Wang, Q.; Li, D.; Zhao, R. Real-time detection and tracking of fish abnormal behavior based on improved YOLOV5 and SiamRPN++. Comput. Electron. Agric. 2022, 192, 106512. [Google Scholar] [CrossRef]
Hu, W.C.; Chen, L.B.; Huang, B.K.; Lin, H.M. A computer vision-based intelligent fish feeding system using deep learning techniques for aquaculture. IEEE Sens. J. 2022, 22, 7185–7194. [Google Scholar] [CrossRef]
Aung, T.; Abdul Razak, R.; Rahiman Bin Md Nor, A. Artificial intelligence methods used in various aquaculture applications: A systematic literature review. J. World Aquac. Soc. 2025, 56, e13107. [Google Scholar] [CrossRef]
Serra-Toro, C.; Montoliu, R.; Traver, V.J.; Hurtado-Melgar, I.M.; Núnez-Redó, M.; Cascales, P. Assessing water quality by video monitoring fish swimming behavior. In Proceedings of the 2010 20th International Conference on Pattern Recognition, Istanbul, Turkey, 23–26 August 2010; IEEE: Piscataway, NJ, USA, 2010; pp. 428–431. [Google Scholar]
Beyan, C.; Fisher, R.B. A filtering mechanism for normal fish trajectories. In Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012), Tsukuba, Japan, 11–15 November 2012; IEEE: Piscataway, NJ, USA, 2012; pp. 2286–2289. [Google Scholar]
Liu, C.; Wang, Z.; Li, Y.; Zhang, Z.; Li, J.; Xu, C.; Du, R.; Li, D.; Duan, Q. Research progress of computer vision technology in abnormal fish detection. Aquac. Eng. 2023, 103, 102350. [Google Scholar] [CrossRef]
Ahmed, M.S.; Aurpa, T.T.; Azad, M.A.K. Fish disease detection using image based machine learning technique in aquaculture. J. King Saud Univ.-Comput. Inf. Sci. 2022, 34, 5170–5182. [Google Scholar] [CrossRef]
Yang, Z.Y.; Xia, W.K.; Chu, H.Q.; Su, W.H.; Wang, R.F.; Wang, H. A Comprehensive Review of Deep Learning Applications in Cotton Industry: From Field Monitoring to Smart Processing. Plants 2025, 14, 1481. [Google Scholar] [CrossRef] [PubMed]
Wang, J.; Sun, K.; Cheng, T.; Jiang, B.; Deng, C.; Zhao, Y.; Liu, D.; Mu, Y.; Tan, M.; Wang, X.; et al. Deep high-resolution representation learning for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 43, 3349–3364. [Google Scholar] [CrossRef]
Cui, K.; Zhu, R.; Wang, M.; Tang, W.; Larsen, G.D.; Pauca, V.P.; Alqahtani, S.; Yang, F.; Segurado, D.; Lutz, D.; et al. Detection and Geographic Localization of Natural Objects in the Wild: A Case Study on Palms. arXiv 2025, arXiv:2502.13023. [Google Scholar]
Li, Y.; Wang, H.; Li, Z.; Wang, S.; Dev, S.; Zuo, G. DAANet: Dual Attention Aggregating Network for Salient Object Detection. In Proceedings of the 2023 IEEE International Conference on Robotics and Biomimetics (ROBIO), Koh Samui, Thailand, 4–9 December 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 1–7. [Google Scholar]
Tunyasuvunakool, K.; Adler, J.; Wu, Z.; Green, T.; Zielinski, M.; Žídek, A.; Bridgland, A.; Cowie, A.; Meyer, C.; Laydon, A.; et al. Highly accurate protein structure prediction for the human proteome. Nature 2021, 596, 590–596. [Google Scholar] [CrossRef]
Arnab, A.; Dehghani, M.; Heigold, G.; Sun, C.; Lučić, M.; Schmid, C. Vivit: A video vision transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Nashville, TN, USA, 20–25 June 2021; pp. 6836–6846. [Google Scholar]
Wang, H.; Zhu, B.; Li, Y.; Gong, K.; Wen, Z.; Wang, S.; Dev, S. SYGNet: A SVD-YOLO based GhostNet for real-time driving scene parsing. In Proceedings of the 2022 IEEE International Conference on Image Processing (ICIP), Bordeaux, France, 16–19 October 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 2701–2705. [Google Scholar]
Siddique, N.; Paheding, S.; Elkin, C.P.; Devabhaktuni, V. U-net and its variants for medical image segmentation: A review of theory and applications. IEEE Access 2021, 9, 82031–82057. [Google Scholar] [CrossRef]
Tang, W.; Cui, K.; Chan, R.H. Optimized hard exudate detection with supervised contrastive learning. In Proceedings of the 2024 IEEE International Symposium on Biomedical Imaging (ISBI), Athens, Greece, 27–30 May 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 1–5. [Google Scholar]
Wang, R.F.; Tu, Y.H.; Chen, Z.Q.; Zhao, C.T.; Su, W.H. A Lettpoint-Yolov11l Based Intelligent Robot for Precision Intra-Row Weeds Control in Lettuce. 2025. Available online: https://ssrn.com/abstract=5162748 (accessed on 20 May 2025).
Zhao, C.T.; Wang, R.F.; Tu, Y.H.; Pang, X.X.; Su, W.H. Automatic lettuce weed detection and classification based on optimized convolutional neural networks for robotic weed control. Agronomy 2024, 14, 2838. [Google Scholar] [CrossRef]
Wang, R.F.; Su, W.H. The application of deep learning in the whole potato production Chain: A Comprehensive review. Agriculture 2024, 14, 1225. [Google Scholar] [CrossRef]
Joshi, A.; Pradhan, B.; Chakraborty, S.; Varatharajoo, R.; Gite, S.; Alamri, A. Deep-Transfer-Learning Strategies for Crop Yield Prediction Using Climate Records and Satellite Image Time-Series Data. Remote Sens. 2024, 16, 4804. [Google Scholar] [CrossRef]
Razavi, M.; Mavaddati, S.; Koohi, H. ResNet deep models and transfer learning technique for classification and quality detection of rice cultivars. Expert Syst. Appl. 2024, 247, 123276. [Google Scholar] [CrossRef]
Subeesh, A.; Kumar, S.P.; Chakraborty, S.K.; Upendar, K.; Chandel, N.S.; Jat, D.; Dubey, K.; Modi, R.U.; Khan, M.M. UAV imagery coupled deep learning approach for the development of an adaptive in-house web-based application for yield estimation in citrus orchard. Measurement 2024, 234, 114786. [Google Scholar] [CrossRef]
Tsai, S.M.; Chuang, M.L.; Huang, P.S. Detection and counting of algae based on deep learning. In Proceedings of the 2022 IEEE International Conference on Consumer Electronics-Taiwan, Taipei, Taiwan, 6–8 July 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 597–598. [Google Scholar]
Kaewta, C.; Pitakaso, R.; Khonjun, S.; Srichok, T.; Luesak, P.; Gonwirat, S.; Enkvetchakul, P.; Jutagate, A.; Jutagate, T. Application of AMIS-optimized vision transformer in identifying disease in Nile Tilapia. Comput. Electron. Agric. 2024, 227, 109676. [Google Scholar] [CrossRef]
Hamzaoui, M.; Ould-Elhassen Aoueileyine, M.; Romdhani, L.; Bouallegue, R. An improved deep learning model for underwater species recognition in aquaculture. Fishes 2023, 8, 514. [Google Scholar] [CrossRef]
VP, H.; KP, R.A.H.; Bhide, A. Attention-driven LSTM and GRU deep learning techniques for precise water quality prediction in smart aquaculture. Aquac. Int. 2024, 32, 8455–8478. [Google Scholar]
Sarker, I.H. Deep learning: A comprehensive overview on techniques, taxonomy, applications and research directions. SN Comput. Sci. 2021, 2, 1–20. [Google Scholar] [CrossRef]
Zhao, S.; Zhang, S.; Liu, J.; Wang, H.; Zhu, J.; Li, D.; Zhao, R. Application of machine learning in intelligent fish aquaculture: A review. Aquaculture 2021, 540, 736724. [Google Scholar] [CrossRef]
Liu, H.; Ma, X.; Yu, Y.; Wang, L.; Hao, L. Application of deep learning-based object detection techniques in fish aquaculture: A review. J. Mar. Sci. Eng. 2023, 11, 867. [Google Scholar] [CrossRef]
LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef]
Zhou, G.; Wang, R.F.; Cui, K. A Local Perspective-based Model for Overlapping Community Detection. arXiv 2025, arXiv:2503.21558. [Google Scholar]
Zhou, G.; Wang, R.F. The Heterogeneous Network Community Detection Model Based on Self-Attention. Symmetry 2025, 17, 432. [Google Scholar] [CrossRef]
Meng, S.; Shi, Z.; Li, G.; Peng, M.; Liu, L.; Zheng, H.; Zhou, C. A novel deep learning framework for landslide susceptibility assessment using improved deep belief networks with the intelligent optimization algorithm. Comput. Geotech. 2024, 167, 106106. [Google Scholar] [CrossRef]
Wang, Z.; Xu, N.; Bao, X.; Wu, J.; Cui, X. Spatio-temporal deep learning model for accurate streamflow prediction with multi-source data fusion. Environ. Model. Softw. 2024, 178, 106091. [Google Scholar] [CrossRef]
Ferentinos, K.P. Deep learning models for plant disease detection and diagnosis. Comput. Electron. Agric. 2018, 145, 311–318. [Google Scholar] [CrossRef]
Qin, Y.M.; Tu, Y.H.; Li, T.; Ni, Y.; Wang, R.F.; Wang, H. Deep Learning for Sustainable Agriculture: A Systematic Review on Applications in Lettuce Cultivation. Sustainability 2025, 17, 3190. [Google Scholar] [CrossRef]
Li, Z.; Sun, C.; Wang, H.; Wang, R.F. Hybrid Optimization of Phase Masks: Integrating Non-Iterative Methods with Simulated Annealing and Validation via Tomographic Measurements. Symmetry 2025, 17, 530. [Google Scholar] [CrossRef]
Cui, K.; Tang, W.; Zhu, R.; Wang, M.; Larsen, G.D.; Pauca, V.P.; Alqahtani, S.; Yang, F.; Segurado, D.; Fine, P.; et al. Real-time localization and bimodal point pattern analysis of palms using uav imagery. arXiv 2024, arXiv:2410.11124. [Google Scholar]
Zhang, W.; Ma, M.; Jiang, Y.; Lian, R.; Wu, Z.; Cui, K.; Ma, X. Center-guided Classifier for Semantic Segmentation of Remote Sensing Images. arXiv 2025, arXiv:2503.16963. [Google Scholar]
Fukushima, K. Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biol. Cybern. 1980, 36, 193–202. [Google Scholar] [CrossRef]
Zhao, X.; Wang, L.; Zhang, Y.; Han, X.; Deveci, M.; Parmar, M. A review of convolutional neural networks in computer vision. Artif. Intell. Rev. 2024, 57, 99. [Google Scholar] [CrossRef]
Wang, P.; Fan, E.; Wang, P. Comparative analysis of image classification algorithms based on traditional machine learning and deep learning. Pattern Recognit. Lett. 2021, 141, 61–67. [Google Scholar]
Cui, K.; Shao, Z.; Larsen, G.; Pauca, V.; Alqahtani, S.; Segurado, D.; Pinheiro, J.; Wang, M.; Lutz, D.; Plemmons, R.; et al. PalmProbNet: A Probabilistic Approach to Understanding Palm Distributions in Ecuadorian Tropical Forest via Transfer Learning. In Proceedings of the 2024 ACM Southeast Conference, Marietta, GA, USA, 18–20 April 2024; pp. 272–277. [Google Scholar]
Li, J.; Xu, W.; Deng, L.; Xiao, Y.; Han, Z.; Zheng, H. Deep learning for visual recognition and detection of aquatic animals: A review. Rev. Aquac. 2023, 15, 409–433. [Google Scholar] [CrossRef]
Feng, J.; Yang, L.T.; Ren, B.; Zou, D.; Dong, M.; Zhang, S. Tensor recurrent neural network with differential privacy. IEEE Trans. Comput. 2023, 73, 683–693. [Google Scholar] [CrossRef]
Hochreiter, S. The vanishing gradient problem during learning recurrent neural nets and problem solutions. Int. J. Uncertainty Fuzziness Knowl.-Based Syst. 1998, 6, 107–116. [Google Scholar]
Graves, A.; Graves, A. Long short-term memory. In Supervised Sequence Labelling with Recurrent Neural Networks; Springer Nature: Dordrecht, The Netherlands, 2012; pp. 37–45. [Google Scholar]
Yu, Y.; Si, X.; Hu, C.; Zhang, J. A review of recurrent neural networks: LSTM cells and network architectures. Neural Comput. 2019, 31, 1235–1270. [Google Scholar] [PubMed]
Camalan, S.; Cui, K.; Pauca, V.P.; Alqahtani, S.; Silman, M.; Chan, R.; Plemmons, R.J.; Dethier, E.N.; Fernandez, L.E.; Lutz, D.A. Change detection of amazonian alluvial gold mining using deep learning and sentinel-2 imagery. Remote Sens. 2022, 14, 1746. [Google Scholar] [CrossRef]
Ma, M.; Mao, Z. Deep-convolution-based LSTM network for remaining useful life prediction. IEEE Trans. Ind. Inform. 2020, 17, 1658–1667. [Google Scholar] [CrossRef]
Li, W.; Wei, Y.; An, D.; Jiao, Y.; Wei, Q. LSTM-TCN: Dissolved oxygen prediction in aquaculture, based on combined model of long short-term memory network and temporal convolutional network. Environ. Sci. Pollut. Res. 2022, 29, 39545–39556. [Google Scholar] [CrossRef]
Creswell, A.; White, T.; Dumoulin, V.; Arulkumaran, K.; Sengupta, B.; Bharath, A.A. Generative adversarial networks: An overview. IEEE Signal Process. Mag. 2018, 35, 53–65. [Google Scholar]
Barth, R.; Hemming, J.; Van Henten, E.J. Optimising realism of synthetic images using cycle generative adversarial networks for improved part segmentation. Comput. Electron. Agric. 2020, 173, 105378. [Google Scholar] [CrossRef]
Lu, Y.; Chen, D.; Olaniyi, E.; Huang, Y. Generative adversarial networks (GANs) for image augmentation in agriculture: A systematic review. Comput. Electron. Agric. 2022, 200, 107208. [Google Scholar] [CrossRef]
Bakht, A.B.; Jia, Z.; Din, M.U.; Akram, W.; Saoud, L.S.; Seneviratne, L.; Lin, D.; He, S.; Hussain, I. Mula-gan: Multi-level attention gan for enhanced underwater visibility. Ecol. Inform. 2024, 81, 102631. [Google Scholar] [CrossRef]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems 30 (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017; Volume 30. [Google Scholar]
Han, K.; Wang, Y.; Chen, H.; Chen, X.; Guo, J.; Liu, Z.; Tang, Y.; Xiao, A.; Xu, C.; Xu, Y.; et al. A survey on vision transformer. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 87–110. [Google Scholar] [CrossRef]
Wang, Z.; Wang, R.; Wang, M.; Lai, T.; Zhang, M. Self-supervised transformer-based pre-training method with General Plant Infection dataset. In Proceedings of the Chinese Conference on Pattern Recognition and Computer Vision (PRCV), Urumqi, China, 18–20 October 2024; Springer: Berlin/Heidelberg, Germany, 2024; pp. 189–202. [Google Scholar]
Gong, B.; Dai, K.; Shao, J.; Jing, L.; Chen, Y. Fish-TViT: A novel fish species classification method in multi water areas based on transfer learning and vision transformer. Heliyon 2023, 9, e16761. [Google Scholar] [CrossRef] [PubMed]
Gu, A.; Dao, T. Mamba: Linear-time sequence modeling with selective state spaces. arXiv 2023, arXiv:2312.00752. [Google Scholar]
Zhu, X.; Ruan, Q.; Qian, S.; Zhang, M. A hybrid model based on transformer and Mamba for enhanced sequence modeling. Sci. Rep. 2025, 15, 11428. [Google Scholar] [CrossRef]
Yang, C.; Xiang, J.; Li, X.; Xie, Y. FishDet-YOLO: Enhanced Underwater Fish Detection with Richer Gradient Flow and Long-Range Dependency Capture through Mamba-C2f. Electronics 2024, 13, 3780. [Google Scholar] [CrossRef]
Yao, M.; Huo, Y.; Tian, Q.; Zhao, J.; Liu, X.; Wang, R.; Xue, L.; Wang, H. FMRFT: Fusion mamba and DETR for query time sequence intersection fish tracking. arXiv 2024, arXiv:2409.01148. [Google Scholar]
Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv 2017, arXiv:1704.04861. [Google Scholar]
Prasetyo, E.; Purbaningtyas, R.; Adityo, R.D.; Suciati, N.; Fatichah, C. Combining MobileNetV1 and Depthwise Separable convolution bottleneck with Expansion for classifying the freshness of fish eyes. Inf. Process. Agric. 2022, 9, 485–496. [Google Scholar] [CrossRef]
Alzubaidi, L.; Zhang, J.; Humaidi, A.J.; Al-Dujaili, A.; Duan, Y.; Al-Shamma, O.; Santamaría, J.; Fadhel, M.A.; Al-Amidie, M.; Farhan, L. Review of deep learning: Concepts, CNN architectures, challenges, applications, future directions. J. Big Data 2021, 8, 1–74. [Google Scholar] [CrossRef] [PubMed]
Banan, A.; Nasiri, A.; Taheri-Garavand, A. Deep learning-based appearance features extraction for automated carp species identification. Aquac. Eng. 2020, 89, 102053. [Google Scholar] [CrossRef]
Yang, X.; Zhang, S.; Liu, J.; Gao, Q.; Dong, S.; Zhou, C. Deep learning for smart fish farming: Applications, opportunities and challenges. Rev. Aquac. 2021, 13, 66–90. [Google Scholar] [CrossRef]
Hu, Z.; Zhang, Y.; Zhao, Y.; Xie, M.; Zhong, J.; Tu, Z.; Liu, J. A water quality prediction method based on the deep LSTM network considering correlation in smart mariculture. Sensors 2019, 19, 1420. [Google Scholar] [CrossRef]
Wang, H.; Zhang, S.; Zhao, S.; Lu, J.; Wang, Y.; Li, D.; Zhao, R. Fast detection of cannibalism behavior of juvenile fish based on deep learning. Comput. Electron. Agric. 2022, 198, 107033. [Google Scholar] [CrossRef]
Hu, X.; Liu, Y.; Zhao, Z.; Liu, J.; Yang, X.; Sun, C.; Chen, S.; Li, B.; Zhou, C. Real-time detection of uneaten feed pellets in underwater images for aquaculture using an improved YOLO-V4 network. Comput. Electron. Agric. 2021, 185, 106135. [Google Scholar] [CrossRef]
Sung, M.; Yu, S.C.; Girdhar, Y. Vision based real-time fish detection using convolutional neural network. In Proceedings of the OCEANS 2017-Aberdeen, Aberdeen, UK, 19–22 June 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 1–6. [Google Scholar]
Li, D.; Du, L. Recent advances of deep learning algorithms for aquacultural machine vision systems with emphasis on fish. Artif. Intell. Rev. 2022, 55, 4077–4116. [Google Scholar] [CrossRef]
Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
Girshick, R. Fast r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
Li, X.; Shang, M.; Hao, J.; Yang, Z. Accelerating fish detection and recognition by sharing CNNs with objectness learning. In Proceedings of the OCEANS 2016-Shanghai, Shanghai, China, 10–13 April 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 1–5. [Google Scholar]
Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
Tong, J.; Wang, W.; Xue, M.; Zhu, Z.; Han, J.; Tian, S. Automatic single fish detection with a commercial echosounder using YOLO v5 and its application for echosounder calibration. Front. Mar. Sci. 2023, 10, 1162064. [Google Scholar] [CrossRef]
Ouis, M.Y.; Akhloufi, M. YOLO-based fish detection in underwater environments. Environ. Sci. Proc. 2023, 29, 44. [Google Scholar]
Yu, X.; Wang, Y.; An, D.; Wei, Y. Counting method for cultured fishes based on multi-modules and attention mechanism. Aquac. Eng. 2022, 96, 102215. [Google Scholar] [CrossRef]
Zhao, Y.; Li, W.; Li, Y.; Qi, Y.; Li, Z.; Yue, J. LFCNet: A lightweight fish counting model based on density map regression. Comput. Electron. Agric. 2022, 203, 107496. [Google Scholar] [CrossRef]
Cai, K.; Miao, X.; Wang, W.; Pang, H.; Liu, Y.; Song, J. A modified YOLOv3 model for fish detection based on MobileNetv1 as backbone. Aquac. Eng. 2020, 91, 102117. [Google Scholar] [CrossRef]
Ben Tamou, A.; Benzinou, A.; Nasreddine, K. Multi-stream fish detection in unconstrained underwater videos by the fusion of two convolutional neural network detectors. Appl. Intell. 2021, 51, 5809–5821. [Google Scholar] [CrossRef]
Patro, K.S.K.; Yadav, V.K.; Bharti, V.; Sharma, A.; Sharma, A. Fish detection in underwater environments using deep learning. Natl. Acad. Sci. Lett. 2023, 46, 407–412. [Google Scholar] [CrossRef]
Lumauag, R.; Nava, M. Fish tracking and counting using image processing. In Proceedings of the 2018 IEEE 10th International Conference on Humanoid, Nanotechnology, Information Technology, Communication and Control, Environment and Management (HNICEM), Baguio City, Philippines, 29 November–2 December 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 1–4. [Google Scholar]
Álvarez-Ellacuría, A.; Palmer, M.; Catalán, I.A.; Lisani, J.L. Image-based, unsupervised estimation of fish size from commercial landings using deep learning. ICES J. Mar. Sci. 2020, 77, 1330–1339. [Google Scholar] [CrossRef]
Ma, C.; Huang, J.B.; Yang, X.; Yang, M.H. Hierarchical convolutional features for visual tracking. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 3074–3082. [Google Scholar]
Danelljan, M.; Hager, G.; Shahbaz Khan, F.; Felsberg, M. Learning spatially regularized correlation filters for visual tracking. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 4310–4318. [Google Scholar]
Lai, Y.C.; Huang, R.J.; Kuo, Y.P.; Tsao, C.Y.; Wang, J.H.; Chang, C.C. Underwater target tracking via 3D convolutional networks. In Proceedings of the 2019 IEEE 6th International Conference on Industrial Engineering and Applications (ICIEA), Tokyo, Japan, 12–15 April 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 485–490. [Google Scholar]
Sun, M.; Li, W.; Jiao, Z.; Zhao, X. A multi-target tracking platform for zebrafish based on deep neural network. In Proceedings of the 2019 IEEE 9th Annual International Conference on CYBER Technology in Automation, Control, and Intelligent Systems (CYBER), Suzhou, China, 29 July–2 August 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 637–642. [Google Scholar]
Liu, T.; He, S.; Liu, H.; Gu, Y.; Li, P. A robust underwater multiclass fish-school tracking algorithm. Remote Sens. 2022, 14, 4106. [Google Scholar] [CrossRef]
Hu, J.; Zhao, D.; Zhang, Y.; Zhou, C.; Chen, W. Real-time nondestructive fish behavior detecting in mixed polyculture system using deep-learning and low-cost devices. Expert Syst. Appl. 2021, 178, 115051. [Google Scholar] [CrossRef]
Wang, G.; Muhammad, A.; Liu, C.; Du, L.; Li, D. Automatic recognition of fish behavior with a fusion of RGB and optical flow data based on deep learning. Animals 2021, 11, 2774. [Google Scholar] [CrossRef]
Han, F.; Zhu, J.; Liu, B.; Zhang, B.; Xie, F. Fish shoals behavior detection based on convolutional neural network and spatiotemporal information. IEEE Access 2020, 8, 126907–126926. [Google Scholar] [CrossRef]
Sun, X.; Wang, Y. Growth models in aquaculture for hybrid and natural groupers based on early development stage. Aquaculture 2024, 578, 740026. [Google Scholar] [CrossRef]
Endo, H.; Yonemori, Y.; Hibi, K.; Ren, H.; Hayashi, T.; Tsugawa, W.; Sode, K. Wireless enzyme sensor system for real-time monitoring of blood glucose levels in fish. Biosens. Bioelectron. 2009, 24, 1417–1423. [Google Scholar] [CrossRef] [PubMed]
Li, X.; Hao, Y.; Akhter, M.; Li, D. A novel automatic detection method for abnormal behavior of single fish using image fusion. Comput. Electron. Agric. 2022, 203, 107435. [Google Scholar] [CrossRef]
Yang, Y.; Xue, B.; Jesson, L.; Wylie, M.; Zhang, M.; Wellenreuther, M. Deep convolutional neural networks for fish weight prediction from images. In Proceedings of the 2021 36th International Conference on Image and Vision Computing New Zealand (IVCNZ), Tauranga, New Zealand, 9–10 December 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 1–6. [Google Scholar]
Yoshida, T.; Suzuki, K.; Kogo, K. Estimating body weight of caged sea cucumbers (Apostichopus japonicus) using an underwater time-lapse camera and image analysis by semantic segmentation. Smart Agric. Technol. 2024, 8, 100520. [Google Scholar] [CrossRef]
Chirdchoo, N.; Mukviboonchai, S.; Cheunta, W. A deep learning model for estimating body weight of live pacific white shrimp in a clay pond shrimp aquaculture. Intell. Syst. Appl. 2024, 24, 200434. [Google Scholar] [CrossRef]
Li, D.; Li, X.; Wang, Q.; Hao, Y. Advanced techniques for the intelligent diagnosis of fish diseases: A review. Animals 2022, 12, 2938. [Google Scholar] [CrossRef]
Wagner, W.P. Trends in expert system development: A longitudinal content analysis of over thirty years of expert system case studies. Expert Syst. Appl. 2017, 76, 85–96. [Google Scholar] [CrossRef]
Hu, J.; Li, D.; Duan, Q.; Han, Y.; Chen, G.; Si, X. Fish species classification by color, texture and multi-class support vector machine using computer vision. Comput. Electron. Agric. 2012, 88, 133–140. [Google Scholar] [CrossRef]
Raj, A.S.; Senthilkumar, S.; Radha, R.; Muthaiyan, R. Enhanced recurrent capsule network with hyrbid optimization model for shrimp disease detection. Sci. Rep. 2025, 15, 10400. [Google Scholar] [CrossRef]
Wang, D.; Wu, M.; Zhu, X.; Qin, Q.; Wang, S.; Ye, H.; Guo, K.; Wu, C.; Shi, Y. Real-time detection and identification of fish skin health in the underwater environment based on improved YOLOv10 model. Aquac. Rep. 2025, 42, 102723. [Google Scholar] [CrossRef]
Chai, X.; Sun, T.; Li, Z.; Zhang, Y.; Sun, Q.; Zhang, N.; Qiu, J.; Chai, X. Cross-Shaped Heat Tensor Network for Morphometric Analysis Using Zebrafish Larvae Feature Keypoints. Sensors 2024, 25, 132. [Google Scholar] [CrossRef] [PubMed]
Huang, Z.; Zhao, H.; Cui, Z.; Wang, L.; Li, H.; Qu, K.; Cui, H. Early warning system for nocardiosis in largemouth bass (Micropterus salmoides) based on multimodal information fusion. Comput. Electron. Agric. 2024, 226, 109393. [Google Scholar] [CrossRef]
Chen, L.; Yang, X.; Sun, C.; Wang, Y.; Xu, D.; Zhou, C. Feed intake prediction model for group fish using the MEA-BP neural network in intensive aquaculture. Inf. Process. Agric. 2020, 7, 261–271. [Google Scholar] [CrossRef]
Zhang, L.; Li, B.; Sun, X.; Hong, Q.; Duan, Q. Intelligent fish feeding based on machine vision: A review. Biosyst. Eng. 2023, 231, 133–164. [Google Scholar] [CrossRef]
Li, D.; Wang, Z.; Wu, S.; Miao, Z.; Du, L.; Duan, Y. Automatic recognition methods of fish feeding behavior in aquaculture: A review. Aquaculture 2020, 528, 735508. [Google Scholar] [CrossRef]
Yao, M.; Huo, Y.; Ran, Y.; Tian, Q.; Wang, R.; Wang, H. Neural radiance field-based visual rendering: A comprehensive review. arXiv 2024, arXiv:2404.00714. [Google Scholar]
Atoum, Y.; Srivastava, S.; Liu, X. Automatic feeding control for dense aquaculture fish tanks. IEEE Signal Process. Lett. 2014, 22, 1089–1093. [Google Scholar] [CrossRef]
Liu, T.; Zhang, B.; Zheng, Q.; Cai, C.; Gao, X.; Xie, C.; Wu, Y.; Gul, H.S.; Liu, S.; Xu, L. A method for fusing attention mechanism-based ResNet and improved ConvNeXt for analyzing fish feeding behavior. Aquac. Int. 2025, 33, 193. [Google Scholar] [CrossRef]
Zhang, Y.; Xu, C.; Du, R.; Kong, Q.; Li, D.; Liu, C. MSIF-MobileNetV3: An improved MobileNetV3 based on multi-scale information fusion for fish feeding behavior analysis. Aquac. Eng. 2023, 102, 102338. [Google Scholar] [CrossRef]
Zhang, L.; Zheng, Y.; Liu, Z.; Zhu, Z.; Wu, Y.; Pan, L. A method for detecting feeding fish in ponds based on FFishNet-YOLOv8. Comput. Electron. Agric. 2025, 230, 109873. [Google Scholar] [CrossRef]
Smith, D.V.; Tabrett, S. The use of passive acoustics to measure feed consumption by Penaeus monodon (giant tiger prawn) in cultured systems. Aquac. Eng. 2013, 57, 38–47. [Google Scholar] [CrossRef]
Cui, M.; Liu, X.; Zhao, J.; Sun, J.; Lian, G.; Chen, T.; Plumbley, M.D.; Li, D.; Wang, W. Fish feeding intensity assessment in aquaculture: A new audio dataset AFFIA3K and a deep learning algorithm. In Proceedings of the 2022 IEEE 32nd International Workshop on Machine Learning for Signal Processing (MLSP), Xi’an, China, 22–25 August 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 1–6. [Google Scholar]
Huang, X.; Ma, X.; Jin, J.; Fan, S.; Xie, Y.; Cai, W. Assessment of feeding intensity of Oreochromis niloticus (tilapia) based on improved VGG16 and voice spectrograms. Aquac. Int. 2025, 33, 246. [Google Scholar] [CrossRef]
Gu, X.; Zhao, S.; Duan, Y.; Meng, Y.; Li, D.; Zhao, R. MMFINet: A multimodal fusion network for accurate fish feeding intensity assessment in recirculating aquaculture systems. Comput. Electron. Agric. 2025, 232, 110138. [Google Scholar] [CrossRef]
Jasmin, S.A.; Ramesh, P.; Tanveer, M. An intelligent framework for prediction and forecasting of dissolved oxygen level and biofloc amount in a shrimp culture system using machine learning techniques. Expert Syst. Appl. 2022, 199, 117160. [Google Scholar] [CrossRef]
Ma, Q.; Li, S.; Qi, H.; Yang, X.; Liu, M. Rapid Prediction and Inversion of Pond Aquaculture Water Quality Based on Hyperspectral Imaging by Unmanned Aerial Vehicles. Water 2025, 17, 517. [Google Scholar] [CrossRef]
Li, T.; Lu, J.; Wu, J.; Zhang, Z.; Chen, L. Predicting aquaculture water quality using machine learning approaches. Water 2022, 14, 2836. [Google Scholar] [CrossRef]
Iniyan Arasu, M.; Subha Rani, S.; Thiyagarajan, K.; Ahilan, A. AQUASENSE: Aquaculture water quality monitoring framework using autonomous sensors. Aquac. Int. 2024, 32, 9119–9135. [Google Scholar]
Ma, Y.; Fang, Q.; Xia, S.; Zhou, Y. Prediction of the Dissolved Oxygen Content in Aquaculture Based on the CNN-GRU Hybrid Neural Network. Water 2024, 16, 3547. [Google Scholar] [CrossRef]
Song, L.; Song, Y.; Tian, Y.; Quan, J. DECSF-Net: A multi-variable prediction method for pond aquaculture water quality based on cross-source feedback fusion. Aquac. Int. 2025, 33, 1–25. [Google Scholar] [CrossRef]
Arepalli, P.G.; Naik, K.J. A deep learning-enabled IoT framework for early hypoxia detection in aqua water using light weight spatially shared attention-LSTM network. J. Supercomput. 2024, 80, 2718–2747. [Google Scholar] [CrossRef]
Arepalli, P.G.; Naik, K.J. An IoT based smart water quality assessment framework for aqua-ponds management using Dilated Spatial-temporal Convolution Neural Network (DSTCNN). Aquac. Eng. 2024, 104, 102373. [Google Scholar] [CrossRef]
Yang, L.; Liu, Y.; Yu, H.; Fang, X.; Song, L.; Li, D.; Chen, Y. Computer vision models in intelligent aquaculture with emphasis on fish detection and behavior analysis: A review. Arch. Comput. Methods Eng. 2021, 28, 2785–2816. [Google Scholar] [CrossRef]
Beyan, C.; Fisher, R.B. Detecting abnormal fish trajectories using clustered and labeled data. In Proceedings of the 2013 IEEE International Conference on Image Processing, Melbourne, VIC, Australia, 15–18 September 2013; IEEE: Piscataway, NJ, USA, 2013; pp. 1476–1480. [Google Scholar]
Du, L.; Lu, Z.; Li, D. Broodstock breeding behaviour recognition based on Resnet50-LSTM with CBAM attention mechanism. Comput. Electron. Agric. 2022, 202, 107404. [Google Scholar] [CrossRef]
Zhao, J.; Bao, W.; Zhang, F.; Zhu, S.; Liu, Y.; Lu, H.; Shen, M.; Ye, Z. Modified motion influence map and recurrent neural network-based monitoring of the local unusual behaviors for fish school in intensive aquaculture. Aquaculture 2018, 493, 165–175. [Google Scholar] [CrossRef]
Xu, W.; Wang, P.; Jiang, L.; Xuan, K.; Li, D.; Li, J. Intelligent recognition and behavior tracking of sea cucumber infected with Vibrio alginolyticus based on machine vision. Aquac. Eng. 2023, 103, 102368. [Google Scholar] [CrossRef]
Li, D.; Wang, G.; Du, L.; Zheng, Y.; Wang, Z. Recent advances in intelligent recognition methods for fish stress behavior. Aquac. Eng. 2022, 96, 102222. [Google Scholar] [CrossRef]
Mei, S.; Chen, Y.; Qin, H.; Yu, H.; Li, D.; Sun, B.; Yang, L.; Liu, Y. A Method Based on Knowledge Distillation for Fish School Stress State Recognition in Intensive Aquaculture. CMES-Comput. Model. Eng. Sci. 2022, 131, 1315–1335. [Google Scholar] [CrossRef]
Fisher, R.B.; Chen-Burger, Y.H.; Giordano, D.; Hardman, L.; Lin, F.P. Fish4Knowledge: Collecting and Analyzing Massive Coral Reef Fish Video Data; Springer: Berlin/Heidelberg, Germany, 2016; Volume 104. [Google Scholar]
Spampinato, C.; Palazzo, S.; Boom, B.; Fisher, R.B. Overview of the LifeCLEF 2014 Fish Task. In Proceedings of the CLEF (Working Notes), Sheffield, UK, 15–18 September 2014; pp. 616–624. [Google Scholar]
Choi, S. Fish Identification in Underwater Video with Deep Convolutional Neural Network: SNUMedinfo at LifeCLEF Fish task 2015. In Proceedings of the CLEF (Working Notes), Toulouse, France, 8–11 September 2015; pp. 1–10. [Google Scholar]
Joly, A.; Goëau, H.; Glotin, H.; Spampinato, C.; Bonnet, P.; Vellinga, W.P.; Champ, J.; Planqué, R.; Palazzo, S.; Müller, H. LifeCLEF 2016: Multimedia life species identification challenges. In Proceedings of the Experimental IR Meets Multilinguality, Multimodality, and Interaction: 7th International Conference of the CLEF Association (CLEF 2016), Évora, Portugal, 5–8 September 2016; Proceedings 7. Springer: Berlin/Heidelberg, Germany, 2016; pp. 286–310. [Google Scholar]
Ali-Gombe, A.; Elyan, E.; Jayne, C. Fish classification in context of noisy images. In Proceedings of the Engineering Applications of Neural Networks: 18th International Conference (EANN 2017), Athens, Greece, 25–27 August 2017; Springer: Berlin/Heidelberg, Germany, 2017; pp. 216–226. [Google Scholar]
Sun, M.; Yang, X.; Xie, Y. Deep learning in aquaculture: A review. J. Comput 2020, 31, 294–319. [Google Scholar]
Li, C.; Guo, J.; Guo, C. Emerging from water: Underwater image color correction based on weakly supervised color transfer. IEEE Signal Process. Lett. 2018, 25, 323–327. [Google Scholar] [CrossRef]
Bianco, G.; Muzzupappa, M.; Bruno, F.; Garcia, R.; Neumann, L. A new color correction method for underwater imaging. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2015, 40, 25–32. [Google Scholar] [CrossRef]
Gomes-Pereira, J.N.; Auger, V.; Beisiegel, K.; Benjamin, R.; Bergmann, M.; Bowden, D.; Buhl-Mortensen, P.; De Leo, F.C.; Dionísio, G.; Durden, J.M.; et al. Current and future trends in marine image annotation software. Prog. Oceanogr. 2016, 149, 106–120. [Google Scholar] [CrossRef]
Duarte, A.; Codevilla, F.; Gaya, J.D.O.; Botelho, S.S. A dataset to evaluate underwater image restoration methods. In Proceedings of the OCEANS 2016-Shanghai, Shanghai, China, 10–13 April 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 1–6. [Google Scholar]
Wang, Z.; Chen, H.; Qin, H.; Chen, Q. Self-supervised pre-training joint framework: Assisting lightweight detection network for underwater object detection. J. Mar. Sci. Eng. 2023, 11, 604. [Google Scholar] [CrossRef]
Jesus, A.; Zito, C.; Tortorici, C.; Roura, E.; De Masi, G. Underwater object classification and detection: First results and open challenges. In Proceedings of the OCEANS 2022-Chennai, Chennai, India, 21–24 February 2022; pp. 1–6. [Google Scholar]
Li, Z.; Zhang, S.; Cao, P.; Zhang, J.; An, Z. Research on fine-tuning strategies for text classification in the aquaculture domain by combining deep learning and large language models. Aquac. Int. 2025, 33, 295. [Google Scholar] [CrossRef]
Li, W.; Du, Z.; Xu, X.; Bai, Z.; Han, J.; Cui, M.; Li, D. A review of aquaculture: From single modality analysis to multimodality fusion. Comput. Electron. Agric. 2024, 226, 109367. [Google Scholar] [CrossRef]
Cui, S.; Zhou, Y.; Wang, Y.; Zhai, L. Fish detection using deep learning. Appl. Comput. Intell. Soft Comput. 2020, 2020, 3738108. [Google Scholar] [CrossRef]
Yassir, A.; Andaloussi, S.J.; Ouchetto, O.; Mamza, K.; Serghini, M. Acoustic fish species identification using deep learning and machine learning algorithms: A systematic review. Fish. Res. 2023, 266, 106790. [Google Scholar] [CrossRef]
Yu-Hao, T.; Rui-Feng, W.; Wen-Hao, S. Active Disturbance Rejection Control—New Trends in Agricultural Cybernetics in the Future: A Comprehensive Review. Machines 2025, 13, 111. [Google Scholar] [CrossRef]
Jalal, A.; Salman, A.; Mian, A.; Shortis, M.; Shafait, F. Fish detection and species classification in underwater environments using deep learning with temporal information. Ecol. Inform. 2020, 57, 101088. [Google Scholar] [CrossRef]
Deep, B.V.; Dash, R. Underwater fish species recognition using deep learning techniques. In Proceedings of the 2019 6th International Conference on Signal Processing and Integrated Networks (SPIN), Noida, India, 7–8 March 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 665–669. [Google Scholar]
Salman, A.; Siddiqui, S.A.; Shafait, F.; Mian, A.; Shortis, M.R.; Khurshid, K.; Ulges, A.; Schwanecke, U. Automatic fish detection in underwater videos by a deep neural network-based hybrid motion learning system. ICES J. Mar. Sci. 2020, 77, 1295–1307. [Google Scholar] [CrossRef]
Qian, Q.; Zhang, B.; Li, C.; Mao, Y.; Qin, Y. Federated transfer learning for machinery fault diagnosis: A comprehensive review of technique and application. Mech. Syst. Signal Process. 2025, 223, 111837. [Google Scholar] [CrossRef]
Wan, S.; Zhao, K.; Lu, Z.; Li, J.; Lu, T.; Wang, H. A modularized ioT monitoring system with edge-computing for aquaponics. Sensors 2022, 22, 9260. [Google Scholar] [CrossRef] [PubMed]
Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.C. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4510–4520. [Google Scholar]
Howard, A.; Sandler, M.; Chu, G.; Chen, L.C.; Chen, B.; Tan, M.; Wang, W.; Zhu, Y.; Pang, R.; Vasudevan, V.; et al. Searching for mobilenetv3. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 1314–1324. [Google Scholar]
Kang, S.; Hu, Z.; Liu, L.; Zhang, K.; Cao, Z. Object Detection YOLO Algorithms and Their Industrial Applications: Overview and Comparative Analysis. Electronics 2025, 14, 1104. [Google Scholar] [CrossRef]
Zhang, X.; Zhou, X.; Lin, M.; Sun, J. Shufflenet: An extremely efficient convolutional neural network for mobile devices. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 6848–6856. [Google Scholar]
Mehta, S.; Rastegari, M. Mobilevit: Light-weight, general-purpose, and mobile-friendly vision transformer. arXiv 2021, arXiv:2110.02178. [Google Scholar]
Mehta, S.; Rastegari, M. Separable self-attention for mobile vision transformers. arXiv 2022, arXiv:2206.02680. [Google Scholar]
Wadekar, S.N.; Chaurasia, A. Mobilevitv3: Mobile-friendly vision transformer with simple and effective fusion of local, global and input features. arXiv 2022, arXiv:2209.15159. [Google Scholar]
Liu, D.; Li, Z.; Wu, Z.; Li, C. Digital Twin/MARS-CycleGAN: Enhancing Sim-to-Real Crop/Row Detection for MARS Phenotyping Robot Using Synthetic Images. J. Field Robot. 2024, 42, 625–640. [Google Scholar] [CrossRef]
Li, Z.; Xu, R.; Li, C.; Fu, L. Visual navigation and crop mapping of a phenotyping robot MARS-PhenoBot in simulation. Smart Agric. Technol. 2025, 11, 100910. [Google Scholar] [CrossRef]
Ding, H.; Zhao, L.; Yan, J.; Feng, H.Y. Implementation of digital twin in actual production: Intelligent assembly paradigm for large-scale industrial equipment. Machines 2023, 11, 1031. [Google Scholar] [CrossRef]

Figure 1. Total aquaculture production value in the world, in Asia, Europe, and the Americas, from 2013 to 2023 [Source: FAO database, Last Accessed on 5 May 2025].

Figure 2. Number of journal articles retrieved from the Web of Science.

Figure 3. The applications of deep learning in various processes of sustainable aquaculture.

Figure 4. Examples of deep learning-based fish counting applications: (a) fish counting based on density maps [87]; (b) fish counting based on object detection [89].

Figure 5. Our deep learning-based fish analysis research: (a) detection results using the YOLOv8n-based fish counting model; (b) network architecture of the improved Mamba-based fish tracking framework [70].

Figure 6. Workflow of deep learning-based fish tracking.

Figure 7. Samples of deep learning-based applications in multiple fish tracking and behavior analysis: (a) effect of FSTA algorithm based underwater fish school tracking method [98]; (b) results of fish behavior detection of the YOLOv3-Lite model and other models [99].

Figure 8. (a) Effects of fish diseases detection of the DCW-YOLO network [112]. (b) Visualization results of keypoint detection in zebrafish larvae, from top to bottom: ground truth, simpleBaseline, SENet, HRNet, ConvNeXt, and the proposed CSHT-Net. Key anatomical landmarks are marked in blue, and hard-to-locate points are highlighted in red in the ground truth [113].

Figure 9. Detection results of the model based on Joint-IoU versus IoU on test samples. Blue boxes indicate ground truth; red boxes denote predicted detections [122].

Figure 10. The structure of the IPSO-CNN-GRU-TAM model [131].

Figure 11. Comparisons between predicted and actual values for various water quality indicators [132].

Figure 12. Overall schematic of the BCS-YOLOv5-based fish abnormal behavior detection network [104].

Figure 13. Behavior trajectories of sea cucumbers [139]. (a) Trajectory under normal conditions; (b) trajectory of sea cucumbers under 6 CFU/ml Vibrio alginolyticus exposure; (c) trajectory under 9 CFU/mL exposure.

Table 1. Advantages and application scenarios of common deep learning models.

Models	Advantages	Typical Application Scenarios
CNN	Strong local feature extraction; excels in image processing	Object detection, image classification, image segmentation
RNN	Strong contextual understanding; effective for sequential data	Time series forecasting, speech recognition
LSTM	Mitigates gradient vanishing/exploding; captures long-term dependencies	Machine translation, complex sequence modeling
GAN	Strong in unsupervised learning; capable of generating high-quality data	Image generation, data augmentation, style transfer
Transformer	Capable of modeling global dependencies; faster training speed	Natural language processing, video understanding
Mamba	Excels in long-sequence modeling; high inference efficiency	Long text processing, time series modeling
MobileNet	Lightweight; low computational cost, suitable for mobile deployment	Embedded vision detection, real-time object detection

Table 2. Representative public datasets for fish image and video analysis.

Dataset	Content	URL
Fish4Knowledge	700,000 videos, over 100 h	Fish4Knowledge: https://homepages.inf.ed.ac.uk/rbf/Fish4Knowledge/resources.htm (accessed on 6 May 2025)
LifeCLEF	Thousands of high-resolution static fish images	ImageCLEF: https://www.imageclef.org (accessed on 6 May 2025)
NCFM	3777 fish images	Kaggle-NCFM: https://www.kaggle.com/c/the-nature-conservancy-fisheries-monitoring (accessed on 6 May 2025)
FRGT	10,000 high-resolution underwater images	FRGT: https://homepages.inf.ed.ac.uk/rbf/Fish4Knowledge/GROUNDTRUTH/RECOG (accessed on 6 May 2025)

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wu, A.-Q.; Li, K.-L.; Song, Z.-Y.; Lou, X.; Hu, P.; Yang, W.; Wang, R.-F. Deep Learning for Sustainable Aquaculture: Opportunities and Challenges. Sustainability 2025, 17, 5084. https://doi.org/10.3390/su17115084

AMA Style

Wu A-Q, Li K-L, Song Z-Y, Lou X, Hu P, Yang W, Wang R-F. Deep Learning for Sustainable Aquaculture: Opportunities and Challenges. Sustainability. 2025; 17(11):5084. https://doi.org/10.3390/su17115084

Chicago/Turabian Style

Wu, An-Qi, Ke-Lei Li, Zi-Yu Song, Xiuhua Lou, Pingfan Hu, Weijun Yang, and Rui-Feng Wang. 2025. "Deep Learning for Sustainable Aquaculture: Opportunities and Challenges" Sustainability 17, no. 11: 5084. https://doi.org/10.3390/su17115084

APA Style

Wu, A.-Q., Li, K.-L., Song, Z.-Y., Lou, X., Hu, P., Yang, W., & Wang, R.-F. (2025). Deep Learning for Sustainable Aquaculture: Opportunities and Challenges. Sustainability, 17(11), 5084. https://doi.org/10.3390/su17115084

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Deep Learning for Sustainable Aquaculture: Opportunities and Challenges

Abstract

1. Introduction

2. Typical Deep Learning Models and Their Suitability of Aquaculture

3. Main Applications of Deep Learning in Aquaculture

3.1. Fish Detection, Identification, and Counting

3.1.1. Recognition Based on Video or Image Data

3.1.2. Multi-Fish Tracking and Behavior Analysis

3.2. Growth Prediction and Health Monitoring

3.3. Intelligent Feeding Systems

3.4. Water Quality Monitoring and Prediction

3.5. Behavioral Recognition and Stress Analysis

4. Datasets, Data Augmentation, and Model Training Strategies

4.1. Public Aquaculture and Underwater Vision Datasets

4.2. Data Acquisition and Augmentation Methods

4.3. Typical Training and Transfer Learning Pipelines

5. Discussions

5.1. Current Challenges

5.2. Future Directions and Perspectives

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI