1. Introduction
In recent years, with the advancement and innovation of aquaculture technology, the output of aquaculture has been continuously increasing, resulting in a growing market share of farmed aquatic products. In aquaculture, feed costs account for more than 60% of the total production cost [
1,
2]. Therefore, realizing precision feeding is significant not only for reducing aquaculture costs and improving production efficiency, but also for minimizing feeding waste, maintaining water quality and promoting sustainable aquaculture. At present, feeding practices in aquaculture mainly rely on manual experience-based feeding or machine-based feeding with fixed timing and quantities. These approaches are unable to dynamically adjust feeding strategies according to fish feeding intensity, which easily leads to underfeeding or overfeeding [
3]. When the physiological condition, nutritional requirements and environmental factors of fish change, their feeding demands also vary accordingly [
4]. A decrease in feeding demands is typically accompanied by a corresponding reduction in feed intake. Consequently, feeding intensity can reflect fish appetite and satiety status, and its accurate assessment is crucial for dynamically adjusting feeding strategies in precision aquaculture systems.
Computer vision has become an important research tool in fisheries production due to its non-invasive, cost-effectiveness and high efficiency [
5,
6]. Computer vision-based methods for assessing fish feeding intensity generally involve image pre-processing and segmentation to obtain fish targets, followed by feature extraction and the construction of feeding intensity assessment models, thereby enabling the assessment of fish feeding intensity. At present, these methods can be categorized into machine learning-based approaches and deep learning-based approaches according to the different image features extracted.
Machine learning-based methods typically adopt a paradigm of feature engineering combined with classical classifiers. First, a set of handcrafted features is extracted from fish audio or visual streams using signal processing or statistical methods. Subsequently, machine learning models such as Support Vector Machine (SVM), Random Forest (RF), and Gradient Boosting Decision Tree (GBDT) are employed to classify and discriminate feeding intensity levels. Zhang et al. [
7] extracted texture features, including inverse difference moment and correlation, from fish school images using mean background modeling and gray-level co-occurrence matrices. A Back-Propagation (BP) neural network was then used to construct a fish feeding intensity recognition model. Chen et al. [
8] plotted fish swimming trajectories based on the centroid of fish schools and extracted color, shape, and texture features from the images. The eXtreme Gradient Boosting (XGBoost) algorithm was applied to select feeding evaluation factors, and optimal weights were determined using a weighted fusion strategy. On this basis, a fish feeding intensity assessment model was constructed using the fused features. Yuan and Zhu [
9] employed color channel separation techniques to extract statistical features such as mean, variance, skewness and kurtosis from each channel. These features were subsequently fused using Kernel Principal Component Analysis (KPCA), and a fish feeding behavior detection model was finally established based on SVM. However, most machine learning-based methods rely on indirect evaluation using features such as texture, color, and statistical descriptors. These features are often insufficient to directly describe fish spatial distribution, posture changes, inter-individual distance, and aggregation degree [
10,
11]. As a result, the accuracy and reliability of the evaluation outcomes remain limited.
Deep learning-based methods for fish feeding intensity assessment typically adopt an end-to-end feature learning paradigm, which enables the automatic extraction of semantic features directly from images, videos, or audio signals to evaluate fish feeding intensity. For instance, Liu et al. [
12] conducted a feeding intensity assessment based on fish feeding image data. In this method, the Coordinate Attention mechanism was integrated into a MobileViT backbone network to construct a fish school feeding intensity recognition model. Wang et al. [
13] evaluated fish feeding intensity using RGB video and optical flow videos generated by FlowNet2. A dual-stream 3D convolutional neural network was employed to establish a fish behavior recognition model, which can accurately identify fish feeding behaviors. Zhang et al. [
14] performed fish feeding intensity assessment based on multimodal features, including video, audio and water surface wave data. They proposed a multilevel enhanced multimodal interaction network (MAINet) to construct a quantitative model for fish feeding intensity, achieving precise classification and quantitative evaluation of feeding intensity levels. More recently, Iqbal et al. [
15] proposed LightHybridNet-Transformer-FFIA, a hybrid Transformer-based deep learning model for enhanced fish feeding intensity classification, which further improved the ability of deep models to extract feeding-related features. Wang et al. [
16] developed a dual-stream spatiotemporal fusion method for fish school feeding intensity identification, in which spatial and temporal information were jointly utilized to enhance feeding intensity recognition. Deep learning models exhibit strong predictive abilities; however, as black-box models, the decision-making process often lacks transparency and interpretability. This limitation leads to reduced model credibility and restricts their further development and application in practical aquaculture scenarios [
17,
18].
Fish typically exhibit distinct behavioral features under different feeding intensities [
19,
20]. However, existing methods for assessing fish feeding intensity mainly rely on the extraction of superficial features, such as texture, color or statistical descriptors, to indirectly assess feeding intensity [
21,
22]. These approaches struggle to capture behavioral differences exhibited by fish at varying feeding intensities but fail to reflect the subtle and dynamic attenuation of satiety levels. Consequently, the reliability of fish feeding intensity assessment results remains limited. In contrast, spatial features have significant advantages, which can directly describe fish behavioral and accurately capture subtle behavioral changes during fish feeding. For example, as feeding intensity increases, fish schools tend to exhibit greater overall upward movement, reduced inter-individual distance and a higher degree of aggregation. Accordingly, spatial features highly correlated with feeding behavior are explored from the underlying mechanism of fish behavioral changes, and an interpretable model is constructed to assess fish feeding intensity directly and accurately in this study.
To address the limitations of insufficient behavioral feature representation and poor model interpretability in existing methods, a method for fish feeding intensity assessment based on spatial features and the TabNet-DFWL model is proposed in this study. Firstly, fish images are acquired from a lateral viewing angle. Subsequently, a series of image processing techniques are applied to obtain clear fish body contours, including image segmentation, image enhancement, image binarization and contour extraction. Then, fish spatial features that are highly correlated with fish feeding behavior are proposed. On this basis, the DFWL is introduced to optimize the TabNet to construct an interpretable model, enabling precise assessment of fish feeding intensity. The main contributions of this study are summarized as follows:
(1) A series of spatial features highly correlated with feeding behavior is proposed from the underlying mechanism of fish behavioral changes, such as inter-individual distance, fish posture, and aggregation degree. These features can accurately reflect behavioral changes during the feeding process, enhance the effectiveness of feature representation, and improve the reliability of feeding intensity assessment.
(2) A dynamic feature weighting layer is proposed to automatically strengthen the weights of key features within the TabNet model. This mechanism addresses the degradation in assessment performance caused by the implicit expression of fish satiety and improves the model performance when processing fish feeding data.
(3) A fish feeding intensity assessment method based on spatial features and the TabNet-DFWL network is proposed. This method avoids the black-box risk of conventional deep learning models and enhances model interpretability and credibility, thereby providing a reliable basis for precision feeding in aquaculture.
The rest of this paper is organized as follows. In
Section 2, the complete workflow of the proposed method is presented. The experiments and corresponding results are reported in
Section 3. Finally, the conclusions are provided in
Section 4.
2. Materials and Methods
2.1. Data Acquisition
The experiments were conducted in the Xuexing Building at the Yangzijin Campus of Yangzhou University in Jiangsu Province. Juvenile Asian crucian carp were selected as the experimental subjects. The fish’s body lengths ranged from 15 to 25 cm and were raised in a fish tank with a water depth of approximately 40 cm. Before the experiment began, the fish had been living in the tank for more than one month and had fully adapted to the environment.
The experimental data acquisition platform is illustrated in
Figure 1. It consists of a raising tank (90 cm × 65 cm × 70 cm), an oxygenation equipment (Sensen Group Co., Ltd., Zhoushan, China), a lighting system (Chihiros Aquatic Technology Co., Ltd., Guangzhou, China), a ZED 2i camera (Stereolabs SAS, San Francisco, CA, USA), and a computer (Hasee Computer Co., Ltd., Shenzhen, China). The camera was fixed at the center of the tank’s side wall, 50 cm from the wall and 20 cm above the ground. It was configured to record videos at a resolution of 2208 × 1242 pixels and a frame rate of 30 frames per second. During data collection, a lateral viewing angle was adopted to record the entire feeding process to ensure class balance among images corresponding to different feeding intensity levels in the dataset.
The feeding scheme was implemented using fixed timing, fixed location and fixed quantity. Feeding was conducted twice daily, at 09:00 and 17:00, respectively, with 60 pellets provided at each time from directly above the center of the fish tank. To ensure that the recorded videos contained complete information on fish feeding behavior, each video acquisition was recorded for 120–150 s, including 20 s before feeding, the entire feeding process and 20 s after feeding. The video recordings started on 25 October 2023 and ended on 13 December 2023, covering a total observation period of 50 days. During the data collection process, some data were affected by a sudden drop in temperature, human activities around the experimental area, and noise interference, which led to abnormal feeding behavior of the fish. Therefore, the recordings affected by these factors were excluded. Finally, valid fish feeding data from 39 days were retained for subsequent analysis.
To evaluate the effectiveness of the proposed method, one image was extracted every 15 frames from the recorded videos. Data cleaning was conducted to remove severely blurred or occluded samples, while no data augmentation was applied. Only the original collected images were used to preserve real-world conditions. After this preprocessing, a total of 9503 valid images were finally obtained to construct the dataset. Some representative fish feeding samples are shown in
Figure 2. The dataset was divided into a training set, a validation set and a test set using a random seed (42). The training set and validation set together accounted for 75% of the total dataset, while the test set accounted for 25%. The ratio between the training set and the validation set was set to 4:1. Detailed information regarding dataset partitioning is provided in
Table 1.
Fish images were classified into four standard feeding intensity levels [
23], with the classification criteria presented in
Table 2. Fish images at each feeding intensity are shown in
Figure 2. Under the none state, fish were observed to aggregate in the lower and middle layers of the water column, with slight overlapping among individuals, and no response was exhibited towards the supplied feed. In the weak state, only nearby food was consumed, and large-scale swimming feeding behavior was not observed. Minor overlapping among individuals was present. In the medium state, fish exhibited relatively large swimming amplitudes, and most individuals presented a distinct inclined posture. In the strong state, all individuals were observed to float to the water surface for feeding. Intense competition for food was evident, accompanied by large-scale overlapping and inclination among fish individuals.
2.2. Overall Process
A method for fish feeding intensity assessment based on spatial features and TabNet-DFWL is proposed in this study, which enhances the interpretability and credibility of feeding intensity assessment and enables precise evaluation of fish feeding intensity. The detailed workflow of the proposed method is illustrated in
Figure 3.
(1) Data acquisition. The camera was fixed at a distance of 50 cm from the side of the raising tank to record fish feeding videos. Video frames were extracted to obtain fish images, and a dataset was constructed accordingly.
(2) Image pre-processing. Target fish bodies were extracted from the images through image segmentation. Subsequently, a series of image processing operations, including image enhancement, image binarization and contour extraction, were performed to obtain clear fish body contours.
(3) Fish spatial feature extraction. Based on the extracted fish contours, centroid points of fish bodies were calculated. Then, spatial features were extracted from both individual and group perspectives.
(4) Fish feeding intensity assessment. A TabNet-DFWL model was constructed by introducing the DFWL. The mapping relationship between fish spatial features and feeding intensity was established, thereby enabling the assessment of fish feeding intensity.
2.3. Image Pre-Processing
Image pre-processing is regarded as a critical step in computer vision and image processing, aiming to improve image quality, reduce noise and enhance features, thereby providing high-quality data input for subsequent analysis and processing. However, images acquired in underwater environments are often affected by uneven illumination, distortion, and motion blur caused by water fluctuations. These issues tend to result in inaccurate feature extraction and consequently degrade the performance of feeding intensity assessment. To address these problems and improve the accuracy of image feature extraction, four pre-processing operations, including image segmentation, image enhancement, image binarization, and contour extraction, were performed in this study.
(1) Image segmentation. Image segmentation is defined as the process of dividing a digital image into multiple regions to enable image understanding and target extraction. In this study, an image differencing method was employed to segment the images and extract target fish bodies. Image differencing involves the subtraction of two images. Detection targets are obtained by calculating the differences between images, with the purpose of highlighting the regions with significant changes. The mathematical formulation of this operation is expressed as follows:
where
D(
x,
y) denotes the differencing result at position (
x,
y),
I(
x,
y) represents the pixel value of the fish image at position (
x,
y),
B(
x,
y) indicates the pixel value of the background image at position (
x,
y).
(2) Image enhancement. Image enhancement is defined as the process of processing image features, such as edges, contours, and contrast, through image processing techniques to improve image clarity or highlight useful information. Due to the presence of blurred target edges and significant noise caused by feces and feed residues in the differenced images, the effectiveness of image feature extraction can be easily affected. Therefore, the visual quality of images needs to be optimized through image enhancement techniques. In this study, image enhancement was achieved by applying linear transformation and histogram equalization. Linear transformation significantly improves image contrast and brightness, whereas histogram equalization increases the dynamic range of gray-level differences among pixels, thereby enhancing image clarity. After these image processing procedures, target regions and noise can be effectively distinguished in the images. The mathematical formulation of the linear transformation is given as follows:
where
Iin(
x,
y) denotes the pixel value of the input image at position (
x,
y),
Iout(
x,
y) represents the pixel value of the output image at position (
x,
y),
α and
β are defined as the gain coefficient and the bias term, respectively.
The histogram equalization formula is given in Equation (3).
In Equation (3), rk denotes the k-th gray level of the input image, sk represents the corresponding gray level of the output image, T(rk) is defined as the gray-level transformation function, nj indicates the number of pixels with gray level rj, N denotes the total number of pixels in the image, and L represents the total number of gray levels in the image.
(3) Image binarization. Image binarization is defined as the process of converting a greyscale image into a binary image containing only black and white values. This operation facilitates the extraction of specific features and details of targets in the image, making them more prominent and easier to detect. In this study, image binarization was performed using an adaptive thresholding method. The calculation formula for the threshold is expressed as follows:
where
T(
x,
y) denotes the adaptive threshold at position (
x,
y),
μ(
x,
y) represents the mean pixel value within the local neighborhood centered at (
x,
y), and
C is defined as a constant offset. The binarization formula is expressed as follows:
where
P(
x,
y) denotes the value of the binary mask at position (
x,
y), and
I(
x,
y) represents the pixel value of the input image at position (
x,
y).
(4) Contour extraction. After the binary image was obtained, although the target region had been preliminarily segmented, it still lacked structured information describing the target shape. To accurately extract spatial features that can be used for behavior analysis, the contour detection function findContours() was applied to detect fish contours in the image. Subsequently, the drawContours() function was used to visualize the detected contours, thereby achieving the extraction of fish target contours.
2.4. Fish Spatial Feature Extraction
Feature extraction is defined as the process of extracting representative information from an image to describe target objects. According to differences in research objects and objectives, corresponding features are selected and extracted to meet specific research requirements.
However, previous methods have mainly focused on local texture features or global statistical features in images. These features are unable to capture behavioral variations during the fish feeding process and fail to reflect the true characteristics of feeding behavior, which may lead to limited accuracy and reliability of assessment results. To address this issue, the spatial features of fish were extracted in this study to assess feeding intensity. Fish spatial features can be divided into individual features and group features. Individual features indicate the feeding motivation of each fish, whereas group features represent the overall behavioral tendency of the fish school from a global perspective. Such behavioral tendencies reflect the current feeding demand of the fish school. Therefore, spatial features were extracted from both individual and group perspectives to achieve an accurate assessment of fish feeding intensity in this study.
When extracting features that require determining the positional information of individuals, the centroid points of fishes were used to represent individual fish. For features directly related to the area, connected component information was adopted to characterize individuals. Since the target pixel regions are discrete in images, the centroid of the fish can be calculated using first-order moments. For a target region, its moments can be expressed as follows:
where
mpq denotes the (
p +
q)-th order moment of the image,
p and
q are non-negative integers.
i and
j represent the pixel indices of the image:
where
m00 denotes the zeroth-order moment of the image, which represents the area of the target region; and
m10 and
m01 are defined as the first-order moments of the image, corresponding to the sum of the products of the horizontal coordinates and their pixel values, and the sum of the products of the vertical coordinates and their pixel values, respectively.
2.4.1. Individual Feature Extraction
The individual features adopted in this study mainly include the average distance from individuals to the water surface, the average inter-individual distance, fish tilt, and the relationship between fish and feeding point. The extraction methods and calculation formulas for each feature are described as follows.
(1) Average distance from individuals to the water surface
In aquaculture, the throwing feeding method restricts fish to feeding by surfacing. During the feeding process, frequent upward and downward movements are observed as fish compete for dispersed feed. This phenomenon is closely related to the feeding demand of the fish. Therefore, the average distance from individuals to the water surface is regarded as an important indicator for measuring feeding intensity in aquaculture. However, this feature is difficult to obtain under a traditional top-view perspective. Consequently, a lateral viewing angle was adopted in this experiment to acquire this feature for the quantitative assessment of fish feeding intensity. The calculation formula for this feature is given as follows:
where
ADS represents the average distance from individuals to the water surface,
yi denotes the distance from the
i-th individual to the water surface, and
N represents the number of fish bodies in the image.
(2) Average inter-individual distance
The average inter-individual distance is defined as the mean value of the distances among multiple individuals. When fish feed, the demand for feed intake drives fish schools to swim towards the feeding area, resulting in large-scale aggregation. In general, as feeding intensity increases, the average inter-individual distance tends to decrease. Therefore, the average inter-individual distance was selected in this study as one of the indicators for assessing the feeding intensity. The calculation method for this feature is given in Equation (9).
In Equation (9), AID represents the average inter-individual distance, dij denotes the distance between the i-th individual and the j-th individual, and N represents the number of fish bodies in the image.
(3) Fish tilt
The fish tilt is defined as the characteristic body orientation exhibited by fish during swimming. During feeding, fish are observed to raise their heads to feed near the water surface while their bodies remain submerged. As a result, fish tend to present a tilted posture when feeding. This posture differs from the horizontal swimming posture observed under none conditions and is considered an important indicator for distinguishing feeding states of fish schools [
24,
25]. Therefore, this feature can be applied to feeding intensity assessment. In this study, the slope of the straight line corresponding to the fish body was selected to describe fish tilt. The straight line was obtained by connecting pixel points at the fish head and tail, and the slope of this line was calculated to characterize fish tilt. The calculation formula for the fish tilt is given in Equation (10).
In Equation (10), FT represents the fish tilt, (x1, y1) represents the coordinates of the fish head, and (x2, y2) represents the coordinates of the fish tail.
(4) Relationship between fish and feeding point
Under feeding conditions, fish schools are observed to raise their heads and rapidly swim towards the feeding point. This movement pattern causes most individuals in the school to approach the feeding point, rise from positions below the feeding point, and then move upward to the water surface for feeding [
26]. Based on this behavioral feature, a formula was developed in this study to describe the relationship between the distance of individuals to the feeding point and their inclination, as follows:
where
RFPR represents the relationship between fish and feeding point,
disti represents the distance between the
i-th individual and the feeding point,
ri denotes the radian value of the inclination angle of the line connecting the
i-th individual and the feeding point, and
N represents the number of fish bodies in the image.
2.4.2. Group Feature Extraction
The fish in the image are divided into two categories. If an individual is found to be attached to or obstructed by other individuals, these individuals are defined as aggregated individuals, and the corresponding region is considered an aggregation area. Otherwise, they are classified as non-aggregated individuals, and the corresponding area is regarded as a non-aggregation area. Based on this, the proportion of the aggregation area and the proportion of aggregated individuals were extracted. Additionally, group dispersity of fish was proposed to describe the dispersal state of the fish school.
(1) Proportion of the aggregation area
The proportion of the aggregation area is defined as the proportion of the area of the aggregation region to the total fish area in the image. When feeding, fish tend to exhibit aggregation and occlusion due to competition for food. Generally, the stronger the feeding intensity, the larger the aggregation area and its proportion. Therefore, the proportion of the aggregation area was extracted to assess the fish feeding intensity [
27]. The calculation method for the proportion of the aggregation area is as follows:
where
PAA represents the proportion of the aggregation area,
Areaa denotes the area of the aggregation region, and
Area represents the total fish area in the image.
(2) Proportion of aggregated individuals
The proportion of aggregated individuals is defined as the proportion of the number of aggregated individuals to the total number of fish in the image. The proportion of aggregated individuals serves as another indicator for measuring the degree of aggregation in fish schools. Unlike the proportion of the aggregation area (PAA), it can reduce assessment errors that may arise when measuring area. By counting the number of aggregated and non-aggregated individuals, the degree of fish aggregation is calculated, and the proportion of aggregated individuals is obtained as one of the indicators for assessing fish feeding intensity. The calculation formula for the proportion of aggregated individuals is as follows:
where
PAI represents the proportion of aggregated individuals,
m denotes the number of aggregated individuals, and
N represents the number of fish bodies in the image.
(3) Group dispersity
Group dispersity is defined as a group spatial feature used to describe the dispersal state of fish schools. The centroids of the fish in the image are treated as a set of points to construct a Delaunay triangulation. After the triangulation, the average perimeter of the triangles in the Delaunay network is taken as the measure of group dispersity to describe the dispersal state of the fish school. Under none conditions, fish schools tend to swim within the same horizontal range without aggregation. However, when feed is introduced, the fish school exhibits aggregation due to competition for food, resulting in aggregation of varying scales. As a result, the triangles in the Delaunay triangulation become smaller. In the Delaunay triangulation, the size of each triangle reflects the distance between the vertices, which indicates the dispersal degree of the fish school. Therefore, the group dispersity can be used as an evaluation indicator of fish feeding behavior. The calculation formula for it is as follows:
where
GD represents the group dispersity,
t denotes the number of Delaunay triangles,
Li represents the perimeter of the
i-th triangle, and
Li1,
Li2 and
Li3 denote the lengths of the three sides of the triangle. Generally, the smaller the value of
GD, the lower the dispersal degree, and the stronger the corresponding feeding intensity. A partial display of the Delaunay triangulation is shown in
Figure 4.
Figure 4a shows the triangulation results of the fish image under feeding conditions. The fish school exhibits large-scale aggregation and mutual occlusion. In this case, group dispersity is 461, indicating a low dispersity and suggesting that the fish school is in a feeding state with relatively high feeding intensity.
Figure 4b shows the triangulation results under unfeeding conditions. The fish school mainly swims near the bottom, with a group dispersity of 943, indicating a higher dispersity and a lower feeding desire.
2.5. Fish Feeding Intensity Assessment Model Based on TabNet-DFWL
Based on the acquired spatial features of fish, such as distance, posture and aggregation degree, a classification model is constructed to assess fish feeding intensity. TabNet is a deep learning model specifically designed for tabular data, combining the advantages of both tree models and neural networks. It is known for its superior classification performance and strong interpretability [
28]. Therefore, this model was selected in this study as the assessment model for assessing fish feeding intensity. However, since fish do not exhibit a clear “stop eating” behavior like mammals, their most significant behavioral change as they become satiated is reflected in the subtle attenuation of a series of key behavioral features, such as gradually moving away from the water surface, posture returning to horizontal, and a decrease in aggregation. These changes are very subtle, and their expression is relatively implicit, making it difficult to define their range. As a result, the model’s learning and decision-making may be influenced by these features, leading to potential assessment errors.
To address the issues, TabNet-DFWL was proposed in this study by introducing the DFWL. The TabNet-DFWL is able to perform adaptive feature weighting based on context, as well as identify and strengthen key features with strong discriminative power during training automatically. It solves the problem of model mis-assessment caused by the weak signals of satiety-related features and easy submergence in fish feeding intensity scenarios, as well as achieves high sensitivity in capturing subtle feature changes in fish feeding behavior and improves the model performance in the fish feeding intensity assessment task. The network structure of TabNet-DFWL is shown in
Figure 5.
The DFWL proposed in this study is a self-learning mechanism that can dynamically adjust feature weights based on feature importance. Its core consists of a learnable parameter vector with the same dimensionality as the input features. This vector is constrained by the sigmoid function to lie within the range of (0, 1), representing the importance weight of each feature. For a given input feature matrix
(where B represents the batch size and D represents the feature dimension), the output of the DFWL is calculated by element-wise multiplication, the formula is as follows:
where ⨂ denotes element-wise multiplication,
w represents the weight vector,
σ is defined as the Sigmoid function
σ(
x) = 1/(1 +
e−x),
W1 and
b1 are defined as the weights and biases of the fully connected layer,
GAP(
X) denotes global average pooling, and
δ represents the ReLU activation function.
In the TabNet-DFWL network, fish spatial features are first processed through the DFWL, where initial weights are assigned to each feature based on its importance, and the initial weight vector is output. Next, the initial weight vector, along with the fish spatial features, is passed into the attentive transformer, and then the attentive transformer generates an attention mask based on feature importance, filtering out the most useful subset of fish spatial features for the current step. Then, the selected subset of fish spatial features is processed through the feature transformer, where deep abstract features of the fish spatial features are extracted to support decision-making. Finally, the output of the feature transformer is split by the split layer into two parts: one part is used to output the fish feeding intensity assessment result, and the other part serves as the input for feature selection in the next step. After multiple rounds of training, the assessment of fish feeding intensity is completed by aggregating the outputs of each step.
2.6. Experimental Setup
To ensure fair and reproducible comparisons between the proposed method and the baseline models, this subsection details the training methodology, hyperparameter settings, and implementation specifics for each comparative model. All baselines were evaluated under identical data splits, preprocessing steps, and evaluation metrics as the proposed method.
2.6.1. Hardware and Software Environment
All experiments were conducted on a workstation with an Intel Core i7-12650H CPU (16 cores, 2.3 GHz), 32 GB DDR5 RAM, and an NVIDIA GeForce RTX 4060 GPU (8 GB VRAM). The software environment included Windows 11, Python 3.9, PyTorch 2.6.0 with CUDA 12.4 and OpenCV 4.7, and scikit-learn 1.6.1.
2.6.2. Comparison Experiment Setup
The proposed method and all comparison models were implemented using the same software and hardware platform as described in
Section 2.6.1.
To ensure a fair comparison, the proposed method and all comparison models were provided with exactly the same input features, namely the normalized fish spatial features, and underwent the same data cleaning preprocessing procedure to remove the missing values before being input to any model. The models compared include XGBoost [
29], Light Gradient Boosting Machine (LGBM) [
30], Random Forest (RF) [
31], Multi-Layer Perceptron (MLP) [
32], Decision Tree (DT) [
33], and 1D Convolutional Neural Network (1D-CNN) [
34]. The main parameter settings used in the experiment are listed in
Table 3.
The parameter settings of the comparison models are strictly executed according to
Table 3, and default values are used for parameters not mentioned in the table. In addition, an early stopping strategy was adopted to prevent overfitting. The patience value is set to 10, which means that if no improvement is observed for 10 consecutive epochs, the training will stop. The system will automatically save the current optimal weight and terminate the training process when the fluctuation amplitude of the indicator is less than +0.15% for 10 consecutive epochs.
2.7. Evaluation Metrics
To evaluate the performance of the proposed TabNet-DFWL, accuracy, precision, recall, specificity and F1-Score are used to assess the model’s performance. The definitions of these evaluation metrics are given as follows:
where
TP,
TN,
FP, and
FN represent true positives, true negatives, false positives, and false negatives, respectively.
4. Conclusions and Future Work
To address the issue of unreliable and untrustworthy results in existing fish feeding intensity assessments, a method for fish feeding intensity assessment based on spatial features and the TabNet-DFWL is proposed in this study, enabling the accurate assessment of fish feeding intensity. This method has significant implications for achieving precise feeding in the aquaculture industry, promoting healthy fish growth, reducing farming costs, and improving productivity. A series of spatial features highly correlated with feeding behavior is proposed, which can enhance the effectiveness of feature representation and improve the reliability of feeding intensity assessment. The DFWL is proposed to automatically strengthen the weights of key features within the TabNet model, addressing the degradation in assessment performance caused by the implicit expression of fish satiety and improving the model performance when processing fish feeding data. A fish feeding intensity assessment method based on the TabNet-DFWL network is proposed to avoid the black-box risk of conventional deep learning models, enhance model interpretability and credibility, and provide a reliable basis for precision feeding in aquaculture.
The method proposed in this study was tested on a real fish feeding dataset and achieved promising experimental results. The assessment accuracy was 95.96%, with an average precision of 93.44%, average recall of 93.33%, average specificity of 98.15%, and average F1-score of 93.38%. Compared with the algorithms XGBoost, LGBM, RF, MLP, DT, and 1D-CNN, the assessment accuracy was increased by 19.37%, 13.64%, 11.06%, 7.91%, 5.46%, and 30.45%, respectively, demonstrating that the proposed method can achieve accurate fish feeding intensity assessment.
Although the proposed method has achieved promising results, it still has some limitations. For example, fish actually move in three-dimensional space, but current methods mainly extract two-dimensional information from side-view images. This may limit the completeness of the description of fish spatial behavior, especially in terms of vertical movement and depth-related information. Therefore, in future work, we plan to introduce three-dimensional spatial features to obtain more comprehensive spatial information of fish during feeding. By incorporating depth information, the spatial behavior of fish schools can be described more accurately, which is expected to further improve the accuracy and reliability of fish feeding intensity assessment.