1. Introduction
Aquaculture has become one of the fastest-growing food production sectors worldwide and plays a crucial role in ensuring global food security, nutritional sustainability, and economic development [
1]. International fisheries reports indicate that aquaculture now provides over half of the fish consumed around the world, an important part of modern food production systems. Freshwater aquaculture ponds, especially, are widely used because of their relatively low operating cost and high production efficiency [
2,
3]. However, ensuring appropriate water quality conditions is one of the most significant problems impacting fish health, growth performance, feed conversion efficiency and overall farm productivity.
The water quality directly affects the biological, chemical, and ecological processes in aquaculture systems. The key physicochemical parameters include temperature, pH, dissolved oxygen, electrical conductivity, turbidity, nutrient concentration, nitrate concentration, and ammonia concentration, which greatly influence fish metabolism and survival. Changes in these parameters can cause physiological stress, lower growth rates, increased disease susceptibility and ultimately mass mortality events [
4,
5]. Therefore, the continuous observation and precise evaluation of water quality are crucial to sustainable aquaculture operation and environmental conservation. Traditionally, water quality assessment was based on manual sampling, laboratory chemical analysis and expert-derived threshold systems. These methods can be accurate, but are often time-consuming, labor-intensive, costly and not suitable for continuous monitoring. In addition, threshold-based assessment frameworks often do not reflect the highly nonlinear interactions between multiple water quality variables [
6,
7]. Traditional monitoring methods have significant limitations in terms of rapid and adaptive decision support as aquaculture is made more and more intensive and environmentally dynamic.
The implementation of smart aquaculture frameworks has been boosted by recent innovations in sensing technology, wireless communication systems, and data analytics. The Internet of Things (IoT) devices, low-cost environmental sensors, and cloud-based monitoring platforms allow for the acquisition of vast amounts of water quality data in real time [
8]. At the same time, machine learning methods have become powerful tools for interpreting and analyzing complex environmental datasets and uncovering hidden relationships between water quality variables. Various supervised learning algorithms, including logistic regression (LR), decision trees (DTs), random forests (RFs), support vector machines (SVMs), artificial neural networks (ANNs), and gradient boosting (XGBoost) models, have demonstrated promising performance in water quality prediction and classification tasks. However, these developments are accompanied by several practical challenges of using conventional machine learning techniques in real-world aquaculture applications. Most supervised learning algorithms need significant amounts of labeled training data to perform reliably in terms of generalization. However, large, high-quality, consistently labeled water quality datasets are difficult to obtain because of the cost of deploying sensors, failure to gather the data, seasonal variability, environmental heterogeneity, missing observations, and the labor-intensive nature of labeling procedures. In many real aquaculture situations, especially in new aquaculture operations, remote locations, and areas of development, only a few representative samples might be available for use in model development. In these scenarios, common machine learning models may suffer from overfitting, performance degradation, and a lack of adaptability [
9,
10].
This study does not attempt to solve the problem of the size of the benchmark dataset but the problem of the scenario in which only a few labeled samples are available during deployment. The full dataset consists of around 4300 labeled instances, but the experimental design deliberately mimics the realistic low-data adaptation setting by employing episodic K-shot evaluation settings. This method allows systematic analysis of model performance for classification decisions based on a small set of illustrative examples per class, thus more representative of practical field deployment scenarios than of the standard supervised learning settings with large numbers of examples per class. In the field of artificial intelligence, few-shot learning (FSL) has come to the forefront as an effective solution to overcome data scarcity issues. Unlike supervised learning, where the class boundaries are learned from large amounts of data, FSL is designed to learn transferable representations that can be quickly adapted to a new task with just a few labeled examples. Many FSL approaches, specifically prototypical networks, Matching Networks, and Siamese Networks, are based upon the theoretical framework of meta-learning, which is sometimes called “learning to learn.” In training, models are given many micro learning tasks so they can learn the knowledge that can be widely applied to tasks they have never seen before. In this paper, prototypical networks are of particular interest because of their conceptual simplicity, computational efficiency, and strong empirical performance among various FSL approaches. Prototypical networks are a type of metric-based meta-learning. They do not learn decision boundaries directly, but they learn an embedding function that transforms the samples into a new space, or latent feature space, where the samples of the same class are concentrated together, and the samples of different classes are well spread in the space. A prototype vector for each class is the mean embedding of support samples of that class. The classification is then done by comparing the distance between the query samples and the prototypes of the classes. This mechanism offers a very interpretable decision-making process and can generally show better robustness when data is limited. In addition, Siamese Networks learn a pairwise similarity function using contrastive learning, enabling flexible similarity modeling but often exhibiting sensitivity under extremely low-shot scenarios. Matching Networks also employ an attention-based mechanism over the support set, providing adaptive nearest-neighbor classification but with higher computational complexity and sensitivity to support set variation. By comparing these three representative architectures under identical episodic K-shot settings, this study provides a systematic and fair evaluation of metric-based FSL strategies in aquaculture water quality classification [
11,
12,
13].
There have been several successful applications of prototypical networks and other metric-based meta-learning in computer vision, medical diagnosis, remote sensing, fault detection, and biological classification tasks. Nevertheless, research investigating their application to aquaculture water quality classification remains limited. The effectiveness of few-shot adaptation strategies in aquaculture monitoring systems has not been extensively explored, and most current studies remain focused on traditional machine learning strategies and performance measures across a conventional train–test paradigm. In order to tackle this research gap, this study suggests a few-shot water quality classification system for aquaculture environments based on prototypical networks. The framework is tested with various K-shot setting scenarios to mimic realistic limited-label scenarios. Moreover, it is systematically compared to the common machine learning classifiers, such as LR, RF, SVM, and XGBoost, with the same constraints imposed on the support set, thereby ensuring a fair comparison.
The main contributions of this study are summarized as follows:
A metric-based FSL framework based on prototypical networks is developed for aquaculture water quality classification under limited-label conditions.
The proposed framework is evaluated across multiple few-shot scenarios (one-shot, five-shot, 10-shot, and 15-shot) to investigate adaptation capability, robustness, and scalability.
A rigorous comparative benchmark is conducted against conventional machine learning models under identical support-set constraints, providing a fair assessment of low-data classification performance.
Comprehensive performance evaluation is performed using accuracy, precision, recall, F1-score, and area under the receiver operating characteristic curve (AUC-ROC), together with confusion matrix analysis and feature-space visualization.
The findings demonstrate the potential of metric-based FSL to support intelligent aquaculture monitoring systems capable of operating effectively when labeled data availability is limited, thereby contributing toward more adaptive, scalable, and data-efficient environmental monitoring solutions.
2. Literature Review
The aspect of water quality management is a key determinant of the success of aquaculture systems, especially in fish cultivation ponds where the environment is a direct outcome of the environmental conditions that will impact the health, growth, and survival of the fish. As mentioned in previous research, parameters such as dissolved oxygen, temperature, pH, turbidity, ammonia, and nutrient concentration play a decisive role in sustaining the pond ecosystem [
14]. The fluctuations in major parameters of water quality have far-reaching effects on fish physiology and behavior. They can impact feeding and social relations, stress response, and overall welfare, which are essential to maximize aquaculture production [
15]. Conventional methods of water quality monitoring rely on laboratory tests, subjective knowledge, or predetermined threshold indices. Though these approaches have gained wide popularity, they tend to be manual, time-consuming, and unable to aid decision-making in real time [
16]. To eliminate these shortcomings, scholars have increasingly considered data-based methods in water quality evaluation. To analyze the correlations among the physicochemical parameters, statistical and regression-based approaches were first used. Nevertheless, these models are based on linearity. In most cases, they do not predict the complex nonlinear relationships that are present within aquatic ecosystems, which reduces their predictive power within dynamically changing aquaculture systems.
ML is currently a well-known method for water quality prediction and classification in aquaculture systems. Supervised models such as LR, SVM, RF, or k-nearest neighbor (KNN) have performed superiorly in comparison to the traditional statistical models due to their effectiveness in the nonlinear relationships between the numerous water quality characterizations [
17]. RF models are particularly liked because they are highly resistant to noise and can work with high-dimensional data, but SVMs are highly generalized with respect to moderate-sized data. Internet of Things (IoT)-based technologies, combined with ML, have become popular in recent studies to enable real-time monitoring and implement intelligent management of the quality of water [
18]. Similarly, it has been proven that the use of ML frameworks that are built on the concept of an IoT will also be applicable to temporal analysis of water quality in tilapia ponds, where data-driven and sustainable frameworks may be utilized to control the aquaculture farming activities within rural settings [
19]. Besides ML, new intelligent systems, which involve artificial intelligence (AI), blockchain, and fuzzy logic, have even made the aquaculture systems more efficient and automated. The IoT-AI-blockchain systems have demonstrated the capability to supply the visibility of the supply chain and traceability in the aquaculture sector. However, again, cost and technical objectivity remain significant challenges [
20]. At the same time, it was demonstrated that fuzzy logic systems relying on IoT are very precise and capable of equilibrating the best water quality conditions on their own, which allows fuzzy control in conditions of high unpredictability of aquaculture to be achieved [
21]. Despite the positivism of such development, most aqueous quality monitoring works based on ML use work with large datasets that are well-labeled [
22]. In the practical implementation of aquaculture, the availability of such data from sensors is usually constrained, and there is a deficiency of information, seasonality, and scarcity of professional labeling.
Recent studies have explored deep learning models for water quality assessment, such as artificial neural networks and hybrid networks. Deep learning techniques have the potential to automatically learn complex feature representations and have had encouraging results in large-scale environmental data. Combination frameworks with multi-scale decomposition, graph-based attention, and GRU-MoE models have shown excellent performance in spatiotemporal water quality dynamics, which consequently increases the resilience of aquaculture systems, as well as data-animated management [
23,
24]. This learning paradigm remains highly underexplored for overcoming the data deficiency and complexity in inland water quality anticipation, which has greater capacity to represent the high-dimensional associations and supplement the conventional labeling techniques [
25]. However, these models are normally resource-intensive in terms of both labelling effort and computation cost during training. In less demanding data collection settings such as aquaculture, datasets are not very large, and deep learning methods are generally inconvenient and vulnerable to poor generalization. Moreover, deep learning models are required to adapt to new settings or difficulties that remain unseen unless massive retraining is carried out. These shortcomings highlight the need for alternative learning paradigms in high-performance cases and data-scarce conditions.
FSL is a promising approach that enables models to learn with a small number of examples per class, which has proven to be an efficient approach to deal with the issue of labeled data scarcity. FSL approaches focus on the acquisition of transfer representations that are especially useful in data-limited situations. ProtoNet is represented as a prototype in a detected embedding space and categorization is based on measures of distance. ProtoNet has demonstrated good performance in low-data regimes; it is less prone to overfitting, and the nature of the model is very interpretable, which makes it particularly suitable for tabular sensor data such as water quality measurements with moderate feature dimensionality and relatively separable class boundaries [
11,
26,
27]. The field of FSL has already been effectively applied across domains, such as image recognition, speech processing, fault diagnosis, and biomedical applications. In practice, embedding distribution models with few shots that use metric learning have performed better with scarce and weakly similar vision in identifying fish species in comparison with traditional prototypical networks in the marine complex setting [
13,
27,
28,
29]. Likewise, few-shot training methods have demonstrated major advancements in the accuracy of rare object detection and the robustness of the perception of autonomous driving systems when less data is available. FSL pipelines combining pre-training, meta-learning, and fine-tuning coupled with feature attention have been shown to be highly accurate and real-time efficient at detecting plant diseases with little training data [
30,
31]. Moreover, prototype matching-based meta-learning has also been demonstrated to perform well in zero-shot and few-shot fault diagnosis, and few-shot meta-learning used with brain activity mapping has facilitated successful central nervous system drug discovery using limited datasets [
32,
33].
Table 1 provides a summary of representative FSL approaches across various application domains.
Most current water quality research still relies on traditional supervised-based learning paradigms, which typically require large well-labeled datasets. However, these approaches face challenges in realistic aquaculture settings where data are sparsely labeled, sensor measurements are noisy, and conditions vary seasonally. As a result, there remains a research gap in data-efficient learning paradigms for aquaculture water quality evaluation. To address this gap, the present study proposes a systematic implementation of FSL framework to classify aquaculture water quality in limited settings.
3. Materials and Methods
This research adopted a comparative experimental methodology to evaluate the effectiveness of FSL for water quality classification in fish cultivation ponds under limited labeled data conditions. The proposed framework integrated a ProtoNet as the primary model and compared its performance with traditional machine learning classifiers trained on a constrained dataset. The core motivation was that in real aquaculture environments, collecting large volumes of labeled water quality data is expensive and time-consuming. Therefore, models must generalize effectively from a very small number of samples per class. The overall workflow shown in
Figure 1 consists of data acquisition and preprocessing, few-shot episodic task construction, feature embedding using a neural encoder, prototype-based classification, K-shot ablation analysis, and performance comparison with classical ML baselines.
3.1. Dataset Description
The water quality dataset employed in this study was obtained from the Mendeley Data Repository (Dataset DOI:
https://doi.org/10.17632/y78ty2g293.1) [
34].
Table 2 presents the descriptive statistical summary of physicochemical and biological parameters in the dataset, which characterizes the water quality condition in aquaculture environments. The variables exhibit substantial variability across features, such as high and low concentrations of nutrients, high and low pH and high and low temperatures. These variations are intended to mimic the actual conditions in aquaculture, including seasonal changes, environmental disruptions, and sensor differences. In addition, there are relationships between features that are not linear and have non-uniform distributions, plus overlapping distributions of features. This inherent complexity increases the difficulty of the classification task and makes it appropriate to test both few-shot and conventional machine learning methods under realistic data-limited conditions.
The main physicochemical and biological parameters of water quality used in this study are presented in
Table 3, along with their corresponding units. These parameters are very important in the stability and health of an aquaculture system. Temperature, dissolved oxygen, and pH directly affect metabolic processes, respiration efficiency and physiological stress responses in aquatic systems. Nutrient-related parameters such as ammonia, nitrite and phosphorus are closely correlated with organic pollution and eutrophication risk; high concentrations can cause toxicity and limit the availability of dissolved oxygen. Alkalinity, hardness, and calcium also help to buffer the water and maintain its ionic balance, which helps to keep water chemistry conditions stable. Furthermore, high concentrations of hydrogen sulfide and carbon dioxide can be lethal and have a toxic effect on fish due to their ability to affect oxygen transport and respiratory function. The abundance of plankton growth reflects the biological productivity and has been used to measure the natural food available in the water, while overgrowth can cause oxygen depletion at night. They together give a complete picture of the water quality dynamics in aquaculture systems.
3.2. Exploratory Data Analysis
Figure 2 shows the distribution plots of each water quality indicator based on histograms. A majority of the variables displayed non-Gaussian and positively skewed distributions, such as ammonia, nitrite, phosphorus, plankton and hydrogen sulfide, indicating unusual but important pollution incidents. Dissolved oxygen and pH had relatively consistent distributions within acceptable biological limits, while turbidity and alkalinity had great variability. These distributions suggest non-uniformity and imbalance of data, highlighting the need for data normalization and robust learning approaches to deal with non-Gaussian and heavy-tailed distributions.
Figure 3 displays box plots to show the central tendency, spread and outlier values for each water quality parameter. Different water quality parameters such as ammonia, nitrite, phosphorus, hardness and alkalinity showed many outliers, reflecting true water quality problems rather than measurement errors. The presence of numerous outliers highlighted the variability and unpredictability of aquatic environments. This supports the inclusion of extreme values in the training set, as they enhance the model’s capability to predict real pollution events and rare yet important water quality conditions.
Figure 4 shows the Pearson correlation heatmap of linear relationships between water quality parameters and water quality class. The heatmap reveals that biochemical oxygen demand (BOD), nitrite, alkalinity, and dissolved oxygen (DO) had moderate to high correlations with water quality, suggesting that these parameters play a crucial role in determining water quality. By contrast, turbidity was negatively correlated with DO and water quality, which implies that suspended sediment decreases the oxygen level and impairs water quality. Temperature and pH had low correlations and, therefore, have low individual explanatory power. The mixed positive and negative correlations among parameters confirmed the multivariate nature of the parameters, supporting the use of multivariate learning models instead of individual thresholding.
3.3. Data Preprocessing
3.3.1. Dataset Cleaning and Validation
The dataset contains
independent telemetry observations collected from fish cultivation ponds. Each observation consists of
D = 14 continuous physicochemical and biological parameters describing localized pond conditions such as temperature, turbidity, dissolved oxygen, biochemical oxygen demand, carbon dioxide concentration, pH, alkalinity, hardness, calcium, ammonia, nitrite, phosphorus, hydrogen sulfide, and plankton density. The target variable represents water quality status and is categorized into three mutually exclusive class states:
corresponding to Excellent, Good, and Poor conditions, respectively.
The complete dataset is denoted as where represents the feature vector and denotes the corresponding class label. The class labels are provided in the original dataset released by the authors of the Mendeley repository and are constructed based on a predefined Water Quality Index (WQI)-based thresholding scheme, which maps continuous physicochemical measurements into discrete water quality categories. No relabeling or modification of the original dataset annotations was performed in this study. The dataset exhibits a near-balanced distribution, consisting of 1400 samples for Class 0, 1400 samples for Class 1, and 1500 samples for Class 2.
3.3.2. Train–Test Partitioning
To guarantee strict validation rigor, a stratified holdout protocol was adopted. The complete dataset resource pool was partitioned into an 80% meta-training split (
) and an independent 20% meta-testing validation holdout (
) using stratified sampling. The sample boundaries are mapped as
with
ensuring complete structural separation between optimization and evaluation phases.
Reflecting the global stratification parameters with absolute precision down to two decimal places, the meta-training split contains exactly 1120 samples for Class 0 (32.56%), 1120 samples for Class 1 (32.56%), and 1200 samples for Class 2 (34.88%). Mirroring this balance perfectly, the meta-testing validation holdout holds exactly 280 samples for Class 0 (32.56%), 280 samples for Class 1 (32.56%), and 300 samples for Class 2 (34.88%). The training partition was used exclusively for episodic meta-training of the deep FSL models, while the testing partition remained completely unseen until final inference evaluation.
3.3.3. Feature Scaling and Normalization
Effective data preprocessing is a critical step to ensure the reliability, stability, and generalizability of machine learning and FSL models. In this study, a structured preprocessing pipeline was applied prior to model training and testing to prevent data leakage and ensure fair model comparison. A check was done on the dataset to identify missing entries, duplication, inconsistencies, and invalid measurements. No missing observations were found, nor duplicate samples identified. All the variables were checked for numerical consistency and suitability for statistical analysis and input to the model. Because water quality parameters exhibit entirely different numerical ranges and physical measurement units, all the features were standardized using z-score normalization to ensure geometric optimization stability [
35]:
where the local feature mean (
) and standard deviation (
) are defined as:
which are computed exclusively from the training partition. A small stabilization constant
is included within the radical denominator to avoid numerical division-by-zero errors. The fitted normalization parameters were subsequently applied to the testing partition as a static transform layer to completely prevent future lookahead information leakage.
3.3.4. Episodic Few-Shot Evaluation Framework
To evaluate out-of-distribution generalizability under highly restricted labeled data conditions, an episodic evaluation strategy was executed over four discrete shot horizons:
For an independent evaluation task, a localized Support Set (
) and Query Set (
) are dynamically extracted from the hidden testing partition. For any target class (
), the class support slice is defined as
which contains exactly
support reference samples. The collective episodic support set is aggregated as
maintaining a total capacity of
, where
total target classes.
To evaluate classification performance with strong statistical significance, an expanded query matrix is extracted containing exactly 80 query samples per class:
maintaining a task volume of
total query samples per episode. Support and query index allocations are sampled without replacement, ensuring a strict mutual exclusion barrier:
which eliminates sample overlaps or identity data leakage within an individual task loop. To reduce stochastic variability and improve statistical robustness, episodic task generation was repeated across five independent random seeds {42, 52, 62, 72, 82}. For each shot horizon, 100 evaluation episodes were generated per seed, resulting in 500 independent evaluation tasks. Final performance metrics were reported as the mean and standard deviation across all the generated episodes.
3.4. Deep Few-Shot Learning Models
Three distinct metric-based meta-learning paradigms were investigated: prototypical networks, Siamese Networks, and Matching Networks. Unlike traditional online machine learning classifiers, these models underwent an intensive offline episodic meta-training phase over the 3440 samples inside the training partition pool. During this optimization stage, thousands of simulated few-shot tasks were dynamically generated from to update model parameters and teach the networks how to map generalized feature relationships rather than memorize static class perimeters.
3.4.1. Shared Embedding Network
To ensure complete structural comparative fairness, all three deep meta-learning architectures share the exact same structural encoder backbone network. The mapping path transitions continuous tabular variables as
The embedding function is defined as , where represents the learnable parameter weights of the linear layers. Every hidden dimension layer incorporates batch normalization, a rectified linear unit (ReLU) activation function, and an aggressive dropout layer () to prevent internal covariate shift and maximize out-of-distribution variance stability. Optimization utilized the Adam optimizer with a fixed learning rate of over exactly 1000 episodic training iterations.
3.4.2. Prototypical Networks
The limited availability of labeled data in aquaculture water quality monitoring is a challenge that is addressed by this study, which proposes an FSL framework based on ProtoNet. ProtoNet was chosen because of its good metric-based classification ability, low data learning ability, simple architecture, and strong generalization ability. ProtoNet is a novel supervised learning method that learns a discriminative embedding space where the samples of a class are grouped around a representative class prototype vector. Then classification is done based on the distance between the query samples and the class prototypes in the embedding space [
13].
where
denotes the subset of support samples with label
. These prototypes serve as class representatives in the embedding space and provide an intuitive geometric interpretation of class structure. Increasing the value of
K leads to more stable prototype estimation, thereby improving classification robustness.
For a query sample
its embedding
is compared against all the class prototypes using a distance metric. In this study, Euclidean distance is employed due to its simplicity and effectiveness [
28]:
The probability of assigning
to class
is computed using a softmax over negative distances:
This formulation ensures that query samples closer to a prototype in the embedding space are assigned higher confidence scores.
3.4.3. Siamese Networks
Siamese Networks optimize parameters to distinguish pairwise similarity relationships across localized contrastive metrics. For any sample pair
, the structural feature distance is computed as the pairwise
norm of their respective hidden embeddings [
11]:
The network optimizes its embedding space projections by minimizing a paired Contrastive Loss function over batch configurations:
where
if the pair matches the identical class category,
if the pair maps to different categories, and
denotes the rigid margin parameter boundary set to 1.0.
3.4.4. Matching Networks
Matching Networks employ a continuous attention-weighted nearest-neighbor lookup strategy to classify query rows. Support and query vector representations are initially forced through an element-wise
normalization layer to map them onto a shared unit sphere hypersphere [
12]:
Attention weights are dynamically computed over query vectors by extracting the softmax cosine similarities across all the available support references:
Final query class probabilities are obtained through a vectorized aggregation step, distributing these attention weights directly onto the support label fields.
3.5. Traditional Machine Learning Baselines
Four baseline classifiers based on conventional machine learning were implemented to compare the effectiveness of deep episodic meta-learning frameworks, including LR, RF, SVM, and XGBoost. None of these classifiers were trained to learn the representation or optimize their parameters offline with the whole training partition as in the meta-learning architectures. They therefore started each evaluation episode with no prior knowledge of the task and instead used only the few labeled samples for the task they were currently testing.
The hyperparameters were standardized for all the evaluations for experimental reproducibility [
35,
36].
LR was implemented with regularization with regularization strength and a maximum number of iterations to optimize set to 1000.
RF was based on 300 decision trees and Gini impurity as the splitting criterion.
SVM was used with a linear kernel and (probability = True).
XGBoost algorithm was used with the XGBClassifier framework and optimized with the objective function called multiclass logistic loss (eval_metric = ‘mlogloss’).
Each conventional classifier was used as a task-specific estimator in the evaluation. For each testing episode that was run independently, a new classifier instance was created and trained only on the support set:
where
and
denote the support feature and corresponding class labels.
The total amount of labeled information available to each classifier was therefore restricted to
where
K is the shot horizon and
C = 3 is the number of water quality classes. So, for the 1-shot, 5-shot, 10-shot and 15-shot settings, only 3, 15, 30 and 45 observations were available to be used for training.
After fitting, the trained classifier generated predictions for the unseen query samples associated with the same evaluation episode. This assessment model is designed to highlight the disparities between two contrasting learning models. While conventional machine learning models have to learn decision boundaries from a few support samples in each task, meta-learning architectures can use representations learned offline based on an episodic training set and reuse them on the entire training partition. This leads to a benchmark of how well representation-based adaptation does compared to task-specific fitting in the same few-shot inference conditions.
The episodic training is performed using an N-way K shot sampling strategy, where support and query sets are dynamically constructed from the training data for each episode. A fixed number of training episodes is used as the stopping criterion, with performance monitored across evaluation episodes to ensure stable convergence across all the shot settings. All the experiments are implemented in Python (Version 3.12.13) using the PyTorch deep learning framework (Version 2.11.0) and executed on Google Colab with GPU acceleration. The computational environment includes 16 GB RAM, along with standard scientific libraries such as NumPy (Version 2.0.2) and Scikit-learn (Version 1.6.1) for preprocessing and evaluation. These details are provided to ensure full transparency and complete reproducibility of the proposed FSL framework.
3.6. Evaluation Metrics
In this study, the proposed FSL method using ProtoNet for fish cultivation pond water quality classification is evaluated. We compare the results with traditional ML approaches with constrained data. Various performance metrics such as precision, recall, F1-score, accuracy, AUC, embedding plot and confusion matrices are used for quantitative evaluation and qualitative interpretation of the results. These formulas are based on the four possible outcomes of a classification problem including true positives (TPs), true negatives (TNs), false positives (FPs) and false negatives (FNs). The equations are as follows [
35,
36]:
The area under the receiver operating characteristic curve (ROC-AUC) was additionally calculated using a macro-averaged, one-vs-rest multiclass mapping routine. To account for the natural class distribution of the dataset, all the primary metrics—including precision, recall, and F1-score—are computed using a weighted averaging matrix across all evaluation runs. All the final outputs were averaged across 100 evaluation episodes across five independent random seed initializations: {SEEDS} = {42, 52, 62, 72, 82}, resulting in a total of 500 evaluation episodes per model.
4. Results
4.1. Performance Comparison with ML Baselines
Table 4 presents the performance comparison between the proposed FSL frameworks (prototypical network, Siamese Network, and Matching Network) and traditional machine learning classifiers (LR, RF, SVM, and XGBoost) under four K-shot learning scenarios. The accuracy, precision, recall, F1-score, and ROC-AUC were used to assess the performance, and accuracy was reported as the mean ± standard deviation (SD) of several runs of the experiments. Deep models that learned via few shots using metrics were consistently more successful than the traditional machine learning methods for each of the shot horizons. The Siamese Network outperformed all the other networks in the hardest one-shot scenario with only one label for each class, with an accuracy of 86.48%, followed by MatchingNet (82.72%) and ProtoNet (82.69%). All the traditional machine learning models, however, failed to do well on the very few training samples, as the RF model accuracy was only 57.09%, and LR and SVM models were about 51% accurate. The XGBoost algorithm demonstrated the lowest accuracy, with a score of 33.33% (random guessing in a three-class problem). With the increase in the support samples, classification performance for all the models improved. ProtoNet demonstrated the greatest improvement, with accuracies for 5, 10, and 15 shots of 93.31%, 94.16%, and 94.46%, respectively. The MatchingNet was also analyzed, and similar trends were seen, with 94.00% accuracy for 15 shots. ProtoNet outperformed all the other models in the higher-shot settings, demonstrating better prototype estimation and class representation learning ability in the presence of more support samples, while Siamese Network gained the highest accuracy in the one-shot setting. The performance of the traditional machine learning classifiers was evaluated consistently, and the best results were obtained by the RF, with an accuracy of 81.10% and ROC-AUC of 91.62% in the 15-shot setup. LR, SVM and XGBoost exhibited a steady improvement in performance with an increase in support set size but still performed significantly worse compared to the FSL models. At the 15-shot level, the best performance of the traditional classifiers was just more than 13 percentage points behind the prototypical network. The ROC-AUC results also show the effectiveness of the FSL models. ProtoNet exhibited ROC-AUC scores from 93.12% to 98.65% in one-shot and 15-shot settings, respectively, demonstrating good class separability and strong probabilistic discrimination. MatchingNet and Siamese Network also obtained consistently high ROC-AUC performance values of more than 96% in the high number of shots. In contrast, the traditional machine learning models showed significantly lower ROC-AUC scores, indicating their poor discriminatory power in limited data environments. Additionally, the standard deviations of the prototypical network were low (0.48–1.08%) across all the shot horizons, which demonstrates good training stability and reproducibility. The results show that metric-based FSL architectures are far superior to standard machine learning methods in the face of extreme few-shot data labeling conditions. The findings confirm the feasibility of prototypical networks as a robust and scalable approach to water quality classification in water quality conditions that are challenging for aquaculture applications, where collecting large amounts of labeled data is difficult and expensive. The performance comparison of radar profiles across shifting horizons is also illustrated in
Figure 5.
4.2. Multi-Shot Ablation Matrix and Parameters Sensitivity Analysis
To assess an FSL system’s performance with respect to its latent feature dimensionality, we performed an ablation study of the embedding dimension in the three metric-based systems, prototypical networks (ProtoNet), Siamese Networks, and Matching Networks, that are presented in
Table 5 and
Figure 6.
The best embedding dimensions (d = 16, 32, 64, and 128) and FSL scenarios (K = 1, 5, 10, and 15 shots) were selected to identify the best embedding dimensions for water quality classification. The results show a positive correlation between embedding dimensionality and classification accuracy for all the models, which is consistent across all. When trained with the one-shot setting, ProtoNet achieved a significant improvement in accuracy from 73.15% at d = 16 to 82.69% at d = 64, gaining around 9.54% of accuracy. For higher shots, the accuracy improved from 82.42% to 93.31% in five-shot and 83.60% to 94.46% in 15-shot. The results demonstrate that ProtoNet can learn more discriminative prototype representations and better deal with complex relationships between the physicochemical water quality parameters in larger embedding spaces. A similar trend was observed for Siamese Networks. In the one-shot setting, the system accuracy increased gradually from 79.10% at d = 16 to 89.12% at d = 128, and it was successful in the 15-shot setting with 92.18% accuracy. The progressive improvement indicates that the improvement of pairwise similarity learning is propelled by the rise in the representation capacity, so the network can distinguish better inter-class relationships and intra-class relationships in the embedding space. There was also a large increase in embeddability in Matching Networks. Accuracy rose from 72.98% at d = 16 to 82.72% at d = 64 in the one-shot setting and further improved to 94.30% at d = 128 in the 15-shot scenario. Higher-dimensional embedding is observed to provide better performance of attention-based support-query matching mechanisms, due to the ability to retain more discriminative feature information. Notably, the performance improvement for d = 128 over d = 64 was relatively small for all the architectures. In ProtoNet, the 15-shot result of increasing the embedding dimension from 64 to 128 was only 0.31 percentage points better (94.46% to 94.77%). The same marginal gains were seen with Matching Networks and Siamese Networks. This saturation effect indicates that many of the discriminative information points are already contained in a 64-dimensional latent space, and a greater embedding will have little gain in practical performance but will add to the computational complexity. The results of the ablation study reveal that embedding dimensionality is important and critical to the performance of few-shot water quality classification. The results showed that an embedding dimension of d = 64 is a good compromise between the complexity of the models, the computational time, and the classification accuracy. Therefore, d = 64 was chosen as the optimal embedding size for the final experimental setup, as it was found to be close to the optimum size while requiring significantly less computational time than larger embedding sizes.
4.3. Cross-Paradigm Manifold and Projection Verification
To study the classification mechanisms in deep metric-based FSL and traditional few-shot supervised machine learning approaches, principal component analysis (PCA) was performed on a representative five-shot evaluation episode. The two-dimensional projections are displayed in
Figure 7. Finally, the visualization of the prototypical network embedding space (see
Figure 7a) shows that episodic meta-learning is able to build very discriminative latent representations. The first two principal components explain 82.02% and 14.38% of the variance, respectively, accounting for 96.40% of the information in the learned feature space. The samples within the same water quality category are grouped tightly, and there is a distinct separation between the classes. The samples of query items are seen to converge to the corresponding support samples and prototype centroids. It shows high intra-class cohesion and low intra-class variance. This structured embedding geometry enables accurate classification through simple Euclidean distance calculations between query embeddings and class prototypes. The clear separation between the three categories of Excellent, Good and Poor water quality indicates that ProtoNet is indeed learning generalized representations that can distinguish water quality conditions in the presence of limited labeled data. Compared to this,
Figure 7b shows the projection of the original feature space in which a standard SVM classifier was trained in the same five-shot setting. Class-related information is still widely spread out across many dimensions, with only 25.62% of the total variance explained by the first two principal components. There is significant overlap between the three water quality classes, which is different from ProtoNet’s structured manifold. The linear decision boundaries created by the limited number of support samples, therefore, do not closely reflect the data distribution. Many query samples are placed close to or over class boundaries, leading to greater ambiguity in classification and less generalization. It is important to note that there is a fundamental difference between meta-learning and traditional supervised learning between
Figure 7a and
Figure 7b.
Standard machine learning models try to learn decision functions directly from a very small sample of points in a support set, which results in being very sensitive to sampling variability and data sparseness. FSL models, on the other hand, provide a representation learned over episodic meta-training so that unseen samples can be mapped to a discriminative latent space for classification via distance-based reasoning. During the experimental evaluation, the superior performance of ProtoNet, Siamese Networks and Matching Networks is attributed to this representation transfer mechanism. In conclusion, the qualitative evidence offered by the PCA visualization, combined with the quantitative results shown in Tables X–Y, suggests that the superior performance of FSL frameworks is not due to the ability to estimate the decision boundary of the respective tasks alone, but stems from the capability to learn transferable and well-structured feature manifolds.
To further investigate class-specific predictive behavior, row-normalized confusion matrices were generated for all the models at the five-shot evaluation setting, as illustrated in
Figure 8. These confusion matrices provide detailed insight into classification accuracy and misclassification patterns across the three water quality categories (Excellent, Good, and Poor). For the Excellent water quality category, all the deep metric-learning frameworks demonstrated near-perfect classification performance. ProtoNet and MatchingNet achieved a true positive rate of 100.00%, while Siamese Networks achieved 99.99%. These results indicate that the learned embedding representations effectively separate high-quality water samples from the remaining classes. In contrast, the conventional machine learning models exhibited greater classification uncertainty. For example, the SVM classifier misclassified 11.01% of the Excellent samples as Poor, suggesting that decision boundaries estimated from a highly restricted support set may not adequately represent the underlying feature distribution. The Good water quality category also exhibited strong classification performance among the FSL models. MatchingNet achieved the highest true positive rate (99.86%), followed by ProtoNet (97.42%) and Siamese Networks (96.62%). A notable characteristic of ProtoNet is that its errors were primarily localized to the neighboring Poor category, with no misclassifications into the Excellent category. This behavior suggests that the learned embedding space preserves the ordinal relationship between water quality conditions, where samples with similar physicochemical characteristics remain close in the latent representation space. Consequently, classification errors tend to occur between adjacent categories rather than across distant classes. The Poor water quality category represented the most challenging classification task across all the models. Compared with the Excellent and Good classes, the Poor water quality samples exhibited greater variability and overlap with neighboring categories, resulting in reduced classification performance. Nevertheless, ProtoNet maintained the highest class-level accuracy with a true positive rate of 80.46%, outperforming MatchingNet (75.07%) and Siamese Networks (72.44%). These findings indicate that prototype-based representation learning provides greater robustness when handling complex and heterogeneous environmental conditions. The conventional machine learning baselines showed substantially lower performance within the Poor category. LR achieved only 31.79% class accuracy, while XGBoost achieved 31.20%. In both cases, a large proportion of the Poor samples were misclassified as Excellent or Good. RF demonstrated improved performance relative to the other conventional models but still achieved only 29.25% correct classification for the Poor category. These results suggest that traditional classifiers trained exclusively on a small number of support samples struggle to capture the complex nonlinear relationships governing degraded water quality conditions. Overall, the confusion matrix analysis provides qualitative evidence supporting the quantitative performance metrics reported in
Table 4. The deep metric-learning frameworks consistently produced more structured and interpretable error patterns, whereas conventional machine learning models exhibited higher levels of inter-class confusion under limited-data conditions. Among all the evaluated approaches, ProtoNet demonstrated the most balanced performance across all the water quality categories, particularly in the challenging Poor class, highlighting its suitability for water quality assessment under FSL scenarios.
4.4. Meta-Training Convergence Behavior
The training loss curves are shown in
Figure 9, along with the episodic meta-validation accuracy curves. These curves help to understand the optimization behavior and convergence of the considered FSL models with a metrics-based approach for episodic training. Prototypical networks (ProtoNet) and Matching Networks show stable and efficient convergence behavior. The training loss decreases at a fast rate in the first 150 episodic iterations, and then both models converge to a low-loss regime (around the 0.5 mark). This quick convergence is the result of the effectiveness of metric-based representation learning, in which class prototypes and attention-based similarity mechanisms allow for an efficient organization of the embedding space. Similarly, the accuracy for meta-validation continues to improve smoothly, with the accuracy of the top two models above 90% of the total after a training episode number of around 200. The two architectures maintain consistent performance after this point, with only small variations with subsequent updates. These findings suggest that prototype-based centroid learning (ProtoNet) and similarity-based matching mechanisms (MatchingNet) are effective in providing strong inductive biases for rapid and stable convergence in low-shot settings. The Siamese Network, on the other hand, has more of a slow and unstable convergence pattern in the initial phases of training. In the first 100–120 episodic iterations, it is seen that the loss and accuracy fluctuate, and in some iterations, accuracy drops to around 55% and then rises in subsequent iterations. This behavior has been claimed to be caused by the sensitivity of the contrastive loss optimization with limited support sampling, which might cause instability in the initial embedding space formation by both minimizing intra-class distance and maximizing inter-class separation simultaneously. In this stage, a considerable reorganization of the embedding space takes place and the loss fluctuates more. But after about 180 training episodes, however, the optimization process settles into a new regime, and the loss slowly approaches the new steady-state region. Siamese Network achieves stable improvement in the validation accuracy during the later training phases and eventually reaches the level of ~93.5% accuracy. There is also an improvement in the stability of the learned representation as the variance in performance is reduced. This implies that the contrastive learning approach needs more adaptation time than the prototype-based approach but could also be stabilized and discriminative in the embedding space if sufficiently trained. The overall results from the convergence analysis indicate distinct differences between the optimization dynamics of the investigated architectures. While Siamese Networks need more training to reach competitive results, Prototype-based methods (ProtoNet and MatchingNet) have quicker convergence and stability in the early training phases. The results highlight the compromise between stability of the optimization process and flexibility in the representation learning approach in few-shot classification of water quality using a metric-based framework.
Another interesting point is that ProtoNet errors are localized around cluster boundaries, in contrast to traditional models that result in misclassification all over the feature space. This means that ProtoNet has a more organized latent representation, which facilitates enhanced generalization and discrimination between classes. The small size of properly categorized samples also lends credence to the hypothesis that metric-based learning maximizes intra-class consistency and maximizes the inter-class distance. These results support the high quality of ProtoNet in the process of working with limited labeled water quality data. The proposed few-shot framework reduces areas of uncertainty and enhances interpretability by learning a semantically meaningful embedding space. The use may be especially useful in aquaculture and environmental surveillance where it is necessary to have a dependable measure of borderline water quality conditions to make operational decisions and to avoid risks.
4.5. Statistical Significance Analysis (Paired t-Test Evaluation)
Table 6 illustrates the performance difference between the prototypical network (ProtoNet) and the baseline models, which were tested using a paired
t-test for each of the four shot settings like 1-shot, 5-shot, 10-shot, and 15-shot. The paired design was employed as all the models were tested against the same episodic test sets with the same support–query configurations, thus yielding comparable and dependent observations. Performance was measured over multiple evaluation episodes for each shot setting, and differences between ProtoNet and each baseline model were calculated for each pair of models. Then, a paired
t-test was used to see if the difference in the means is significantly different from zero under the null hypothesis that there is no difference in performance. The significance level was set at
p < 0.05; a significance level of
p < 0.01 and
p < 0.001, respectively, was considered a higher statistical significance.
The results show that ProtoNet outperforms all traditional machine learning baselines including LogReg, RF, SVM, and XGBoost, with a statistically significant margin in each of the shot configurations. The performance differences observed during episodic evaluation were generally highly significant (p < 0.001), indicating that the improvements are unlikely to be due to random factors. These findings demonstrate that metric-based few-shot learning is effective in extracting discriminative feature representations in the case where only a few labeled samples are provided. ProtoNet exhibited mixed results in extremely limited data scenarios compared with other few-shot deep learning methods. In the 1-shot setting, the difference between ProtoNet and Matching Network was not statistically significant (p = 0.977), suggesting that both approaches have comparable performance when the class representation is derived from only a single support sample per class. This result is expected as a single example can not accurately represent the class distribution. However, the number of samples that were used for support increases, ProtoNet provides statistically significant improvements in comparison with Siamese Network and Matching Network. The results show that additional support samples enable more accurate prototype estimation which leads to better class representation and better separation between water quality categories. In general, the statistical analysis illustrates that ProtoNet works better with larger support sets in episodic learning scenarios, offers more stable and reliable predictions when labeled data is scarce. The results are supported by several repeated episodic evaluations performed under different random seeds, enhancing the robustness and reliability of the reported results.
4.6. Discussion
The results show that deep metric-based FSL is an excellent approach to classify water quality in aquaculture with limited labeled data. Meta-learning models with episodic methods produced better results than traditional machine learning models in all the K-shot settings, indicating that representation learning is better than directly estimating the decision boundary in data-scarce settings. In particular, ProtoNet was found to be more scalable as more support samples were provided and was seen to gradually improve with more samples provided, perhaps due to the stability of prototype-based class representations in the learned embedding space. Siamese Networks, on the other hand, showed better results in very low-shot scenarios, indicating that pairwise similarity learning is more advantageous when having only a small amount of labeled data. While traditional machine learning models are competitive in higher-shot scenarios, they demonstrated that under severe data limitation, they are not robust and rely on having enough training samples to reliably estimate a decision boundary. The Poor water quality class was the most difficult to manage in all the models, possibly because of the higher intra-class variability and overlap with adjacent classes. The learned embedding spaces, however, showed better discrimination for this class with the metric-based models, especially ProtoNet, suggesting that nonlinear relationships among the physicochemical parameters are better captured by learned embedding spaces. The ablation study also underscores the importance of embedding dimensionality for optimal model performance, where 64 dimensions serve as a good balance between representational capacity and computational costs. An increase in dimensions did not show significant gains, and beyond this point, there were diminishing returns with tabular aquaculture data. The results are also important from an application standpoint as they apply to the real-world aquaculture systems where labeled data is often expensive, noisy, and scarce. FSL models provide a viable solution for learning effectively with only a few shots. The present study, however, was based on a single dataset and a static classification system, and therefore may have some limitations with respect to generalizations to different environmental conditions. Also, it lacks temporal modeling capabilities to capture temporal changes in water quality. Future work will focus on cross-domain adaptation, temporal sequence modelling, and more sophisticated meta-learning architectures, like transformer and graph-based models, to further improve robustness and real-world applicability.
5. Conclusions
This study comprehensively investigated few-shot learning algorithms for deep metrics in aquaculture water quality classification, under scarce labeled data settings. Three episodic meta-learning architectures were systematically compared with four classifiers of conventional machine learning, such as logistic regression, random forest, support vector machine, and XGBoost. The experimental results showed that few-shot learning models consistently outperformed the traditional supervised learning models for all the shot horizons. ProtoNet consistently had the highest overall classification accuracy (94.46%) and highest ROC-AUC (98.65%) on the 15-shot setting, whereas Siamese Networks performed best on the most restrictive one-shot setting. Latent space visualization, confusion matrix analysis, paired t-test statistical analysis and convergence studies further corroborated the superior performance of the few-shot learning frameworks by demonstrating increased class separability, improved recognition of the hard water quality categories, and stable optimization behavior. The ablation analysis also revealed that embedding dimensionality plays a crucial role in determining the effectiveness of a model, as a latent representation with 64 dimensions proved to be optimal with a balance between accuracy and computational complexity. Moreover, the fair episodic benchmarking framework revealed that representation-based meta-learning techniques have significantly higher data efficiency than traditional machine learning techniques in scenarios with a limited amount of labeled data. The results from this research demonstrate the feasibility and effectiveness of few-shot learning in assessing water quality in aquaculture systems, especially in settings where collecting extensive amounts of annotated data is not feasible. The proposed framework holds promise for intelligent water quality monitoring, early detection of environmental risks, and responsible aquaculture operation, all achieved with minimal labeled samples to facilitate precise classification. Future work will involve multi-source sensor integration, development of temporal water quality forecasting, transformer-based meta-learning architectures, domain adaptation across different aquaculture settings, and real-time deployment in an Internet of Things (IoT) enabled smart aquaculture system. These advancements could significantly boost the adaptability, versatility, and effectiveness of few-shot learning in environmental monitoring and precision aquaculture. The proposed method also aligns with the United Nations Sustainable Development Goal (SDG), in particular SDG6 (Clean Water and Sanitation), by supporting improved monitoring and management of water quality to reduce pollution risks and promote the sustainable utilization of freshwater resources.