Next Article in Journal
Metagenomic Insights into Taxonomic Structure, Function of Microbial Community and Antibiotic Resistance Genes in the Whole Baihe Basin
Previous Article in Journal
Lagged Responses of Vegetation Growth to Hydrometeorological Drivers Across Complex Terrain in Southwest China
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Few-Shot Learning–Based Water Quality Classification Under Limited Data Conditions for Smart Aquaculture Monitoring

1
Faculty of Artificial Intelligence and Engineering, Multimedia University, Cyberjaya 63100, Malaysia
2
Preparatory Center for Science and Technology, Universiti Malaysia Sabah, Jalan UMS, Kota Kinabalu 88400, Malaysia
*
Authors to whom correspondence should be addressed.
Water 2026, 18(12), 1523; https://doi.org/10.3390/w18121523 (registering DOI)
Submission received: 7 May 2026 / Revised: 5 June 2026 / Accepted: 17 June 2026 / Published: 20 June 2026

Abstract

Water quality monitoring is a fundamental element of sustainable aquaculture management, as changes in parameters of physicochemical and biological properties directly affect the health, growth performance, and productivity of the aquaculture systems. Although traditional machine learning (ML) methods have demonstrated effectiveness in water quality classification, their performance often depends on large amounts of labeled data, which can be challenging and expensive to collect in real-world aquaculture environments. This study explores a few-shot learning (FSL) framework for data-efficient water quality classification under limited supervision to address this limitation. Several FSL models, including prototypical networks (ProtoNet), Siamese Networks, and Matching Networks were developed and evaluated in a comparative experimental framework against the traditional machine learning classifiers logistic regression, random forest, support vector machine and extreme gradient boosting. Low-data learning scenarios were simulated using a structured episodic evaluation approach. Experimental results demonstrate FSL techniques outperform traditional machine learning methods across all evaluated scenarios. Among the tested methods, ProtoNet achieved the highest performance, attaining an accuracy of 94.46% and an ROC-AUC score of 98.65%, indicating superior discriminative capability and robustness. Siamese Networks also demonstrated competitive performance under highly constrained data conditions. Furthermore, latent-space visualization, confusion matrix analysis, paired t-test statistical analysis, and ablation studies confirmed that episodic meta-learning enables the learning of highly discriminative latent representations with strong generalization capability under limited labeled data conditions. The findings highlight that FSL provides a robust and scalable framework for intelligent water quality classification in aquaculture systems, particularly in scenarios where labeled data are scarce, offering significant potential for sustainable aquaculture monitoring applications.

1. Introduction

Aquaculture has become one of the fastest-growing food production sectors worldwide and plays a crucial role in ensuring global food security, nutritional sustainability, and economic development [1]. International fisheries reports indicate that aquaculture now provides over half of the fish consumed around the world, an important part of modern food production systems. Freshwater aquaculture ponds, especially, are widely used because of their relatively low operating cost and high production efficiency [2,3]. However, ensuring appropriate water quality conditions is one of the most significant problems impacting fish health, growth performance, feed conversion efficiency and overall farm productivity.
The water quality directly affects the biological, chemical, and ecological processes in aquaculture systems. The key physicochemical parameters include temperature, pH, dissolved oxygen, electrical conductivity, turbidity, nutrient concentration, nitrate concentration, and ammonia concentration, which greatly influence fish metabolism and survival. Changes in these parameters can cause physiological stress, lower growth rates, increased disease susceptibility and ultimately mass mortality events [4,5]. Therefore, the continuous observation and precise evaluation of water quality are crucial to sustainable aquaculture operation and environmental conservation. Traditionally, water quality assessment was based on manual sampling, laboratory chemical analysis and expert-derived threshold systems. These methods can be accurate, but are often time-consuming, labor-intensive, costly and not suitable for continuous monitoring. In addition, threshold-based assessment frameworks often do not reflect the highly nonlinear interactions between multiple water quality variables [6,7]. Traditional monitoring methods have significant limitations in terms of rapid and adaptive decision support as aquaculture is made more and more intensive and environmentally dynamic.
The implementation of smart aquaculture frameworks has been boosted by recent innovations in sensing technology, wireless communication systems, and data analytics. The Internet of Things (IoT) devices, low-cost environmental sensors, and cloud-based monitoring platforms allow for the acquisition of vast amounts of water quality data in real time [8]. At the same time, machine learning methods have become powerful tools for interpreting and analyzing complex environmental datasets and uncovering hidden relationships between water quality variables. Various supervised learning algorithms, including logistic regression (LR), decision trees (DTs), random forests (RFs), support vector machines (SVMs), artificial neural networks (ANNs), and gradient boosting (XGBoost) models, have demonstrated promising performance in water quality prediction and classification tasks. However, these developments are accompanied by several practical challenges of using conventional machine learning techniques in real-world aquaculture applications. Most supervised learning algorithms need significant amounts of labeled training data to perform reliably in terms of generalization. However, large, high-quality, consistently labeled water quality datasets are difficult to obtain because of the cost of deploying sensors, failure to gather the data, seasonal variability, environmental heterogeneity, missing observations, and the labor-intensive nature of labeling procedures. In many real aquaculture situations, especially in new aquaculture operations, remote locations, and areas of development, only a few representative samples might be available for use in model development. In these scenarios, common machine learning models may suffer from overfitting, performance degradation, and a lack of adaptability [9,10].
This study does not attempt to solve the problem of the size of the benchmark dataset but the problem of the scenario in which only a few labeled samples are available during deployment. The full dataset consists of around 4300 labeled instances, but the experimental design deliberately mimics the realistic low-data adaptation setting by employing episodic K-shot evaluation settings. This method allows systematic analysis of model performance for classification decisions based on a small set of illustrative examples per class, thus more representative of practical field deployment scenarios than of the standard supervised learning settings with large numbers of examples per class. In the field of artificial intelligence, few-shot learning (FSL) has come to the forefront as an effective solution to overcome data scarcity issues. Unlike supervised learning, where the class boundaries are learned from large amounts of data, FSL is designed to learn transferable representations that can be quickly adapted to a new task with just a few labeled examples. Many FSL approaches, specifically prototypical networks, Matching Networks, and Siamese Networks, are based upon the theoretical framework of meta-learning, which is sometimes called “learning to learn.” In training, models are given many micro learning tasks so they can learn the knowledge that can be widely applied to tasks they have never seen before. In this paper, prototypical networks are of particular interest because of their conceptual simplicity, computational efficiency, and strong empirical performance among various FSL approaches. Prototypical networks are a type of metric-based meta-learning. They do not learn decision boundaries directly, but they learn an embedding function that transforms the samples into a new space, or latent feature space, where the samples of the same class are concentrated together, and the samples of different classes are well spread in the space. A prototype vector for each class is the mean embedding of support samples of that class. The classification is then done by comparing the distance between the query samples and the prototypes of the classes. This mechanism offers a very interpretable decision-making process and can generally show better robustness when data is limited. In addition, Siamese Networks learn a pairwise similarity function using contrastive learning, enabling flexible similarity modeling but often exhibiting sensitivity under extremely low-shot scenarios. Matching Networks also employ an attention-based mechanism over the support set, providing adaptive nearest-neighbor classification but with higher computational complexity and sensitivity to support set variation. By comparing these three representative architectures under identical episodic K-shot settings, this study provides a systematic and fair evaluation of metric-based FSL strategies in aquaculture water quality classification [11,12,13].
There have been several successful applications of prototypical networks and other metric-based meta-learning in computer vision, medical diagnosis, remote sensing, fault detection, and biological classification tasks. Nevertheless, research investigating their application to aquaculture water quality classification remains limited. The effectiveness of few-shot adaptation strategies in aquaculture monitoring systems has not been extensively explored, and most current studies remain focused on traditional machine learning strategies and performance measures across a conventional train–test paradigm. In order to tackle this research gap, this study suggests a few-shot water quality classification system for aquaculture environments based on prototypical networks. The framework is tested with various K-shot setting scenarios to mimic realistic limited-label scenarios. Moreover, it is systematically compared to the common machine learning classifiers, such as LR, RF, SVM, and XGBoost, with the same constraints imposed on the support set, thereby ensuring a fair comparison.
The main contributions of this study are summarized as follows:
  • A metric-based FSL framework based on prototypical networks is developed for aquaculture water quality classification under limited-label conditions.
  • The proposed framework is evaluated across multiple few-shot scenarios (one-shot, five-shot, 10-shot, and 15-shot) to investigate adaptation capability, robustness, and scalability.
  • A rigorous comparative benchmark is conducted against conventional machine learning models under identical support-set constraints, providing a fair assessment of low-data classification performance.
  • Comprehensive performance evaluation is performed using accuracy, precision, recall, F1-score, and area under the receiver operating characteristic curve (AUC-ROC), together with confusion matrix analysis and feature-space visualization.
The findings demonstrate the potential of metric-based FSL to support intelligent aquaculture monitoring systems capable of operating effectively when labeled data availability is limited, thereby contributing toward more adaptive, scalable, and data-efficient environmental monitoring solutions.

2. Literature Review

The aspect of water quality management is a key determinant of the success of aquaculture systems, especially in fish cultivation ponds where the environment is a direct outcome of the environmental conditions that will impact the health, growth, and survival of the fish. As mentioned in previous research, parameters such as dissolved oxygen, temperature, pH, turbidity, ammonia, and nutrient concentration play a decisive role in sustaining the pond ecosystem [14]. The fluctuations in major parameters of water quality have far-reaching effects on fish physiology and behavior. They can impact feeding and social relations, stress response, and overall welfare, which are essential to maximize aquaculture production [15]. Conventional methods of water quality monitoring rely on laboratory tests, subjective knowledge, or predetermined threshold indices. Though these approaches have gained wide popularity, they tend to be manual, time-consuming, and unable to aid decision-making in real time [16]. To eliminate these shortcomings, scholars have increasingly considered data-based methods in water quality evaluation. To analyze the correlations among the physicochemical parameters, statistical and regression-based approaches were first used. Nevertheless, these models are based on linearity. In most cases, they do not predict the complex nonlinear relationships that are present within aquatic ecosystems, which reduces their predictive power within dynamically changing aquaculture systems.
ML is currently a well-known method for water quality prediction and classification in aquaculture systems. Supervised models such as LR, SVM, RF, or k-nearest neighbor (KNN) have performed superiorly in comparison to the traditional statistical models due to their effectiveness in the nonlinear relationships between the numerous water quality characterizations [17]. RF models are particularly liked because they are highly resistant to noise and can work with high-dimensional data, but SVMs are highly generalized with respect to moderate-sized data. Internet of Things (IoT)-based technologies, combined with ML, have become popular in recent studies to enable real-time monitoring and implement intelligent management of the quality of water [18]. Similarly, it has been proven that the use of ML frameworks that are built on the concept of an IoT will also be applicable to temporal analysis of water quality in tilapia ponds, where data-driven and sustainable frameworks may be utilized to control the aquaculture farming activities within rural settings [19]. Besides ML, new intelligent systems, which involve artificial intelligence (AI), blockchain, and fuzzy logic, have even made the aquaculture systems more efficient and automated. The IoT-AI-blockchain systems have demonstrated the capability to supply the visibility of the supply chain and traceability in the aquaculture sector. However, again, cost and technical objectivity remain significant challenges [20]. At the same time, it was demonstrated that fuzzy logic systems relying on IoT are very precise and capable of equilibrating the best water quality conditions on their own, which allows fuzzy control in conditions of high unpredictability of aquaculture to be achieved [21]. Despite the positivism of such development, most aqueous quality monitoring works based on ML use work with large datasets that are well-labeled [22]. In the practical implementation of aquaculture, the availability of such data from sensors is usually constrained, and there is a deficiency of information, seasonality, and scarcity of professional labeling.
Recent studies have explored deep learning models for water quality assessment, such as artificial neural networks and hybrid networks. Deep learning techniques have the potential to automatically learn complex feature representations and have had encouraging results in large-scale environmental data. Combination frameworks with multi-scale decomposition, graph-based attention, and GRU-MoE models have shown excellent performance in spatiotemporal water quality dynamics, which consequently increases the resilience of aquaculture systems, as well as data-animated management [23,24]. This learning paradigm remains highly underexplored for overcoming the data deficiency and complexity in inland water quality anticipation, which has greater capacity to represent the high-dimensional associations and supplement the conventional labeling techniques [25]. However, these models are normally resource-intensive in terms of both labelling effort and computation cost during training. In less demanding data collection settings such as aquaculture, datasets are not very large, and deep learning methods are generally inconvenient and vulnerable to poor generalization. Moreover, deep learning models are required to adapt to new settings or difficulties that remain unseen unless massive retraining is carried out. These shortcomings highlight the need for alternative learning paradigms in high-performance cases and data-scarce conditions.
FSL is a promising approach that enables models to learn with a small number of examples per class, which has proven to be an efficient approach to deal with the issue of labeled data scarcity. FSL approaches focus on the acquisition of transfer representations that are especially useful in data-limited situations. ProtoNet is represented as a prototype in a detected embedding space and categorization is based on measures of distance. ProtoNet has demonstrated good performance in low-data regimes; it is less prone to overfitting, and the nature of the model is very interpretable, which makes it particularly suitable for tabular sensor data such as water quality measurements with moderate feature dimensionality and relatively separable class boundaries [11,26,27]. The field of FSL has already been effectively applied across domains, such as image recognition, speech processing, fault diagnosis, and biomedical applications. In practice, embedding distribution models with few shots that use metric learning have performed better with scarce and weakly similar vision in identifying fish species in comparison with traditional prototypical networks in the marine complex setting [13,27,28,29]. Likewise, few-shot training methods have demonstrated major advancements in the accuracy of rare object detection and the robustness of the perception of autonomous driving systems when less data is available. FSL pipelines combining pre-training, meta-learning, and fine-tuning coupled with feature attention have been shown to be highly accurate and real-time efficient at detecting plant diseases with little training data [30,31]. Moreover, prototype matching-based meta-learning has also been demonstrated to perform well in zero-shot and few-shot fault diagnosis, and few-shot meta-learning used with brain activity mapping has facilitated successful central nervous system drug discovery using limited datasets [32,33]. Table 1 provides a summary of representative FSL approaches across various application domains.
Most current water quality research still relies on traditional supervised-based learning paradigms, which typically require large well-labeled datasets. However, these approaches face challenges in realistic aquaculture settings where data are sparsely labeled, sensor measurements are noisy, and conditions vary seasonally. As a result, there remains a research gap in data-efficient learning paradigms for aquaculture water quality evaluation. To address this gap, the present study proposes a systematic implementation of FSL framework to classify aquaculture water quality in limited settings.

3. Materials and Methods

This research adopted a comparative experimental methodology to evaluate the effectiveness of FSL for water quality classification in fish cultivation ponds under limited labeled data conditions. The proposed framework integrated a ProtoNet as the primary model and compared its performance with traditional machine learning classifiers trained on a constrained dataset. The core motivation was that in real aquaculture environments, collecting large volumes of labeled water quality data is expensive and time-consuming. Therefore, models must generalize effectively from a very small number of samples per class. The overall workflow shown in Figure 1 consists of data acquisition and preprocessing, few-shot episodic task construction, feature embedding using a neural encoder, prototype-based classification, K-shot ablation analysis, and performance comparison with classical ML baselines.

3.1. Dataset Description

The water quality dataset employed in this study was obtained from the Mendeley Data Repository (Dataset DOI: https://doi.org/10.17632/y78ty2g293.1) [34]. Table 2 presents the descriptive statistical summary of physicochemical and biological parameters in the dataset, which characterizes the water quality condition in aquaculture environments. The variables exhibit substantial variability across features, such as high and low concentrations of nutrients, high and low pH and high and low temperatures. These variations are intended to mimic the actual conditions in aquaculture, including seasonal changes, environmental disruptions, and sensor differences. In addition, there are relationships between features that are not linear and have non-uniform distributions, plus overlapping distributions of features. This inherent complexity increases the difficulty of the classification task and makes it appropriate to test both few-shot and conventional machine learning methods under realistic data-limited conditions.
The main physicochemical and biological parameters of water quality used in this study are presented in Table 3, along with their corresponding units. These parameters are very important in the stability and health of an aquaculture system. Temperature, dissolved oxygen, and pH directly affect metabolic processes, respiration efficiency and physiological stress responses in aquatic systems. Nutrient-related parameters such as ammonia, nitrite and phosphorus are closely correlated with organic pollution and eutrophication risk; high concentrations can cause toxicity and limit the availability of dissolved oxygen. Alkalinity, hardness, and calcium also help to buffer the water and maintain its ionic balance, which helps to keep water chemistry conditions stable. Furthermore, high concentrations of hydrogen sulfide and carbon dioxide can be lethal and have a toxic effect on fish due to their ability to affect oxygen transport and respiratory function. The abundance of plankton growth reflects the biological productivity and has been used to measure the natural food available in the water, while overgrowth can cause oxygen depletion at night. They together give a complete picture of the water quality dynamics in aquaculture systems.

3.2. Exploratory Data Analysis

Figure 2 shows the distribution plots of each water quality indicator based on histograms. A majority of the variables displayed non-Gaussian and positively skewed distributions, such as ammonia, nitrite, phosphorus, plankton and hydrogen sulfide, indicating unusual but important pollution incidents. Dissolved oxygen and pH had relatively consistent distributions within acceptable biological limits, while turbidity and alkalinity had great variability. These distributions suggest non-uniformity and imbalance of data, highlighting the need for data normalization and robust learning approaches to deal with non-Gaussian and heavy-tailed distributions.
Figure 3 displays box plots to show the central tendency, spread and outlier values for each water quality parameter. Different water quality parameters such as ammonia, nitrite, phosphorus, hardness and alkalinity showed many outliers, reflecting true water quality problems rather than measurement errors. The presence of numerous outliers highlighted the variability and unpredictability of aquatic environments. This supports the inclusion of extreme values in the training set, as they enhance the model’s capability to predict real pollution events and rare yet important water quality conditions.
Figure 4 shows the Pearson correlation heatmap of linear relationships between water quality parameters and water quality class. The heatmap reveals that biochemical oxygen demand (BOD), nitrite, alkalinity, and dissolved oxygen (DO) had moderate to high correlations with water quality, suggesting that these parameters play a crucial role in determining water quality. By contrast, turbidity was negatively correlated with DO and water quality, which implies that suspended sediment decreases the oxygen level and impairs water quality. Temperature and pH had low correlations and, therefore, have low individual explanatory power. The mixed positive and negative correlations among parameters confirmed the multivariate nature of the parameters, supporting the use of multivariate learning models instead of individual thresholding.

3.3. Data Preprocessing

3.3.1. Dataset Cleaning and Validation

The dataset contains N   =   4300 independent telemetry observations collected from fish cultivation ponds. Each observation consists of D = 14 continuous physicochemical and biological parameters describing localized pond conditions such as temperature, turbidity, dissolved oxygen, biochemical oxygen demand, carbon dioxide concentration, pH, alkalinity, hardness, calcium, ammonia, nitrite, phosphorus, hydrogen sulfide, and plankton density. The target variable represents water quality status and is categorized into three mutually exclusive class states:
y 0 ,   1 ,   2
corresponding to Excellent, Good, and Poor conditions, respectively.
The complete dataset is denoted as D   =   { ( x i ,   y i ) } i = 1 4300 , where x i R 14 represents the feature vector and y i { 0 ,   1 ,   2 } denotes the corresponding class label. The class labels are provided in the original dataset released by the authors of the Mendeley repository and are constructed based on a predefined Water Quality Index (WQI)-based thresholding scheme, which maps continuous physicochemical measurements into discrete water quality categories. No relabeling or modification of the original dataset annotations was performed in this study. The dataset exhibits a near-balanced distribution, consisting of 1400 samples for Class 0, 1400 samples for Class 1, and 1500 samples for Class 2.

3.3.2. Train–Test Partitioning

To guarantee strict validation rigor, a stratified holdout protocol was adopted. The complete dataset resource pool was partitioned into an 80% meta-training split ( N t r a i n = 3440 ) and an independent 20% meta-testing validation holdout ( N t e s t = 860 ) using stratified sampling. The sample boundaries are mapped as
D = D t r a i n D t e s t
with
D = D t r a i n D t e s t = ,
ensuring complete structural separation between optimization and evaluation phases.
Reflecting the global stratification parameters with absolute precision down to two decimal places, the meta-training split contains exactly 1120 samples for Class 0 (32.56%), 1120 samples for Class 1 (32.56%), and 1200 samples for Class 2 (34.88%). Mirroring this balance perfectly, the meta-testing validation holdout holds exactly 280 samples for Class 0 (32.56%), 280 samples for Class 1 (32.56%), and 300 samples for Class 2 (34.88%). The training partition was used exclusively for episodic meta-training of the deep FSL models, while the testing partition remained completely unseen until final inference evaluation.

3.3.3. Feature Scaling and Normalization

Effective data preprocessing is a critical step to ensure the reliability, stability, and generalizability of machine learning and FSL models. In this study, a structured preprocessing pipeline was applied prior to model training and testing to prevent data leakage and ensure fair model comparison. A check was done on the dataset to identify missing entries, duplication, inconsistencies, and invalid measurements. No missing observations were found, nor duplicate samples identified. All the variables were checked for numerical consistency and suitability for statistical analysis and input to the model. Because water quality parameters exhibit entirely different numerical ranges and physical measurement units, all the features were standardized using z-score normalization to ensure geometric optimization stability [35]:
x i j = x i j μ j σ j 2 +   ,
where the local feature mean ( μ j ) and standard deviation ( σ j ) are defined as:
μ j = 1 N t r a i n i = 1 N t r a i n x i j  
σ j = 1 N t r a i n i = 1 N t r a i n ( x i j μ j ) 2 ,
which are computed exclusively from the training partition. A small stabilization constant ( = 10 8 ) is included within the radical denominator to avoid numerical division-by-zero errors. The fitted normalization parameters were subsequently applied to the testing partition as a static transform layer to completely prevent future lookahead information leakage.

3.3.4. Episodic Few-Shot Evaluation Framework

To evaluate out-of-distribution generalizability under highly restricted labeled data conditions, an episodic evaluation strategy was executed over four discrete shot horizons:
K 1 , 5 , 10 , 15
For an independent evaluation task, a localized Support Set ( S ) and Query Set ( Q ) are dynamically extracted from the hidden testing partition. For any target class ( c ), the class support slice is defined as
S c = x i , y i y i = c ,
which contains exactly K support reference samples. The collective episodic support set is aggregated as
S = c = 1 C S c
maintaining a total capacity of S = K × C , where C = 3 total target classes.
To evaluate classification performance with strong statistical significance, an expanded query matrix is extracted containing exactly 80 query samples per class:
Q = c = 1 C Q c   ,
maintaining a task volume of Q = 240 total query samples per episode. Support and query index allocations are sampled without replacement, ensuring a strict mutual exclusion barrier:
S c Q c =   ,
which eliminates sample overlaps or identity data leakage within an individual task loop. To reduce stochastic variability and improve statistical robustness, episodic task generation was repeated across five independent random seeds {42, 52, 62, 72, 82}. For each shot horizon, 100 evaluation episodes were generated per seed, resulting in 500 independent evaluation tasks. Final performance metrics were reported as the mean and standard deviation across all the generated episodes.

3.4. Deep Few-Shot Learning Models

Three distinct metric-based meta-learning paradigms were investigated: prototypical networks, Siamese Networks, and Matching Networks. Unlike traditional online machine learning classifiers, these models underwent an intensive offline episodic meta-training phase over the 3440 samples inside the training partition pool. During this optimization stage, thousands of simulated few-shot tasks were dynamically generated from D t r a i n to update model parameters and teach the networks how to map generalized feature relationships rather than memorize static class perimeters.

3.4.1. Shared Embedding Network

To ensure complete structural comparative fairness, all three deep meta-learning architectures share the exact same structural encoder backbone network. The mapping path transitions continuous tabular variables as
14   256 128 64
The embedding function is defined as f θ   : R 14 R 64 , where θ represents the learnable parameter weights of the linear layers. Every hidden dimension layer incorporates batch normalization, a rectified linear unit (ReLU) activation function, and an aggressive dropout layer ( p   =   0.30 ) to prevent internal covariate shift and maximize out-of-distribution variance stability. Optimization utilized the Adam optimizer with a fixed learning rate of 10 3 over exactly 1000 episodic training iterations.

3.4.2. Prototypical Networks

The limited availability of labeled data in aquaculture water quality monitoring is a challenge that is addressed by this study, which proposes an FSL framework based on ProtoNet. ProtoNet was chosen because of its good metric-based classification ability, low data learning ability, simple architecture, and strong generalization ability. ProtoNet is a novel supervised learning method that learns a discriminative embedding space where the samples of a class are grouped around a representative class prototype vector. Then classification is done based on the distance between the query samples and the class prototypes in the embedding space [13].
P c = 1 S c x i , S c f θ x i ,
where S c denotes the subset of support samples with label c . These prototypes serve as class representatives in the embedding space and provide an intuitive geometric interpretation of class structure. Increasing the value of K leads to more stable prototype estimation, thereby improving classification robustness.
For a query sample x q , its embedding f θ x q is compared against all the class prototypes using a distance metric. In this study, Euclidean distance is employed due to its simplicity and effectiveness [28]:
d x q , P c = f θ x q P c 2 2  
The probability of assigning x q to class c is computed using a softmax over negative distances:
P y = c x q = e x p ( d ( x q , P c ) c = 1 C e x p ( d ( x q , P c )  
This formulation ensures that query samples closer to a prototype in the embedding space are assigned higher confidence scores.

3.4.3. Siamese Networks

Siamese Networks optimize parameters to distinguish pairwise similarity relationships across localized contrastive metrics. For any sample pair x i , x j , the structural feature distance is computed as the pairwise L 2 norm of their respective hidden embeddings [11]:
d i j = f θ x i f θ x j 2
The network optimizes its embedding space projections by minimizing a paired Contrastive Loss function over batch configurations:
L = y d i j 2 + 1 y   m a x ( 0 ,   m d i j ) 2 ,
where y   =   1.0 if the pair matches the identical class category, y   =   0.0 if the pair maps to different categories, and m denotes the rigid margin parameter boundary set to 1.0.

3.4.4. Matching Networks

Matching Networks employ a continuous attention-weighted nearest-neighbor lookup strategy to classify query rows. Support and query vector representations are initially forced through an element-wise L 2 normalization layer to map them onto a shared unit sphere hypersphere [12]:
f ^ ( x ) = f θ x f θ x 2
Attention weights are dynamically computed over query vectors by extracting the softmax cosine similarities across all the available support references:
a ( x q , x i ) = e x p ( f ^ ( x q ) T f ^ ( x i ) ) j e x p ( f ^ ( x q ) T f ^ ( x j ) )
Final query class probabilities are obtained through a vectorized aggregation step, distributing these attention weights directly onto the support label fields.

3.5. Traditional Machine Learning Baselines

Four baseline classifiers based on conventional machine learning were implemented to compare the effectiveness of deep episodic meta-learning frameworks, including LR, RF, SVM, and XGBoost. None of these classifiers were trained to learn the representation or optimize their parameters offline with the whole training partition as in the meta-learning architectures. They therefore started each evaluation episode with no prior knowledge of the task and instead used only the few labeled samples for the task they were currently testing.
The hyperparameters were standardized for all the evaluations for experimental reproducibility [35,36].
  • LR was implemented with L 2 regularization with regularization strength C = 1.0 and a maximum number of iterations to optimize set to 1000.
  • RF was based on 300 decision trees and Gini impurity as the splitting criterion.
  • SVM was used with a linear kernel and (probability = True).
  • XGBoost algorithm was used with the XGBClassifier framework and optimized with the objective function called multiclass logistic loss (eval_metric = ‘mlogloss’).
Each conventional classifier was used as a task-specific estimator in the evaluation. For each testing episode that was run independently, a new classifier instance was created and trained only on the support set:
c l f . f i t (   S x ,   S y ) ,
where S x and S y denote the support feature and corresponding class labels.
The total amount of labeled information available to each classifier was therefore restricted to
S = K × C ,
where K is the shot horizon and C = 3 is the number of water quality classes. So, for the 1-shot, 5-shot, 10-shot and 15-shot settings, only 3, 15, 30 and 45 observations were available to be used for training.
After fitting, the trained classifier generated predictions for the unseen query samples x q Q associated with the same evaluation episode. This assessment model is designed to highlight the disparities between two contrasting learning models. While conventional machine learning models have to learn decision boundaries from a few support samples in each task, meta-learning architectures can use representations learned offline based on an episodic training set and reuse them on the entire training partition. This leads to a benchmark of how well representation-based adaptation does compared to task-specific fitting in the same few-shot inference conditions.
The episodic training is performed using an N-way K shot sampling strategy, where support and query sets are dynamically constructed from the training data for each episode. A fixed number of training episodes is used as the stopping criterion, with performance monitored across evaluation episodes to ensure stable convergence across all the shot settings. All the experiments are implemented in Python (Version 3.12.13) using the PyTorch deep learning framework (Version 2.11.0) and executed on Google Colab with GPU acceleration. The computational environment includes 16 GB RAM, along with standard scientific libraries such as NumPy (Version 2.0.2) and Scikit-learn (Version 1.6.1) for preprocessing and evaluation. These details are provided to ensure full transparency and complete reproducibility of the proposed FSL framework.

3.6. Evaluation Metrics

In this study, the proposed FSL method using ProtoNet for fish cultivation pond water quality classification is evaluated. We compare the results with traditional ML approaches with constrained data. Various performance metrics such as precision, recall, F1-score, accuracy, AUC, embedding plot and confusion matrices are used for quantitative evaluation and qualitative interpretation of the results. These formulas are based on the four possible outcomes of a classification problem including true positives (TPs), true negatives (TNs), false positives (FPs) and false negatives (FNs). The equations are as follows [35,36]:
Accuracy   =   T P + T N T P + T N + F P + F N
Precision = T P T P + F P
Recall = T P T P + F N
F 1 - Score = 2 . P r e c i s i o n R e c a l l P r e c i s i o n + R e c a l l
The area under the receiver operating characteristic curve (ROC-AUC) was additionally calculated using a macro-averaged, one-vs-rest multiclass mapping routine. To account for the natural class distribution of the dataset, all the primary metrics—including precision, recall, and F1-score—are computed using a weighted averaging matrix across all evaluation runs. All the final outputs were averaged across 100 evaluation episodes across five independent random seed initializations: {SEEDS} = {42, 52, 62, 72, 82}, resulting in a total of 500 evaluation episodes per model.

4. Results

4.1. Performance Comparison with ML Baselines

Table 4 presents the performance comparison between the proposed FSL frameworks (prototypical network, Siamese Network, and Matching Network) and traditional machine learning classifiers (LR, RF, SVM, and XGBoost) under four K-shot learning scenarios. The accuracy, precision, recall, F1-score, and ROC-AUC were used to assess the performance, and accuracy was reported as the mean ± standard deviation (SD) of several runs of the experiments. Deep models that learned via few shots using metrics were consistently more successful than the traditional machine learning methods for each of the shot horizons. The Siamese Network outperformed all the other networks in the hardest one-shot scenario with only one label for each class, with an accuracy of 86.48%, followed by MatchingNet (82.72%) and ProtoNet (82.69%). All the traditional machine learning models, however, failed to do well on the very few training samples, as the RF model accuracy was only 57.09%, and LR and SVM models were about 51% accurate. The XGBoost algorithm demonstrated the lowest accuracy, with a score of 33.33% (random guessing in a three-class problem). With the increase in the support samples, classification performance for all the models improved. ProtoNet demonstrated the greatest improvement, with accuracies for 5, 10, and 15 shots of 93.31%, 94.16%, and 94.46%, respectively. The MatchingNet was also analyzed, and similar trends were seen, with 94.00% accuracy for 15 shots. ProtoNet outperformed all the other models in the higher-shot settings, demonstrating better prototype estimation and class representation learning ability in the presence of more support samples, while Siamese Network gained the highest accuracy in the one-shot setting. The performance of the traditional machine learning classifiers was evaluated consistently, and the best results were obtained by the RF, with an accuracy of 81.10% and ROC-AUC of 91.62% in the 15-shot setup. LR, SVM and XGBoost exhibited a steady improvement in performance with an increase in support set size but still performed significantly worse compared to the FSL models. At the 15-shot level, the best performance of the traditional classifiers was just more than 13 percentage points behind the prototypical network. The ROC-AUC results also show the effectiveness of the FSL models. ProtoNet exhibited ROC-AUC scores from 93.12% to 98.65% in one-shot and 15-shot settings, respectively, demonstrating good class separability and strong probabilistic discrimination. MatchingNet and Siamese Network also obtained consistently high ROC-AUC performance values of more than 96% in the high number of shots. In contrast, the traditional machine learning models showed significantly lower ROC-AUC scores, indicating their poor discriminatory power in limited data environments. Additionally, the standard deviations of the prototypical network were low (0.48–1.08%) across all the shot horizons, which demonstrates good training stability and reproducibility. The results show that metric-based FSL architectures are far superior to standard machine learning methods in the face of extreme few-shot data labeling conditions. The findings confirm the feasibility of prototypical networks as a robust and scalable approach to water quality classification in water quality conditions that are challenging for aquaculture applications, where collecting large amounts of labeled data is difficult and expensive. The performance comparison of radar profiles across shifting horizons is also illustrated in Figure 5.

4.2. Multi-Shot Ablation Matrix and Parameters Sensitivity Analysis

To assess an FSL system’s performance with respect to its latent feature dimensionality, we performed an ablation study of the embedding dimension in the three metric-based systems, prototypical networks (ProtoNet), Siamese Networks, and Matching Networks, that are presented in Table 5 and Figure 6.
The best embedding dimensions (d = 16, 32, 64, and 128) and FSL scenarios (K = 1, 5, 10, and 15 shots) were selected to identify the best embedding dimensions for water quality classification. The results show a positive correlation between embedding dimensionality and classification accuracy for all the models, which is consistent across all. When trained with the one-shot setting, ProtoNet achieved a significant improvement in accuracy from 73.15% at d = 16 to 82.69% at d = 64, gaining around 9.54% of accuracy. For higher shots, the accuracy improved from 82.42% to 93.31% in five-shot and 83.60% to 94.46% in 15-shot. The results demonstrate that ProtoNet can learn more discriminative prototype representations and better deal with complex relationships between the physicochemical water quality parameters in larger embedding spaces. A similar trend was observed for Siamese Networks. In the one-shot setting, the system accuracy increased gradually from 79.10% at d = 16 to 89.12% at d = 128, and it was successful in the 15-shot setting with 92.18% accuracy. The progressive improvement indicates that the improvement of pairwise similarity learning is propelled by the rise in the representation capacity, so the network can distinguish better inter-class relationships and intra-class relationships in the embedding space. There was also a large increase in embeddability in Matching Networks. Accuracy rose from 72.98% at d = 16 to 82.72% at d = 64 in the one-shot setting and further improved to 94.30% at d = 128 in the 15-shot scenario. Higher-dimensional embedding is observed to provide better performance of attention-based support-query matching mechanisms, due to the ability to retain more discriminative feature information. Notably, the performance improvement for d = 128 over d = 64 was relatively small for all the architectures. In ProtoNet, the 15-shot result of increasing the embedding dimension from 64 to 128 was only 0.31 percentage points better (94.46% to 94.77%). The same marginal gains were seen with Matching Networks and Siamese Networks. This saturation effect indicates that many of the discriminative information points are already contained in a 64-dimensional latent space, and a greater embedding will have little gain in practical performance but will add to the computational complexity. The results of the ablation study reveal that embedding dimensionality is important and critical to the performance of few-shot water quality classification. The results showed that an embedding dimension of d = 64 is a good compromise between the complexity of the models, the computational time, and the classification accuracy. Therefore, d = 64 was chosen as the optimal embedding size for the final experimental setup, as it was found to be close to the optimum size while requiring significantly less computational time than larger embedding sizes.

4.3. Cross-Paradigm Manifold and Projection Verification

To study the classification mechanisms in deep metric-based FSL and traditional few-shot supervised machine learning approaches, principal component analysis (PCA) was performed on a representative five-shot evaluation episode. The two-dimensional projections are displayed in Figure 7. Finally, the visualization of the prototypical network embedding space (see Figure 7a) shows that episodic meta-learning is able to build very discriminative latent representations. The first two principal components explain 82.02% and 14.38% of the variance, respectively, accounting for 96.40% of the information in the learned feature space. The samples within the same water quality category are grouped tightly, and there is a distinct separation between the classes. The samples of query items are seen to converge to the corresponding support samples and prototype centroids. It shows high intra-class cohesion and low intra-class variance. This structured embedding geometry enables accurate classification through simple Euclidean distance calculations between query embeddings and class prototypes. The clear separation between the three categories of Excellent, Good and Poor water quality indicates that ProtoNet is indeed learning generalized representations that can distinguish water quality conditions in the presence of limited labeled data. Compared to this, Figure 7b shows the projection of the original feature space in which a standard SVM classifier was trained in the same five-shot setting. Class-related information is still widely spread out across many dimensions, with only 25.62% of the total variance explained by the first two principal components. There is significant overlap between the three water quality classes, which is different from ProtoNet’s structured manifold. The linear decision boundaries created by the limited number of support samples, therefore, do not closely reflect the data distribution. Many query samples are placed close to or over class boundaries, leading to greater ambiguity in classification and less generalization. It is important to note that there is a fundamental difference between meta-learning and traditional supervised learning between Figure 7a and Figure 7b.
Standard machine learning models try to learn decision functions directly from a very small sample of points in a support set, which results in being very sensitive to sampling variability and data sparseness. FSL models, on the other hand, provide a representation learned over episodic meta-training so that unseen samples can be mapped to a discriminative latent space for classification via distance-based reasoning. During the experimental evaluation, the superior performance of ProtoNet, Siamese Networks and Matching Networks is attributed to this representation transfer mechanism. In conclusion, the qualitative evidence offered by the PCA visualization, combined with the quantitative results shown in Tables X–Y, suggests that the superior performance of FSL frameworks is not due to the ability to estimate the decision boundary of the respective tasks alone, but stems from the capability to learn transferable and well-structured feature manifolds.
To further investigate class-specific predictive behavior, row-normalized confusion matrices were generated for all the models at the five-shot evaluation setting, as illustrated in Figure 8. These confusion matrices provide detailed insight into classification accuracy and misclassification patterns across the three water quality categories (Excellent, Good, and Poor). For the Excellent water quality category, all the deep metric-learning frameworks demonstrated near-perfect classification performance. ProtoNet and MatchingNet achieved a true positive rate of 100.00%, while Siamese Networks achieved 99.99%. These results indicate that the learned embedding representations effectively separate high-quality water samples from the remaining classes. In contrast, the conventional machine learning models exhibited greater classification uncertainty. For example, the SVM classifier misclassified 11.01% of the Excellent samples as Poor, suggesting that decision boundaries estimated from a highly restricted support set may not adequately represent the underlying feature distribution. The Good water quality category also exhibited strong classification performance among the FSL models. MatchingNet achieved the highest true positive rate (99.86%), followed by ProtoNet (97.42%) and Siamese Networks (96.62%). A notable characteristic of ProtoNet is that its errors were primarily localized to the neighboring Poor category, with no misclassifications into the Excellent category. This behavior suggests that the learned embedding space preserves the ordinal relationship between water quality conditions, where samples with similar physicochemical characteristics remain close in the latent representation space. Consequently, classification errors tend to occur between adjacent categories rather than across distant classes. The Poor water quality category represented the most challenging classification task across all the models. Compared with the Excellent and Good classes, the Poor water quality samples exhibited greater variability and overlap with neighboring categories, resulting in reduced classification performance. Nevertheless, ProtoNet maintained the highest class-level accuracy with a true positive rate of 80.46%, outperforming MatchingNet (75.07%) and Siamese Networks (72.44%). These findings indicate that prototype-based representation learning provides greater robustness when handling complex and heterogeneous environmental conditions. The conventional machine learning baselines showed substantially lower performance within the Poor category. LR achieved only 31.79% class accuracy, while XGBoost achieved 31.20%. In both cases, a large proportion of the Poor samples were misclassified as Excellent or Good. RF demonstrated improved performance relative to the other conventional models but still achieved only 29.25% correct classification for the Poor category. These results suggest that traditional classifiers trained exclusively on a small number of support samples struggle to capture the complex nonlinear relationships governing degraded water quality conditions. Overall, the confusion matrix analysis provides qualitative evidence supporting the quantitative performance metrics reported in Table 4. The deep metric-learning frameworks consistently produced more structured and interpretable error patterns, whereas conventional machine learning models exhibited higher levels of inter-class confusion under limited-data conditions. Among all the evaluated approaches, ProtoNet demonstrated the most balanced performance across all the water quality categories, particularly in the challenging Poor class, highlighting its suitability for water quality assessment under FSL scenarios.

4.4. Meta-Training Convergence Behavior

The training loss curves are shown in Figure 9, along with the episodic meta-validation accuracy curves. These curves help to understand the optimization behavior and convergence of the considered FSL models with a metrics-based approach for episodic training. Prototypical networks (ProtoNet) and Matching Networks show stable and efficient convergence behavior. The training loss decreases at a fast rate in the first 150 episodic iterations, and then both models converge to a low-loss regime (around the 0.5 mark). This quick convergence is the result of the effectiveness of metric-based representation learning, in which class prototypes and attention-based similarity mechanisms allow for an efficient organization of the embedding space. Similarly, the accuracy for meta-validation continues to improve smoothly, with the accuracy of the top two models above 90% of the total after a training episode number of around 200. The two architectures maintain consistent performance after this point, with only small variations with subsequent updates. These findings suggest that prototype-based centroid learning (ProtoNet) and similarity-based matching mechanisms (MatchingNet) are effective in providing strong inductive biases for rapid and stable convergence in low-shot settings. The Siamese Network, on the other hand, has more of a slow and unstable convergence pattern in the initial phases of training. In the first 100–120 episodic iterations, it is seen that the loss and accuracy fluctuate, and in some iterations, accuracy drops to around 55% and then rises in subsequent iterations. This behavior has been claimed to be caused by the sensitivity of the contrastive loss optimization with limited support sampling, which might cause instability in the initial embedding space formation by both minimizing intra-class distance and maximizing inter-class separation simultaneously. In this stage, a considerable reorganization of the embedding space takes place and the loss fluctuates more. But after about 180 training episodes, however, the optimization process settles into a new regime, and the loss slowly approaches the new steady-state region. Siamese Network achieves stable improvement in the validation accuracy during the later training phases and eventually reaches the level of ~93.5% accuracy. There is also an improvement in the stability of the learned representation as the variance in performance is reduced. This implies that the contrastive learning approach needs more adaptation time than the prototype-based approach but could also be stabilized and discriminative in the embedding space if sufficiently trained. The overall results from the convergence analysis indicate distinct differences between the optimization dynamics of the investigated architectures. While Siamese Networks need more training to reach competitive results, Prototype-based methods (ProtoNet and MatchingNet) have quicker convergence and stability in the early training phases. The results highlight the compromise between stability of the optimization process and flexibility in the representation learning approach in few-shot classification of water quality using a metric-based framework.
Another interesting point is that ProtoNet errors are localized around cluster boundaries, in contrast to traditional models that result in misclassification all over the feature space. This means that ProtoNet has a more organized latent representation, which facilitates enhanced generalization and discrimination between classes. The small size of properly categorized samples also lends credence to the hypothesis that metric-based learning maximizes intra-class consistency and maximizes the inter-class distance. These results support the high quality of ProtoNet in the process of working with limited labeled water quality data. The proposed few-shot framework reduces areas of uncertainty and enhances interpretability by learning a semantically meaningful embedding space. The use may be especially useful in aquaculture and environmental surveillance where it is necessary to have a dependable measure of borderline water quality conditions to make operational decisions and to avoid risks.

4.5. Statistical Significance Analysis (Paired t-Test Evaluation)

Table 6 illustrates the performance difference between the prototypical network (ProtoNet) and the baseline models, which were tested using a paired t-test for each of the four shot settings like 1-shot, 5-shot, 10-shot, and 15-shot. The paired design was employed as all the models were tested against the same episodic test sets with the same support–query configurations, thus yielding comparable and dependent observations. Performance was measured over multiple evaluation episodes for each shot setting, and differences between ProtoNet and each baseline model were calculated for each pair of models. Then, a paired t-test was used to see if the difference in the means is significantly different from zero under the null hypothesis that there is no difference in performance. The significance level was set at p < 0.05; a significance level of p < 0.01 and p < 0.001, respectively, was considered a higher statistical significance.
The results show that ProtoNet outperforms all traditional machine learning baselines including LogReg, RF, SVM, and XGBoost, with a statistically significant margin in each of the shot configurations. The performance differences observed during episodic evaluation were generally highly significant (p < 0.001), indicating that the improvements are unlikely to be due to random factors. These findings demonstrate that metric-based few-shot learning is effective in extracting discriminative feature representations in the case where only a few labeled samples are provided. ProtoNet exhibited mixed results in extremely limited data scenarios compared with other few-shot deep learning methods. In the 1-shot setting, the difference between ProtoNet and Matching Network was not statistically significant (p = 0.977), suggesting that both approaches have comparable performance when the class representation is derived from only a single support sample per class. This result is expected as a single example can not accurately represent the class distribution. However, the number of samples that were used for support increases, ProtoNet provides statistically significant improvements in comparison with Siamese Network and Matching Network. The results show that additional support samples enable more accurate prototype estimation which leads to better class representation and better separation between water quality categories. In general, the statistical analysis illustrates that ProtoNet works better with larger support sets in episodic learning scenarios, offers more stable and reliable predictions when labeled data is scarce. The results are supported by several repeated episodic evaluations performed under different random seeds, enhancing the robustness and reliability of the reported results.

4.6. Discussion

The results show that deep metric-based FSL is an excellent approach to classify water quality in aquaculture with limited labeled data. Meta-learning models with episodic methods produced better results than traditional machine learning models in all the K-shot settings, indicating that representation learning is better than directly estimating the decision boundary in data-scarce settings. In particular, ProtoNet was found to be more scalable as more support samples were provided and was seen to gradually improve with more samples provided, perhaps due to the stability of prototype-based class representations in the learned embedding space. Siamese Networks, on the other hand, showed better results in very low-shot scenarios, indicating that pairwise similarity learning is more advantageous when having only a small amount of labeled data. While traditional machine learning models are competitive in higher-shot scenarios, they demonstrated that under severe data limitation, they are not robust and rely on having enough training samples to reliably estimate a decision boundary. The Poor water quality class was the most difficult to manage in all the models, possibly because of the higher intra-class variability and overlap with adjacent classes. The learned embedding spaces, however, showed better discrimination for this class with the metric-based models, especially ProtoNet, suggesting that nonlinear relationships among the physicochemical parameters are better captured by learned embedding spaces. The ablation study also underscores the importance of embedding dimensionality for optimal model performance, where 64 dimensions serve as a good balance between representational capacity and computational costs. An increase in dimensions did not show significant gains, and beyond this point, there were diminishing returns with tabular aquaculture data. The results are also important from an application standpoint as they apply to the real-world aquaculture systems where labeled data is often expensive, noisy, and scarce. FSL models provide a viable solution for learning effectively with only a few shots. The present study, however, was based on a single dataset and a static classification system, and therefore may have some limitations with respect to generalizations to different environmental conditions. Also, it lacks temporal modeling capabilities to capture temporal changes in water quality. Future work will focus on cross-domain adaptation, temporal sequence modelling, and more sophisticated meta-learning architectures, like transformer and graph-based models, to further improve robustness and real-world applicability.

5. Conclusions

This study comprehensively investigated few-shot learning algorithms for deep metrics in aquaculture water quality classification, under scarce labeled data settings. Three episodic meta-learning architectures were systematically compared with four classifiers of conventional machine learning, such as logistic regression, random forest, support vector machine, and XGBoost. The experimental results showed that few-shot learning models consistently outperformed the traditional supervised learning models for all the shot horizons. ProtoNet consistently had the highest overall classification accuracy (94.46%) and highest ROC-AUC (98.65%) on the 15-shot setting, whereas Siamese Networks performed best on the most restrictive one-shot setting. Latent space visualization, confusion matrix analysis, paired t-test statistical analysis and convergence studies further corroborated the superior performance of the few-shot learning frameworks by demonstrating increased class separability, improved recognition of the hard water quality categories, and stable optimization behavior. The ablation analysis also revealed that embedding dimensionality plays a crucial role in determining the effectiveness of a model, as a latent representation with 64 dimensions proved to be optimal with a balance between accuracy and computational complexity. Moreover, the fair episodic benchmarking framework revealed that representation-based meta-learning techniques have significantly higher data efficiency than traditional machine learning techniques in scenarios with a limited amount of labeled data. The results from this research demonstrate the feasibility and effectiveness of few-shot learning in assessing water quality in aquaculture systems, especially in settings where collecting extensive amounts of annotated data is not feasible. The proposed framework holds promise for intelligent water quality monitoring, early detection of environmental risks, and responsible aquaculture operation, all achieved with minimal labeled samples to facilitate precise classification. Future work will involve multi-source sensor integration, development of temporal water quality forecasting, transformer-based meta-learning architectures, domain adaptation across different aquaculture settings, and real-time deployment in an Internet of Things (IoT) enabled smart aquaculture system. These advancements could significantly boost the adaptability, versatility, and effectiveness of few-shot learning in environmental monitoring and precision aquaculture. The proposed method also aligns with the United Nations Sustainable Development Goal (SDG), in particular SDG6 (Clean Water and Sanitation), by supporting improved monitoring and management of water quality to reduce pollution risks and promote the sustainable utilization of freshwater resources.

Author Contributions

Conceptualization, Y.H.N. and G.C.C.; methodology, A.R., G.C.C. and Y.H.N.; software, A.R.; validation, Y.H.N., G.C.C. and K.Y.C.; formal analysis, S.F.T.; investigation, K.Y.C. and S.F.T.; resources, A.R.; data curation, A.R.; writing—original draft preparation, A.R.; writing—review and editing, Y.H.N. and G.C.C.; visualization, K.Y.C. and S.F.T.; supervision, Y.H.N. and G.C.C.; project administration, G.C.C.; funding acquisition, G.C.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data presented in this study are openly available in Mendeley Data at https://doi.org/10.17632/y78ty2g293.1 (accessed on 15 January 2026). The dataset can be accessed and downloaded from the repository.

Acknowledgments

All content was carefully reviewed and verified by the authors to ensure accuracy and originality. The authors take full responsibility for the final content of the manuscript. The authors would like to express their sincere gratitude to their respective institutions for providing research support and facilities. The authors also thank the editor and reviewers for their valuable comments and suggestions, which have significantly improved the quality of this manuscript.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Boyd, C.E.; McNevin, A.A.; Davis, R.P. The contribution of fisheries and aquaculture to the global protein supply. Food Secur. 2022, 14, 805–827. [Google Scholar] [CrossRef] [PubMed]
  2. Turlybek, N.; Nurbekova, Z.; Mukhamejanova, A.; Baimurzina, B.; Kulatayeva, M.; Aubakirova, K.M.; Alikulov, Z. Sustainable Aquaculture Systems and Their Impact on Fish Nutritional Quality. Fishes 2025, 10, 206. [Google Scholar] [CrossRef]
  3. Yang, H.; Tan, T.; Du, X.; Feng, Q.; Liu, Y.; Tang, Y.; Bai, G.; Liu, Z.; Xia, S.; Song, S.; et al. Advancements in freshwater aquaculture wastewater management: A comprehensive review. Aquaculture 2025, 594, 741346. [Google Scholar] [CrossRef]
  4. Hridoy, A.A.M.; Neogi, S.; Ujjaman, R.; Hasan, M. Water quality interactions and their synergistic effects on aquaculture performance in Bangladesh: A critical review. Results Chem. 2025, 16, 102306. [Google Scholar] [CrossRef]
  5. Lindholm-Lehto, P. Water quality monitoring in recirculating aquaculture systems. Aquac. Fish Fish. 2023, 3, 113–131. [Google Scholar] [CrossRef]
  6. Mandal, A.; Ghosh, A.R. Role of artificial intelligence (AI) in fish growth and health status monitoring: A review on sustainable aquaculture. Aquac. Int. 2024, 32, 2791–2820. [Google Scholar]
  7. Zhang, F.; Hu, J.; Sun, Y. Underwater fish image recognition based on knowledge graphs and semi-supervised learning feature enhancement. Sci. Rep. 2025, 15, 45245. [Google Scholar] [CrossRef] [PubMed]
  8. Essamlali, I.; Nhaila, H.; El Khaili, M. Advances in machine learning and IoT for water quality monitoring: A comprehensive review. Heliyon 2024, 10, e27920. [Google Scholar] [CrossRef] [PubMed]
  9. Hridoy, M.A.A.M.; Bordin, C.; Masood, A.; Masood, K. Predictive modelling of aquaculture water quality using IoT and advanced machine learning algorithms. Results Chem. 2025, 16, 102456. [Google Scholar] [CrossRef]
  10. Arepalli, P.G.; Naik, K.J. Water quality classification framework for IoT-enabled aquaculture ponds using deep learning based flexible temporal network model. Earth Sci. Inform. 2025, 18, 351. [Google Scholar] [CrossRef]
  11. Snell, J.; Swersky, K.; Zemel, R.S. Prototypical Networks for Few-shot Learning. In Proceedings of the Advances in Neural Information Processing Systems 30, Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
  12. Derdour, A.; Baz, M.; Alzaed, A.; Bojer, A.K.; Ghoneim, S.S.M. Groundwater quality assessment using few-shot learning with prototypical, Siamese, and matching networks. J. Water Process Eng. 2025, 75, 108003. [Google Scholar] [CrossRef]
  13. Bueno, G.; Sanchez, L.; Cristobal, G.; Kloster, M.; Beszteri, B.; Salido, J. Phytoplankton identification with prototypical networks: A few-shot learning approach. Results Eng. 2025, 28, 106984. [Google Scholar] [CrossRef]
  14. Aljehani, F.; N’Doye, I.; Laleg-Kirati, T.-M. Feeding control and water quality monitoring on bioenergetic fish growth modeling: Opportunities and challenges. Aquac. Eng. 2025, 109, 102511. [Google Scholar] [CrossRef]
  15. Zhang, K.; Ye, Z.; Qi, M.; Cai, W.; Saraiva, J.L.; Wen, Y.; Liu, G.; Zhu, Z.; Zhu, S.; Zhao, J. Water Quality Impact on Fish Behavior: A Review From an Aquaculture Perspective. Rev. Aquac. 2025, 17, e12985. [Google Scholar]
  16. Boyd, C.E.; Tucker, C.S. Ecology of Aquaculture Ponds. In Pond Aquaculture Water Quality Management; Boyd, C.E., Tucker, C.S., Eds.; Springer: Boston, MA, USA, 1998; pp. 8–86. [Google Scholar]
  17. Baena-Navarro, R.; Carriazo-Regino, Y.; Torres-Hoyos, F.; Pinedo-López, J. Intelligent Prediction and Continuous Monitoring of Water Quality in Aquaculture: Integration of Machine Learning and Internet of Things for Sustainable Management. Water 2025, 17, 82. [Google Scholar] [CrossRef]
  18. Alnemari, A.M.; Elmessery, W.M.; Qazaq, A.S.; Moustapha, M.E.; Rakhimgaliyeva, S.; Abuhussein, M.F.A.; Alhag, S.K.; Al-Shuraym, L.A.; Moghanm, F.S.; Szűcs, P.; et al. Developing highly accurate machine learning models for optimizing water quality management decisions in tilapia aquaculture. Sci. Rep. 2025, 15, 35600. [Google Scholar] [CrossRef] [PubMed]
  19. Chandran, P.J.I.; Khalil, H.A.; Hashir, P.; S, V. Smart technologies in aquaculture: An integrated IoT, AI, and blockchain framework for sustainable growth. Aquac. Eng. 2025, 111, 102584. [Google Scholar] [CrossRef]
  20. Nagothu, S.K.; Sri, P.B.; Anitha, G.; Vincent, S.; Kumar, O.P. Advancing aquaculture: Fuzzy logic-based water quality monitoring and maintenance system for precision aquaculture. Aquac. Int. 2025, 33, 32. [Google Scholar]
  21. Cojbasic, S.; Dmitrasinovic, S.; Kostic, M.; Sekulic, M.T.; Radonic, J.; Dodig, A.; Stojkovic, M. Application of machine learning in river water quality management: A review. Water Sci. Technol. 2023, 88, 2297–2308. [Google Scholar] [CrossRef] [PubMed]
  22. Ji, X.; Liu, L.; Duan, B.; Li, Y.; Xing, H.; Wang, B.; Li, D. Long-term multivariate water quality forecasting for sustainable aquaculture management. Water Res. X 2025, 29, 100402. [Google Scholar] [CrossRef]
  23. Arepalli, P.G.; Naik, K.J. An IoT based smart water quality assessment framework for aqua-ponds management using Dilated Spatial-temporal Convolution Neural Network (DSTCNN). Aquac. Eng. 2024, 104, 102373. [Google Scholar] [CrossRef]
  24. Zhi, W.; Appling, A.P.; Golden, H.E.; Podgorski, J.; Li, L. Deep learning for water quality. Nat. Water 2024, 2, 228–241. [Google Scholar] [CrossRef] [PubMed]
  25. Zeng, W.; Xiao, Z. Few-shot learning based on deep learning: A survey. Math. Biosci. Eng. 2024, 21, 679–711. [Google Scholar] [CrossRef] [PubMed]
  26. Zhuang, Y.; Liu, P.; Yang, H.; Zhang, K.; Wang, Y.; Pu, Z. Few-shot learning for novel object detection in autonomous driving. Commun. Transp. Res. 2025, 5, 100194. [Google Scholar] [CrossRef]
  27. Lu, J.; Zhang, S.; Zhao, S.; Li, D.; Zhao, R. A Metric-Based Few-Shot Learning Method for Fish Species Identification with Limited Samples. Animals 2024, 14, 755. [Google Scholar] [CrossRef] [PubMed]
  28. Rezaei, M.; Diepeveen, D.; Laga, H.; Jones, M.G.K.; Sohel, F. Plant disease recognition in a low data scenario using few-shot learning. Comput. Electron. Agric. 2024, 219, 108812. [Google Scholar] [CrossRef]
  29. Gull, S.; Kim, J. Metric-Based Meta-Learning Approach for Few-Shot Classification of Brain Tumors Using Magnetic Resonance Images. Electronics 2025, 14, 1863. [Google Scholar] [CrossRef]
  30. Luo, X.; Ding, Y.; Cao, Y.; Liu, Z.; Zhang, W.; Zeng, S.; Cheng, S.H.; Li, H.; Haggarty, S.J.; Wang, X.; et al. Few-shot meta-learning applied to whole brain activity maps improves systems neuropharmacology and drug discovery. iScience 2024, 27, 110875. [Google Scholar] [CrossRef] [PubMed]
  31. Qaraqe, M.; Elzein, A.; Belhaouari, S.; Ilam, M.S.; Petrovski, G. A novel few shot learning derived architecture for long-term HbA1c prediction. Sci. Rep. 2024, 14, 482. [Google Scholar] [CrossRef] [PubMed]
  32. Snyder, S.H.; Vignaux, P.A.; Ozalp, M.K.; Gerlach, J.; Puhl, A.C.; Lane, T.R.; Corbett, J.; Urbina, F.; Ekins, S. The Goldilocks paradigm: Comparing classical machine learning, large language models, and few-shot learning for drug discovery applications. Commun. Chem. 2024, 7, 134. [Google Scholar] [CrossRef] [PubMed]
  33. Rahman, A.; Chung, G.C.; Ng, Y.H. Applications, Challenges, and Future Trends of Artificial Intelligence of Things (AIoT)-Enabled Water Quality and Resource Management. Water 2026, 18, 919. [Google Scholar] [CrossRef]
  34. Venkataramana, V.; Rajeshwarrao, A.; Bernatin, T. Aquaculture—Water Quality Dataset. Mendeley Data V1. 2024. Available online: https://doi.org/10.17632/y78ty2g293.1 (accessed on 15 January 2026). [CrossRef]
  35. Choudhary, R.; Kumar, A.; C, P.; Naik, M.M.; Choudhury, M.; Khan, N.A. Predicting water quality index using stacked ensemble regression and SHAP based explainable artificial intelligence. Sci. Rep. 2025, 15, 31139. [Google Scholar] [CrossRef] [PubMed]
  36. Elshewey, A.M.; Youssef, R.Y.; El-Bakry, H.M.; Osman, A.M. Water potability classification based on hybrid stacked model and feature selection. Environ. Sci. Pollut. Res. 2025, 32, 7933–7949. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Proposed water quality classification framework using Metric-Based Few-Shot Learning and traditional machine learning. The diagram illustrates the complete workflow from data collection and pre-processing to model evaluation. Blue arrows represent the standard data preparation sequence; green arrows denote the data routing for traditional machine learning models, and yellow or orange arrows highlight the few-shot learning (FSL) development pipeline.
Figure 1. Proposed water quality classification framework using Metric-Based Few-Shot Learning and traditional machine learning. The diagram illustrates the complete workflow from data collection and pre-processing to model evaluation. Blue arrows represent the standard data preparation sequence; green arrows denote the data routing for traditional machine learning models, and yellow or orange arrows highlight the few-shot learning (FSL) development pipeline.
Water 18 01523 g001
Figure 2. Feature distribution histograms.
Figure 2. Feature distribution histograms.
Water 18 01523 g002
Figure 3. Box plot analysis of water quality parameters.
Figure 3. Box plot analysis of water quality parameters.
Water 18 01523 g003
Figure 4. Correlation heatmap of water quality parameters.
Figure 4. Correlation heatmap of water quality parameters.
Water 18 01523 g004
Figure 5. Performance comparison radar profiles across shifting horizons.
Figure 5. Performance comparison radar profiles across shifting horizons.
Water 18 01523 g005
Figure 6. FSL Performance Trajectories across shifting horizons.
Figure 6. FSL Performance Trajectories across shifting horizons.
Water 18 01523 g006
Figure 7. Principal component analysis (a) FSL, (b) conventional ML.
Figure 7. Principal component analysis (a) FSL, (b) conventional ML.
Water 18 01523 g007
Figure 8. Authentic aggregated confusion matrix heatmap profiles for Few-Shot Learning (FSL) and traditional Machine Learning (ML) baselines under a 5-shot configuration. The vertical axes represent the true water quality types (Excellent, Good, Poor), and the horizontal axes denote the predicted categories. Cell values indicate the normalized percentage of predictions, where darker blue color intensities correspond to a higher density of classification outcomes.
Figure 8. Authentic aggregated confusion matrix heatmap profiles for Few-Shot Learning (FSL) and traditional Machine Learning (ML) baselines under a 5-shot configuration. The vertical axes represent the true water quality types (Excellent, Good, Poor), and the horizontal axes denote the predicted categories. Cell values indicate the normalized percentage of predictions, where darker blue color intensities correspond to a higher density of classification outcomes.
Water 18 01523 g008
Figure 9. Training loss dynamics and episodic validation behavior.
Figure 9. Training loss dynamics and episodic validation behavior.
Water 18 01523 g009
Table 1. Summary of FSL approaches across various application domains.
Table 1. Summary of FSL approaches across various application domains.
Ref.Application DomainMethodDataset/ScenarioAccuracy (%)StrengthsLimitationsSuggested Improvement
[17]Water Quality PredictionRandom ForestFully supervised water dataset87.3Strong performance with sufficient dataRequires many labeled samplesCombine with representation learning
[18]IoT Water MonitoringSVM + IoTReal-time pond monitoring84.5Good classification capabilityPerformance degrades under limited dataFew-shot adaptation mechanisms
[19]Aquaculture Water QualityTraditional MLTilapia pond monitoring82.0Simple implementationPoor transferability across pondsMeta-learning for rapid adaptation
[27]Environmental MonitoringMetric-Based FSLSparse environmental sensor data91.2Effective representation learningSensitive to class imbalanceIntegrate class-balanced episodic sampling
[28]Fish Species RecognitionFew-Shot Metric LearningMarine image dataset93.5Robust under limited training dataImage domain onlyAdapt framework to tabular sensor data
[29]Plant Disease DetectionMeta-Learning + AttentionFew-shot agricultural dataset95.8High accuracy and fast adaptationRequires image feature extractionExtend to multivariate environmental datasets
[30]Fault DiagnosisPrototype Matching NetworkIndustrial monitoring signals92.7Strong generalization to unseen classesDomain-specific featuresCross-domain adaptation strategies
[31]CNS Drug DiscoveryFew-Shot Meta-LearningBiomedical dataset90.1Effective with scarce labelsComputationally intensiveLightweight architectures
[32]Zero/Few-Shot Fault DetectionPrototype-Based Meta-LearningIndustrial fault data94.2Handles unseen fault categoriesLimited scalability assessmentEvaluate on larger datasets
This StudyAquaculture Water Quality ClassificationFew-Shot Metric LearningAquaculture water quality tabular dataset94.46High accuracy under limited labeled support samples; strong ROC-AUC (98.65%); stable adaptation across shot horizonsRequires episodic meta-training phaseFuture integration with cross-domain adaptation, continual learning, and real-time sensor deployment
Table 2. Descriptive statistics of water quality parameters.
Table 2. Descriptive statistics of water quality parameters.
StatisticTempTurbidityDOBODCO2pHAlkalinityHardnessCalciumAmmoniaNitritePhosphorusH2SPlanktonWater Quality
Count430043004300430043004300430043004300430043004300430043004300
Mean25.7039.055.303.136.387.7193.72127.0684.870.050.641.170.023805.511.02
Std9.6720.941.832.292.831.5868.9578.8875.720.120.901.080.011208.550.82
Min0.190.050.131.000.000.0025.010.260.020.000.000.000.0078.600.00
25%19.7822.223.981.525.056.4440.4269.4823.750.010.010.030.012956.020.00
50%25.0430.215.002.246.607.7467.56111.0662.850.030.100.980.023729.401.00
75%30.2855.956.524.328.249.04132.83162.68115.600.041.172.100.024555.092.00
Max84.2599.8014.9714.9414.9814.85299.91398.80399.321.004.994.970.107460.422.00
Table 3. Water quality parameters.
Table 3. Water quality parameters.
No.ParameterUnitScientific Explanation
1Temperature°CControls metabolic rate, growth, and immunity of fish. Extreme temperatures reduce oxygen solubility and increase stress.
2TurbiditycmModerate turbidity enhances primary productivity; excessive turbidity reduces light penetration and clogs fish gills.
3Dissolved Oxygen (DO)mg L−1Essential for respiration. DO below 3 mg L−1 causes hypoxia and mortality in most cultured species.
4Biochemical Oxygen Demand (BOD)mg L−1Indicates organic pollution. High BOD consumes oxygen, leading to anaerobic conditions.
5Carbon Dioxide (CO2)mg L−1Excess CO2 interferes with oxygen uptake by fish blood and reduces pH.
6pHInfluences toxicity of ammonia and metabolic processes. Extreme pH damages gills and skin.
7Alkalinitymg L−1Acts as a buffer against sudden pH changes and supports plankton productivity.
8Total Hardnessmg L−1Regulates osmoregulation and ionic balance; low hardness increases metal toxicity.
9Calciummg L−1Essential for bone formation, scale development, and muscle contraction.
10Ammonia (NH3–N)mg L−1Unionized ammonia is highly toxic; even small concentrations damage gills and the nervous system.
11Nitrite (NO2)mg L−1Causes methemoglobinemia (“brown blood disease”) by reducing oxygen transport.
12Phosphorusmg L−1Excess phosphorus leads to eutrophication, algal blooms, and oxygen depletion.
13Hydrogen Sulfide (H2S)mg L−1Extremely toxic even at trace levels; inhibits cellular respiration.
14Plankton DensityNo. L−1Indicates pond productivity. Excessive plankton causes night-time oxygen crashes.
Table 4. Comparative performance of meta-learning and traditional machine learning models across different shot settings.
Table 4. Comparative performance of meta-learning and traditional machine learning models across different shot settings.
K-ShotModelAccuracy (%)Precision (%)Recall (%)F1-Score (%)ROC-AUC (%)
1-ShotPrototypical Network82.69 ± 1.0883.3182.6981.8593.12
Siamese86.48 ± 0.4486.6686.4886.1394.78
MatchingNet82.72 ± 2.1683.4282.7281.7291.38
Logistic Regression51.48 ± 0.4156.5251.4845.2073.77
Random Forest57.09 ± 2.1857.3957.0952.8978.59
SVM50.74 ± 0.4656.7550.7444.2229.18
XGBoost33.33 ± 0.0011.1133.3316.6750.00
5-ShotPrototypical Network93.31 ± 0.4793.6993.3193.2098.35
Siamese89.97 ± 0.1090.8589.9789.6196.92
MatchingNet91.99 ± 0.5192.9991.9991.7396.76
Logistic Regression66.44 ± 0.3265.4866.4463.2978.99
Random Forest69.78 ± 0.2369.7069.7865.1484.10
SVM63.73 ± 0.4162.9363.7360.2876.78
XGBoost62.03 ± 0.6961.1262.0359.3078.38
10-ShotPrototypical Network94.16 ± 0.5494.4394.1694.0998.55
Siamese90.31 ± 0.1891.2390.3189.9697.22
MatchingNet93.55 ± 0.6994.1893.5593.4097.66
Logistic Regression70.33 ± 0.3569.3670.3368.0481.22
Random Forest76.92 ± 0.3178.3376.9274.3189.04
SVM67.97 ± 0.3667.1867.9765.3080.11
XGBoost67.97 ± 0.4366.6567.9765.6182.68
15-ShotPrototypical Network94.46 ± 0.4894.7194.4694.3998.65
Siamese90.30 ± 0.1391.2690.3089.9497.28
MatchingNet94.00 ± 0.4194.5794.0093.8998.06
Logistic Regression72.24 ± 0.2071.4372.2470.3182.57
Random Forest81.10 ± 0.3482.5981.1079.4291.62
SVM70.26 ± 0.1469.4970.2668.0082.02
XGBoost71.48 ± 0.2470.5971.4869.4285.21
Table 5. Comprehensive multi-shot ablation analysis of embedding dimension sensitivity across different FSL architectures.
Table 5. Comprehensive multi-shot ablation analysis of embedding dimension sensitivity across different FSL architectures.
Model ArchitectureEmbedding Dimension (d)1-Shot Accuracy (%)5-Shot Accuracy (%)10-Shot Accuracy (%)15-Shot Accuracy (%)
Prototypical Network (ProtoNet)1673.1582.4283.2583.60
3278.4289.1589.9090.25
6482.6993.3194.1694.46
12883.0593.6594.4694.77
Siamese Network1679.1081.3581.5081.45
3284.6587.4287.8587.60
6486.4889.9790.3190.30
12889.1291.9092.2392.18
Matching Network (MatchingNet)1672.9881.9482.8583.15
3278.1588.1089.1089.65
6482.7291.9993.5594.00
12882.9592.3593.9194.30
Table 6. Paired t-test results across shot settings.
Table 6. Paired t-test results across shot settings.
Comparison1-Shot (t)1-Shot (p)5-Shot (t)5-Shot (p)10-Shot (t)10-Shot (p)15-Shot (t)15-Shot (p)
ProtoNet vs. Siamese−9.617<0.00114.713<0.00116.572<0.00122.365<0.001
ProtoNet vs. MatchingNet−0.030<0.9778.379<0.0015.547<0.0052.317<0.008
ProtoNet vs. LogReg44.900<0.001158.775<0.00187.606<0.001100.086<0.001
ProtoNet vs. RF19.430<0.001116.106<0.00144.111<0.00135.349<0.001
ProtoNet vs. SVM44.324<0.001183.312<0.00190.220<0.00183.440<0.001
ProtoNet vs. XGBoost91.048<0.001105.514<0.001115.699<0.001145.394<0.001
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Rahman, A.; Chung, G.C.; Ng, Y.H.; Chan, K.Y.; Tan, S.F. Few-Shot Learning–Based Water Quality Classification Under Limited Data Conditions for Smart Aquaculture Monitoring. Water 2026, 18, 1523. https://doi.org/10.3390/w18121523

AMA Style

Rahman A, Chung GC, Ng YH, Chan KY, Tan SF. Few-Shot Learning–Based Water Quality Classification Under Limited Data Conditions for Smart Aquaculture Monitoring. Water. 2026; 18(12):1523. https://doi.org/10.3390/w18121523

Chicago/Turabian Style

Rahman, Ashikur, Gwo Chin Chung, Yin Hoe Ng, Kah Yoong Chan, and Soo Fun Tan. 2026. "Few-Shot Learning–Based Water Quality Classification Under Limited Data Conditions for Smart Aquaculture Monitoring" Water 18, no. 12: 1523. https://doi.org/10.3390/w18121523

APA Style

Rahman, A., Chung, G. C., Ng, Y. H., Chan, K. Y., & Tan, S. F. (2026). Few-Shot Learning–Based Water Quality Classification Under Limited Data Conditions for Smart Aquaculture Monitoring. Water, 18(12), 1523. https://doi.org/10.3390/w18121523

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop