Next Article in Journal
Generative AI and Large Language Models
Previous Article in Journal
LST-AGCN: A Novel Unified Lightweight Attention Framework for Efficient Skeleton-Based Action Recognition
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Edge Node Deployment for Turbidity Estimation in Farm Ponds

by
Martin Moreno
1,
Iván Trejo-Zúñiga
1,*,
Víctor Alejandro González-Huitrón
2,
René Francisco Santana-Cruz
3,
Raúl García García
4 and
Gabriela Pineda Chacón
2,*
1
Laboratory of Energy Innovation and Intelligent and Sustainable Agriculture (LEIISA), Universidad Tecnológica de San Juan del Río, Av. La Palma No. 125 Vista Hermosa, San Juan del Río 76826, Querétaro, Mexico
2
División de Estudios de Posgrado e Investigación, Instituto Tecnológico de Querétaro, Tecnológico Nacional de México, Av. Tecnológico s/n esq. Gral. Mariano Escobedo, Colonia Centro Histórico 76000, Querétaro, Mexico
3
Centro de Investigación en Ciencia Aplicada y Tecnología Avanzada, Unidad Querétaro, Instituto Politécnico Nacional, Santiago de Querétaro 76000, Querétaro, Mexico
4
División de Química Industrial y Energías Renovables, Universidad Tecnológica de San Juan del Río, Av. La Palma No. 125 Vista Hermosa, San Juan del Río 76826, Querétaro, Mexico
*
Authors to whom correspondence should be addressed.
Big Data Cogn. Comput. 2026, 10(4), 126; https://doi.org/10.3390/bdcc10040126
Submission received: 25 February 2026 / Revised: 8 April 2026 / Accepted: 14 April 2026 / Published: 18 April 2026

Abstract

Image-based AI offers a low-cost alternative to traditional turbidity sensors in farm ponds, yet the prevailing shift toward Vision Transformers (ViTs) critically overlooks two field realities: the chronic scarcity of annotated data (Small Data) and the strict computational limits of edge hardware. This study presents a frugal computer vision framework that challenges the need for complex architectures in environmental screening. By systematically benchmarking six deep learning models across a calibrated high-turbidity dataset (200–800 NTU, 700 images) under standardized capture conditions, we demonstrate that traditional Convolutional Neural Networks (CNNs) possess a crucial inductive bias for this task. Specifically, ResNet-50 significantly outperformed modern ViTs in both accuracy (96.3% vs. 80.0%) and data efficiency, effectively capturing spatial scattering patterns without the massive data requirements that hindered transformer convergence. Deployed on a resource-constrained Raspberry Pi 4, the CNN-based system achieved an inference latency of 46 ms, demonstrated in an initial hardware-in-the-loop field proof-of-concept (82.4% agreement under baseline, calm-weather conditions, n = 17 ). This edge-native approach not only provides actionable spatial turbidity maps to guide on-farm filtration and livestock management decisions but also establishes a critical architectural baseline: under controlled capture protocols, mature CNNs consistently outperform ViTs, establishing them as the optimal architecture for frugal, small-data agricultural Internet of Things (IoT) deployments.

Graphical Abstract

1. Introduction

Artificial Intelligence (AI) has emerged as a fundamental enabler of sustainable development, increasingly relying on cognitive computing capabilities—such as machine perception, representation learning, and edge inference—to translate noisy environmental observations into actionable guidance [1]. In the context of cognitive IoT systems, decision-making frameworks must continuously handle heterogeneous and uncertain observations often summarized by the “5 Vs” of big data: volume, velocity, variety, veracity, and value [2]. More importantly, these systems must deliver timely inferences directly at the data-generating site. This requirement is not merely a computational preference; it fundamentally shapes the architectural design of monitoring pipelines. Delayed or centralized cloud-processing can miss short-lived environmental events, introduce latency, and ultimately reduce the operational value of the generated intelligence.
This need for distributed, low-latency intelligence is particularly acute in water management and precision agriculture, where decisions must be executed under tight time constraints and limited instrumentation. Agriculture accounts for approximately 70% of global freshwater withdrawals, confronting a demand that has increased by nearly 1% annually since the 1980s [3]. Despite this, the 2024 Sustainable Development Goal (SDG) Report indicates that progress on more than 30% of SDG targets lags due to persistent monitoring and reporting gaps [4]. The Organization for Economic Co-operation and Development (OECD) similarly highlights that critical environmental indicators rely on outdated or insufficient data [5]. These persistent inequalities in access to reliable water monitoring [6] strongly motivate the development of frugal, field-deployable Machine Learning (ML) pipelines capable of performing edge inference to deliver real-time water-quality signals.
Against this backdrop, small, rain-fed farm ponds—formally recognized in Mexican rural-water guidance as bordos de abrevadero or jagüeyes [7]—represent a critical yet under-instrumented node in agricultural water security. These shallow, wind-exposed detention basins store stormwater runoff for livestock and supplemental irrigation. Within these systems, a persistent operational challenge is turbidity, a parameter that fluctuates rapidly and widely as fine sediments and organic matter are introduced by runoff and re-entrained through wind–wave resuspension. Turbidity in these ponds is highly event-driven, frequently exhibiting rapid, order-of-magnitude excursions over short temporal windows. During storm-runoff events, inflow turbidity can spike from tens to several thousand Nephelometric Turbidity Units (NTU) as surface runoff mobilizes fine sediments [8,9]. Post-storm, outflow turbidity typically declines toward the 10–30 NTU range due to particle settling [10,11]. Between storms, fetch and wind–wave action become the dominant controls, resuspending bottom sediments and producing measurable turbidity increases at wind speeds above 4 m s−1 [12,13,14]. Similar seasonal wave-driven resuspension dynamics are well-documented in Mexican water bodies, such as the Valle de Bravo reservoir [15].
These rapid spatial and temporal variations directly impact routine farm risk management. For irrigation systems, elevated total suspended solids (TSS)—often critical above 50 mg L−1—increase filter head loss, accelerate clogging in drip emitters, and mandate frequent, costly cleaning cycles [16,17,18,19,20]. Furthermore, high turbidity severely compromises on-farm disinfection by increasing chlorine demand and shielding pathogens [21,22,23], while simultaneously degrading livestock drinking water palatability and correlating with higher E. coli persistence [24,25]. Consequently, farm-pond turbidity monitoring is not merely descriptive; it is a vital decision-support function. The operational value lies in detecting actionable shifts early enough to trigger filtration maintenance or adjust chlorination protocols.
Despite clear regulatory frameworks dictating wastewater discharges [26] and drinking water limits [27,28,29], agricultural turbidity monitoring remains constrained by the cost and fragility of traditional nephelometers, which provide single-point measurements and miss rapid storm-driven peaks [30,31]. While recent studies have demonstrated the feasibility of using low-cost cameras and RGB-based computer vision to classify turbidity [32,33,34,35,36], the transition toward image-based monitoring poses a critical methodological crossroads in applied deep learning.
Currently, the computer vision community is experiencing a paradigm shift toward Vision Transformers (ViTs), architectures that achieve state-of-the-art performance in large-scale classification by leveraging self-attention mechanisms to model long-range spatial dependencies. However, the adoption of these models frequently overlooks the harsh realities of agricultural AI deployments: the chronic scarcity of large, annotated datasets (the “Small Data” problem) and the strict computational limits of edge hardware. ViTs inherently lack the inductive biases—specifically, translation equivariance and spatial locality—that Convolutional Neural Networks (CNNs) possess. Consequently, ViTs are notoriously “data-hungry,” requiring massive datasets to learn visual representations that CNNs can easily infer from much smaller samples. In the context of farm-pond turbidity, where compiling exhaustive datasets of high-NTU conditions is labor-intensive and expensive, the uncritical application of transformer-based architectures is inherently flawed.
To address this gap, this study proposes a frugal, image-based framework that maps simple RGB images to operationally relevant turbidity classes (200–800 NTU) directly on edge hardware. Rather than blindly adopting data-hungry models, we critically evaluate the hypothesis that traditional CNNs possess the optimal inductive bias for small-data environmental screening. The present work makes four targeted contributions to the intersection of machine learning, agricultural IoT, and sustainable water management:
  • Methodological Benchmarking in Small-Data Scenarios: We provide a rigorous comparative analysis of modern deep learning architectures—spanning both convolutional (CNN) and transformer (ViT) families—for visual turbidity classification. We reveal that mature CNNs significantly outperform modern ViTs in data-constrained agricultural tasks, establishing crucial architectural design principles for environmental screening.
  • High-Turbidity Dataset Curation: We curate and document a traceable, class-balanced dataset of 700 RGB images with calibrated NTU labels spanning the critical high-turbidity regime (200–800 NTU). This provides a reproducible foundation for training models aligned with specific irrigation and filtration thresholds.
  • Frugal Edge Deployment: We demonstrate the end-to-end deployment of the best-performing model on resource-constrained edge hardware (Raspberry Pi 4) via TensorFlow Lite quantization. Achieving a 46 ms inference time, the system enables near-real-time spatial mapping and is validated under operational farm conditions (82.4% agreement).
  • Sustainability and Operational Impact: By facilitating low-cost, high-frequency spatial turbidity assessment that runs on commodity hardware without cloud dependency, this framework directly supports SDG 6 and SDG 2, enhancing irrigation efficiency and reducing filter maintenance costs in climate-resilient farming systems.
The remainder of the paper is structured as follows. Section 2 details the empirical workflow, including dataset acquisition under standardized illumination, preprocessing, and the architectural training setup for the CNN vs. ViT benchmark. Section 3 presents the core findings, contrasting the data efficiency and accuracy of both model families, followed by the embedded in situ validation. Section 4 critically translates these results into farm practice, comparing the proposed edge-native approach against smartphone-only protocols and in situ probes, while outlining limitations. Finally, Section 5 summarizes the key technical takeaways and outlines future paths for deploying frugal AI in unconstrained farm-pond environments.

2. Materials and Methods

2.1. Dataset Description

The dataset reported in [32] was created to support image-based turbidity analysis using laboratory-prepared water samples. For each sample, turbidity was measured with a Hach 2100P turbidimeter (Hach Company, Loveland, CO, USA). Each captured RGB image is directly paired with its measured NTU value, which serves as the ground-truth label for turbidity classification. To enable consistent training and fair comparison across models, the dataset is organized into five predefined turbidity classes covering the 200–800 NTU operating window. As summarized in Table 1, Class 1 corresponds to low turbidity (200–320 NTU), Class 2 to moderate turbidity (320–440 NTU), Class 3 to intermediate turbidity (440–560 NTU), Class 4 to high turbidity (560–680 NTU), and Class 5 to very high turbidity (680–800 NTU). Water samples were prepared in the laboratory to fall within the predefined NTU intervals. To ensure the ecological validity of the optical properties captured, these samples were turbidified using native topsoil collected directly from the immediate catchment area (banks) of the target farm pond, rather than standardized synthetic sediments. The turbidity of each sample was then measured with a turbidimeter. To prevent sedimentation bias, each sample was gently homogenized immediately prior to measurement and photographed within a controlled time window of less than two minutes after the turbidimeter reading, ensuring that the NTU value and the captured image corresponded to the same suspended-particle state.
The complete dataset comprises 700 RGB images distributed across five classes. By capturing exactly one image per physical sample, each class contains 140 images, yielding a perfectly balanced representation. Image acquisition was performed with a Nikon D3300 camera (Nikon Corporation, Tokyo, Japan) equipped with a 35  mm fixed lens. Photographs were taken inside a photo lightbox with diffused LED illumination to standardize lighting conditions and reduce external variability. Each sample was placed in a transparent cylindrical cell at the center of the scene to minimize shadows and reflections and maintain a consistent viewing geometry across images. Overall, the dataset provides controlled, repeatable RGB images linked to measured NTU labels and grouped into well-defined turbidity classes, making it suitable for benchmarking image-based turbidity classification models under standardized capture conditions.

2.2. Image Preprocessing

A two-stage preprocessing strategy was applied: a fixed normalization pipeline for all images and an on-the-fly augmentation pipeline for training images only [37,38]. For the fixed stage, each image was converted to RGB, center-cropped to remove non-informative borders (e.g., vessel edges and labels), resized to 500 × 500   px using antialiased bilinear interpolation [39], and min–max normalized by casting pixels to float32 and scaling intensities to [ 0 , 1 ] via division by 255 [40]. This ensured a consistent input size and dynamic range across all evaluated architectures.
For the training set, online data augmentation was used to improve generalization and reduce overfitting [38,41]. For each training image, the following transformations were sampled independently with probability p = 0.5 : random horizontal and vertical flips; random in-plane rotation uniformly drawn from [ 15 , + 15 ] ; random brightness and contrast jitter within ± 20 % ; and random crop retaining 90–100% of the image area (aspect ratio constrained to 1 ± 0.02 ), acting as a mild zoom/translation [38]. These magnitudes were carefully selected and empirically verified to mimic plausible acquisition variability while strictly preventing the mislabeling of images into adjacent NTU classes. Because CNNs prioritize spatial scattering patterns over absolute global intensity, this jitter magnitude successfully preserves the critical turbidity-relevant cues, such as color gradients. No augmentation was applied to validation or test images, so reported metrics reflect performance on standardized, unaltered inputs. It is important to note that the reported dataset size of 700 images refers exclusively to the original captured images. Augmentation was applied on-the-fly during training only and did not generate additional stored samples; validation and test sets always consisted of unaltered original images.

2.3. Classifier Architecture Selection

To probe the trade-offs and accuracy–efficiency biases, six off-the-shelf architectures were selected, spanning CNNs and ViTs. Selection criteria included (i) coverage of distinct design families, (ii) availability of robust pretrained weights, (iii) compatibility with the standardized 500 × 500 input, and (iv) suitability for both workstation training and prospective edge deployment.
CNN-based models, such as GoogLeNet, ResNet-50, and MobileNetV3, have consistently demonstrated excellent performance across various image classification benchmarks due to their architectural efficiency and inductive biases well-suited to spatial data. GoogLeNet (Inception V1), with its inception modules, captures features at multiple scales within a single layer, promoting efficient representation learning without significantly increasing computational cost. ResNet-50, a 50-layer deep residual network, addresses the degradation problem in very deep networks by introducing identity shortcut connections that enable smoother gradient flow [42]. This design allows for the network to learn highly abstract and discriminative features across multiple layers. MobileNetV3, in contrast, is designed for low-resource environments. It combines separable depthwise convolutions with lightweight attention mechanisms (squeeze-and-excitation modules) and nonlinear activations (h-swish), offering a strong trade-off between accuracy, latency, and model size, making it particularly suitable for edge-based deployment scenarios [43].
On the other hand, transformer-based models—specifically the Swin Transformer V2 Base (Swin Transformer V2-Base), Vision Transformer Base 16 (ViT B/16), and Vision Transformer Base 32 (ViT B/32)—are informed by advances in natural language processing and represent a shift in computer vision toward data-driven inductive biases. These models rely on self-attention mechanisms to establish long-range dependencies within the images. The Swin Transformer V2-Base model introduces a hierarchical representation that uses shifted windows for localized self-attention, while maintaining computational efficiency via a sliding-window approach [44]. This design effectively captures both fine-grained and global features. In contrast, the ViT B/16 and ViT B/32 models process images as sequences of patches, each of size 16 × 16 and 32 × 32 pixels, respectively. These models incorporate positional encodings to preserve spatial information [45]. Although ViT B/16 provides finer spatial attention resolution, ViT B/32 has computational advantages due to its coarser input granularity. These transformer-based models are particularly effective for scenarios that require modeling long-range spatial relationships within images. As illustrated in Figure 1a, the CNN pipeline naturally exploits spatial locality through hierarchical convolutions, whereas the ViT architecture relies on patch-based sequence tokenization (Figure 1b), which inherently lacks this inductive bias.
All models were implemented in PyTorch 2.3.1 and initialized with pretrained ImageNet weights to enable transfer learning. ImageNet pretraining was selected because the resulting feature hierarchies—encoding low-level textures, color gradients, and spatial patterns—transfer effectively to turbidity-relevant visual cues such as light scattering and chromatic shifts. This initialization is particularly consequential in the Small Data regime: it enables fine-tuning to start from a rich representational baseline, substantially reducing the number of task-specific examples required for convergence. Importantly, the observed performance gap between CNNs and ViTs reflects not only intrinsic architectural differences but also how efficiently each family leverages this shared prior under data-constrained conditions. The dataset was randomly split (seed set to 42 for reproducibility): 70% for training, 20% for testing, and 10% for validation. Fine-tuning was carried out on the turbidity classification dataset using the categorical cross-entropy loss function, which is appropriate for multi-class classification problems. The Adam optimizer was used with an initial learning rate of 1 × 10 4 , providing adaptive learning rate updates based on moment estimates, which enhances convergence stability.
To ensure optimal performance and prevent overfitting, a combination of early stopping and learning rate scheduling was employed. Early stopping halted training when the validation loss no longer improved over a specified number of epochs, whereas learning rate reduction on a plateau dynamically adjusted the learning rate to facilitate further fine-tuning in the later stages of training. All models were trained for up to 50 epochs with a mini-batch size of 32, and the final model checkpoints were selected based on the best validation performance.
This systematic, diverse model selection strategy was essential for benchmarking neural network paradigms and assessing their suitability for visual turbidity classification in both controlled and real-world imaging scenarios.

3. Results

3.1. Benchmark Across Deep Learning Architectures

To evaluate the effectiveness of different deep learning models in classifying turbidity levels/classes from RGB images, six architectures were assessed: GoogLeNet, ResNet-50, MobileNetV3, Swin V2-B, Vision Transformer (ViT) B/16, and ViT B/32. The evaluation metrics included precision, recall, F1-score, and support, calculated for each class individually, as well as overall accuracy.
Table 2 reports class-averaged precision, recall, and F1-score together with overall accuracy for six architectures. ResNet-50 attains the highest values across all metrics (precision = 0.96, recall = 0.96, F1 = 0.96, accuracy = 0.96), followed closely by GoogLeNet (0.96, 0.95, 0.95, 0.95). MobileNetV3 ranks next with uniformly high scores (0.92, 0.92, 0.92, 0.92). Among the transformer-based models, Swin V2-Base achieves 0.89 for precision/recall/F1 with accuracy 0.89, whereas ViT-Base/16 and ViT-Base/32 yield lower values (precision/recall/F1 ≈ 0.85/0.84/0.84 and 0.83/0.80/0.80; accuracy = 0.84 and 0.80, respectively).
The close alignment between precision and recall across all models (differences ≤ 0.03) indicates balanced classification without pronounced bias toward either false positives or false negatives. The 16-percentage-point gap between best (ResNet-50: 96%) and worst (ViT-Base/32: 80%) performers highlights substantial architectural differences in suitability for this task. Notably, all three CNN-based models outperform all three transformer variants, suggesting that the inductive biases inherent to convolutional architectures—spatial locality and translation equivariance—provide advantages when learning turbidity-relevant visual features from moderately sized datasets under standardized capture conditions.
Per-class analysis (Table 3) reveals where these performance differences originate. In the lowest turbidity class (200–320 NTU), CNNs achieve perfect scores across all metrics, whereas transformers show progressive degradation: Swin V2-Base maintains high precision (98%) but suffers recall loss (89%), while ViT-Base/32 exhibits the most severe imbalance (95% precision, 68% recall), indicating systematic misclassification of low-turbidity samples into adjacent bins.
The 320–440 NTU class exposes similar patterns. GoogLeNet achieves near-perfect performance (98%, 100%, F1 = 99%), while ViT-Base/32 shows marked precision collapse (57%), suggesting difficulty distinguishing this moderate-turbidity range from neighboring classes. This precision–recall trade-off is characteristic of models that have not fully learned class boundaries.
Mid-range classes (440–560 and 560–680 NTU) pose a greater challenge for all architectures. Given that the dataset is class-balanced (140 images per class), this performance drop is attributable to the subtle visual differences in color gradients and scattering intensity between adjacent turbidity levels, rather than to any imbalance in training samples. ResNet-50 maintains superiority in these regions (F1 = 94–95%), benefiting from residual connections that preserve gradient flow during training. Transformers struggle particularly in the 440–560 range, with ViT-Base/16 dropping to 70% precision, suggesting insufficient attention to the fine-grained color and scattering gradients that distinguish these intermediate levels.
In the highest turbidity class (680–800 NTU), CNNs again demonstrate robust performance (F1 = 93–98%), while Swin V2-Base and ViT-Base/32 show precision deficits (79% and 69%, respectively) despite reasonable recall. This pattern — high recall but low precision — indicates that transformers over-predict the highest-class label, likely because their global self-attention mechanisms capture broad, diffuse patterns but fail to discriminate fine distinctions near class boundaries.
Figure 2 presents the training and validation performance of the six deep learning architectures. The results show that GoogLeNet and ResNet-50 consistently achieve superior performance across all evaluated metrics. In terms of validation accuracy (Figure 2a), both models exceed 90% after approximately 10 epochs and maintain stable performance throughout the remainder of the training period. Similarly, the training accuracy curves (Figure 2c) indicate that GoogLeNet and ResNet-50 converge rapidly and reach the highest accuracy values among all models.
Regarding loss metrics, Figure 2b illustrates that validation loss decreases sharply in the early epochs for all models. However, GoogLeNet and ResNet-50 achieve the lowest final loss values, reflecting better generalization. The training loss trends (Figure 2d) further support these observations, as both models converge faster and achieve significantly lower training losses than the other architectures. These convergence analyses quantitatively confirm that under the present experimental conditions, convolutional architectures (e.g., GoogLeNet, ResNet-50) reliably outperform transformer-based counterparts (Swin V2-Base, ViT-B/16, ViT-B/32) in classification accuracy, as reflected by higher class-averaged precision, recall, F1 Score, and overall accuracy.
It is worth noting that while both the CNNs and the ViT-Base/16 architecture achieve high Recall values for the baseline 200–320 NTU class, their underlying behavior differs significantly. Recall can be interpreted as the inverse of the false-negative rate (i.e., the proportion of highly turbid samples erroneously recorded as safe). While minimizing false negatives is critical in real-world farm applications, the high recall but poor precision of ViTs suggests a tendency to over-predict the baseline class rather than demonstrating true discriminative learning. Ultimately, these results highlight a fundamental limitation of ViT architectures in Frugal AI contexts: their severe data-hungry nature prevents them from establishing robust decision boundaries without massive annotated datasets, reaffirming mature CNNs as the optimal choice for agricultural edge deployments.
Figure 3 demonstrates strong diagonal dominance across all models by the confusion matrices presented, indicating that most samples are correctly categorized into their respective NTU intervals. Misclassifications tend to be confined to adjacent classes, as expected given the continuous nature of turbidity values within the defined bins. Among the convolutional models, GoogLeNet and ResNet-50 show the clearest diagonal patterns. For GoogLeNet, only minor leakage is observed from the 680–800 range to the 560–680 range and from the 440–560 range to the 320–440 range; the remaining cells are essentially empty. ResNet-50 exhibits a similar trend, with small spillovers between the 440–560 and 560–680 ranges and occasional misassignments from 680–800 to 560–680. MobileNetV3 maintains high diagonal counts across all classes but has slightly more off-diagonal misclassifications than the previous CNNs, notably increasing from 440–560 to 560–680 and from 680–800 to 560–680. In contrast, transformer-based models exhibit larger adjacency errors. Swin V2-Base exhibits clear confusion between the 200–320 and 320–440 ranges, as well as between the 560–680 and 680–800 ranges, while remaining relatively stable in the 440–560 range. ViT-Base/16 exhibits scattered errors that reduce recall in the 320–440 range and allow leakage from the 440–560 range into neighboring bins. ViT-Base/32 is the most diffuse, with notable swaps occurring from 680–800 to 560–680 and from 320–440 to both 200–320 and 440–560, which weakens diagonal dominance.
The stark performance disparity between convolutional architectures and vision transformers in this benchmark highlights a fundamental architectural trade-off in applied agricultural AI. Under the present experimental conditions, mature CNNs (e.g., GoogLeNet, ResNet-50) consistently outperformed their transformer-based counterparts (Swin V2-Base, ViT-B/16, ViT-B/32) across all metrics. This divergence cannot be attributed solely to hyperparameter tuning, but rather to the intrinsic architectural biases of each model family when confronted with the “Small Data” constraints typical of environmental monitoring.
Vision Transformers inherently lack the strict inductive biases—specifically, translation equivariance and spatial locality—that are hardwired into convolutional layers through local receptive fields and weight sharing. While self-attention mechanisms enable ViTs to theoretically capture global, long-range dependencies across an image, this architectural flexibility comes at the cost of extreme data-hungry requirements. In our 700-image dataset, the ViT models (particularly ViT-B/32) struggled to learn localized visual representations independently, leading to a diffuse confusion matrix and a severe collapse in precision for intermediate turbidity classes (e.g., 57% precision in the 320–440 NTU range). The transformers likely over-parameterized the broad scattering patterns while failing to discriminate the fine-grained color and intensity gradients near class boundaries.
Conversely, the superior performance of ResNet-50 (96.3% overall accuracy) validates the necessity of convolutional inductive biases for frugal environmental datasets. The residual connections in ResNet-50 preserve gradient flow across deep layers, allowing the network to extract highly abstract features from subtle spatial variations in turbidity without requiring tens of thousands of examples to converge. Similarly, the multi-scale feature extraction of GoogLeNet’s inception modules proved highly efficient at isolating relevant scattering artifacts. Ultimately, these results empirically demonstrate that for pond-scale turbidity classification—where large-scale annotated datasets are economically and logistically prohibitive to acquire—traditional CNNs provide a far more robust, data-efficient, and computationally practical foundation than state-of-the-art transformer architectures.

3.2. In Situ Validation with Embedded Inference

To validate edge inference latency, system integration under natural illumination, and pipeline stability—rather than to perform a statistically exhaustive multi-class field evaluation—the trained ResNet-50 model was quantized and deployed on a Raspberry Pi 4 Model B 4GB RAM (Raspberry Pi Foundation, Cambridge, UK) as a hardware-in-the-loop proof-of-concept. The system was installed adjacent to a rainwater collection tank utilized for both supplemental irrigation and livestock watering. When connected to a digital RGB camera, the edge node acquired surface images, performed local inference, and transmitted the predicted NTU class to a local display (Figure 4). To quantify the spatial heterogeneity of turbidity relevant to the intake location and filtration systems, a virtual sampling grid was superimposed on the water surface. Random samples were collected at predefined grid nodes and validated against a physical turbidimeter. The scene illustrated in Figure 5 shows the spatial heterogeneity that motivates the grid-based mapping and the local NTU calibration used in this study, as well as the operational relevance near intake locations for filtration maintenance and post-storm sediment management.
Quantitatively, this initial field deployment comprised 17 physically labeled ground-truth points collected during a calm-weather window (see representative samples in Figure 6). As summarized in Table 4, the embedded system achieved an overall accuracy of 82.4% (14/17). The model exhibited highly reliable performance in Class 1 (200–320 NTU), achieving 100.0% precision, 93.3% recall, and a 96.6% F1 Score ( n = 15 ). This indicates that the system reliably recognizes baseline turbidity directly on the edge device. Due to stable weather conditions and the absence of recent storm runoff, only baseline turbidity (Class 1) was prevalent; Classes 2 through 5 were scarce or entirely absent (n = 2, 0, 0, 0, respectively) during this acquisition window, which naturally depresses macro-averaged statistical scores and reflects the ephemeral nature of high-turbidity events in these systems.
It is crucial to note that this field trial was explicitly designed as a hardware-in-the-loop proof-of-concept, prioritizing the validation of edge inference latency, system integration under natural illumination, and pipeline stability, rather than attempting a statistically exhaustive, multi-seasonal validation. The system successfully demonstrated an inference latency of approximately 46 ms per image on the Raspberry Pi 4. This sub-second processing speed confirms the viability of executing complex CNNs locally, enabling rapid, grid-based spatial mapping without reliance on continuous cloud connectivity or massive bandwidth. While the limited representation of high-NTU classes underscores the logistical challenges of capturing ephemeral, storm-driven turbidity peaks in the wild, the system’s robust performance on the available data validates its edge-deployment architecture. From an operational perspective, these initial findings establish a stable baseline that paves the way for future crowdsourced data-collection strategies and directly supports the design conclusions for the Frugal AI farm-pond workflow discussed next.

4. Discussion

The shift towards data-driven water quality monitoring has predominantly branched into two broad families: (i) instrumented optics that mimic or augment nephelometry using custom hardware, and (ii) image-only methods that infer turbidity directly from RGB imagery under a standardized capture protocol. As summarized in Table 5, recent exemplars cover both ends of this spectrum across laboratory and field settings. However, the emerging paradigm of Frugal AI for agriculture demands that we critically assess not only the sensing modality but also the computational and data efficiency of the underlying architectures.
The proposed system targets the image-only family but differentiates itself by explicitly addressing the severe data and hardware constraints of farm-pond environments. Rather than deploying highly parameterized, data-hungry models (such as Vision Transformers) that struggle to converge on small environmental datasets, our framework is engineered around the inductive biases of mature CNNs. By targeting the high-turbidity ranges typical of wind-stressed and storm-fed farm ponds (200–800 NTU) under standardized illumination, the system explicitly calibrates visual classes to local NTU measurements. This frugal, edge-native approach bridges the gap between expensive in situ probes and lab-restricted imaging, translating raw pixels into operational decisions for filtration, chlorination, and livestock watering with sub-second latency.

4.1. Comparison with Instrumented Smartphone Optics

Relative to instrumented smartphone approaches, Koydemir et al. [55] combine transmittance and scattering via fiber-coupled illumination to achieve a wide dynamic range (0.3–2000 NTU). However, as highlighted in the “Platform and Modality” and “Setting” columns of Table 5, this requires ancillary optical add-ons and a cuvette-centered workflow. This hardware dependency limits spatial mapping and complicates high-frequency, pond-scale scans. In contrast, the present Frugal AI approach forgoes complex optical add-ons on the imaging device and relies instead on commodity imaging in a simple, standardized lightbox. This shifts the computational burden to the edge (Raspberry Pi), emphasizing rapid, spatially distributed classification directly tied to farm operations rather than full-spectrum laboratory quantification.

4.2. Comparison with In Situ Point Sensors

Compared with open-source and low-cost in situ sensors—such as the open-hardware probe validated by Droujko and Molnar [52] and the compact, low-power node for continuous stormwater monitoring by Wang et al. [48]—these point instruments excel at providing uninterrupted time series and enabling unattended deployment. However, they do not provide instantaneous spatial coverage of pond surfaces or human-interpretable visual classes. The proposed system complements such probes by enabling fast, grid-based surveys (e.g., pre-intake checks, post-storm mapping) while binding image classes to NTU bins aligned with farm thresholds, with occasional cross-calibration maintaining traceability (cf. “Platform and Modality” and “Target NTU Range” in Table 5).

4.3. Comparison with Bench and Industrial Optics

Against bench optics, Zhu et al. [54] report high agreement with commercial turbidimetry using dual NIR cameras, while Mullins et al. [56] demonstrate accurate prediction in wastewater using industrial cameras. As Table 5 indicates, these approaches rely on heavy hardware setups and PC-level processing confined to the laboratory or controlled industrial facilities. They are not engineered for edge inference. The present framework explicitly addresses this by optimizing for embedded Frugal AI deployment. By standardizing the capture environment with a diffuse lightbox and deploying a lightweight CNN (ResNet-50 via TensorFlow Lite), the system achieves a latency of 46 ms on a Raspberry Pi, aligning with Frugal AI principles for unconstrained agricultural environments.

4.4. Comparison with Image-Only Smartphone Protocols

The most prominent methodological contrast emerges when comparing our framework against recent image-only smartphone protocols. For instance, Soto et al. [47] and Jantarakasem et al. [49] report near-ceiling accuracy using commodity smartphones. However, as the “Target NTU Range” and “Dataset (n)” columns in Table 5 reveal, these studies operate almost exclusively in the low-to-moderate turbidity regimes (0–180 NTU and 0–40 NTU, respectively) suitable for drinking water, and often rely on massive datasets (e.g., > 11,000 images).
Our method shifts the operating window entirely into the high-turbidity regime (200–800 NTU), which dominates farm-pond operations during storms. Crucially, our system converges with only 700 images—a classic “Small Data” scenario. This data efficiency is a direct result of our architectural benchmarking: traditional CNNs possess the necessary inductive biases (translation equivariance) to learn robust scattering features from limited samples, whereas modern, data-hungry architectures such as Vision Transformers can overfit or fail to converge.
The rapid proliferation of mobile imaging for water quality has culminated in recent comprehensive reviews (e.g., Jantarakasem et al. [60]), which highlight both the accessibility of these methods and their inherent vulnerability to variable ambient lighting. While lab-focused studies [46,51,53] and ambient smartphone methods [34] validate the feasibility of RGB inference, our approach directly addresses this environmental vulnerability by strictly enforcing physical capture discipline (the lightbox). This ensures the Frugal AI model remains stable when deployed at the edge, abstracting away the nuisance variables introduced by lighting that plague open-field smartphone captures. Ultimately, this makes the approaches complementary: phone-only massive-data models for potable ranges, and standardized optics with small-data CNNs for high-NTU pond management.

4.5. Comparison with Underwater and Marine Imaging

In underwater scene analysis, Rudy and Wilson [50] estimate turbidity from natural images (≤55 FNU), validating a non-contact, in-scene approach. However, underwater imagery, lighting, and scattering dynamics differ markedly from the top-down pond-surface captures required for routine farm operations. Similarly, early marine-oriented designs by Sampedro and Salgueiro [59] utilized RGB sensors merely to augment traditional turbidimeters. Our framework elevates the RGB image from an auxiliary signal to the primary source of intelligence, mapping pixels directly to agronomic thresholds at the edge.

4.6. Advantages of the Proposed Framework

Synthesizing the comparisons from Table 5, the proposed framework offers a unique set of advantages explicitly tailored for agricultural water security:
  • Targeted High-Turbidity Focus (200–800 NTU): Directly addresses the operational ranges that drive filter headloss, clogging risk, and chlorination failure in farm ponds, filling a gap left by potable-water models.
  • Architectural Superiority for “Small Data”: Demonstrates that for constrained environmental datasets ( n = 700 ), mature CNNs provide the essential inductive biases needed for high accuracy, avoiding the data hunger and convergence failures of Vision Transformers.
  • Frugal Edge Deployment: Replaces cloud-dependent processing and PC/GPU hardware with a quantized TFLite pipeline, achieving near real-time spatial mapping (46 ms/image) on a low-cost Raspberry Pi 4.
  • Standardized Protocol Resilience: Utilizes a low-cost lightbox to enforce optical consistency, ensuring that the Frugal AI’s feature extraction is grounded in physical turbidity changes rather than ambient lighting artifacts.

4.7. Edge–Cloud Cognitive Workflow: Limitations and Future Directions

Figure 7 summarizes the proposed approach as an edge-first cognitive decision workflow that couples perception, inference, and action directly at the point of sensing, while assigning the cloud a supervisory role for traceability, data curation, and continuous model refinement. In this pipeline, pond-side RGB acquisition feeds a quantized ResNet-50 model deployed on a Raspberry Pi via TensorFlow Lite. The optimized model maintains a highly efficient memory footprint of approximately 24.5 MB and requires a peak CPU utilization of ∼35–40% during execution, enabling low-latency screening with a latency of approximately ∼46 ms/image. Rather than continuously streaming bandwidth-intensive raw imagery, the architecture transmits compact categorical outputs and escalates only uncertain samples. This frugal design is well aligned with the connectivity and power constraints typical of precision-agriculture deployments.
The cognitive value of the system extends beyond classification accuracy alone; it lies in its ability to arbitrate actions when visual evidence becomes unreliable. The edge node computes a predictive confidence score c and applies a gating rule with threshold τ , using confidence as a practical proxy for epistemic uncertainty in downstream decision-making. High-confidence predictions ( c τ ) can autonomously trigger routine decision support actions (e.g., filter maintenance scheduling or intake control). In contrast, low-confidence predictions ( c < τ ) are treated as indicators of possible out-of-distribution conditions. Instead of forcing a potentially erroneous class assignment, the system conservatively escalates these observations by prompting image re-acquisition, notifying human operators, or archiving the frame and metadata for cloud-side inspection. By embedding uncertainty-aware rules directly at the sensor, the architecture reduces the risk of overconfident actuation under changing environmental conditions.
Open-field agricultural environments are inherently stochastic. Direct sunlight, specular glare, wind-driven ripples, and high-NTU storm events can introduce substantial domain shifts, degrading model reliability. Although the initial hardware-in-the-loop field deployment successfully validated edge latency and operational stability, it relied on a diffuse lightbox to establish a controlled optical baseline over a weather-constrained, low-turbidity subset of 17 points. Static deep learning models are known to be brittle under unconstrained ecological variability; therefore, this environmental stochasticity should not be viewed merely as a limitation, but as the primary motivation for the architecture’s built-in cognitive feedback loop (Figure 7).
To improve robustness against concept drift, future iterations will operationalize an active-learning workflow driven by the confidence-gating mechanism. In this setting, low-confidence predictions serve as empirical evidence of emerging domain shifts, allowing the edge node to selectively curate anomalous frames and associated metadata for targeted cloud-side review, augmentation, and periodic retraining. The resulting “monitor–adapt–redeploy” cycle can be further strengthened through crowdsourced data collection across diverse weather and turbidity conditions, enabling the model to progressively assimilate new environmental distributions. In parallel, lightweight photometric hardware adaptations (e.g., cross-polarizers) may be incorporated to suppress glare before inference. Together, these algorithmic and hardware strategies define a scalable, pragmatic path toward long-term turbidity monitoring.
Moving forward, several environmental confounders must be systematically addressed to transition this proof-of-concept into a fully unconstrained field deployment. First, because the model was trained using native soil from a specific geographical location, the framework inherently captures local optical scattering signatures. Deploying this pipeline in regions with vastly different geological profiles (e.g., dark silt versus red clay) will necessitate a brief site-specific calibration or fine-tuning phase to maintain accuracy. Second, biological contributors to turbidity—such as suspended algae, macrophyte fragments, and organic particulates—can produce spectral signatures that differ from those of inorganic sediment at equivalent NTU values. Future work must evaluate model robustness across turbidities arising from these distinct optical properties, which are common in rain-fed ponds during the growing season. Finally, the current protocol relies on static, lightbox-acquired images of collected samples rather than direct surface captures. Future validation campaigns must incorporate disturbed-surface scenarios—such as wind-driven ripples and animal-induced agitation—to quantify their effects on classification confidence and to inform the design of physically robust capture fixtures for in situ deployment that will permanently integrate an enclosed capture zone, rather than relying on unshielded, open-surface imaging.

5. Conclusions

Event-driven and wind-resuspension variability in small rain-fed ponds carries immediate operational consequences for irrigation hardware, on-farm disinfection, and livestock health. This study challenges the prevailing trend of deploying highly parameterized, data-hungry models for environmental screening and instead demonstrates a highly efficient, edge-native monitoring pathway. By leveraging standardized RGB imaging and local calibration, the proposed framework maps visual appearance to operationally relevant NTU classes in accordance with the principles of Frugal AI.
A systematic cross-architecture evaluation using a curated 700-image dataset established that traditional Convolutional Neural Networks (CNNs) possess the optimal inductive bias for this small-data task. Mature CNNs significantly outperformed modern Vision Transformers (ViTs), with ResNet-50 achieving 96.3% test accuracy and an F1-score of 0.96, compared to the 80.0–89.3% range exhibited by the transformer baselines.
An initial hardware-in-the-loop proof-of-concept in a rain-fed pond confirmed edge latency and pipeline stability (46 ms/image), achieving 82.4% agreement under calm-weather, low-turbidity conditions (n = 17). Full multi-class field validation across storm-driven high-turbidity events remains an explicit priority for future work.
The core contributions of this work establish a scalable foundation for agricultural water security:
  • Architectural Benchmarking: Empirical validation that CNNs outperform ViTs in small-data turbidity classification, highlighting the necessity of translation equivariance for data-efficient environmental learning.
  • Standardized Protocol and Curation: A reproducible imaging protocol and a traceable, instrument-calibrated dataset (700 images) anchored to high-turbidity farm operations (200–800 NTU).
  • Frugal Edge Deployment: Near real-time embedded inference (46 ms/image) via TensorFlow Lite, proving that complex spatial pattern recognition can execute locally without cloud reliance.
  • Operational Translation: A field-validated workflow that maps discrete RGB classes to actionable agronomic thresholds, complementing expensive point sensors and bridging the gap between potable-water protocols and high-NTU agricultural realities.
Ultimately, this research provides a practical, scalable, and computationally frugal foundation for continuous pond-scale turbidity surveillance, ready to support precision irrigation and routine farm-water decisions using commodity optics and inexpensive edge computing.

Author Contributions

Methodology, M.M. and I.T.-Z.; Software, V.A.G.-H. and R.F.S.-C.; Validation, M.M., I.T.-Z., V.A.G.-H., R.F.S.-C., R.G.G. and G.P.C.; Formal analysis, M.M.; Investigation, M.M., I.T.-Z. and G.P.C.; Resources, R.G.G.; Data curation, V.A.G.-H.; Writing—original draft, M.M., I.T.-Z. and V.A.G.-H.; Writing—review and editing, I.T.-Z., R.F.S.-C., R.G.G. and G.P.C.; Visualization, I.T.-Z. and V.A.G.-H. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are openly available in Zenodo at https://doi.org/10.5281/zenodo.12631881.

Acknowledgments

During the preparation of this manuscript, the authors used Grammarly (Premium version) to polish our language. The authors have reviewed and edited the output and take full responsibility for the content of this publication.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Vinuesa, R.; Azizpour, H.; Leite, I.; Balaam, M.; Dignum, V.; Domisch, S.; Felländer, A.; Langhans, S.D.; Tegmark, M.; Fuso Nerini, F. The role of artificial intelligence in achieving the Sustainable Development Goals. Nat. Commun. 2020, 11, 233. [Google Scholar] [CrossRef]
  2. Gandomi, A.; Haider, M. Beyond the hype: Big data concepts, methods, and analytics. Int. J. Inf. Manag. 2015, 35, 137–144. [Google Scholar] [CrossRef]
  3. United Nations. The United Nations World Water Development Report 2024: Water for Prosperity and Peace; UNESCO: Paris, France, 2024. [Google Scholar]
  4. United Nations. The Sustainable Development Goals Report 2024; United Nations: New York, NY, USA, 2024. [Google Scholar]
  5. OECD. The Short and Winding Road to 2030: Measuring Distance to the SDG Targets; Technical report; OECD: Paris, France, 2022. [Google Scholar] [CrossRef]
  6. World Health Organization; UNICEF. Progress on Household Drinking Water, Sanitation and Hygiene 2000–2024: Special Focus on Inequalities; Technical report; World Health Organization: Geneva, Switzerland, 2025. [Google Scholar]
  7. Comisión Nacional de Zonas Áridas (CONAZA). Catálogo de Obras y Acciones CONAZA (PIASRE) [Catalog of Works and Actions CONAZA (PIASRE)]. 2023. Available online: https://www.gob.mx/cms/uploads/attachment/file/870640/Catalogo_de_Obras_y_Acciones_CONAZA__PIASRE_.pdf (accessed on 25 June 2025).
  8. Shen, C.; Liao, Q.; Titi, H.H.; Li, J. Turbidity of Stormwater Runoff from Highway Construction Sites. J. Environ. Eng. 2018, 144, 04018061. [Google Scholar] [CrossRef]
  9. Grimm, A.G.; Tirpak, R.A.; Winston, R.J. Monitoring the impacts of rainfall characteristics on sediment loss from road construction sites. Environ. Sci. Pollut. Res. 2024, 31, 32428–32440. [Google Scholar] [CrossRef] [PubMed]
  10. Drake, J.; Young, D.; McIntosh, N. Performance of an Underground Stormwater Detention Chamber and Comparison with Stormwater Management Ponds. Water 2016, 8, 211. [Google Scholar] [CrossRef]
  11. Youn, C.H.; Pandit, A. Estimation of Average Annual Removal Efficiencies of Wet Detention Ponds Using Continuous Simulation. J. Hydrol. Eng. 2012, 17, 1230–1239. [Google Scholar] [CrossRef]
  12. Li, Y.; Tang, C.; Wang, J.; Acharya, K.; Du, W.; Gao, X.; Luo, L.; Li, H.; Dai, S.; Mercy, J.; et al. Effect of wave-current interactions on sediment resuspension in large shallow Lake Taihu, China. Environ. Sci. Pollut. Res. 2016, 24, 4029–4039. [Google Scholar] [CrossRef]
  13. Ding, W.; Zhao, J.; Qin, B.; Wu, T.; Zhu, S.; Li, Y.; Xu, S.; Ruan, S.; Wang, Y. Exploring and quantifying the relationship between instantaneous wind speed and turbidity in a large shallow lake: Case study of Lake Taihu in China. Environ. Sci. Pollut. Res. 2021, 28, 16616–16632. [Google Scholar] [CrossRef]
  14. Yao, X.; Liu, X.; Zhou, Y.; Zhang, L.; Zhou, Z.; Zhang, Y. The influence of wind-induced sediment resuspension and migration on raw water turbidity in Lake Taihu, China. Environ. Sci. Pollut. Res. 2022, 29, 84487–84503. [Google Scholar] [CrossRef]
  15. Arias-Rodriguez, L.F.; Duan, Z.; Sepúlveda, R.; Martinez-Martinez, S.I.; Disse, M. Monitoring Water Quality of Valle de Bravo Reservoir, Mexico, Using Entire Lifespan of MERIS Data and Machine Learning Approaches. Remote Sens. 2020, 12, 1586. [Google Scholar] [CrossRef]
  16. Anyango, G.W.; Bhowmick, G.D.; Sahoo Bhattacharya, N. A critical review of irrigation water quality index and water quality management practices in micro-irrigation for efficient policy making. Desalin. Water Treat. 2024, 318, 100304. [Google Scholar] [CrossRef]
  17. Oliver, M.; Pezzaniti, D.; Hewa, G. Emitter clogging in a reclaimed water irrigation scheme with controlled suspended load. Int. J. Sustain. Dev. Plan. 2014, 9, 847–860. [Google Scholar] [CrossRef]
  18. Duran-Ros, M.; Arbat, G.; Barragán, J.; Ramírez de Cartagena, F.; Puig-Bargués, J. Assessment of head loss equations developed with dimensional analysis for micro irrigation filters using effluents. Biosyst. Eng. 2010, 106, 521–526. [Google Scholar] [CrossRef]
  19. Hu, Y.; Wu, W.; Liu, H.; Huang, Y.; Bi, X.; Liao, R.; Yin, S. Dimensional Analysis Model of Head Loss for Sand Media Filters in a Drip Irrigation System Using Reclaimed Water. Water 2022, 14, 961. [Google Scholar] [CrossRef]
  20. Yurdem, H.; Demir, V.; Degirmencioglu, A. Development of a mathematical model to predict head losses from disc filters in drip irrigation systems using dimensional analysis. Biosyst. Eng. 2008, 100, 14–23. [Google Scholar] [CrossRef]
  21. World Health Organization. Water Quality and Health—Review of Turbidity: Information for Regulators and Water Suppliers; Technical brief WHO/FWC/WSH/17.01; World Health Organization: Geneva, Switzerland, 2017. [Google Scholar]
  22. LeChevallier, M.W.; Evans, T.M.; Seidler, R.J. Effect of turbidity on chlorination efficiency and bacterial persistence in drinking water. Appl. Environ. Microbiol. 1981, 42, 159–167. [Google Scholar] [CrossRef]
  23. Léziart, T.; Dutheil de la Rochere, P.M.; Cheswick, R.; Jarvis, P.; Nocker, A. Effect of turbidity on water disinfection by chlorination with the emphasis on humic acids and chalk. Environ. Technol. 2019, 40, 1734–1743. [Google Scholar] [CrossRef]
  24. Smith, R.P.; Ashmore, A.; Moore, A.; Pritchard, G.C.; Donn, A.; Paiba, G.A. Turbidity as an Indicator of Escherichia coli Presence in Water Troughs on Cattle Farms. J. Dairy Sci. 2008, 91, 2874–2883. [Google Scholar] [CrossRef]
  25. LeJeune, J.T.; Besser, T.E.; Rice, D.H.; Berg, J.L.; Stilborn, R.P.; Hancock, D.D. Cattle Water Troughs as Reservoirs of Escherichia coli O157. Appl. Environ. Microbiol. 2001, 67, 3053–3057. [Google Scholar] [CrossRef]
  26. Secretaría de Medio Ambiente y Recursos Naturales (SEMARNAT). NORMA Oficial Mexicana NOM-001-SEMARNAT-2021, Que Establece los Límites Permisibles de Contaminantes en las Descargas de Aguas Residuales en Cuerpos Receptores Propiedad de la Nación; Diario Oficial de la Federación: Mexico City, Mexico, 2022. [Google Scholar]
  27. Secretaría de Salud (SSA). NORMA Oficial Mexicana NOM-127-SSA1-2021, Agua Para Uso y Consumo Humano. Límites Permisibles de la Calidad DEL Agua; Diario Oficial de la Federación: Mexico City, Mexico, 2022. [Google Scholar]
  28. Shoushtarian, F.; Negahban-Azar, M. Worldwide Regulations and Guidelines for Agricultural Water Reuse: A Critical Review. Water 2020, 12, 971. [Google Scholar] [CrossRef]
  29. State of California. California Code of Regulations, Title 22, Division 4, Chapter 3: Water Recycling Criteria. California Code of Regulations. Véase también §60304 (LII). 2018. Available online: https://www.law.cornell.edu/regulations/california/22-CCR-60304 (accessed on 10 May 2025).
  30. Lewis, J.; Eads, R.E. Implementation Guide for Turbidity Threshold Sampling: Principles, Procedures, and Analysis; General Technical Report PSW-GTR-212; U.S. Department of Agriculture, Forest Service, Pacific Southwest Research Station: Albany, CA, USA, 2009. [Google Scholar]
  31. Jastram, J.D.; Moyer, D.L.; Hyer, K.E. A Comparison of Turbidity-Based and Streamflow-Based Estimates of Suspended-Sediment Concentrations in Three Chesapeake Bay Tributaries; Technical Report Scientific Investigations Report 2009-5165; U.S. Geological Survey: Reston, VA, USA, 2009. [Google Scholar]
  32. Trejo-Zúñiga, I.; Moreno, M.; Santana-Cruz, R.F.; Meléndez-Vázquez, F. Deep-Learning-Driven Turbidity Level Classification. Big Data Cogn. Comput. 2024, 8, 89. [Google Scholar] [CrossRef]
  33. Parra, L.; Ahmad, A.; Sendra, S.; Lloret, J.; Lorenz, P. Combination of Machine Learning and RGB Sensors to Quantify and Classify Water Turbidity. Chemosensors 2024, 12, 34. [Google Scholar] [CrossRef]
  34. Wilches, L.M.L.; Jantarakasem, C.; Sioné, L.; Templeton, M.; Mikolajczyk, K. Estimating water turbidity from a smartphone camera. In Proceedings of the 33rd British Machine Vision Conference 2022, BMVC 2022, London, UK, 21–24 November 2022; BMVA Press: Surrey, UK, 2022. [Google Scholar]
  35. Miglino, D.; Jomaa, S.; Rode, M.; Saddi, K.C.; Isgrò, F.; Manfreda, S. Technical note: Image processing for continuous river turbidity monitoring—Full-scale tests and potential applications. Hydrol. Earth Syst. Sci. 2025, 29, 4133–4151. [Google Scholar] [CrossRef]
  36. Özsert Yiğit, G.; Baransel, C. Utilizing machine learning techniques for enhanced water quality monitoring. Water Qual. Res. J. 2024, 59, 187–204. [Google Scholar] [CrossRef]
  37. Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016; Available online: http://www.deeplearningbook.org (accessed on 24 February 2026).
  38. Shorten, C.; Khoshgoftaar, T.M. A survey on Image Data Augmentation for Deep Learning. J. Big Data 2019, 6, 60. [Google Scholar] [CrossRef]
  39. TensorFlow. tf.image.resize API Documentation. TensorFlow API Docs. Available online: https://www.tensorflow.org/api_docs/python/tf/image/resize (accessed on 26 December 2025).
  40. Keras. Rescaling Layer Documentation. Keras API Docs. Available online: https://keras.io/api/layers/preprocessing_layers/image_preprocessing/rescaling/ (accessed on 26 December 2025).
  41. Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
  42. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: New York City, NY, USA, 2016; pp. 770–778. [Google Scholar] [CrossRef]
  43. Howard, A.; Sandler, M.; Chen, B.; Wang, W.; Chen, L.C.; Tan, M.; Chu, G.; Vasudevan, V.; Zhu, Y.; Pang, R.; et al. Searching for MobileNetV3. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV); IEEE: New York City, NY, USA, 2019; pp. 1314–1324. [Google Scholar] [CrossRef]
  44. Liu, Z.; Hu, H.; Lin, Y.; Yao, Z.; Xie, Z.; Wei, Y.; Ning, J.; Cao, Y.; Zhang, Z.; Dong, L.; et al. Swin Transformer V2: Scaling Up Capacity and Resolution. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: New York City, NY, USA, 2022; pp. 11999–12009. [Google Scholar] [CrossRef]
  45. Wang, Y.; Deng, Y.; Zheng, Y.; Chattopadhyay, P.; Wang, L. Vision Transformers for Image Classification: A Comparative Survey. Technologies 2025, 13, 32. [Google Scholar] [CrossRef]
  46. Nie, Y.; Chen, Y.; Guo, J.; Li, S.; Xiao, Y.; Gong, W.; Lan, R. An improved CNN model in image classification application on water turbidity. Sci. Rep. 2025, 15, 11264. [Google Scholar] [CrossRef]
  47. Soto, I.L.; Concha-Sánchez, Y.; Raya, A. An Image-Based Water Turbidity Classification Scheme Using a Convolutional Neural Network. Computation 2025, 13, 178. [Google Scholar] [CrossRef]
  48. Wang, M.; Shi, B.; Catsamas, S.; Kolotelo, P.; McCarthy, D. A Compact, Low-Cost, and Low-Power Turbidity Sensor for Continuous In Situ Stormwater Monitoring. Sensors 2024, 24, 3926. [Google Scholar] [CrossRef]
  49. Jantarakasem, C.; Sioné, L.; Templeton, M.R. Estimating drinking water turbidity using images collected by a smartphone camera. AQUA—Water Infrastruct. Ecosyst. Soc. 2024, 73, 1277–1284. [Google Scholar] [CrossRef]
  50. Rudy, I.M.; Wilson, M.J. Turbidivision: A machine vision application for estimating turbidity from underwater images. PeerJ 2024, 12, e18254. [Google Scholar] [CrossRef] [PubMed]
  51. Feizi, H.; Sattari, M.T.; Mosaferi, M.; Apaydin, H. An image-based deep learning model for water turbidity estimation in laboratory conditions. Int. J. Environ. Sci. Technol. 2022, 20, 149–160. [Google Scholar] [CrossRef]
  52. Droujko, J.; Molnar, P. Open-source, low-cost, in-situ turbidity sensor for river network monitoring. Sci. Rep. 2022, 12, 10341. [Google Scholar] [CrossRef]
  53. Lopez-Betancur, D.; Moreno, I.; Guerrero-Mendez, C.; Saucedo-Anaya, T.; González, E.; Bautista-Capetillo, C.; González-Trinidad, J. Convolutional Neural Network for Measurement of Suspended Solids and Turbidity. Appl. Sci. 2022, 12, 6079. [Google Scholar] [CrossRef]
  54. Zhu, Y.; Cao, P.; Liu, S.; Zheng, Y.; Huang, C. Development of a New Method for Turbidity Measurement Using Two NIR Digital Cameras. ACS Omega 2020, 5, 5421–5428. [Google Scholar] [CrossRef]
  55. Koydemir, H.C.; Rajpal, S.; Gumustekin, E.; Karinca, D.; Liang, K.; Göröcs, Z.; Tseng, D.; Ozcan, A. Smartphone-based turbidity reader. Sci. Rep. 2019, 9, 19901. [Google Scholar] [CrossRef]
  56. Mullins, D.; Coburn, D.; Hannon, L.; Jones, E.; Clifford, E.; Glavin, M. A novel image processing-based system for turbidity measurement in domestic and industrial wastewater. Water Sci. Technol. 2018, 77, 1469–1482. [Google Scholar] [CrossRef] [PubMed]
  57. Chai, M.M.E.; Ng, S.M.; Chua, H.S. An alternative cost-effective image processing-based sensor for continuous turbidity monitoring. In Proceedings of the AIP Conference Proceedings; AIP Publishing: New York, NY, USA, 2017. [Google Scholar] [CrossRef]
  58. Hussain, I.; Ahamad, K.; Nath, P. Water turbidity sensing using a smartphone. RSC Adv. 2016, 6, 22374–22382. [Google Scholar] [CrossRef]
  59. Sampedro, Ó; Salgueiro, J.R. Turbidimeter and RGB sensor for remote measurements in an aquatic medium. Measurement 2015, 68, 128–134. [Google Scholar] [CrossRef]
  60. Jantarakasem, C.; Sioné, L.; Templeton, M.R. A critical review of the use of smartphone cameras in water quality analysis. Environ. Technol. Rev. 2025, 15, 11–28. [Google Scholar] [CrossRef]
Figure 1. Overview of the model architectures evaluated for turbidity classification from RGB images of water samples. (a) CNN-based pipeline: The input image is processed through a hierarchical series of convolutional layers for spatial feature extraction, followed by a Multilayer Perceptron (MLP) head that predicts the final Nephelometric Turbidity Unit (NTU) class. (b) Vision Transformer (ViT) pipeline: The input image is divided into a grid of non-overlapping patches, which are linearly projected into sequence tokens. A learnable classification token (CLS) and positional embeddings (+Pos) are prepended to the sequence before processing by the Transformer encoder. The encoded representation of the CLS token is then mapped to the NTU class via an MLP head.
Figure 1. Overview of the model architectures evaluated for turbidity classification from RGB images of water samples. (a) CNN-based pipeline: The input image is processed through a hierarchical series of convolutional layers for spatial feature extraction, followed by a Multilayer Perceptron (MLP) head that predicts the final Nephelometric Turbidity Unit (NTU) class. (b) Vision Transformer (ViT) pipeline: The input image is divided into a grid of non-overlapping patches, which are linearly projected into sequence tokens. A learnable classification token (CLS) and positional embeddings (+Pos) are prepended to the sequence before processing by the Transformer encoder. The encoded representation of the CLS token is then mapped to the NTU class via an MLP head.
Bdcc 10 00126 g001
Figure 2. Performance metrics of deep learning models during training and validation: (a) validation accuracy across epochs; (b) validation loss; (c) training accuracy trend indicating learning progression; and (d) training loss convergence over time.
Figure 2. Performance metrics of deep learning models during training and validation: (a) validation accuracy across epochs; (b) validation loss; (c) training accuracy trend indicating learning progression; and (d) training loss convergence over time.
Bdcc 10 00126 g002
Figure 3. Confusion matrices for the six evaluated deep learning architectures on the independent test set. The axes map the predicted turbidity class to the ground–truth NTU intervals, highlighting the superior diagonal dominance of CNN models relative to transformers.
Figure 3. Confusion matrices for the six evaluated deep learning architectures on the independent test set. The axes map the predicted turbidity class to the ground–truth NTU intervals, highlighting the superior diagonal dominance of CNN models relative to transformers.
Bdcc 10 00126 g003
Figure 4. Experimental setup for in situ edge inference, featuring the portable photobox, the RGB camera, the Raspberry Pi 4 edge node, and the local display interface.
Figure 4. Experimental setup for in situ edge inference, featuring the portable photobox, the RGB camera, the Raspberry Pi 4 edge node, and the local display interface.
Bdcc 10 00126 g004
Figure 5. Rain-fed farm pond during the wet season, exhibiting surface macrophytes and patchy algal mats following storm inflows.
Figure 5. Rain-fed farm pond during the wet season, exhibiting surface macrophytes and patchy algal mats following storm inflows.
Bdcc 10 00126 g005
Figure 6. Representative in situ water samples captured using the portable photobox system, categorized by their corresponding ground-truth NTU class.
Figure 6. Representative in situ water samples captured using the portable photobox system, categorized by their corresponding ground-truth NTU class.
Bdcc 10 00126 g006
Figure 7. Edge–cloud cognitive IoT workflow for pond turbidity monitoring. A quantized ResNet-50 model running on a Raspberry Pi performs edge inference (NTU class and confidence score c). High-confidence predictions ( c τ ) trigger local routine decision support, whereas low-confidence outputs ( c < τ ) selectively escalate frames and metadata to the cloud for inspection, curation, and periodic model redeployment, thereby enabling a monitor–adapt–redeploy cycle without continuous raw-image streaming.
Figure 7. Edge–cloud cognitive IoT workflow for pond turbidity monitoring. A quantized ResNet-50 model running on a Raspberry Pi performs edge inference (NTU class and confidence score c). High-confidence predictions ( c τ ) trigger local routine decision support, whereas low-confidence outputs ( c < τ ) selectively escalate frames and metadata to the cloud for inspection, curation, and periodic model redeployment, thereby enabling a monitor–adapt–redeploy cycle without continuous raw-image streaming.
Bdcc 10 00126 g007
Table 1. Turbidity classes used in the dataset and their corresponding NTU intervals.
Table 1. Turbidity classes used in the dataset and their corresponding NTU intervals.
Class NumberNTU RangeTurbidity LevelNumber of Images
1200–320Low140
2320–440Moderate140
3440–560Intermediate140
4560–680High140
5680–800Very High140
Table 2. Class-averaged performance metrics (precision, recall, and F1-score) and overall accuracy for the evaluated deep learning architectures. Shaded cells indicate the best value in each column (ties are jointly highlighted).
Table 2. Class-averaged performance metrics (precision, recall, and F1-score) and overall accuracy for the evaluated deep learning architectures. Shaded cells indicate the best value in each column (ties are jointly highlighted).
ModelPrecision (avg)Recall (avg)F1-Score (avg)Accuracy
GoogLeNet0.960.950.950.95
ResNet-500.960.960.960.96
MobileNetV30.920.920.920.92
Swin Transformer V2-Base0.890.890.890.89
ViT-B/160.850.840.840.84
ViT-B/320.830.800.800.80
Note: “avg” denotes class-averaged metrics across the five NTU classes.
Table 3. Per-class precision, recall, and F1-score of the evaluated deep learning models across NTU intervals. Shaded cells indicate the best result in each row (ties are jointly highlighted).
Table 3. Per-class precision, recall, and F1-score of the evaluated deep learning models across NTU intervals. Shaded cells indicate the best result in each row (ties are jointly highlighted).
(a) Precision
NTU classGNRN50MNV3SV2BVB16VB32
200–3201.001.001.000.980.980.95
320–4400.980.930.910.860.930.57
440–5600.861.000.880.980.700.93
560–6800.950.910.910.860.770.86
680–8001.000.980.930.790.860.69
(b) Recall
NTU classGNRN50MNV3SV2BVB16VB32
200–3201.001.001.000.891.000.68
320–4401.001.000.950.970.760.86
440–5600.970.900.880.930.810.93
560–6800.880.980.870.790.830.73
680–8000.950.950.930.890.860.91
(c) F1-score
NTU classGNRN50MNV3SV2BVB16VB32
200–3201.001.001.000.930.990.79
320–4400.990.960.930.920.840.68
440–5600.910.950.880.950.750.93
560–6800.910.940.890.830.800.79
680–8000.980.960.930.840.860.78
Abbreviations: GN = GoogLeNet; RN50 = ResNet-50; MNV3 = MobileNetV3; SV2B = Swin Transformer V2-Base; VB16 = ViT-B/16; VB32 = ViT-B/32.
Table 4. Quantitative performance of the embedded ResNet-50 model during the in situ hardware-in-the-loop field validation.
Table 4. Quantitative performance of the embedded ResNet-50 model during the in situ hardware-in-the-loop field validation.
ClassPrecisionRecallF1-ScoreSupport
Class 11.000.930.9615
Class 20.000.000.002
Class 3 a0.000.000.000
Macro avg0.330.310.3217
Weighted avg0.880.820.8517
Overall accuracy: 0.82 (17 field samples). a Class 3 had zero support in this field-validation run; per-class metrics are reported as 0.00 by convention.
Table 5. Comparative summary of recent image- and optics-based approaches for turbidity estimation. The table summarizes the methodological landscape relevant to the proposed framework, particularly regarding platform, computational strategy, target turbidity range, data volume, and deployment setting.
Table 5. Comparative summary of recent image- and optics-based approaches for turbidity estimation. The table summarizes the methodological landscape relevant to the proposed framework, particularly regarding platform, computational strategy, target turbidity range, data volume, and deployment setting.
Ref.Platform & ModalityAlgorithm/ModelTarget Range (Reported Units)Dataset/Data VolumeSetting
This StudyEdge (Raspberry Pi 4) + RGBCNN (ResNet-50, TFLite)High (200–800 NTU)700 Images (Small Data)Lab & Field
[46]Industrial camera + controlled imaging (RGB)Improved CNN (noise-robust variants)69.1–878 NTU (5 classes)250 imagesLab
[47]Smartphone + standardized RGB imagingCNN (EfficientNet-B0)0–180 NTU11,518 imagesLab
[35]Off-the-shelf riverbank cameraClassical CV + calibration regressionContinuous river turbidityNot reported (time-series monitoring)Field
[48]Compact optical node (LEDs + phototransistors)Embedded optical sensing/calibration0–250 NTU (calibration)2 deployed nodes (>6 months)Field
[49]Smartphone (no accessory) + RGBBayesian CNN (classification/regression)0–40 NTU15,401 imagesLab (field-mimicking)
[32]Standard camera + RGBDeep Learning (CNNs)200–800 NTU (4 classes)700 imagesLab & Field (preliminary)
[33]Low-cost RGB sensor + LED arrayML (LR/ANN/SVM/k-NN/RF)0.02–60 NTU21 samples × 64 combinationsLab
[50]Consumer underwater camerasYOLOv8 + regression0–55 FNU675 imagesField & Lab
[51]Digital camera (Canon 1300D) + grayscale imagesCNN classifier0 to >250 NTU (5 classes)200 imagesLab
[52]Open-source in-situ optical sensorMulti-range calibration/regression0.5–4000 NTU (multi-range)N/A (calibration + deployments)Lab & Field
[53]Smartphone + controlled RGB LEDCNN (AlexNet) + MLR0–306 NTU88,000 train + 1100 val + 1100 testLab
[54]Lab-built dual NIR camerasOptical modeling/image-based estimation0–1000 NTU20 samplesLab
[55]Smartphone + optical add-onPhysics-based scattering (R/G ratio)0.3–2000 NTUNot reportedLab & Field
[56]Industrial camera + enclosureClassical CV + regression30–250 FAU (effective camera range)31 samples (12 images/sample)Lab
[57]Video camera + LED enclosureClassical image processing0.86–500 NTU9 samples × 7 depths × 4 reps (≈252 captures)Prototype/Lab
[58]Smartphone sensors + IR LED (ambient/proximity)Sensor-signal calibration (not camera CV)0–400 NTUNot reportedLab/Portable
[59]Remote embedded turbidimeter + RGB + commsSignal modeling + remote monitoringDrinking-water monitoring proxy930 measurementsLab & Field
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Moreno, M.; Trejo-Zúñiga, I.; González-Huitrón, V.A.; Santana-Cruz, R.F.; García García, R.; Pineda Chacón, G. Edge Node Deployment for Turbidity Estimation in Farm Ponds. Big Data Cogn. Comput. 2026, 10, 126. https://doi.org/10.3390/bdcc10040126

AMA Style

Moreno M, Trejo-Zúñiga I, González-Huitrón VA, Santana-Cruz RF, García García R, Pineda Chacón G. Edge Node Deployment for Turbidity Estimation in Farm Ponds. Big Data and Cognitive Computing. 2026; 10(4):126. https://doi.org/10.3390/bdcc10040126

Chicago/Turabian Style

Moreno, Martin, Iván Trejo-Zúñiga, Víctor Alejandro González-Huitrón, René Francisco Santana-Cruz, Raúl García García, and Gabriela Pineda Chacón. 2026. "Edge Node Deployment for Turbidity Estimation in Farm Ponds" Big Data and Cognitive Computing 10, no. 4: 126. https://doi.org/10.3390/bdcc10040126

APA Style

Moreno, M., Trejo-Zúñiga, I., González-Huitrón, V. A., Santana-Cruz, R. F., García García, R., & Pineda Chacón, G. (2026). Edge Node Deployment for Turbidity Estimation in Farm Ponds. Big Data and Cognitive Computing, 10(4), 126. https://doi.org/10.3390/bdcc10040126

Article Metrics

Back to TopTop