Cross-System Short-Term Dissolved Oxygen Prediction in Aquaponic Systems Using Multivariate Neural Network Models

Alanis, Arnulfo; Gutierrez, Karime; Marquez, Bogart Yail; Guarda, Teresa; Dueñas, Felix

doi:10.3390/app16136298

Open AccessArticle

Cross-System Short-Term Dissolved Oxygen Prediction in Aquaponic Systems Using Multivariate Neural Network Models

by

Arnulfo Alanis

^1,*

,

Karime Gutierrez

¹,

Bogart Yail Marquez

¹

,

Teresa Guarda

²

and

Felix Dueñas

³

¹

Systems and Computer Department, National Technology of México, Campus Tijuana, Calzada del Tecnológico S/N, Fraccionamiento Tomas Aquino, Tijuana C.P. 22414, Baja California, Mexico

²

Faculty of Systems and Telecommunications, Universidad Estatal Península Santa Elena, Santa Elena 240204, Ecuador

³

Everest Internet Solutions S de R.L. de C.V., Tijuana C.P. 22106, Baja California, Mexico

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2026, 16(13), 6298; https://doi.org/10.3390/app16136298 (registering DOI)

Submission received: 4 April 2026 / Revised: 7 June 2026 / Accepted: 15 June 2026 / Published: 23 June 2026

(This article belongs to the Section Environmental Sciences)

Download

Browse Figures

Versions Notes

Abstract

Aquaponic systems show complex multivariate dynamics in water quality parameters, with dissolved oxygen (DO) being a key indicator of biological stability. This study presents a dynamic multivariate predictive framework for short-term dissolved oxygen forecasting utilizing IoT data gathered from various heterogeneous aquaponic ponds. The issue is redefined as a regression task to forecast future DO values within a brief time-frame (~5 min), enabling early warning functionalities instead of utilizing a rule-based classification method. To ensure structural robustness across systems, we applied intra-pond percentile trimming and normalization procedures to mitigate the differences in scale between ponds. Using a Leave-One-Pond-Out (LOPO) validation scheme, we tested model performance and cross-system generalization. An MLP feedforward neural network with lagged temporal variables had an average RMSE of 0.83 on a normalized scale. Regime-based error analysis showed that the RMSE increased from 0.80 on stable conditions to 1.43 under high-volatile regimes. A comparative LSTM model did not produce substantial performance enhancements. Sensitivity analysis revealed lagged impacts of pH and turbidity on subsequent DO dynamics, indicating the need for operational measures such as aeration modification and suspended solids management.

Keywords:

aquaponic systems; dissolved oxygen prediction; Leave-One-Pond-Out validation; cross-system generalization; neural networks

1. Introduction

Aquaponics is one of the emerging sustainable approaches for integrated food production with considerable environmental benefits. Due to their closed-loop recirculating systems, they can sustain the cultivation of plants and fish. This configuration improves resource efficiency and reduces water consumption. These systems contain groups of bacteria that transform fish waste into nutrients that plants can use. This establishes a biogeochemical cycle characterized by the mutual dependance of the components. This circular integration not only optimizes resources, but it also cuts down on pollution in ecosystems nearby. This is in line with the ideas of a circular economy and sustainable intensification [1].

Aquaponic systems must carefully balance the chemical, physical, and biological processes that keep them stable and running smoothly. Even small perturbations in the quality of the water can adversely affect fish, plants, and microbes and change the whole system. For this reason, monitoring water quality all the time is not only a management tool, but also a structural need for long-term operational sustainability and economic viability.

An aquaponic environment is a multicomponent, time-evolving system. The pH, temperature, ammonia concentration, nitrate levels, turbidity, and dissolved oxygen (DO) can all affect biological processes at the same time. It can take a while for these changes to happen. These time lags could be because nitrification happens quickly, microbes change, fish react to changes in their environment, or organic matter builds up. That is why the system cannot be fully explained by simple static relationships or thresholds that happen right away. It needs dynamic modeling methods that can deal with feedback loops and dependencies that change over time [2].

Dissolved oxygen (DO) responds to shifts in microbial respiration, organic load, and fish activity faster than most other parameters, which is why it has been consistently identified as a sensitive indicator of environmental change in aquaponic systems [3].

DO directly affects fish respiration, aerobic biochemical reaction rates, and microbial biofilters performance. Fish can experience stress, grow slower, or even die when the oxygen levels drop quickly. Additionally, DO can serve as an indicator of biological load, how well nitrification is working, and how well organic matter is breaking down by looking at the DO levels. Nitrifying bacterial communities in biofilters play a fundamental role in sustaining these processes and maintaining system stability.

Recirculating systems depend on microbial activity as much as on water chemistry.

Nitrifying bacteria oxidize ammonia continuously. When that process slows down, nitrogen compounds accumulate. Standard water quality parameters often do not reflect this until the damage is already measurable [4].

Conventional approaches rely on rule-based classification systems with fixed threshold ranges to rate aquaponic systems in the past. This method assumes that the system will stay stable as long as the parameters stay within certain ranges, and anything outside the thresholds is treated as a critical event. While this method is simple to understand and use, it has some limitations. First, it only takes into consideration current events, disregarding prior history. Second, it cannot see changes to unstable regimes because it does not have the power to do so. Third, supervised classification algorithms can easily replicate rule-based behavior, thereby making the underlying dynamic complexity seem less complicated, which could make performance estimates too high.

In contrast, multivariate time-series for predictive modeling offers a more appropriate temporal characterization of system evolution.

Threshold-based systems detect when a parameter exceeds a fixed limit. They do not estimate what will happen next. Short-term forecasting works differently—it projects future DO values from current and lagged multivariate inputs. Operational responses—aeration adjustments, feeding reductions—become viable only when deterioration is anticipated rather than detected after the fact [5].

Instead of characterizing current states, it tries to guess what the most important factors will be in the future. This enables plan-making and setting up systems that warn people ahead of time (for example, changing how often the fish are feed or the amount of air introduced) [6].

Machine learning applied to aquaponic systems, on the other hand, increases the complexity by changing how the ponds are built.

pH, turbidity, temperature, and DO move together in time. A change in one variable tends to precede changes in others. In our dataset, pH and turbidity showed detectable lagged effects on subsequent DO concentrations. A model that tracks only DO cannot capture that [7].

Each installation can have very different stocking densities, hydraulic setups, total volumes, biofiltration efficiencies, feeding schedules, and sensor characteristics.

Machine learning methods have been applied to aquaculture monitoring across a range of tasks, from anomaly detection to feeding optimization. Their practical utility, however, depends on whether validation is conducted against structurally independent systems rather than data subsets from the same installation [8].

Because of these differences, the measured parameters have different statistical distributions. In such situations, using standard random data partitioning could make the model look better than it really is by mixing data from the same structural environment during training and testing. This hides the differences between systems.

Algorithm performance in water quality prediction is not consistent across datasets. Results vary depending on temporal resolution, pond configuration, and sensor characteristics. Testing multiple architectures on the same dataset—rather than assuming one method generalizes—is the only way to identify which approach fits the data structure [9].

Inter-pond generalization constitutes a significant methodological concern. More rigorous validation frameworks that mimic real-world deployment scenarios are required to test how well a model works in structurally independent systems. The Leave-One-Pond-Out (LOPO) strategy meets this need by training the model on a few ponds and then testing it on a pond that was not used for training. This framework shows how strong and flexible a structure can be.

Aquaponic systems present a dual behavioral regime, sometimes manifesting long periods of stability while in other times they experience high volatility. During stable periods, changes happen in a smooth and planned way. Sudden changes can happen due to operational disturbance, environmental shifts, a sudden rise in organic load, or sensor noise. This dual behavior suggests that predictive error must not be assessed exclusively via global metrics but rather examined within the context of dynamic regimes. Evaluating performance in both stable and unstable environments provides a deeper comprehension of the merits and drawbacks of predictive architectures.

In this context, the present study presents a dynamic multivariate modeling framework for the short-term prediction of dissolved oxygen levels in multi-pond aquaponic systems. The problem is framed as a temporal regression task, utilizing multivariate sliding windows and lagged variables to identify short-range dependencies. We use a Leave-One-Pond-Out validation scheme to see how well the model can generalize structurally. Dynamic regime error decomposition is applied to identify model behavior in different situations. A comparative analysis of a feedforward neural network and a recurrent LSTM architecture is conducted to determine if enhanced temporal memory enhances cross-system predictive robustness.

This work improves intelligent monitoring systems by transitioning from static rule-based methods to dynamic predictive modeling. This helps people decide what to do right now. More broadly, these kinds of systems make aquaponic systems last longer, be more flexible, and be stronger. They also link data-driven modeling to the real-world management of aquaculture.

Unlike previous aquaponic water-quality prediction studies primarily focused on single-system forecasting or random data partitioning, the present work emphasizes cross-system generalization under structurally independent aquaponic environments using a Leave-One-Pond-Out (LOPO) validation framework. Additionally, the proposed methodology incorporates regime-based error decomposition to evaluate predictive robustness under both stable and volatile dissolved oxygen dynamics. These contributions enable a more realistic assessment of model transferability across heterogeneous aquaponic systems and provide operational insight into predictive limitations under dynamic perturbations.

2. Materials

2.1. Dataset Origin (Kaggle + HiPIC Research Group)

The present study employs the publicly accessible dataset titled “Sensor-Based Aquaponics Fish Pond Datasets,” obtainable through the Kaggle platform (Kaggle Inc., San Francisco, CA, USA; https://www.kaggle.com, accessed on 15 May 2026) and originally created by the HiPIC Research Group, Department of Computer Science, University of Nigeria, Nsukka [10]. The dataset was collected using Internet of Things (IoT) monitoring infrastructure deployed in a real-world environment.

The dataset used in this study was collected through IoT sensor infrastructure deployed across multiple real-world aquaponic ponds. This setup enabled continuous acquisition of physicochemical parameters at five-minute resolution—a temporal granularity sufficient to capture rapid DO fluctuations that coarser sampling would miss [11].

This real-world acquisition context is particularly significant as it illustrates how operations can evolve organically, influenced by sensor noise and environmental disturbances that are frequently absent from controlled experiments.

The dataset contains information from several aquaponic ponds, each equipped with a continuous physical and chemical monitoring system.

Distributed sensor networks operating across heterogeneous pond configurations generate the kind of multivariate, high-frequency data that dynamic predictive models require. Without this infrastructure, cross-system validation approaches such as Leave One-Pond-Out would not be feasible [12].

On average, measurements were recorded every 20 s. This was enough to show changes that happened quickly and changes that only lasted for a short time. This sampling rate is critical to keep track of quick changes in dissolved oxygen and other related factors. This is especially true in systems where operational changes or shifts in biological activity happen a lot. At first, data from eleven different aquaponic ponds were examined (IoTPond1, IoTPond2, IoTPond3, IoTPond4, IoTPond6, IoTPond7, IoTPond8, IoTPond9, IoTPond10, IoTPond11, and IoTPond12). Each pond is its own production unit with its own rules for how it is built and how it works, including the number of fish in each pond, how the water flows through the pond, how fast the water is recirculated, how the fish are fed, and how the biofiltration system works. This multiplicity of installations introduces substantial system variability, making this dataset very helpful for testing how well predictive models work on different systems.

Each aquaponic installation differs in stocking density, hydraulic configuration, and sensor setup. Testing a model across structurally independent ponds—rather than splitting data from the same system—is the appropriate method for assessing cross-system transferability and generalization [13].

The following variables were measured:

Water temperature (°C);
Turbidity (NTUO);
Dissolved oxygen (DO);
pH;
Ammonia concentration;
Fish population;
Nitrate concentration;
Average fish weight (g);
Average fish length (cm).

These parameters indicate the quality of the water and how well the aquaculture part is doing based on biometric data. This combination of physicochemical and biological measurements lets us talk about how the system changes in more than one way. It shows how the metabolism of fish, the way microbes work, and the way plants take in nutrients all affect each other.

After putting the data together, it was clear that there were problems with the structure that needed more work to fix and make consistent (see Section 2.2). These included discrepancies in the column naming convention, timestamp formatting, and how units of measurement were declared across ponds. Pronounced outliers and sensor inconsistencies were also found.

After the first step of preprocessing and creating multivariate temporal windows, it was clear that Pond 9 and Pond 11 contained insufficient records to ensure that the Leave-One-Pond-Out (LOPO) validation scheme was statistically sound. The last cross-system generalization analysis used four ponds (Pond 1–4) that had enough data. This is because LOPO needs a lot of data from each system that are left out to give accurate performance estimates. This choice made sure that the experiment was stable, that the validation folds could be compared, and that the predictive robustness could be measured correctly.

In particular, Ponds 9 and 11 exhibited prolonged sensor interruptions, incomplete multivariate sequences, and inconsistent temporal continuity that prevented reliable sliding-window construction and reduced statistical reliability during cross-system validation.

Evaluating dynamic regression models in different situations with real-world, varied, multi-pond data is challenging. Rather than providing benchmarking performance under controlled conditions, the dataset evaluated model adaptability across diverse environments.

2.2. Structural Heterogeneity Across Ponds

After consolidating the individual pond data, it was clear that the structure was very different at both the statistical and syntactic levels. The structural variability reflected simply because this information is in different formats does not mean that aquaponic systems work the same way in real life.

The primary sources of heterogeneity identified are:

Different ways to name variables, like “Temperature (C)” and “TEMPERATURE (C)”;
Different ways to format timestamps and set up time zones;
Mismatches in declared measurement units (mg/L vs. g/mL) for chemical substances such ammonia and dissolved oxygen;
Malfunctioning sensors producing anomalous values, including extreme and missing observation;
The ranges, distributions, and scales of the measured variables are very different from one pond to the next.

To ensure numerical consistency across ponds, all physicochemical variables were harmonized into a unified measurement representation during preprocessing prior to model training.

If left unaddressed, these discrepancies can introduce systematic bias into predictive modeling. The model could learn about certain ponds instead of the dynamic relationships that are found in all systems because of differences in size and distribution. In terms of machine learning, this situation shows a shift in the distribution, with each pond representing a different data domain. If this kind of variability is not dealt with, it could make it harder for models to generalize and lead to inflated performance estimates when training and testing data come from similar structural conditions.

Extreme outliers and sensor inconsistencies pose significant challenges, as neural networks can be influenced by anomalous values that disrupt parameter estimation during training. In time-series situations, these artifacts may move through sliding windows, which makes their effect on learned representations even stronger.

To mitigate these effects, a structured preprocessing pipeline was set up. The pipeline consisted of column name standardization, enforcement of numeric data types, and resolution of temporal inconsistencies. Additionally, percentile-based trimming (winsorization) and intra-pond standardization were applied to reduce the influence of extreme values and normalize the distribution of each system (see Section 3).

Intra-pond normalization was preferred over global normalization for all ponds, as it reduces global scaling which obscures structural differences or transfers distributional characteristics between ponds. The preprocessing stage removes fake scale differences while keeping relative dynamic patterns by standardizing each system on its own.

Addressing structural heterogeneity is a prerequisite for cross-system generalization.

The proposed framework reduces the likelihood of overfitting to specific pond characteristics and enhances the reliability of the Leave-One-Pond-Out validation method by directly addressing variations in syntax, statistics, and scale prior to model training.

2.3. Computational Environment

The computational experiments were conducted using Google Colaboratory (Google LLC, Mountain View, CA, USA) with Python 3.12.13 (Python Software Foundation, Wilmington, DE, USA) as the primary programming language. We used Python to process, clean, and make predictions with all of the data. We picked Python because it has a lot of tools for research, machine learning, and scientific computing tasks that can be done over and over again.

Numpy 2.0.2 and Pandas 2.2.2 were selected for data preparation and preprocessing. Pandas was used for data sorting and filling missing values. While NumPy provided the tools for accelerated computing, vectorized calculations, and matrix transformations.

We learned how to use Scikit-learn 1.6.1. to figure out how well a model works by looking at metrics like the mean absolute error (MAE) and the root mean square error (RMSE). This library also made it easier to get the same numbers for all of the validation folds in the Leave-One-Pond-Out (LOPO) method. This made it possible for us to compare the results from different ponds.

We used TensorFlow 2.20.0 and its high-level Keras 3.13.2 API to build the neural networks. These frameworks made it easy to choose optimization methods, set up feedforward and recurrent neural networks, and use regularization methods like dropout. We trained with the Adam optimizer because it has options that let the learning rate change. This helps nonlinear regression tasks find a stable point more easily.

The model was trained with multivariate inputs that went through time-series data that had already been cleaned up. To get the mean squared error (MSE) loss as low as possible, we used batch processing and iterative gradient-based optimization. The LOPO plan said that the testing and training ponds had to be kept separate so that no one could see what was going on.

3. Methods

3.1. Preprocessing and Intra-Pond Standardization

Given that the dataset originated from multiple independent aquaponic ponds, each operating under distinct structural and environmental conditions, an intra-pond standardization procedure was implemented to mitigate structural heterogeneity and prevent systematic bias during model training. Because each pond may exhibit different operating ranges, stocking densities, and sensor calibration characteristics, direct aggregation of raw data could lead to scale-induced distortions in learned representations.

In the initial preprocessing stage, redundant or inconsistent columns derived from variations in the original file nomenclature were removed. Variable names were standardized across ponds to ensure semantic alignment, and all measurements were explicitly converted to numeric format. Records containing non-convertible values, corrupted entries, or incomplete observations were discarded. This step was essential to guarantee structural consistency and numerical integrity prior to time-series window construction.

Subsequently, percentile-based trimming (winsorization) was applied independently within each pond. For each monitored variable

x

, values were constrained to the interval defined by the 1st and 99th percentiles of its empirical distribution:

x^{*} = {\begin{matrix} Q_{0.01}, i f x < Q_{0.01} \\ x, i f Q_{0.01} \leq x \leq Q_{0.99} \\ Q_{0.99}, i f x > Q_{0.99} \end{matrix}

x^{'} = \frac{x - µ}{σ}

Because percentile-based trimming was performed independently within each pond, the proportion of adjusted observations varied according to the statistical distribution and sensor quality characteristics of each system. This adaptive strategy reduced the influence of extreme outliers while preserving the dominant temporal dynamics required for cross-system learning.

In the above equation,

μ_{p}

and

σ_{p}

represent the mean and standard deviation computed exclusively within pond _p. To prevent information leakage during Leave-One-Pond-Out validation, preprocessing statistics and normalization parameters were computed independently within each training fold. No statistical information from the excluded test pond was incorporated during model fitting or preprocessing operations. Intra-pond normalization ensures that each system is scaled relative to its own statistical properties, preventing cross-pond scale dominance. This is especially important in learning environments with more than one system, where the optimization process could be biased toward systems with larger variance if the ponds are not the same size. By standardizing independently, the model is encouraged to learn dynamic relationships rather than absolute magnitudes tied to specific installations.

This preprocessing technique makes it possible to separate the structural variability that is unique to each pond from the temporal patterns that the predictive model is trying to find. As a result, the training process emphasizes acquiring generalized dynamic behavior instead of memorizing system-specific distributions.

3.2. Formulation of the Problem as Dynamic Regression

The predictive task addressed in this study is formulated as a nonlinear mapping problem over multivariate time-series data. Rather than treating water quality assessment as a classification task based on threshold violations, the objective is to estimate the future value of dissolved oxygen (DO) as a continuous variable over a short forecasting horizon.

Formally, the model seeks to approximate a nonlinear function:

f_{θ} : R^{W \times d} \to R

where:

W

denotes the temporal window length;

d

represents the number of observed variables;

θ

corresponds to the set of learnable model parameters.

The function

f_{θ}

maps a multivariate input sequence

X_{t} ϵ R^{W \times d}

to a scalar prediction representing the future dissolved oxygen concentration:

D O_{t + Δ} = f_{θ} (X_{t})

where

D O_{t + Δ}

denotes the dissolved oxygen value at a future time

t + Δ

and

Δ

corresponds to a short prediction horizon equivalent to approximately five minutes (ten sampling intervals given the 20 s acquisition frequency), and

f_{θ}

is parameterized through weight matrices and bias vectors across successive layers.

The input vector

X_{t}

contains multivariate observations spanning from

t - W + 1 to t

.

This formulation enables the model to encapsulate short-term temporal dependencies and dynamic interactions among physicochemical and biometric variables.

From the perspective of dynamic systems, this regression formulation can be interpreted as a nonlinear approximation of a multivariate autoregressive model with exogenous inputs (NARX). In this framework, the system’s future state depends not only on its recent trajectory (autoregressive component) but also on external or auxiliary variables (exogenous component), such as pH, turbidity, ammonia concentration, and fish biomass indicators. Neural networks implement the nonlinear function

f_{θ}

, which makes it possible to approximate complex interactions in a way that linear autoregressive structures cannot do well.

This regression formulation preserves the continuous nature of the DO variable and enables the determination of its magnitude and direction of change. This is not the same as classification methods that use rules. This is especially important for early warning systems, where it may be more useful to be able to see gradual trends than to know when thresholds are crossed.

The short prediction horizon was also chosen on purpose to strike a balance between how easy it is to make predictions and how useful they are for operations. In biological systems that are very nonlinear, longer time horizons tend to create uncertainty quickly because of random changes and disturbances that are not modeled. The model uses recent temporal data to focus on short-term forecasting, making it useful for real-time management decisions like changing the aeration control or feeding rate.

From an operational perspective, a short-term forecasting horizon is particularly relevant for early warning applications, as it provides sufficient response time for preventive monitoring actions while preserving predictive stability in highly dynamic aquaponic environments.

This dynamic regression framework provides a practical approach for short-term dissolved oxygen prediction in multi-pond aquaponic systems.

3.3. Construction of Multivariate Temporal Windows

To capture temporal dependencies in the aquaponic system, fixed-length multivariate sequences were constructed using a sliding-window approach. Each input sample was defined over a temporal window of length W = 15, corresponding to approximately five minutes of historical data given the 20 s sampling interval.

Formally, each input vector is represented as:

X_{t} = [x_{t - 14}, x_{t - 13}, \dots, x_{t}]

where

X_{t} ϵ R^{W \times d}

contains multivariate observations from time t − W + 1 to 1, and d denotes the number of monitored variables.

Each temporal window contains all monitored variables listed in Section 2.1.

This multivariate construction enables the model to integrate both physicochemical dynamics and biological influences within a cohesive temporal framework.

We chose W = 15 for both practical and dynamic reasons. A five-minute historical window gives enough information to keep an eye on short-term changes in the system while also making the most of the computer’s processing power. From the standpoint of dynamic systems, aquaponic processes including oxygen diffusion, microbial nitrification, and metabolic activity demonstrate short-range temporal dependencies in stable conditions. Because of this, it is likely that current observations will have most of the information needed to make short-term predictions.

Adding temporal memory directly with sliding windows can be used instead of relying on internal states that repeat at first. In feedforward architecture, temporal dependencies are explicitly encoded in the input structure instead of being embedded in hidden recurrent dynamics. This method has several strengths:

The architecture is simpler than that of recurrent networks;
Cross-system validation makes training more stable;
How far in the past the time-series is needed to perform a calculation.

In addition, window-based representations make it easier to line up ponds under the Leave-One-Pond-Out scheme because each sample is made without taking into account the system’s recent history. This maintains strict temporal causality and prevents information from permeating into the future.

The sliding-window method changes the problem of predicting time-series into a supervised regression task in high-dimensional space from a machine learning point of view. The model learns to guess how this temporal embedding will change the future dissolved oxygen value by looking at each training sample as a picture of how the system has changed in the past few days.

This model works best for making short-term predictions in biological systems that are not linear, where short-term patterns are often more important than long-term ones. The framework balances predictive capability with computational efficiency by explicitly modeling recent history. This may facilitate future integration into IoT-based monitoring systems.

3.4. Predictive Models

3.4.1. Feedforward Neural Network (MLP)

The main predictive model was a feedforward multilayer perceptron (MLP) architecture. The network had two hidden layers, one with 64 neurons and the other with 32 neurons. The activation function used was the Rectified Linear Unit (ReLU). Before being input into the network, multivariate temporal sequences were flattened into one-dimensional vectors. This made it possible for the model to process temporal information that was explicitly encoded in the sliding-window structure.

The MLP can be seen as a universal nonlinear function approximator that can model complicated interactions between many variables. Based on the formulation presented in Section 3.2, the network endeavors to approximate the nonlinear mapping

f_{θ} (X_{t})

between the multivariate input sequence and the future dissolved oxygen concentration.

We picked the ReLU activation function because it is easy to calculate and works well for spreading gradients. ReLU is different from sigmoid or hyperbolic tangent functions because it makes the effects of vanishing gradients less strong in deeper architectures and encourages sparse activation patterns. This can help generalization work better.

We used the Adam optimizer to train the model. This is an adaptive gradient-based optimization algorithm that changes the learning rate and momentum for each parameter. Adam is a good choice for nonlinear regression problems with noisy gradients. A lot of the time, this is what happens with time-series data from real-world IoT systems. The goal of the training was to lower the mean squared error (MSE). It was defined as:

M S E = \frac{1}{N} \sum_{i = 1}^{N} {(D O_{i}^{p r e d} - D O_{i}^{t r u e})}^{2}

Training was conducted using mini-batch gradient optimization with a batch size of 32 samples and a maximum of 200 training epochs. Early stopping criteria were applied during preliminary experimentation to reduce overfitting and stabilize convergence behavior under the LOPO validation framework.

Minimizing MSE makes the model try to get close to the conditional mean of the target distribution. This leads to smooth predictions and punishes big deviations more than small ones.

We chose a relatively shallow architecture (two hidden layers) on purpose. Deeper networks can hold more information, but they also have a higher chance of overfitting, especially when cross-system validation is used and there are differences in structure between ponds. The chosen configuration strikes a balance between expressive power and structural strength in the Leave-One-Pond-Out scheme.

3.4.2. Recurrent Neural Network (LSTM)

Long Short-Term Memory (LSTM) architectures have been extensively utilized to model temporal dependencies in multivariate time-series [14]. This study assessed an LSTM network comprising 32 recurrent units, succeeded by a dense intermediate layer and a linear output layer.

The LSTM’s internal gating mechanisms control how information flows through input, forget, and output gates. This lets it model higher-order temporal dependencies. We used dropout regularization (rate = 0.3) to stop overfitting and make the model work better on new data.

However, as stated in Section 4, the recurrent architecture did not yield substantial enhancements in cross-system generalization relative to the feedforward model employing explicit temporal windows.

3.5. Leave-One-Pond-Out (LOPO) Validation

A Leave-One-Pond-Out (LOPO) validation scheme was used to test how well cross-system generalization works. In every iteration:

The model was trained with data from all but one pond;
The performance of the model was only tested on the pond that was left out.

The overall methodological framework is illustrated in Figure 1.

This procedure enables assessment of the structural robustness of the model against variability across independent systems. Cross-system validation is essential for evaluating true generalization capacity, particularly in heterogeneous environments where structural differences may exist between data domains [15].

3.6. Error Decomposition by Dynamic Regime

To analyze model behavior under different operational conditions, an error decomposition strategy based on local dissolved oxygen (DO) dynamics was implemented. Rather than relying solely on global evaluation metrics, temporal intervals were differentiated according to the instantaneous variability level of the system.

Instantaneous volatility was defined as:

v_{t} = ∣ D O_{t} - D O_{t - 1} ∣

where

v_{t}

represents the absolute magnitude of the consecutive change in dissolved oxygen concentration.

Based on this definition, temporal intervals were sorted into:

Stable regime (little change in DO);
Unstable regime (high DO variation).

Instead of a fixed absolute value, a percentile-based threshold was used. This made it possible to adaptively segment each pond based on its own natural dynamics and avoid making random guesses about what “critical” variation magnitude is.

The 75th percentile threshold was selected to balance sensitivity to abrupt dissolved oxygen fluctuations while preserving sufficient sample representation within both operational regimes. Preliminary exploratory inspection indicated that lower thresholds excessively fragmented stable dynamics, whereas higher thresholds reduced the number of volatile samples available for meaningful regime-specific evaluation.

After that, prediction error metrics (RMSE and MAE) were calculated separately for each regime. This made it possible to see how sensitive the model was to sudden changes.

In nonlinear dynamic systems, behavior can oscillate between quasi-stationary states and transient high-variability phases. Just analyzing the aggregated metrics might miss the important differences in the performance of the model to:

Components that happen less often (trend behavior);
Changes that happen quickly (sudden events).

Regime-based error decomposition thus offers a more refined evaluation of robustness in the face of dynamic perturbations, especially pertinent in interconnected biological systems like aquaponics.

4. Results

4.1. Cross-System Generalization (Leave-One-Pond-Out)

The Leave-One-Pond-Out (LOPO) validation scheme was used to see how well the model could predict. This meant that it was trained on all but one pond and tested only on the one that was left out. This process is like a real-life deployment where a model that has been trained on many installations has to use what it has learned on a pond it has never seen before.

The predictive performance of the MLP model under the LOPO validation framework is summarized in Table 1.

The average error (RMSE ≈ 0.83 on the normalized scale) indicates consistent predictive capability across systems. However, noticeable dispersion is observed among ponds. In particular, Pond 2 exhibits the lowest error (RMSE = 0.63), suggesting relatively stable dynamics well represented by the model trained on other systems. In contrast, Pond 3 presents the highest error (RMSE = 1.06), representing an approximate 68% increase relative to the best-performing case.

This variability supports the hypothesis of structural heterogeneity across ponds, as discussed in Section 2. Factors such as stocking density, biofiltration efficiency, feeding regime, or hydraulic configuration may induce differentiated dynamic distributions that challenge direct cross-system generalization.

Despite these differences, no extreme performance degradation was observed in any evaluated pond. This suggests that:

Intra-pond standardization effectively mitigated scale disparities;
The model captured shared dynamic patterns governing DO behavior;
Explicit temporal window representation was sufficient to describe dominant cross-system dynamics.

The absence of disproportionate errors in the worst-case scenario indicates that the model maintains structural stability under moderate inter-system variability.

From a deployment perspective, the LOPO results suggest that the model can serve as an initial predictive framework in a new aquaponic installation without immediate system-specific retraining. However, localized fine-tuning may further reduce prediction error in ponds exhibiting higher structural volatility, such as Pond 3.

Figure 2 shows the RMSE obtained in each pond under LOPO validation.

4.2. Dynamic Regime Analysis

To evaluate model sensitivity across different operational states of the aquaponic system, error decomposition was performed by differentiating between the two dynamic regimes outlined in Section 3.6: stable and unstable regimes.

Volatility was defined as the difference between two dissolved oxygen readings taken one after the other:

v_{t} = ∣ D O_{t} - D O_{t - 1} ∣

We used the 75th percentile as a threshold to tell the difference between stable and high-variability intervals.

The findings indicate that in a stable regime, RMSE is about 0.80, and in a volatile regime, it is about 1.43.

Under highly volatile operational conditions, this means that the predictive error goes up by about 78%.

The difference we saw shows that the model accurately shows the system’s main low-frequency dynamics, which are connected to slow and smooth changes in the amount of dissolved oxygen. In stable conditions, the temporal structure encoded through sliding windows is sufficient to model the system’s evolution with considerable consistency.

The prediction error goes up a lot when things are unstable, like when DO levels change suddenly. There are several reasons why this might happen, such as: unmodeled operational interventions (like changes in aeration or feeding), sudden spikes in organic load, hydraulic disturbances, sensor noise, or limits on measurements.

From the perspective of nonlinear dynamical systems, high-volatility events may produce higher-frequency components or transient nonlinear effects that are insufficiently captured by a feedforward architecture reliant on short-range temporal memory.

Future improvements for highly volatile operational regimes may include incorporation of external control variables such as aeration intensity, hydraulic recirculation rate, feeding schedules, and hybrid physics-informed learning architectures capable of representing abrupt nonlinear transitions more effectively.

Figure 3 shows the difference between the DO values that were actually measured and the DO values that were predicted for Pond 3. This shows that the model can show the general trend of the system, but it cannot show sudden changes.

The predictive model accurately replicates the baseline oxygen dynamics; however, it exhibits smoothing behavior and reduced sensitivity to extreme transient fluctuations, particularly during periods of high volatility.

This phenomenon suggests that the model primarily encapsulates the main signal component associated with gradual, low-frequency variations, while attenuating short-duration abrupt changes. In real life, the neural network shows how the system’s structure changes over time, but it does not do a good job of modeling quick changes or sudden shocks.

This is how models that were trained to minimize mean squared error (MSE) from a time-series analysis point of view should act. Optimization usually prefers solutions that smooth out extreme oscillations because MSE punishes big differences a lot. Because of this, the model learns an average picture of how the system works, putting global stability ahead of local sensitivity.

Also, high-frequency events could be related to other nonlinear processes, like:

Changes in aeration that happen all at once;
A big increase in how fast the body works;
Changes in the amount of organic matter that happen quickly;
Changes in the environment or the way water flows.

These events add dynamic parts that might not be fully captured in a short time windows, especially when they do not happen very often in the training data.

The results show that the model mostly understands the main dynamic structure of an aquaponic system (how it behaves at low frequencies), but high-frequency events make things more complicated in a nonlinear way, which makes predictions less certain.

4.3. Comparison Between MLP and LSTM Architectures

We tried out a recurrent LSTM architecture to see if it would improve cross-system predictive performance by modeling higher-order temporal dependencies. The training results showed that the validation error kept getting worse and there were not any big changes from the feedforward model.

Figure 4 shows how the MLP regression model’s training loss changed over time. The curve shows that it quickly comes together in the first few epochs and then slowly settles down. This means that the system is quickly learning how to deal with short-term changes.

The absence of substantial improvement in inter-pond RMSE when employing the LSTM architecture indicates that the temporal representations produced through explicit sliding windows and lagged variables sufficiently capture the existing dissolved oxygen dynamics within the studied systems. In other words, it seems that the relevant temporal memory for short-term DO prediction is limited in range and can be accurately represented by an explicit multivariate autoregressive formulation, eliminating the need for internal recurrent state mechanisms.

This finding suggests that the principal dynamic structure of the system is largely governed by local temporal dependencies, rather than long-range relationships that require substantial memory storage, as exemplified by LSTM units. Most of the predictive information comes from recent observations because the prediction horizon is only about five minutes and the dataset has a high temporal resolution. The feedforward architecture can effectively display the appropriate temporal context.

Also, the fact that cross-system generalization did not get better shows that the LSTM’s added parametric complexity does not make the structure stronger when using the LOPO validation scheme. This finding suggests that performance constraints are more closely linked to the inherently nonlinear and partially unpredictable nature of volatile events than to insufficient temporal memory capacity.

Feedforward models offer several practical advantages:

Do not need as much processing power during training and inference,
Use less memory and processing power;
Show more stability in numbers during optimization.
Make it easier to use in embedded systems, edge devices, or IoT platforms that do not need a lot of power.

Because of this, the MLP architecture is a good balance between how hard it is to compute and how well it predicts. This makes it a good choice for multi-pond aquaponic systems that need to be watched all the time.

4.4. Variable Importance Analysis

We performed a sensitivity analysis on the trained MLP network to better understand how the model works and find the variables that have the biggest impact on future dissolved oxygen (DO) predictions. We determined the contribution of each variable by analyzing the variations in predictive error resulting from controlled alterations in input features and by examining the magnitude of the weights associated with lagged temporal components.

The sensitivity analysis revealed that the variables exerting the most significant influence on forecasting future DO were:

pH (with temporal lags);
Turbidity (with temporal lags);
Past DO values (autoregressive component);
Average fish weight.

Because the predictive framework operated over multivariate temporal windows rather than isolated autoregressive coefficients, the sensitivity analysis reflects aggregated short-range temporal influence across recent lagged observations within the selected forecasting horizon, rather than attribution to a single discrete lag position.

Autoregressive Component. The importance of past DO values proves that the system has an autoregressive property. The recent state of oxygen has a big effect on its future concentration. This is similar to how diffusion works, how gases move around, and how things stay in balance in water systems. This result validates the temporal formulation employed in the modeling framework and demonstrates that recent DO history contains substantial predictive insights.

The existence of lagged variables further indicates the presence of delayed effects in system dynamics. Dissolved oxygen reacts to current conditions, but it also shows how recent biological, chemical, and physical processes have worked together over time.

Delayed Influence of pH. The importance of lagged pH values shows that there are delayed effects that are connected to biological and chemical processes. pH directly affects:

How well nitrification works;
How active microbes are;
The chemical balance of the system.

Changes in pH can slowly change the biological oxygen demand and the way microbes breathe, which can take time to affect the amount of dissolved oxygen in the water. It may take some time for these effects to show up, but they will as biochemical reactions keep happening.

Turbidity and Organic Load. Turbidity looks like a good predictor, especially when it is lagged. This relationship might be connected to:

▪: More solids that are suspended;
▪: More organic matter;
▪: More active microbes.

These factors contribute to raising the biological oxygen demand (BOD), which has an indirect effect on the DO levels in the next time steps. The delayed effect of turbidity means that its effect gets stronger over time as microbes use up the oxygen that is already there.

Biomass and Metabolic Demand. A good way to figure out the total biomass of the system is to look at the average weight of the fish. If the average weight is higher, the body will need more oxygen to work, which will change how DO works in the future. This finding shows that environmental and biometric variables in aquaponic systems are more connected than we thought, and it highlights how biological load depends on water quality parameters.

Dynamic Implications. The existence of lagged predictors indicates short-term memory effects in aquaponic dynamics. Dissolved oxygen does not merely respond to immediate conditions; it embodies the temporal amalgamation of interconnected biological and physicochemical processes.

These findings support the hypothesis that aquaponic systems operate as nonlinear dynamic systems with short-range memory, in which interactions among variables generate unique temporal dependencies. The sensitivity results provide both an understanding of the predictive model and an understanding of the dynamic structure of multi-pond aquaponic environments.

5. Discussion

5.1. Structural Heterogeneity Across Systems

The Leave-One-Pond-Out validation scheme indicates that ponds have very different levels of error prediction. Pond 3 had the highest RMSE, which suggests that it was less stable or more volatile than the other systems that were examined.

This variability between ponds should not be considered as a random variation, but as confirmation that each aquaponic system has its own unique dynamic setup.

Aquaponic installations are not uniform. Differences in feeding schedules, pond volume, biofiltration capacity, and sensor characteristics produce datasets with different statistical distributions. Modeling frameworks that do not account for inter-pond variability risk producing performance estimates that do not transfer to structurally independent systems [16].

Even though all ponds follow the same biophysical rules, variable operational conditions cause different statistical distributions and dynamic structures.

This finding confirms that aquaponic systems cannot be considered homogeneous from either a statistical or dynamic standpoint. Some things that are important are:

The tank’s biomass and fish count;
The speed of nitrification and how well the biofilter works;
How often and how much is fed;
How the hydraulics work and how fast they flow back through the system;
External environmental factors like temperature, radiation, and ventilation can have a big effect on the dynamics of dissolved oxygen. They can modify both its average value and how it varies over time.

From a machine learning point of view, this means that each pond might mean a different data distribution in the input space. Cross-system generalization does not simply depend on how effectively a system can predict things; it also depends on how well it can deal with changes to the data structure.

Differences in fish biomass, stocking density, growth stage, hydraulic configuration, and operational conditions may contribute to structural heterogeneity among ponds and influence cross-system predictive performance. Consequently, predictive error may reflect not only model limitations but also differences in the underlying characteristics of the evaluated systems.

This view is supported by the fact that the pond needs to be standardized before the model can be trained. This method not only addresses size differences, but it also helps structures line up across different platforms. It indicates that inter-system variability is not just biological, but also structural and operational. This has a direct effect on the statistical properties of the variables that were looked at.

For this test, the LOPO validation approach is tighter than the standard random cross-validation. LOPO assesses how well the model can adjust to dynamic structures that are only partially different by splitting up large systems during testing. The results reveal that the overall predictive stability is the same, even though some ponds do exhibit a little decline in performance. This means that the model learns patterns that change over time and are common to all systems.

5.2. Behavior Under Stable and Volatile Regimes

Error decomposition showed that performance was far better when conditions were steady (RMSE = 0.80) than when they were very volatile (RMSE ≈ 1.43). This difference illustrates that the system’s ability to predict depends a lot on the changing environment it is in.

Rapid fluctuation, organic load spikes, and sensor noise all push DO prediction error upward. This was also observed in our regime-based error analysis, where RMSE increased from 0.80 under stable conditions to 1.43 under high-volatility regimes [17].

Under stable conditions, the dynamics of dissolved oxygen predominantly consist of low-frequency components and gradual variations, which can be effectively depicted through explicit temporal window representations. When dealing with sophisticated nonlinear systems, it is common to split them into stable and unstable regimes [2].

This behavior shows that when the aquaponic system is working normally, it grows in a quasi-stationary domain that is defined by temporal continuity and structural coherence in biophysical interactions. In this setting, the model correctly shows the current trend in the system and consistently forecasts what will happen in the near future.

But when DO varies quickly, it seems to have extra nonlinear effects that make predictions less certain. These perturbations may be associated with:

Unmodeled human actions, including how humans aerate or feed;
Rapid changes in the organic load;
Issues with the hydraulics;
Noise from sensors or constraints on measurements.

From the perspective of nonlinear dynamic systems, such events may be regarded as transient deviations from the primary regime, instigating more complex or less predictable dynamics.

These results are very significant for management of operations and control. The model appears to be a dependable predictive tool for preventing difficulties when things are running normally, allowing for the prediction of long-term DO declines. But when things are very unstable, the predictive system needs to be backed up with fast techniques to discover problems or safety regulations that can respond to major changes nearly right away.

So, the proposed model could be a beneficial feature of a hybrid monitoring system that uses both short-term forecasts and event detection methods to make the system stronger.

5.3. Architectural Comparison

The comparative investigation of a feedforward MLP architecture and a recurrent LSTM model yielded no substantial improvement in cross-system generalization when employing the recurrent technique. This research shows that adding internal memory through recurrent gating mechanisms did not significantly improve the ability to make predictions within the given time frame.

From a dynamic perspective, this suggests that the relevant temporal memory for DO prediction in the analyzed systems is predominantly short-range. Explicit sliding windows already have recent multivariate history; thus, the feedforward design can capture dominant autoregressive dependencies without having to save internal information for a long time.

The dataset’s short prediction horizon (about 5 min) and temporal precision also make it easier to utilize a modeling method that looks at short-term delays to get predictive information. In this case, the long-term modeling skills of LSTM networks are not very important for improving performance across systems.

Another crucial issue to consider is how hard the model is to understand. Recurrent architectures have more parameters, which makes training harder. This could make them more sensitive to changes in structure between ponds when LOPO validation is applied. The absence of substantial RMSE improvement suggests that the established performance barrier is more closely linked to the inherently nonlinear and somewhat unexpected nature of volatile events, rather than insufficient temporal memory capacity.

In terms of practical use, feedforward models offer significant advantages:

Cheaper to train and produce predictions;
Need less memory and processing resources;
Numbers that are more stable during optimization;
Easier to employ in IoT platforms or embedded devices that do not need a lot of power.

Therefore, the MLP design is a suitable choice for real-time monitoring in multi-pond aquaponic systems because it finds a fair balance between speed and accuracy.

5.4. Operational Implications and Predictive Monitoring Potential

A lot of research has been performed on how to use predictive models in decision-support systems for smart agriculture and digital agro–aquaculture systems [18]. In this context, the sensitivity analysis conducted in this work demonstrated that pH and turbidity—particularly in their delayed states—substantially influence the future trajectory of dissolved oxygen.

This finding suggests the presence of delayed effects associated with integrated biological and physicochemical processes, in which alterations in chemical equilibrium or organic load affect dissolved oxygen dynamics with a temporal delay.

From a systemic perspective, our findings enhance the comprehension of aquaponic systems as interconnected dynamic ecosystems, wherein seemingly minor factors can forecast substantial variations in oxygen availability. The model’s ability to capture these delayed correlations supports proactive monitoring and operational decision-support strategies.

Integrating predictive analytics with real-time monitoring has shown practical value in aquaculture decision-support applications. When forecast outputs inform operational adjustments directly, response times to water quality deterioration are reduced and preventive interventions become feasible [19].

The results suggest potential operational actions that could be explored in future implementations.

Changing the aeration system ahead of time if a long-term decline in DO is foreseen;
Managing suspended particles and improving filtration when the water is excessively hazy;
Changing chemical buffering and biofilter settings to deal with pH changes that occur all the time,
Strategic management of biomass and stocking density, including how the weight of fish affects how much oxygen they need to breathe.

These actions are not just based on what we see happening right now; they are also dependent on what we think will happen in the future. This indicates a transition from reactive to predictive system management.

For practical deployment, predicted dissolved oxygen values may be integrated with simple threshold-based decision rules to support early-warning functionality. For example, predicted DO values approaching predefined operational limits may generate warning notifications, whereas predictions below critical safety thresholds may trigger high-priority alerts requiring immediate intervention. This approach allows continuous regression outputs to be translated into actionable monitoring signals without modifying the underlying predictive framework.

In this context, the proposed model should not be regarded only as an independent statistical predictor. Instead, it should be seen as a possible aspect of a bigger decision-support system that wants to make aquaponic systems more stable, strong, and long-lasting. When included into IoT-based infrastructures, such predictive models may support intelligent monitoring and decision-support functionalities for water quality management, facilitating timely interventions, reducing risk, and optimizing energy use.

5.5. Study Limitations

The results from the Leave-One-Pond-Out validation approach were consistent, but there are certain methodological and structural issues that need to be considered when looking at the data.

The prediction model does not take into account things like the strength of active aeration, the rate of hydraulic flow, the feeding schedule, or the frequency of recirculation. These variables represent external factors that can directly influence the dynamics of dissolved oxygen. Their absence signifies that the model infers variations in DO only fromobservational data, without direct access to control actions that could clarify abrupt swings or regime shifts.

The absence of these operational control variables likely contributed to reduced predictive sensitivity during highly volatile dissolved oxygen transitions, particularly when abrupt interventions or nonlinear hydraulic disturbances occurred outside the information represented within the temporal observation windows.

Second, the chance of sensor noise that is not modeled adds another layer of uncertainty. In real IoT-based monitoring systems, sensors can show calibration drift, occasional failures, or a regular bias in their measurements. Even though preprocessing and percentile trimming were performed to get rid of severe outliers, there were no clear noise modeling or advanced filtering approaches used. When things are very unstable, this rule can make matters worse.

Third, certain ponds were not included in the final analysis since there were not enough valid records after preprocessing. This choice was crucial to make sure that the LOPO scheme was statistically sound. However, it also made it harder to apply to systems with less data available and limited the number of structures that could be studied.

The proposed model is solely derived from historical data and has not been integrated into a real-time feedback control system. Consequently, its performance was not evaluated within a closed-loop predictive control system where decisions derived from model outputs could influence subsequent system states. This work is limited to passive predictive modeling and does not encompass the application of adaptive control.

Although the Leave-One-Pond-Out framework provided a rigorous evaluation of cross-system generalization, additional within-pond temporal validation strategies such as walk-forward validation were not systematically implemented in the present study. Future work should incorporate complementary time-series validation schemes to further assess temporal consistency and forecasting ability within individual aquaponic environments.

These limitations do not invalidate the published results but rather define the scope of the study. They also help future research construct hybrid physics–machine learning models, add outside control variables, and make adaptive systems for ponds that can modify parameters in real time in present aquaculture settings [20].

6. Conclusions

The current study took the usual problem of classifying water quality in aquaponic systems and put it into a dynamic regression framework to try to predict future levels of dissolved oxygen (DO). This new formulation got beyond the problems that occur with deterministic rule-based schemes and made a more realistic multivariate temporal analysis possible.

By combining data from several structurally independent ponds and using in-pond normalization methods, we were able to directly address structural differences between systems. The Leave-One-Pond-Out (LOPO) validation scheme allowed us to test cross-system generalization capacity in a very thorough way. The average RMSE was 0.83 (on a normalized scale), which shows that the model can make accurate predictions in a wide range of environments.

Dynamic regime analysis showed that prediction error was much higher when the conditions were very volatile (RMSE ≈ 1.43) than when they were stable (RMSE ≈ 0.80). This confirmed that there are high-frequency nonlinear components in DO dynamics. Architectural comparisons also demonstrated that feedforward models with clear temporal representation were enough to describe the main behavior of the system, and that recurrent LSTM architectures did not make a big difference.

Sensitivity analysis found that pH and turbidity had delayed effects on future DO evolution. This made it possible to figure out what operational changes should be made to support operational monitoring and decision-making strategies. Overall, the results show that dynamic multivariate modeling is a good way to do predictive monitoring and enhance decision-making in multi-pond aquaponic systems.

7. Future Work

This study lays the groundwork for dynamic predictive modeling in multi-pond aquaponic systems. However, there are many ways that future research might build on and improve the suggested framework.

Future research directions should specifically address the methodological limitations identified in the present study, particularly those associated with volatile operational regimes, missing control variables, and cross-system adaptation under heterogeneous aquaponic environments.

First, it is suggested that hybrid physics–machine learning models be combined. Adding simplified oxygen balance equations and nitrification kinetics to neural network topologies could make them easier to understand and more stable during sudden changes in the environment. These hybrid strategies enable learning from data and mechanistic knowledge at the same time.

Second, each pond should have its own adaptive learning system put in place. By applying approaches like incremental learning or transfer learning, a general predictive model could be adjusted to fit the needs of a certain area inside this framework. This plan could help systems with less reliable design make better judgments.

By incorporating outside control inputs like aeration intensity, feeding rate, hydraulic flow, and weather, predictive models can be built to recreate different conditions and find the best method to run a business.

Additionally, API that works with the prediction model and manages databases can be built. This would enable real-time monitoring for automatically update models and use them in the cloud or on the edge.

Lastly, we might be able to learn more about how to capture more complex temporal relationships in systems that vary a lot by looking at more advanced sequential models, such as attention-based structures (like Transformers for time-series).

Author Contributions

Conceptualization, A.A., K.G., and B.Y.M.; methodology, A.A. and K.G.; software, A.A. and B.Y.M.; validation, T.G. and F.D.; formal analysis, A.A. and B.Y.M.; investigation, A.A., K.G., and B.Y.M.; resources, A.A. and K.G.; data curation, B.Y.M.; writing—original draft preparation, A.A. and K.G.; writing—review and editing, A.A., K.G., T.G., and F.D.; visualization, T.G. and F.D.; supervision, A.A., K.G., B.Y.M., T.G., and F.D.; project administration, A.A., K.G., and B.Y.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The dataset used in this study is publicly available through the Kaggle platform and corresponds to the “Sensor-Based Aquaponics Fish Pond Datasets” provided by the HiPIC Research Group.

Conflicts of Interest

The authors declare no conflicts of interest. Author Felix Dueñas is employed by Everest Internet Solutions S. de R.L. de C.V. This affiliation reflects only an institutional employment relationship and does not represent financial support, sponsorship, or involvement of the company in the present study. Everest Internet Solutions S. de R.L. de C.V. provided no funding for this research and had no role in the design of the study; in the collection, analysis, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results. The remaining authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AI	Artificial Intelligence
ANN	Artificial Neural Network
DO	Dissolved Oxygen
IoT	Internet of Things
ML	Machine Learning
NH₃	Ammonia
NO₃⁻	Nitrate
pH	Potential of Hydrogen
RF	Random Forest

References

Rakocy, J.E.; Masser, M.P.; Losordo, T.M. Recirculating Aquaculture Tank Production Systems: Aquaponics—Integrating Fish and Plant Culture; SRAC Publication: Stoneville, MS, USA, 2006; p. 15. [Google Scholar]
Kantz, H.; Schreiber, T. Nonlinear Time Series Analysis, 2nd ed.; Cambridge University Press: Cambridge, UK, 2004. [Google Scholar]
Chen, L.; Wang, Y.; Lai, Y.; Wang, T.; Lin, Y.; Chiang, C.; Wang, H. A Method for Predicting Dissolved Oxygen in Aquaculture Water in an Aquaponics System. Comput. Electron. Agric. 2018, 151, 384–391. [Google Scholar] [CrossRef]
Zhao, Z.; Zhang, Y.; Liu, Y.; Xu, W.; Wang, J.; Li, D. Overview and Future Perspectives of Nitrifying Bacteria on Biofilters for Recirculating Aquaculture Systems. Rev. Aquac. 2020, 12, 2101–2120. [Google Scholar] [CrossRef]
Ncube, M.; Dalu, T.; Chari, L.D.; Dalu, M.T. A Review of Water Quality Forecasting Models for Freshwater Lentic Ecosystems. Water 2025, 17, 2312. [Google Scholar] [CrossRef]
Zhang, Y.; Thorburn, P.J.; Xiang, W.; Fitch, P. Machine Learning Approaches for Water Quality Prediction in Aquaculture Systems. Aquac. Eng. 2020, 89, 102061. [Google Scholar] [CrossRef]
Eze, E.; Kirby, S.; Attridge, J.; Ajmal, T. Aquaculture 4.0: Hybrid Neural Network Multivariate Water Quality Parameters Forecasting Model. Sci. Rep. 2023, 13, 16129. [Google Scholar] [CrossRef] [PubMed]
Yang, H.; Feng, Q.; Xia, S.; Wu, Z.; Zhang, Y. AI-Driven Aquaculture: A Review of Technological Innovations and Their Sustainable Impacts. Artif. Intell. Agric. 2025, 15, 508–525. [Google Scholar] [CrossRef]
Shams, M.Y.; Elshewey, A.M.; El-Kenawy, E.-S.M.; Ibrahim, A.; Talaat, F.M.; Tarek, Z. Water Quality Prediction Using Machine Learning Models Based on Grid Search Method. Multimed. Tools Appl. 2024, 83, 35307–35334. [Google Scholar] [CrossRef]
Udanor, C.N.; Ossai, N.I.; Nweke, E.O.; Ogbuokiri, B.O.; Eneh, A.H.; Ugwuishiwu, C.H.; Aneke, S.O.; Ezugwu, A.O. An Internet of Things Labelled Dataset for Aquaponics Fish Pond Water Quality Monitoring System. Data Brief 2022, 43, 108400. [Google Scholar] [CrossRef] [PubMed]
Zamnuri, M.A.H.; Qiu, S.; Rizalmy, M.A.A.; He, W.; Yusoff, S.; Roeroe, K.A.; Du, J.; Loh, K.-H. Integration of IoT in Small-Scale Aquaponics to Enhance Efficiency and Profitability: A Systematic Review. Animals 2024, 14, 2555. [Google Scholar] [CrossRef] [PubMed]
Liu, T.; Liu, J.; Wang, J.; Xu, J. Optimization of the Intelligent Sensing Model for Environmental Information in Aquaculture Waters Based on the 5G Smart Sensor Network. J. Sens. 2022, 2022, 6409046. [Google Scholar] [CrossRef]
Cheng, W.K.; Khor, J.C.; Liew, W.Z.; Bea, K.T.; Chen, Y.L. Integration of Federated Learning and Edge-Cloud Platform for Precision Aquaculture. IEEE Access 2024, 12, 124974–124989. [Google Scholar] [CrossRef]
Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
Varma, S.; Simon, R. Bias in Error Estimation When Using Cross-Validation for Model Selection. BMC Bioinform. 2006, 7, 91. [Google Scholar] [CrossRef] [PubMed]
Rahman, M.T.; Nielsen, R.; Khan, M.A.; Asmild, M. Efficiency and Production Environmental Heterogeneity in Aquaculture: A Meta-Frontier DEA Approach. Aquaculture 2019, 509, 22–32. [Google Scholar] [CrossRef]
Shi, P.; Kuang, L.; Yuan, L.; Wang, Q.; Li, G.; Yuan, Y.; Zhang, Y.; Huang, G. Dissolved Oxygen Prediction Using Regularized Extreme Learning Machine with Clustering Mechanism in a Black Bass Aquaculture Pond. Aquac. Eng. 2024, 105, 102408. [Google Scholar] [CrossRef]
Kanwal, S.; Abdullah, M.; Kumar, S.; Arshad, S.; Shahroz, M.; Zhang, D.; Kumar, D. An Optimal Internet of Things-Driven Intelligent Decision-Making System for Real-Time Fishpond Water Quality Monitoring and Species Survival. Sensors 2024, 24, 7842. [Google Scholar] [CrossRef] [PubMed]
Abba, S.I.; Hadi, S.J.; Abdullahi, J. Emerging Artificial Intelligence Methods in Aquaculture: A Review. Comput. Electron. Agric. 2021, 189, 106315. [Google Scholar] [CrossRef]
Liakos, K.G.; Busato, P.; Moshou, D.; Pearson, S.; Bochtis, D. Machine Learning in Agriculture: A Review. Sensors 2018, 18, 2674. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Methodological architecture of the proposed predictive framework.

Figure 2. Cross-system LOPO performance of the MLP model (normalized RMSE).

Figure 3. Observed versus predicted dissolved oxygen (DO) values for Pond 3 under dynamic conditions.

Figure 4. Training loss evolution of the MLP regression model.

Table 1. LOPO performance of the MLP model for future DO prediction.

Pond	RMSE	MAE
Pond 1	0.68	0.51
Pond 2	0.63	0.41
Pond 3	1.06	0.72
Pond 4	0.93	0.56
Average	0.83	—

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Alanis, A.; Gutierrez, K.; Marquez, B.Y.; Guarda, T.; Dueñas, F. Cross-System Short-Term Dissolved Oxygen Prediction in Aquaponic Systems Using Multivariate Neural Network Models. Appl. Sci. 2026, 16, 6298. https://doi.org/10.3390/app16136298

AMA Style

Alanis A, Gutierrez K, Marquez BY, Guarda T, Dueñas F. Cross-System Short-Term Dissolved Oxygen Prediction in Aquaponic Systems Using Multivariate Neural Network Models. Applied Sciences. 2026; 16(13):6298. https://doi.org/10.3390/app16136298

Chicago/Turabian Style

Alanis, Arnulfo, Karime Gutierrez, Bogart Yail Marquez, Teresa Guarda, and Felix Dueñas. 2026. "Cross-System Short-Term Dissolved Oxygen Prediction in Aquaponic Systems Using Multivariate Neural Network Models" Applied Sciences 16, no. 13: 6298. https://doi.org/10.3390/app16136298

APA Style

Alanis, A., Gutierrez, K., Marquez, B. Y., Guarda, T., & Dueñas, F. (2026). Cross-System Short-Term Dissolved Oxygen Prediction in Aquaponic Systems Using Multivariate Neural Network Models. Applied Sciences, 16(13), 6298. https://doi.org/10.3390/app16136298

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Cross-System Short-Term Dissolved Oxygen Prediction in Aquaponic Systems Using Multivariate Neural Network Models

Abstract

1. Introduction

2. Materials

2.1. Dataset Origin (Kaggle + HiPIC Research Group)

2.2. Structural Heterogeneity Across Ponds

2.3. Computational Environment

3. Methods

3.1. Preprocessing and Intra-Pond Standardization

3.2. Formulation of the Problem as Dynamic Regression

3.3. Construction of Multivariate Temporal Windows

3.4. Predictive Models

3.4.1. Feedforward Neural Network (MLP)

3.4.2. Recurrent Neural Network (LSTM)

3.5. Leave-One-Pond-Out (LOPO) Validation

3.6. Error Decomposition by Dynamic Regime

4. Results

4.1. Cross-System Generalization (Leave-One-Pond-Out)

4.2. Dynamic Regime Analysis

4.3. Comparison Between MLP and LSTM Architectures

4.4. Variable Importance Analysis

5. Discussion

5.1. Structural Heterogeneity Across Systems

5.2. Behavior Under Stable and Volatile Regimes

5.3. Architectural Comparison

5.4. Operational Implications and Predictive Monitoring Potential

5.5. Study Limitations

6. Conclusions

7. Future Work

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI