Next Article in Journal
Functional Cobalt-Based Biochar Activating Peracetic Acid for Sulfamethoxazole Degradation: Electron Shuttle Effect and Synergistic Oxidation Mechanisms
Previous Article in Journal
Tesla Valve-Based Passive Flow Regulation for Sustainable Water Systems: Mechanisms, Structural Evolution, and Engineering Applications
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Short-Term Dissolved Oxygen Forecasting in Aquaculture Systems Using a Process-Based Mass-Balance Model

1
Department of Biosystems and Agricultural Engineering, University of Kentucky, 128 C.E. Barnhart Building, Lexington, KY 40546-0276, USA
2
Aquaculture Research Center, Kentucky State University, 103 Athletic Rd., Frankfort, KY 40601-2355, USA
*
Author to whom correspondence should be addressed.
Water 2026, 18(13), 1618; https://doi.org/10.3390/w18131618
Submission received: 21 May 2026 / Revised: 24 June 2026 / Accepted: 1 July 2026 / Published: 3 July 2026
(This article belongs to the Section Water, Agriculture and Aquaculture)

Abstract

Dissolved oxygen (DO) is a critical water quality parameter in aquaculture systems. Low DO events can stress, limit the growth of, or even cause mortality of aquatic life in aquaculture systems and require rapid management decisions. This study presents a process-based approach for short-term DO forecasting that is intended to support rapid deployment and transferability across various aquaculture systems. Future DO is computed using a mass-balance equation driven by daily stream metabolism and reaeration coefficients estimated from the previous 24 h of weather and water observations. These coefficients are combined with the next day’s observed water temperature, atmospheric pressure, photosynthetically active radiation, and salinity to predict DO 24 h ahead under idealized measured-input conditions with a ten-minute resolution. Model performance was evaluated across multiple aquaculture ponds with varying aeration techniques by assessing prediction accuracy of daily DO minimums using a safety-based metric and full-day DO trajectories using root mean square error. The model successfully predicted 91.77% of DO drops below 6 mg/L within 1 mg/L in a consistently aerated artificial pond and achieved high success in a natural watershed system. Performance was reduced in systems with highly variable aeration. Prediction accuracy was the highest in surface locations away from aerators. These results indicate that a minimal-history process-based framework can identify low DO risk under idealized measured-input conditions, particularly in surface locations away from aerators and in systems with constant or natural aeration.

1. Introduction

The American aquaculture industry is a growing industry that saw an increase in over 500 new aquaculture farms and almost $400 million in sales from 2018 to 2023 [1]. Small-scale aquaculture in locally owned ponds provides an opportunity for farmers to produce food as a source of locally grown protein or income [2]. When managing aquaculture systems, dissolved oxygen (DO) is one of the most important factors to manage [3] due to the negative response of aquatic life to hypoxia when DO levels are too low [4,5]. While it differs depending on species of fish, DO levels below 6 mg/L may cause slower growth and can stress fish populations, while 3 mg/L or lower can cause death [6]. It is important to maintain DO levels above 6 mg/L for high-yield aquaculture production.
The main providers of DO into aquatic environments are photosynthesis and diffusion of oxygen from the air to the water. Conversely, biological activity from all organisms in the water depletes DO. In an aquaculture system, DO is consumed at a higher rate due to increased respiration from increased numbers of aquatic organisms [7]. Nighttime is often the most dangerous time for DO drops to occur due to respiration and decomposition of unconsumed food or feces [8]. Since photosynthesis does not occur during the night, diffusion becomes the only provider of additional DO. Unfortunately, this is often not enough to keep DO from dropping by a large amount [7]. Therefore, the most critical period for DO shortages is often in the very early morning just before photosynthesis begins again, and DO levels start to increase. DO drops may also be caused by weather conditions, feeding rates [9], stocking rates, and pond management.
A common solution for dealing with low DO levels in aquaculture is the use of aerators to help maintain higher levels of DO [7,10]. Aerators often consume significant amounts of energy [11], and it may not be fiscally viable for smaller farmers to run them constantly. Even in situations of constant aeration, DO drops can occur and require extra emergency aeration to keep DO levels high. Unfortunately, with the critical low DO period often occurring shortly before sunrise, effective management requires automated emergency aeration activation systems or continuous on-call personnel. An effective DO management system for farmers is one that operates during daylight hours and predicts upcoming low DO levels for the night. Emergency aeration could then be deployed the day before DO drops happen and eliminate the risk of losing stock to hypoxia. This would allow farmers to effectively manage their system without spending time and resources on complex control systems or constant physical monitoring.
DO concentration is affected by a range of complex biological, chemical, and physical processes [7]. There exist multiple mass-balance equations that describe these complex processes [12,13,14]. While each approach differs slightly from the others in the terms used or their complexity, all equations have coefficients representing metabolism parameters such as respiration and photosynthesis, along with a coefficient for reaeration.
There are multiple models that estimate metabolism and reaeration coefficients, such as BaMM [15], Model Maker implementation [16], and BASEmetab [12], from weather data and aquatic measurements. These models can estimate DO mass-balance coefficients from 24 h of data. Compared to BaMM and the Model Maker implementation, BASEmetab provides a robust platform that can process large amounts of data in a shorter amount of time [12]. BASEmetab has been applied to estimate metabolism parameters in a diverse range of settings, including dryland lowland rivers [17] and headwater streams [18], demonstrating its usefulness across various aquatic environments.
Recent advances in machine learning (ML) have caused a growing interest in DO prediction models [19,20]. ML models such as those of Gachloo et al. [21] and Yu, X. et al. [22] utilize a combination of historical and “would-be” forecasted data to forecast DO for extended periods. In Gachloo et al. [21], the model is given observed meteorology and river discharge data for the same two weeks that it predicts. Similarly, in Yu X. et al. [22], stratification data is provided for the same month that the model predicts DO. Process-based models can also be used to predict DO, and various such models have been applied in aquaculture ponds [13], rivers [23], and lakes [14]. The process-based DO prediction model of Xu and Xu [14] computes DO using a mass-balance equation but requires chlorophyll-α concentration and sediment oxygen demand to do so. While these models produce accurate results, they recognize that model accuracy reduces with inaccurate forecasting of predictors. This becomes increasingly likely when forecasting complex predictors such as river discharge, stratification, chlorophyll-α concentration [24], sediment oxygen demand and turbidity [25].
The ability of a DO prediction model that uses more than just historical data to have future operational application is determined by its use of inputs that are available or derivable from commonly available weather forecast information. The NOAA Global Forecast System (GFS) provides forecasts of meteorological variables such as sea level (atmospheric) pressure, temperature, and humidity [26]. The Copernicus Atmosphere Monitoring Service (CAMS) provides forecasts of atmospheric composition including aerosols and other gases [27]. Forecasts of meteorological variables and atmospheric composition can be utilized with the REST2 model to predict PAR [28]. Similarly, the General Lake Model (GLM) is a process-based one-dimensional model that uses forecast meteorological inputs to predict water temperature [29]. Atmospheric pressure, water temperature, and PAR can be forecasted with relative ease from readily available weather and atmospheric composition forecasts compared to chlorophyll-α concentration, river discharge, and other complex predictors. The availability of forecast meteorological variables and atmospheric composition provides an opportunity for the development of a DO prediction model that is quickly deployable in any aquaculture production system.
Other ML models, such as those of Pan et al. [30], Granata et al. [31] and Pant et al. [32], make short-term DO predictions entirely on historical weather and prediction site data. While these models may not introduce errors due to forecasting of predictors, they require longer datasets for training the prediction models. In Pan et al. [30], DO is predicted 12 h ahead with a testing MSE of 0.5092 mg/L using 80% of four months of data to train the model. While the model has a high R2, the model does fail to predict the one major drop in DO from around 8 mg/L to 4 mg/L that occurs in the dataset. In Granata et al. [31], DO is predicted 1 day ahead with a testing RMSE of 0.13 mg/L using 80% of 8 years of data to train the model. However, the dataset used only has a DO range from around 6 mg/L to 12 mg/L across the entire 8 years. In Pant et al. [32], DO is predicted 3 h ahead with an RMSE as low as 0.145 mg/L using 75% of 4 and a half years to train the model. Being data-driven, ML models require significant periods of data collection before they can be implemented into a real-world system. Data-driven models also have trouble generalizing across larger regions and aquaculture environments [33,34], meaning new periods of data collection are required every time the model is to be implemented in a new system. Finally, new data collection may also be required every time there are significant changes to production practices such as altering stock patterns, variations in feeding rates, or when new equipment like aerators or water circulation is deployed.
While the results of process-based approaches to DO prediction may not be as accurate as the results of ML models, they do not have the downfalls of being site-specific, as DO mass-balance equations can be applied to any aquatic environment. Xu and Xu [14] took a particularly interesting approach to DO prediction by using local water quality measurements and weather data to simulate hourly DO using a mass-balance DO equation. The most difficult part of this approach is determining the metabolism and reaeration coefficients, which was done by seasonally breaking up the data and then dividing it into calibration and testing data. A single value for each of the metabolism and reaeration coefficients was determined for each season using the calibration data and was used to simulate DO against the validation data. Xu and Xu [14] provide good results; however, they take an approach to parameter estimation that requires larger datasets and cannot be used for quick deployment to new sites.
The goal of this project is to create a minimal-history process-based DO prediction framework that utilizes the generalizability of a DO mass-balance equation and easily forecastable model inputs. The approach will use a single day of observations of water quality (DO, water temperature, and salinity) and meteorological (photosynthetically active radiation (PAR) and atmospheric pressure) measurements to estimate metabolism and reaeration coefficients using BASEmetab. Using those coefficients and forecastable meteorological variables, the model will then predict DO for the next 24 h in ten-minute intervals using a mass-balance equation to recursively compute DO. This framework is designed for minimal-history, rapid DO prediction across sites, without site-specific training or large historical datasets. The model is tested in various aquaculture ponds that contain aerators and diffusers. DO concentration varies throughout ponds, especially at different depths or near aerators. Therefore, an additional key consideration is whether model performance changes when predictions are made at different depths or different proximities to the in-pond aeration system.
The objectives of this project were:
  • Using the previous day’s metabolism and reaeration coefficients, test a process-based model’s ability to predict
    • when the daily DO minimum reaches below 6 mg/L in aquaculture systems, and
    • DO concentration for 24 h periods.
  • Evaluate the effect of prediction location and depth on model performance in various ponds with different aeration techniques.
  • Analyze how the use of the previous day’s metabolism and reaeration coefficients causes inaccurate prediction results.
  • Perform additional analyses on parameter uncertainty, seasonal performance, and sensitivity to forecast-input uncertainty.

2. Materials and Methods

The long-term goal for this framework is to support farmer decision-making for supplemental aeration to combat overnight DO drops. For this project, this decision is assumed to occur at 18:00. Therefore, the 24 h prediction period runs from 18:00 on a given day to 18:00 on the following day, and the system estimates the DO model parameters using historical data from the previous 24 h (18:00 on the day before to 18:00 on the current day). This results in a prediction of the overnight DO drop that occurs due to lack of sunlight and the rise that occurs during the following daytime. In this paper, these 24 h periods (18:00 to 18:00) are referred to as “days,” and the parameters estimated in every 24 h period as “daily” since they are one day long, but do not directly correspond to calendar days. A diagram of the overall architecture used in this paper is seen below in Figure 1.

2.1. Computation of Dissolved Oxygen

In this project, DO is directly computed using the following regression model described in Grace et al. [12]:
D O m o d , t + 1 = D O m o d , t + A I t p R θ T t T ¯ + K 1.0241 ( T t T ¯ ) D O s a t , t D O m o d , t
Here, D O m o d , t and D O m o d , t + 1 represent the modeled DO in mg/L at the timesteps t and t + 1 respectively. I t is the incident light intensity in µmol/m2s, T t is the water temperature in °C, and T ¯ is the water temperature averaged across the 24 h period in °C. A , p , R , and θ are the daily metabolism coefficients, and K is the daily reaeration coefficient.
Metabolism coefficients:
  • A : gross photosynthesis rate
  • p : light use by primary consumers
  • R : respiration coefficient
  • θ : respiration dependence on temperature
D O s a t , t is the DO at saturation in mg/L and is described by:
D O s a t = F s a l P a γ
where F s a l is the salinity correction for solubility of DO at a given salinity S m (ppt) [35] and is described by:
F s a l = e x p 139.34411 + S 1 + S 2 + S 3 + S 4 + S 5
S 1 = 157570.1 / t k
S 2 = 6.6423080 × 10 7 / t k 2
S 3 = 1.2438 × 10 10 / t k 3
S 4 = 8.621949 × 10 11 / t k 4
S 5 = 1.0 × S m 0.017674 10.754 t k + 2140.7 t k 2
where t k is the water temperature in Kelvin. P a , as seen in (Equation (2)), is the atmospheric pressure in (atm). γ , as seen in (Equation (2)), is the correction to 100% DO saturation from changes in atmospheric pressure [36]:
γ = 1 β P a / ( 1 β ) × 1 α P a / ( 1 α )
where:
β = e x p 11.8571 3840.7 t k 216961 t k 2
α = 0.000975 0.00001426 t k + 0.00000006436 t k 2
For this project, incident light intensity is assumed to be equivalent to PAR. This regression model allows for future DO to be calculated using the last available measured DO value, forecasted water temperature, atmospheric pressure, PAR, salinity, and estimated metabolism and reaeration coefficients A , p , R , θ , and K .

2.2. Metabolism and Reaeration Coefficients Estimation

BASEmetab is an R package written by Giling et al. [36]. Its main function implements the regression model described in Grace et al. [12] for DO time series. BASEmetab estimates the metabolism and reaeration coefficients A , p , R , θ , and K for a 24 h period using measured DO (mg/L), water temperature (°C), incident light intensity (μmol/m2s), atmospheric pressure (atm), and salinity (ppt). BASEmetab works by taking a distribution of the metabolism and reaeration coefficients and runs a DO time-series trajectory using Equation (1) and the measured parameters for a day. Three chains are run 1000 times, and a cloud of coefficient values and DO trajectories for the day is collected. Bayesian inferencing via Markov chain Monte Carlo then accepts or rejects the metabolism and reaeration coefficient values based on the fit of the DO trajectory to the measured DO. BASEmetab then outputs the mean and standard deviation of the metabolism and reaeration parameters accepted by the model.
Successful convergence of the chains for each parameter is evaluated using the Gelman–Rubin statistic ( R ^ ). An R ^ of less than 1.1 indicates successful convergence of the Monte Carlo chains. Successful overall model convergence occurs when the R ^ value for each of the metabolism and reaeration coefficients is below 1.1. By default, BASEmetab only estimates three of the five coefficients and holds p and θ constant. While a five-parameter estimation may provide more accurate values of p and θ , we found a much lower rate of convergence compared to the three-parameter estimation when running preliminary data. The three-parameter estimation had much more consistent convergence at every location and therefore was chosen as the estimation technique for this project. Finally, the analysis within this project requires greater detail than the standard output format from BASEmetab, so the program was edited to provide more detailed output files.
Figure 2 provides an example output of BASEmetab with successful convergence and a good fit of the DO trajectory to the measured DO for a day. DO trajectories calculated using the DO regression model (Equation (1)) are much smoother than the measured DO trend and do not account for every short rise and fall in DO. These trajectories tend to capture the overall trend in DO for a day and therefore may disregard noise in data collection that causes a sudden change in DO. These trajectories may also disregard sharp rises in DO due to quick wind gusts or rain showers.

2.3. Prediction Modeling

The prediction model uses metabolism and reaeration coefficients output from BASEmetab, along with forecasted water temperature, salinity, atmospheric pressure, and PAR to calculate future values of DO (Equation (1)). The output means and standard deviations from BASEmetab for day n are used to drive the DO calculation for day n + 1 . Each coefficient is assumed to follow a normal distribution. Random samples of A , R , and K are then taken, and along with forecasted values for water temperature, atmospheric pressure, and PAR in ten-minute intervals for day n + 1 , are used as model inputs. Equation (1) is then used to compute a predicted DO value every ten minutes for day n + 1 . The first predicted DO value for day n + 1 uses the last measured DO value from day n as the initial condition. For every subsequent timestep, the previously predicted DO is used recursively as the initial condition ( D O m o d , t ) . This process generates a full predicted DO trajectory for day n + 1 driven by the sampled metabolism and reaeration coefficients. A fully simulated DO trajectory is calculated using 144 timesteps, or ten-minute intervals, to complete a full day. Using a Monte Carlo approach, the process is repeated 1000 times for each day, with a new random sampling of A , R , and K for each iteration. The median of all the iterations is then taken as the final predicted DO trajectory for day n + 1 .
For all calculations, the temperature-scaling parameters p and θ were held constant at 1.0 and 1.07177, respectively, the same values used by BASEmetab when running the three-parameter estimation mode. Salinity was also held at a constant value for each pond. The reaeration coefficient ( K ) output by BASEmetab is expressed on a per-day basis, and after sampling is converted to a ten-minute timestep scale to match the model’s temporal resolution by dividing by the number of timesteps.

2.4. Model Testing

Model performance was evaluated using the same prediction framework described in the previous section, except that observed water temperature, atmospheric pressure, and PAR were used in place of forecasted data for day n + 1 . This measured-input evaluation was used as an intentional first-stage validation of the process-based framework. By using observed next-day inputs, errors in model performance can be attributed to model structure, parameter estimation, and the use of the previous day’s metabolism and reaeration coefficients. This avoids errors introduced through external forecasts. The approach provides an upper-bound estimate of model performance under idealized input conditions. This also acts as a baseline to which future forecast-driven implementations can be compared.

2.5. Prediction Locations

The model was tested across four separate ponds located in Frankfort, KY, and operated by Kentucky State University’s School of Aquaculture and Aquatic Sciences. While all four ponds were man-made for the purpose of aquaculture research, the Farm pond operates as a natural watershed pond, while Ponds 2, 4, and 7 are cemented pools that have less interaction with their local environment. The Farm pond is larger, with a surface area of approximately 8000 m2, while Ponds 2, 4, and 7 are all the same size, with a surface area of approximately 400 m2. As all these ponds are utilized for intensive aquaculture production; they already include aerators of various types to support oxygen levels for improved fish health.
The two main aerators at these facilities are surface aerators, which push high volumes of water into the air, and bubble diffusers, which pump air into a diffuser grid under the surface. The surface aerators were deployed in Pond 7 and the Farm pond. They ran continuously and provided a constant source of additional oxygen. The bubble diffusers were in the Farm pond, Pond 2, and Pond 4. The diffuser in Pond 2 was powered directly from solar panels and provided aeration anytime there was sufficient solar energy to power the compressor. Therefore, its operation was highly correlated to the PAR levels that powered algae oxygen production. When dissolved oxygen was below saturation levels, it increased available oxygen, but if dissolved oxygen levels reached supersaturation, diffusion could cause oxygen removal. The bubbler in Pond 4 was also solar powered, but it contained a battery pack which stored all the energy produced during the day. When the sun set, it calculated how much energy it had stored during the day. It would begin nighttime aeration such that the batteries would be depleted around sunrise. This provided targeted aeration during the most critical low-oxygen period shortly before sunrise. With this approach, the total amount of overnight aeration would be highly correlated with total PAR from the day before.
The aeration system in the Farm pond was much more complex given its size and the different aquaculture activities occurring in the pond. It was a watershed pond that trapped runoff from surrounding farmland. At least one surface aerator operated continuously in the Farm pond. It also had a solar-powered grid diffuser with batteries to operate similarly to the one in Pond 4. This diffuser included small wind turbines to augment the solar production. The Farm pond also included in-pond raceways, some of which used grid diffusion airlifts to push water through the raceway. These airlifts also provided additional aeration to the pond as the water flowed from the raceway back into the general pond area. Fish stocking levels in these raceways varied throughout the year depending on fish growth and harvest patterns. Raceway airlifts and other raceway-specific aerators operated constantly and consistently once fish were stocked in the raceway, but they would be shut off when the raceway was not used. This provided a highly variable aeration environment in the Farm pond. While this variability is a major test for a prediction model, these conditions reflect the impacts of aquaculture production, and a good predictive model should be able to handle them.

2.6. Data Acquisition and Preprocessing

From 14 March 2024, to 6 June 2025, DO (mg/L) and water temperature (°C) were measured using PME miniDOT Loggers in multiple locations across all four ponds. Data was taken at ten-minute intervals for the duration of this period. In the Farm pond, there were four total miniDOTs, one placed at the surface near the aerators, one submerged near the aerators, one at the surface away from the aerators, and one submerged away from the aerators. In Pond 2, there were two miniDOTs: both at the surface, with one near and one away from the diffuser. In Pond 4, there were two miniDOTs, with one being placed at the surface near the diffuser. The other miniDOT in Pond 4 was first located at the surface away from the diffuser until June 2024, when it was moved to be submerged near the diffuser. Pond 7 only had one miniDOT placed at the surface of the pond. For simplicity, reference codes are used to denote the various miniDOT placements and their corresponding data range (Table 1).
Predicting future DO relies on accurate measurements of current DO. Sensor failures that result in sudden DO shifts cause errors when estimating metabolism and reaeration coefficients. We utilized filtering to correct for short-term sensor deviations. Preprocessing of DO and water temperature measurements consisted of smoothing with a smoothing factor of 0.001 using MATLAB R2024b’s Data Cleaner app [37]. This was done to reduce major spikes caused by mechanical failure, not environmental change. While this preprocessing can handle a few outliers, the model cannot run effectively when there are large spikes, multiple spikes or longer-term shifts caused by mechanical failure. To account for this in our dataset, all days were thrown out when there was a single large input spike. Multiple data spikes affected metabolism and reaeration coefficient estimation and therefore had a direct effect on the next day’s prediction of DO. When a day had multiple large data spikes, both that day and the following day of results were thrown out. Spikes are defined as a change of more than 2 mg/L between data points taken 10 min apart. In Table 1, “Days used” refers to the number of days not thrown out and used in the results of this project.
Historical atmospheric pressure (atm) and solar radiation (μmol/m2s) measurements in five-minute intervals were pulled from the Kentucky Mesonet’s online database [38]. The data was taken at the Kentucky Mesonet’s Franklin County weather station, which is around 250 m from the Farm pond and around 9 km from Ponds 2, 4 and 7. Atmospheric pressure was converted from mb to atm. Solar radiation in W/m2 was converted to PAR in μmol/m2s.
Salinity was assumed to be constant across the duration of the data range. Salinity samples were taken on June 21, 2025, for the Farm pond and Pond 4. These values were used across the entire data range in the corresponding ponds, and the measured value in Pond 4 was also used in Pond 2 and Pond 7 data.
There were instances of missing data points up to an hour in the DO, water temperature, atmospheric pressure, and PAR data. These gaps were filled in using linear interpolation. The atmospheric pressure and PAR data always began exactly at 18:00:00 for a day, while the water temperature and DO measurements had variable start times. The atmospheric pressure and PAR data were mapped to the nearest timestamp on the DO and water temperature dataset so that time of measurement between the miniDOT and weather station measurements was as close as possible. Timestamps were then rescaled into the nearest ten-minute value in a day so that there were 144 evenly spaced timesteps for a day.

2.7. Model Performance

The goal of the model is to predict minimum DO on periods when the DO falls to below 6 mg/L, referred to as critical levels in this paper. Evaluation of the model’s performance at predicting the minimum DO level for days when the measured DO reached critical levels was done using a designated % Safe metric. For each location, total days with a minimum DO value below 6 mg/L were counted. Then, of those days, the number of days where the model predicted the minimum DO value within 1 mg/L were counted, as well as number of days where the model predicted a DO value less than the measured DO by more than 1 mg/L. While predicting a DO value less than the measured value by more than 1 mg/L is not a goal of the model, this is still considered a safe result because even though it falsely predicts the DO will reach critical levels, it does not result in any negative effects on the fish if it triggers preemptive aeration deployment. The % Safe metric was calculated as the number of days where the predicted DO was within 1 mg/L of the minimum measured DO, or less than the minimum measured DO by more than 1 mg/L out of the total number of days with a minimum measured DO of less than 6 mg/L.
Key Prediction Levels:
  • Critical Levels: Minimum measured DO reached 6 mg/L or lower.
  • Safe Prediction: During Critical Levels, the predicted minimum DO was within 1 mg/L or less than the measured minimum DO.
  • Unsafe Prediction: During Critical Levels, the predicted minimum DO was greater than the measured minimum DO by more than 1 mg/L.
Evaluation of the model’s performance over the course of a full prediction day was performed using root mean square error (RMSE). These errors measure accuracy of the model’s predictions to the actual values, where a smaller error correlates to a better model performance [39]. RMSE was calculated using the following equation:
R M S E = 1 N i = 1 N y i y ^ i 2
Here, N is the total number of samples, y i is the real value of sample i , y ^ i is the predicted value of sample i , and y ¯ i is the mean value of the real values.
For all tables in Section 3, n denotes the number of valid prediction days included in the calculation for that specific location, season, or evaluation category.

2.8. Statistical Analysis

Evaluation of BASEmetab’s ability to estimate parameters well was done using the coefficient of variation. Because K , A , and R are estimated on different scales, the Coefficient of Variation (CV) provides a relative uncertainty for each parameter that is easier to compare. CV (%) was calculated daily for each parameter using the following equation:
C V = σ μ × 100
Here, σ is the daily parameter standard deviation, and μ is the daily parameter mean.
Since the model used metabolism and reaeration coefficients from the previous day in its prediction, multiple linear regression (MLR) was used to determine if there was statistical significance between K Mismatch, A Mismatch, and R Mismatch and the model performance metrics. The predictors K Mismatch, A Mismatch, and R Mismatch were calculated as the difference between the coefficient value estimated for day n and used in the prediction of day n   + 1 , and the coefficient value estimated for day n + 1 . MLR was then performed twice for each location to test the significance of the predictors. First, on days where the model was off by more than 1 mg/L in predicting the minimum DO, secondly on days where RMSE was greater than 1 mg/L.

2.9. Additional Analysis

Two additional analyses were conducted to evaluate model performance further. Firstly, daily RMSE and % Safe results were split up by meteorological season to determine if model performance varied consistently throughout the year. The second additional analysis was a perturbation test of the model’s forecast inputs.
Evaluating model performance first on the observed next-day input data provides a baseline for evaluating model performance without the introduction of outside error from forecasts. The accuracy of site-specific forecast-driven inputs relies heavily on local data availability, forecast resolution, and resources available to producers. The perturbation test acts as a controlled sensitivity test that represents possible forecast-input uncertainty.
For the perturbation test, the DO prediction model was run an additional 10 times past the nominal case. Water temperature was perturbed by ±2 °C, atmospheric pressure by ±0.025 atm, and PAR and salinity by ±10%. Each input was perturbed individually, then all the inputs were perturbed together in the positive and negative directions. Results were then compared with the nominal case. Location 7S was chosen for the perturbation test due to its high performance in the nominal case, as seen in Section 3.

3. Results

3.1. Prediction Curve Analysis

Figure 3 compares the same daily DO prediction using two different bands. The first plot shows the 95% Monte Carlo interval, while the second shows the ±1 mg/L operational tolerance used in the low DO safety analysis. The predicted DO in both plots is the median of the results from the Monte Carlo simulations using the distributions for the metabolism and reaeration parameters from the previous day output by BASEmetab. The 95% Monte Carlo interval is relatively narrow because it only propagates uncertainty from the estimated metabolism and reaeration coefficients. For this reason, many measured DO values fall outside of the Monte Carlo interval, as seen in the first plot, despite the predicted trajectory following the overall daily trend.
On the other hand, in the second plot, measured DO consistently falls within ±1 mg/L of the predicted DO trajectory. This comparison supports the use of the ±1 mg/L threshold as the primary safety-based evaluation criteria. If the framework is to be used in the future for supporting decision-making for additional aeration, it is most important that the model produces a sufficiently conservative estimate of the minimum daily DO during low DO events. It is not as important that all ten-minute predictions fall within the Monte Carlo interval. The ±1 mg/L threshold therefore provides a better operational tolerance for low DO warnings than the uncertainty produced by Monte Carlo simulations. Figure 3 also highlights the ability of the model to capture larger daily trends and not follow short-term trends or spikes.
Figure 4 shows a sequence of rolling one-day DO predictions at location 4SN. Each prediction is initialized with the measured DO at 18:00 and uses the previous day’s BASEmetab coefficients; therefore, discontinuities between daily DO predictions can occur. Figure 4 is not intended to represent a single continuous 72 h recursive DO prediction. The model predicts DO well through the first day, where it successfully predicts the level of the overnight DO drop. During the first day, measured DO levels drop around 10 mg/L from 15 to 5 mg/L, and then rise 15 mg/L up to 20 mg/L. In contrast, on the second day, measured DO drops around 15 mg/L from 20 mg/L to 5 mg/L. However, the model predicted that the drop would be consistent with day one’s 10 mg/L drop. This provides a predicted minimum DO of 10 mg/L on day two instead of the actual minimum of 5 mg/L. The DO prediction of day two follows a trend like that of the measured DO from day one, a result of day two using the means and standard deviations of the metabolism and reaeration coefficients output by BASEmetab from day one in its prediction. On day three, the measured DO drops around 15 mg/L again from 18 mg/L to 3 mg/L. Because this trend is the same as that on day two, the model can accurately predict the DO drop on day three using the means and standard deviations of the metabolism and reaeration coefficients output by BASEmetab for day two. This three-period sequence highlights that when there is a change in the trend of the DO curves from day to day, the model requires a day to adjust due to the use of the previous day’s metabolism and reaeration coefficients in the DO prediction.
BASEmetab provides means and standard deviations of K , A , and R for day n from which the prediction model creates normal distributions and samples in predicting DO for day n + 1 . Across the three days predicted in Figure 4, one can see sudden changes across the mean and standard deviations of the coefficients (Table 2). There is large variation between the values of K used for predicting a day and the BASEmetab output for the same day in all three days. This emphasizes that in this case, variation in K is not the leading cause of an inaccurate prediction, as only one of the three days is inaccurate. When looking at R, however, there is large variation between the mean values used in prediction and output by BASEmetab for day two. This signifies that R is the main driver of inaccurate predictions, as day two is the only one of the three days that is inaccurate. Note that Figure 3 and Table 2 only contain data relating to location 4SN. The significance of the difference between the coefficient values used in predicting DO for a day and the values output by BASEmetab for the same day on model performance is explored later in this analysis at all locations.

3.2. Critical DO Evaluation

Table 3 summarizes the results of using the model to predict the minimum DO across all testing locations on days where the measured DO reached critical levels. Model success, designated by % Safe, varied across ponds and in-pond location. The FSA location had the highest % Safe but had the fewest days where DO reached critical levels. Location 7S had the next highest % Safe while also experiencing the third most days where DO reached critical levels. Pond 4 showed the third best results with a high % Safe in two of the three locations, and Pond 2 had the worst results.
The best results occur in the Farm pond and Pond 7, both of which include constant aeration, while Ponds 2 and 4 include weather-dependent, variable aeration. The mass-balance equation used to model DO in BASEmetab and the prediction model is also used mainly for natural environments such as rivers, streams, and estuaries. The reaeration coefficient seen in Equation (1) is built to handle natural aeration, not variable aeration due to aerators. The constant aeration occurring in the Farm pond and Pond 7 may be acting as a constant baseline, like how natural aeration systems have an aeration baseline of zero. The Farm pond also has variable aeration and even more variable inputs from watershed events. However, BASEmetab appears well suited to describing pond dynamics in these more natural systems.
Ponds 2 and 4 include weather-dependent variable aeration. Pond 2 runs aeration during the day, the same time that both respiration and photosynthetic activity are occurring, while during the night only respiration is occurring. In Pond 4, the aeration runs overnight so that during the day only respiration and photosynthetic activity are present, meaning that K and A have little to no overlap during a day. Higher performance in Pond 4 than Pond 2 suggests that BASEmetab provides more accurate coefficient estimations in a variable overnight aeration system than a variable daytime aeration system, possibly due to the reduced overlap between K and A .
Comparing locations FSA to FSN, FDA to FDN, and 2SA to 2SN, model performance increases when the prediction occurs further away from the aerators. Comparing locations FSA to FDA, FSN to FDN, and 4SN2 to 4DN, model performance increases when the prediction is at the surface of the pond instead of lower in the water column. These comparisons suggest BASEmetab provides more consistent coefficient estimations further away from the aerator at the surface, leading to better prediction results. Location 7S also supports these statements, as it is away from the aerator and displays high model performance. Prediction performance was reduced further from the aerator when comparing locations 4SN1 to 4SA; however, less than three months of data was available for location 4SA.
The column “Underpredicted DO by more than 1 mg/L” can also be thought of as the number of days where a false alarm occurred. On these days, the model would have flagged that extra aeration was necessary when in reality it was not. The most false alarms occurred in the Pond 4 locations, which supports that the model does not perform as well in variable aeration environments. In the Farm pond, 3 of the 14 days the model predicted DO to be below 6 mg/L would have been false alarms, and in Pond 7, only 5 of the 145 days would have been.
Overall, when predicting when DO reaches critical levels, the model produces the best results when predicting away from the aerator at the surface. It also has greater success when predicting in natural watershed environments like the Farm pond, or constant aeration environments like Pond 7.

3.3. Impact of Coefficient Estimate Errors

Figure 5 shows the relationship between K Mismatch, A Mismatch, and R Mismatch and minimum DO difference for location FSA. Coefficient mismatch refers to the difference between the mean coefficient value estimated for day n by BASEmetab used in estimating day n + 1 , and the mean coefficient value estimated for day n + 1 by BASEmetab. Minimum DO difference refers to the difference between predicted and measured minimum DO values for a day.
From the trendlines displayed in Figure 5, it appears that all three coefficient differences were statistically significant in determining the model’s ability to predict daily minimum DO at location FSA. MLR was performed at every location to determine which coefficient differences were significant to daily minimum DO predictions at each location, as seen in Table 4.
Table 4 highlights the significance results of performing MLR with the predictors K Mismatch, A Mismatch, and R Mismatch on minimum DO difference when overprediction or underprediction by more than 1 mg/L occurred. In every location except for 4DN, R Mismatch was found to be significant in affecting the minimum DO difference. In every farm pond location and 2SN, K Mismatch was also found to be significant in predicting minimum DO difference. A Mismatch was also found to be significant in locations FDA, FSA, and FDN.
This significance of multiple predictors in the Farm pond locations is likely due to the overall complexity of the water dynamics in this pond. This highlights that large differences between estimated day n and day n + 1 coefficient values are likely to cause overprediction or underprediction of the daily DO minimum in the Farm pond. In complex systems, the use of the previous day’s metabolism and reaeration coefficients leads to a model that is highly sensitive to the results of BASEmetab when predicting minimum daily DO.
In Ponds 2, 4, and 7, R Mismatch is either the only or the most significant predictor at all locations. This highlights that in less complex aquaculture systems, R Mismatch is the most important to accurately predicting daily minimum DO. This suggests that predictions in Ponds 2, 4, and 7 are sensitive to the BASEmetab’s estimation of R , but can still produce accurate results with large K and A Mismatch.

3.4. RMSE Evaluation

With the goal of extending the model to multi-day prediction, it is important to evaluate the model over the course of the full day, as seen through RMSE values in Table 5. Predictions in the Farm pond and Pond 7 had the lowest average daily RMSE, followed by Pond 4 and Pond 2. Predictions in the Farm pond and Pond 7 were within an RMSE of 1 mg/L on around or over half of the days predicted, highlighting the model’s ability to follow daily trends better in natural aquaculture systems or systems with consistent aeration. The improved results in Pond 4 compared to Pond 2 indicate that the model performs better when predicting daily trends in systems with variable overnight aeration as opposed to variable daytime aeration.
Comparing locations FSA to FSN, FDA to FDN, and 2SN to 2SA, the percentage of days predicted with an average RMSE less than 1 mg/L increases when the prediction location is away from the aerator. Comparing locations FSA to FDA and FSN to FDN, the percentage of days predicted with an average RMSE less than 1 mg/L increases when the prediction location is at the surface. There is little difference in results when comparing 4SN1 to 4SA and 4SN2 to 4DN.
These trends seen in Table 5 are the same as those found in Table 3. From this it can be concluded that the model produces the best performance when predicting at the surface away from the aerator. It also performs best in natural aquaculture systems such as the Farm pond or constant aeration systems such as Pond 7. Performance is reduced in variable aeration systems such as Pond 4 and Pond 2 but is higher in variable overnight aeration systems as opposed to variable daytime aeration systems.
Table 6 displays the results of MLR on days where the RMSE was greater than 1 mg/L. In every location except for 4DN, A Mismatch was found to have the most significant effect on the value of the RMSE compared with the other predictors. This emphasizes that for correctly predicting DO across an entire 24 h period, it is important to have a correct estimation for the value of A . In all locations, larger A Mismatch is likely to result in worse model performance when predicting the daily trend for day n + 1 .

3.5. Uncertainty Evaluation

Table 7 summarizes BASEmetab convergence and coefficient uncertainty by location. All locations besides 4SN and 4DN had convergence rates of greater than 97%, while 4DN had the lowest convergence rate of 80.72%. This suggests that predictions occurring near the diffuser produce less stable parameter estimations. Overall, convergence was successful for most days in all locations.
Coefficient uncertainty varied more between locations than convergence. Location 7S produced the lowest coefficient uncertainty for all three parameters. A generally had the lowest coefficient uncertainty, with median daily CV values below 4%. R uncertainty was also relatively low, but increased in the submerged locations of FDA, FDN, and 4DN. K uncertainty had the greatest variability, and was very high, with locations FSA, FDA, and FDN having median daily CV values above 85%. This indicates that the reaeration coefficient was more difficult for BASEmetab to estimate in Farm pond locations, possibly due to the complex mixing of natural aeration, watershed inputs, and variable aeration.
Despite high K uncertainty at Farm pond locations, model performance did not decline in these locations. Location FSA had the highest K uncertainty but still produced strong low-DO warning performance. This suggests that high uncertainty in one parameter does not directly translate to poor model performance.

3.6. Seasonal Evaluation

Results were broken up into seasons for RMSE and % Safe analysis by season. Seasons where there were less than five available days of data were not included in the RMSE analysis. Seasons where there were no days with a minimum measured DO below 6 mg/L were not included in the % Safe analysis. Number of days used for evaluation is included in both Table 8 and Table 9. Table 8 and Table 9 summarize model performance by season using daily RMSE and % Safe, respectively.
In Table 8, Farm pond locations showed relatively stable seasonal performance. Locations FDA and FDN showed slightly higher RMSE in spring while locations FSA and FSN remained near or below 1 mg/L during all seasons. Locations 4SN and 4DN had higher variability during the seasons with low RMSE during the winter and high RMSE during the summer. Location 7S had the largest variability between seasons with summer and fall both having an RMSE around 0.6 mg/L and spring and winter having an RMSE of greater than 2 mg/L.
In Table 9, % Safe was relatively stable across all seasons at location 4SN. At location 7S, performance was relatively high across all seasons, with the lowest % Safe being above 78% in spring. At location 4DN, % Safe was lowest in summer and fall compared to spring. The Farm pond location had limited critical DO days outside of spring making seasonal comparisons difficult. From Table 8 and Table 9, the seasonal results do not indicate that there is a consistent seasonal pattern in model performance across locations. Seasonal effects may still affect DO dynamics, but the results of the study indicate that aeration type and prediction location have a stronger effect on model performance.

3.7. Perturbation Test

Model sensitivity to forecast-input uncertainty was evaluated using perturbations of water temperature, PAR, atmospheric pressure, and salinity. Water temperature was perturbed by ±2 °C, atmospheric pressure by ±0.025 atm, and PAR and salinity by ±10%. Perturbations were applied individually as well as all together to simulate individual and combined-input error propagation. Location 7S was chosen for this analysis due to its high performance in the nominal case; results are summarized in Table 10.
Across all perturbation scenarios, RMSE remained within the range of 1.237 to 1.363 mg/L, indicating model performance at location 7S was not very sensitive to the tested input perturbations. The largest increase in RMSE occurred when PAR was increased by 10%, which raised average daily RMSE from 1.279 to 1.363 mg/L. The combined positive perturbation of all inputs produced a similar RMSE of 1.362 mg/L. Decreasing PAR by 10% reduced RMSE slightly to 1.244 mg/L, and the combined negative perturbation produced the lowest RMSE at 1.237 mg/L. This suggests that the model may have been more sensitive to PAR than other tested inputs. This makes sense as PAR directly affects the gross primary production term in the DO mass-balance equation.
Temperature perturbations of ±2 °C produced only small increases in RMSE, while atmospheric pressure perturbations of ±0.025 atm and salinity perturbations of ±10% had minimal effects. Overall, the perturbation test suggests that the model’s RMSE at 7S was relatively robust to realistic uncertainty in forecast-dependent inputs. However, this analysis should still be interpreted as a sensitivity analysis, and future work should evaluate the framework using real forecasted data.

4. Discussion

As seen in Table 11, the process-based DO prediction approach adopted in this paper yields higher RMSE than site-trained ML models in their respective contexts. However, the approach provides a framework that is intended to support rapid deployment and transferability. Unlike data-driven approaches that require larger sets of historical data, this model predicts DO directly from a mass-balance equation using minimal input information and only 24 h of historical data. As a result, the model captures daily DO trends, but not short-term fluctuations present due to aeration or other mixing events. The trade-off between precision and operational simplicity makes the model framework largely valuable for local, small-scale, new aquaculture systems with more limited resources and monitoring availability.
The use of observed next-day inputs should be seen as a controlled evaluation of the DO prediction framework. This separates model performance errors caused by model structure and daily coefficient selection from errors caused by forecasting errors. Such separation is useful because forecast accuracy will vary between locations based on forecast resolution, microclimate effects, and availability of resources. The perturbation test then provides an additional sensitivity analysis between the measured-input baseline and future operational testing with forecasted data. At location 7S, model performance did not vary greatly among any perturbation scenario. This suggests that the model is not highly sensitive to realistic deviations in the forecast-dependent inputs at consistent aeration locations. However, the perturbation test does not replace forecast validation, only indicates that forecast uncertainty within the tested ranges may not be a dominant source of error in consistently aerated pond and sensor locations.
Further work to improve model performance should focus on a better method of determining the metabolism parameters for a day than simply choosing the previous day’s values. This approach works well in consistently aerated systems but is not an accurate approach for systems with high variation. Therefore, with the goal of making a more transferable framework across pond and aeration types, a different approach to parameter selection should be explored to yield accurate results in all aquaculture systems. Implementation of a machine learning model may provide a better approach to parameter selection and could be trained and tested on the same data used in the DO calculation. Physical solutions may also be explored, such as estimating A from measurements of photosynthetic biomass, or estimating K using aerator capacity, wind, and rainfall data.
Future work should evaluate model performance on forecasted data. Currently, the model is only tested using observed and perturbed observed data, which will not be available if the model is implemented in a field setting. Atmospheric pressure, PAR, and water temperature data should be predicted directly from forecasting sources or calculated using available forecasted weather data. Salinity can be measured at regular intervals and assumed constant between them. Using the GFS and CAMS for meteorological and atmospheric condition forecasts, combined with the REST2 and GLM models, atmospheric pressure, water temperature, and PAR can be forecasted and used as inputs for true testing of the DO prediction model. With the use of forecasted data, model performance will vary significantly with the accuracy of said forecasted data and should look to implement the most accurate measurements and forecasting models.

5. Conclusions

Overall, the process-based DO prediction model discussed in this paper produces RMSE results much higher than those of the ML models posted in the literature. However, the model does show success in predicting the minimum daily DO values in constant aeration and natural watershed aquaculture systems. Trends of model performance within ponds showed that model performance increases when the prediction location is further away from the aerator and at the surface. Implementation of this model at locations at the surface and away from the aeration location will likely yield the most accurate results. When the model fails at accurately predicting the minimum DO for a day within 1 mg/L, it is often due to a difference in the respiration coefficient ( R ) between the used and actual value for the day. In systems with more complex and variable aeration, difference in the reaeration coefficient ( K ) also has a significant effect on the model’s accuracy in predicting daily minimum DO. When the model fails to predict the daily DO trend within 1 RMSE, it is largely due to the difference in gross photosynthesis rate ( A ) between the used and actual value for the day. Results of the study indicate that while seasons may affect DO dynamics, they do not have a consistent effect on model performance across locations. Results of the perturbation test indicate the model is relatively robust to realistic uncertainty in forecast-dependent inputs. This DO prediction framework provides an approach to DO prediction in aquaculture systems that is intended to support rapid deployment with only 24 h of data required and is transferable across pond types and aeration regimes. This framework may provide the basis for a future decision-support tool for farmers to improve and keep their aquaculture systems running. Future work should focus on improving daily coefficient selection, validation of the framework with forecasted inputs, and testing robustness to missing data, sensor error, and abrupt management changes before operational deployment. Future work should also evaluate broader spatial heterogeneity within ponds and whether spatial gradients in water temperature and salinity affect coefficient estimation and prediction accuracy.

Author Contributions

Conceptualization, J.D., K.S. and B.F.; methodology, S.M., J.D. and K.S.; software, S.M.; validation, S.M.; formal analysis, S.M.; investigation, S.M.; resources, J.D., K.S. and B.F.; data curation, S.M. and K.S.; writing—original draft preparation, S.M. and J.D.; writing—review and editing, S.M. and J.D.; visualization, S.M.; supervision, J.D.; project administration, J.D.; funding acquisition, J.D. All authors have read and agreed to the published version of the manuscript.

Funding

Funding for this project was provided by the REU Site: Multidisciplinary Approaches for Overcoming Water Resources and Sustainable Engineering Challenges in Appalachian Regions funded by the U.S. National Science Foundation under award number 2348814. This work is also funded by the 1890 Capacity Building Grants Program, project award no. 2022-38821-37340, from the U.S. Department of Agriculture’s National Institute of Food and Agriculture. This material is based upon work funded by the U.S. Department of Energy, Office of Science, Office of SBIR/STTR Programs, under Award Number(s) DE-SC0021762 to Hawaii Fish Company through a subaward to the University of Kentucky.

Data Availability Statement

The edited BASEmetab R script and Python 3.11 (64-bit) source code are available on GitHub (https://github.com/sjmartin1313/do-forecast-mass-balance, accessed on 19 May 2026). The data presented in this study are available on request from the corresponding author due to privacy restrictions.

Acknowledgments

Local weather data provided by The Kentucky Mesonet.

Conflicts of Interest

The authors declare that this study received funding from Hawaii Fish Company. The funder was not involved in the study design, collection, analysis, interpretation of data, the writing of this article or the decision to submit it for publication. This report was prepared as an account of work sponsored by an agency of the United States Government. Neither the United States Government nor any agency thereof, nor any of their employees, makes any warranty, express or implied, or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately owned rights. Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise does not necessarily constitute or imply its endorsement, recommendation, or favoring by the United States Government or any agency thereof. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States Government or any agency thereof.

Abbreviations

The following abbreviations are used in this manuscript:
DODissolved Oxygen
MLMachine Learning
PARPhotosynthetically Active Radiation
RMSERoot Mean Square Error
CVCoefficient of Variation
GFSGlobal Forecast System
CAMSCopernicus Atmosphere Monitoring Service
GLMGeneral Lake Model

References

  1. U.S. Department of Agriculture; National Agricultural Statistics Service. 2023 Census of Aquaculture. 2024. Available online: https://www.nass.usda.gov/Publications/AgCensus/2022/Online_Resources/Aquaculture/index.php (accessed on 25 November 2025).
  2. Fritsch, A.; Gilmore, P. Healing Appalachia: Sustainable Living Through Appropriate Technology; The University Press of Kentucky: Lexington, KY, USA, 2007. [Google Scholar]
  3. Boyd, C.E.; Romaire, R.P.; Johnston, E. Predicting Early Morning Dissolved Oxygen Concentrations in Channel Catfish Ponds. Trans. Am. Fish. Soc. 1978, 107, 484–492. [Google Scholar] [CrossRef]
  4. Farrell, A.P.; Richards, J.G. Chapter 11 Defining Hypoxia. In Fish Physiology; Elsevier: Amsterdam, The Netherlands, 2009; Volume 27, pp. 487–503. [Google Scholar] [CrossRef]
  5. Vaquer-Sunyer, R.; Duarte, C.M. Thresholds of hypoxia for marine biodiversity. Proc. Natl. Acad. Sci. USA 2008, 105, 15452–15457. [Google Scholar] [CrossRef] [PubMed]
  6. Ali, B.; Anushka; Mishra, A. Effects of dissolved oxygen concentration on freshwater fish: A review. Int. J. Fish. Aquat. Stud. 2022, 10, 113–127. [Google Scholar] [CrossRef]
  7. Boyd, C.E.; Torrans, E.L.; Tucker, C.S. Dissolved Oxygen and Aeration in Ictalurid Catfish Aquaculture. J. World Aquac. Soc. 2018, 49, 7–70. [Google Scholar] [CrossRef]
  8. Cheng, W.; Liu, C.-H.; Kuo, C.-M. Effects of dissolved oxygen on hemolymph parameters of freshwater giant prawn, Macrobrachium rosenbergii (de Man). Aquaculture 2003, 220, 843–856. [Google Scholar] [CrossRef]
  9. Sivarajaboopathy, R.P.; Krishnakumar, S. An efficient Corona ring based data collection scheme using wireless sensor networks with Internet of Things for aeration control in smart shrimp aquaculture. Ad Hoc Netw. 2026, 182, 104091. [Google Scholar] [CrossRef]
  10. Kumar, A.; Moulick, S.; Mal, B.C. Selection of aerators for intensive aquacultural pond. Aquac. Eng. 2013, 56, 71–78. [Google Scholar] [CrossRef]
  11. Yu, G.; Zhang, S.; Chen, X.; Li, D.; Wang, Y. Investigation on aeration efficiency and energy efficiency optimization in recirculating aquaculture coupling CFD with Euler-Euler and species transport model. J. Environ. Chem. Eng. 2024, 12, 113927. [Google Scholar] [CrossRef]
  12. Grace, M.R.; Giling, D.P.; Hladyz, S.; Caron, V.; Thompson, R.M.; Mac Nally, R. Fast processing of diel oxygen curves: Estimating stream metabolism with BASE (BAyesian Single-station Estimation). Limnol. Oceanogr. Methods 2015, 13, 103–114. [Google Scholar] [CrossRef]
  13. Culberson, S.D.; Piedrahita, R.H. Aquaculture pond ecosystem model: Temperature and dissolved oxygen prediction—Mechanism and application. Ecol. Model. 1996, 89, 231–258. [Google Scholar] [CrossRef]
  14. Xu, Z.; Xu, Y. A Deterministic Model for Predicting Hourly Dissolved Oxygen Change: Development and Application to a Shallow Eutrophic Lake. Water 2016, 8, 41. [Google Scholar] [CrossRef]
  15. Holtgrieve, G.W.; Schindler, D.E.; Branch, T.A.; A’mar, Z.T. Simultaneous quantification of aquatic ecosystem metabolism and reaeration using a Bayesian statistical model of oxygen dynamics. Limnol. Oceanogr. 2010, 55, 1047–1063. [Google Scholar] [CrossRef]
  16. Atkinson, B.L.; Grace, M.R.; Hart, B.T.; Vanderkruk, K.E.N. Sediment instability affects the rate and location of primary production and respiration in a sand-bed stream. J. N. Am. Benthol. Soc. 2008, 27, 581–592. [Google Scholar] [CrossRef]
  17. Holland, A.; McInerney, P.J.; Shackleton, M.E.; Rees, G.N.; Bond, N.R.; Silvester, E. Dissolved organic matter and metabolic dynamics in dryland lowland rivers. Spectrochim. Acta Part A Mol. Biomol. Spectrosc. 2020, 229, 117871. [Google Scholar] [CrossRef] [PubMed]
  18. Chen, M.; Ayers, J.C. Environmental drivers of stream metabolism in a middle TN headwater stream. PLoS ONE 2024, 19, e0315978. [Google Scholar] [CrossRef] [PubMed]
  19. Liu, L.; Zhao, X.; Zhou, L.; Liu, J.; Li, W.; Zhang, B.; Ling, J.; Wu, F. Comparative analysis of machine learning based dissolved oxygen predictions in the Yellow River Basin: The role of diverse environmental predictors. J. Environ. Manag. 2025, 393, 127138. [Google Scholar] [CrossRef] [PubMed]
  20. Zhao, Y.; Chen, M. Prediction of river dissolved oxygen (DO) based on multi-source data and various machine learning coupling models. PLoS ONE 2025, 20, e0319256. [Google Scholar] [CrossRef] [PubMed]
  21. Gachloo, M.; Liu, Q.; Song, Y.; Wang, G.; Zhang, S.; Hall, N. Using Machine Learning Models for Short-Term Prediction of Dissolved Oxygen in a Microtidal Estuary. Water 2024, 16, 1998. [Google Scholar] [CrossRef]
  22. Yu, X.; Shen, J.; Du, J. A Machine-Learning-Based Model for Water Quality in Coastal Waters, Taking Dissolved Oxygen and Hypoxia in Chesapeake Bay as an Example. Water Resour. Res. 2020, 56, e2020WR027227. [Google Scholar] [CrossRef]
  23. Haider, H.; Ali, W.; Haydar, S. Evaluation of various relationships of reaeration rate coefficient for modeling dissolved oxygen in a river with extreme flow variations in Pakistan. Hydrol. Process. 2013, 27, 3949–3963. [Google Scholar] [CrossRef]
  24. Zhang, L.; Ma, C.; Chen, X.; Zhang, C.; Li, Q.; Ye, X.; Tian, L. An integrated algorithm to estimate chlorophyll-a concentration in various optical waters using HY-3A CZI. ISPRS J. Photogramm. Remote Sens. 2025, 225, 402–422. [Google Scholar] [CrossRef]
  25. Santos, V.O.; Rocha, P.A.C.; Thé, J.V.G.; Gharabaghi, B. Evaluation of machine learning methods for forecasting turbidity in river networks using Sentinel-2 remote sensing data. Ecol. Inform. 2025, 90, 103313. [Google Scholar] [CrossRef]
  26. Campbell, P.C.; Jiang, W.; Moon, Z.; Zinn, S.; Tang, Y. NOAA’s Global Forecast System Data in the Cloud for Community Air Quality Modeling. Atmosphere 2023, 14, 1110. [Google Scholar] [CrossRef]
  27. European Centre for Medium-Range Weather Forecasts. Copernicus Atmosphere Monitoring Service (CAMS): Global Atmospheric Composition Forecasts; Copernicus Atmosphere Monitoring Service, European Union: Bonn, Germany, 2022; Available online: https://ads.atmosphere.copernicus.eu/datasets/cams-global-atmospheric-composition-forecasts?tab=overview (accessed on 24 November 2025).
  28. Gueymard, C.A. REST2: High-performance solar radiation model for cloudless-sky irradiance, illuminance, and photosynthetically active radiation—Validation with a benchmark dataset. Sol. Energy 2008, 82, 272–285. [Google Scholar] [CrossRef]
  29. Hipsey, M.R.; Bruce, L.C.; Boon, C.; Busch, B.; Carey, C.C.; Hamilton, D.P.; Hanson, P.C.; Read, J.S.; de Sousa, E.; Weber, M.; et al. A General Lake Model (GLM 3.0) for linking with high-frequency sensor data from the Global Lake Ecological Observatory Network (GLEON). Geosci. Model Dev. 2019, 12, 473–523. [Google Scholar] [CrossRef]
  30. Pan, D.; Zhang, Y.; Deng, Y.; Van Griensven Thé, J.; Yang, S.X.; Gharabaghi, B. Dissolved Oxygen Forecasting for Lake Erie’s Central Basin Using Hybrid Long Short-Term Memory and Gated Recurrent Unit Networks. Water 2024, 16, 707. [Google Scholar] [CrossRef]
  31. Granata, F.; Zhu, S.; Di Nunno, F. Dissolved oxygen forecasting in the Mississippi River: Advanced ensemble machine learning models. Environ. Sci. Adv. 2024, 3, 1537–1551. [Google Scholar] [CrossRef]
  32. Pant, N.; Toshniwal, D.; Gurjar, B.R. Multi-step forecasting of dissolved oxygen in River Ganga based on CEEMDAN-AdaBoost-BiLSTM-LSTM model. Sci. Rep. 2024, 14, 11199. [Google Scholar] [CrossRef] [PubMed]
  33. Lokman, A.; Ismail, W.Z.W.; Aziz, N.A.A. A Review of Water Quality Forecasting and Classification Using Machine Learning Models and Statistical Analysis. Water 2025, 17, 2243. [Google Scholar] [CrossRef]
  34. Alluhaidan, A.S.; Prabu, P.; Aziz, R.; Basheer, S. Enhanced LSTM-based AI model for accurate dissolved oxygen prediction in aquaculture systems. Smart Agric. Technol. 2025, 12, 101140. [Google Scholar] [CrossRef]
  35. Benson, B.B.; Krause, D. The concentration and isotopic fractionation of oxygen dissolved in freshwater and seawater in equilibrium with the atmosphere. Limnol. Oceanogr. 1984, 29, 620–632. [Google Scholar] [CrossRef]
  36. Giling, D.; Mac Nally, R.; Bond, N.; Grace, M. User Guide for Package ‘BASEmetab’; GitHub: San Francisco, CA, USA, 2018; Available online: https://github.com/dgiling/BASEmetab/blob/master/vignettes/BASEmetab.pdf (accessed on 6 June 2025).
  37. MathWorks. Statistics and Machine Learning Toolbox: Data Cleaner App, R2024b; MathWorks: Natick, MA, USA, 2024; Available online: https://www.mathworks.com/help/stats/data-cleaner.html (accessed on 12 February 2026).
  38. Kentucky Mesonet. Meteorological Data Provided Upon Request; Western Kentucky University: Bowling Green, KY, USA, 2025; Available online: https://www.kymesonet.org/ (accessed on 27 August 2025).
  39. Rainio, O.; Teuho, J.; Klén, R. Evaluation metrics and statistical tests for machine learning. Sci. Rep. 2024, 14, 6086, Correction in Sci. Rep. 2024, 14, 15724. https://doi.org/10.1038/s41598-024-66611-y. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Schematic of the overall architecture of the project methods.
Figure 1. Schematic of the overall architecture of the project methods.
Water 18 01618 g001
Figure 2. Mean DO trajectory output by BASEmetab compared with measured DO for a single day.
Figure 2. Mean DO trajectory output by BASEmetab compared with measured DO for a single day.
Water 18 01618 g002
Figure 3. Two comparisons of the DO prediction and measurements for a successful prediction of a DO drop at location 7S highlighting the difference between a 95% Monte Carlo interval and a ±1 mg/L operational tolerance.
Figure 3. Two comparisons of the DO prediction and measurements for a successful prediction of a DO drop at location 7S highlighting the difference between a 95% Monte Carlo interval and a ±1 mg/L operational tolerance.
Water 18 01618 g003
Figure 4. A sequence of rolling one-day DO predictions at location 4SN displaying poor model performance on day two. Water 18 01618 i001 displays model performance from 4 September 2024 18:00 to 5 September 18:00. Water 18 01618 i002 displays model performance from 5 September 18:00 to 6 September 18:00. Water 18 01618 i003 displays model performance from 6 September 18:00 to 7 September 18:00.
Figure 4. A sequence of rolling one-day DO predictions at location 4SN displaying poor model performance on day two. Water 18 01618 i001 displays model performance from 4 September 2024 18:00 to 5 September 18:00. Water 18 01618 i002 displays model performance from 5 September 18:00 to 6 September 18:00. Water 18 01618 i003 displays model performance from 6 September 18:00 to 7 September 18:00.
Water 18 01618 g004
Figure 5. Correlation plots between metabolism and reaeration coefficient mismatch and minimum DO difference (mg/L) at location FSA. Dots represent each 24-h prediction period and the dashed lines show the least-squared trendlines for each coefficient relationship.
Figure 5. Correlation plots between metabolism and reaeration coefficient mismatch and minimum DO difference (mg/L) at location FSA. Dots represent each 24-h prediction period and the dashed lines show the least-squared trendlines for each coefficient relationship.
Water 18 01618 g005aWater 18 01618 g005b
Table 1. A summary of miniDOT placement, data collection periods, days of valid data in each pond, and a reference code for each location.
Table 1. A summary of miniDOT placement, data collection periods, days of valid data in each pond, and a reference code for each location.
PondMiniDOT PlacementData RangeDays UsedReference Code
FarmSurface away from aerators15 March 2024–20 March 2025370FSA
FarmSurface near aerators15 March 2024–5 June 2025371FSN
FarmSubmerged away from aerators15 March 2024–5 June 2025307FDA
FarmSubmerged near aerators15 March 2024–5 June 2025303FDN
2Surface near diffuser16 March 2024–3 June 2024582SN
2Surface away from diffuser16 March 2024–3 June 2024622SA
4Surface near diffuser16 March 2024–1 June 2024764SN1
4Surface away from diffuser16 March 2024–1 June 2024774SA
4Surface near diffuser8 June 2024–5 June 20253304SN2
4Submerged near diffuser8 June 2024–5 June 20253054DN
7Surface away from aerator7 June 2024–21 April 20253067S
Table 2. Mean and standard deviations of metabolism and reaeration coefficients output by BASEmetab used to create the normal distributions of metabolism and reaeration coefficients used in the DO forecasting seen in Figure 3.
Table 2. Mean and standard deviations of metabolism and reaeration coefficients output by BASEmetab used to create the normal distributions of metabolism and reaeration coefficients used in the DO forecasting seen in Figure 3.
Day Output by BASEmetabDay Predicted K A R
MeanStd. Dev.MeanStd. Dev.MeanStd. Dev.
010.1480.1303.20 × 10−41.55 × 10−50.1374.78 × 10−3
120.0460.0463.31 × 10−48.73 × 10−60.1283.34 × 10−3
230.1620.0933.78 × 10−46.24 × 10−60.2013.73 × 10−3
3n/a3.5700.1454.24 × 10−46.10 × 10−60.2313.70 × 10−3
Table 3. Model performance on days with minimum measured DO less than 6 mg/L.
Table 3. Model performance on days with minimum measured DO less than 6 mg/L.
LocationTotal Days with DO Less Than 6 mg/LMinimum DO Prediction Within 1 mg/LUnderpredicted DO by More Than 1 mg/L *% Safe **
FSA14113100.00
FSN4026270.00
FDA4328474.42
FDN4626363.04
2SN4614234.78
2SA4932167.35
4SN14635280.43
4SA4932371.43
4SN22201382071.82
4DN2331171054.51
7S158140591.77
Notes: * When the predicted minimum DO for a day is less than the measured DO by more than 1 mg/L. ** Determined by days out of total days under 6 mg/L where predicted minimum DO was within 1 mg/L of or less than measured DO.
Table 4. Results of multiple linear regressions on days with greater than 1 mg/L difference between predicted and measured minimum DO.
Table 4. Results of multiple linear regressions on days with greater than 1 mg/L difference between predicted and measured minimum DO.
LocationPredictor p-Values
K Mismatch A Mismatch R Mismatch
FSA5.317 × 10−50.03326.415 × 10−12
FSN1.484 × 10−40.53031.096 × 10−7
FDA2.904 × 10−56.898 × 10−61.645 × 10−4
FDN4.144 × 10−40.00600.0011
2SN9.527 × 10−50.56148.037 × 10−8
2SA0.12120.93123.179 × 10−5
4SN0.18450.53230.0232
4SA0.58110.46731.953 × 10−4
4DN0.12900.14450.1231
7S0.16340.87870.0180
Table 5. Model performance by RMSE.
Table 5. Model performance by RMSE.
LocationAverage Daily RMSE (mg/L)
Average ± Std. Dev.
% of Days with RMSE Less Than 1 mg/L
FSA0.758 ± 0.56180.27
FSN0.921 ± 0.62456.33
FDA1.225 ± 1.05856.03
FDN1.014 ± 0.81949.50
2SN2.744 ± 2.09918.97
2SA1.809 ± 1.42141.19
4SN11.648 ± 1.35731.58
4SA1.699 ± 1.37729.87
4SN21.353 ± 1.11446.97
4DN1.355 ± 1.05044.59
7S1.279 ± 1.77363.73
Table 6. Results of multiple linear regressions on days with RMSE greater than 1 mg/L.
Table 6. Results of multiple linear regressions on days with RMSE greater than 1 mg/L.
LocationPredictor p-Values
K Mismatch A Mismatch R Mismatch
FSA0.14338.2849 × 10−40.1642
FSN0.12460.02790.7956
FDA0.65056.3400 × 10−60.4079
FDN0.28215.9875 × 10−130.0178
2SN0.00391.2816 × 10−78.2046 × 10−5
2SA0.67383.4777 × 10−40.3776
4SN0.07623.5293 × 10−50.0088
4SA0.55413.0881 × 10−60.2221
4DN0.90820.53950.3040
7S0.12779.1012 × 10−70.0234
Table 7. BASEmetab convergence and coefficient uncertainty by location.
Table 7. BASEmetab convergence and coefficient uncertainty by location.
LocationConvergence (%)K CV (%)A CV (%)R CV (%)
FSA98.8894.382.923.94
FSN98.6612.222.748.08
FDA98.6696.019.819.14
FDN98.4485.028.3711.35
2SN97.3011.943.034.83
2SA100.007.291.902.33
4SN92.1711.063.134.60
4SA98.757.861.892.13
4DN80.7233.8616.3519.29
7S97.734.941.942.50
Note: Coefficient uncertainty is reported as the median daily coefficient of variation across valid estimation days.
Table 8. Daily average and std. dev. RMSE by season.
Table 8. Daily average and std. dev. RMSE by season.
LocationSpringSummerFallWinter
FSA0.739 ± 0.616, n = 970.703 ± 0.325, n = 920.654 ± 0.461, n = 910.941 ± 0.726, n = 90
FSN1.092 ± 0.787, n = 1690.948 ± 0.518, n = 660.966 ± 0.344, n = 461.074 ± 0.780, n = 90
FDA1.374 ± 1.079, n = 1291.195 ± 0.558, n = 360.807 ± 0.523, n = 521.265 ± 1.328, n = 90
FDN1.387 ± 1.123, n = 1631.323 ± 0.399, n = 271.034 ± 0.634, n = 240.907 ± 0.688, n = 89
2SN1.812 ± 1.456, n = 59---
2SA2.717 ± 2.107, n = 57---
4SN1.465 ± 1.242, n = 1632.048 ± 0.915, n = 821.592 ± 1.312, n = 720.590 ± 0.390, n = 89
4SA1.699 ± 1.377, n = 77---
4DN1.021 ± 1.316, n = 732.068 ± 0.821, n = 841.326 ± 0.839, n = 890.797 ± 0.636, n = 59
7S2.119 ± 2.044, n = 510.612 ± 0.355, n = 850.581 ± 0.473, n = 802.055 ± 2.497, n = 90
Table 9. % Safe by season.
Table 9. % Safe by season.
LocationSpringSummerFallWinter
FSA100.00, n = 6100.00, n = 6100.00, n = 2-
FSN75.00, n = 2883.33, n = 633.33, n = 6-
FDA71.87, n = 3287.50, n = 866.67, n = 3-
FDN62.50, n = 4066.67, n = 6--
2SN67.39, n = 46---
2SA35.56, n = 45---
4SN73.04, n = 11571.83, n = 7176.67, n = 6070.00, n = 20
4SA71.43, n = 49---
4DN74.54, n = 5540.54, n = 7453.09, n = 8156.52, n = 20
7S78.57, n = 2897.53, n = 8195.45, n = 2285.18, n = 27
Table 10. Model sensitivity to forecast-input uncertainty at location 7S.
Table 10. Model sensitivity to forecast-input uncertainty at location 7S.
Perturbation ScenarioDaily RMSE (mg/L)
Average ± Std. Dev.
Nominal case1.279 ± 1.773
Temperature + 2 °C1.329 ± 1.755
Temperature − 2 °C1.320 ± 1.793
PAR + 10%1.363 ± 1.872
PAR − 10%1.244 ± 1.665
Pressure + 0.025 atm1.298 ± 1.785
Pressure − 0.025 atm1.306 ± 1.763
Salinity + 10%1.287 ± 1.777
Salinity − 10%1.287 ± 1.777
All inputs +1.362 ± 1.866
All inputs −1.237 ± 1.673
Table 11. Comparison of ML models against this paper’s model.
Table 11. Comparison of ML models against this paper’s model.
ModelTime PredictedRequired Dataset LengthRMSE (mg/L)
Pan et al. [31]12 h4 months0.5092
Granata et al. [32]24 h8 years0.13
Pant et al. [33]3 h4.5 years0.145
This paper (7S Location)24 h1 day1.279
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Martin, S.; Dvorak, J.; Semmens, K.; Ford, B. Short-Term Dissolved Oxygen Forecasting in Aquaculture Systems Using a Process-Based Mass-Balance Model. Water 2026, 18, 1618. https://doi.org/10.3390/w18131618

AMA Style

Martin S, Dvorak J, Semmens K, Ford B. Short-Term Dissolved Oxygen Forecasting in Aquaculture Systems Using a Process-Based Mass-Balance Model. Water. 2026; 18(13):1618. https://doi.org/10.3390/w18131618

Chicago/Turabian Style

Martin, Sonny, Joseph Dvorak, Ken Semmens, and Bill Ford. 2026. "Short-Term Dissolved Oxygen Forecasting in Aquaculture Systems Using a Process-Based Mass-Balance Model" Water 18, no. 13: 1618. https://doi.org/10.3390/w18131618

APA Style

Martin, S., Dvorak, J., Semmens, K., & Ford, B. (2026). Short-Term Dissolved Oxygen Forecasting in Aquaculture Systems Using a Process-Based Mass-Balance Model. Water, 18(13), 1618. https://doi.org/10.3390/w18131618

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop