A New Approach for Estimating Dissolved Oxygen Based on a High-Accuracy Surface Modeling Method

Dissolved oxygen (DO) is a direct indicator of water pollution and an important water quality parameter that affects aquatic life. Based on the fundamental theorem of surfaces in differential geometry, the present study proposes a new modeling approach to estimate DO concentrations with high accuracy by assessing the spatial correlation and heterogeneity of DO with respect to explanatory variables. Specifically, a regularization penalty term is integrated into the high-accuracy surface modeling (HASM) method by applying geographically weighted regression (GWR) with some covariates. A modified version of HASM, namely HASM_MOD, is illustrated through a case study of Poyang Lake, China, by comparing the results of HASM, a support vector machine (SVM), and cokriging. The results indicate that HASM_MOD yields the best performance, with a mean absolute error (MAE) that is 38%, 45%, and 42% lower than those of HASM, the SVM, and cokriging, respectively, by using the cross-validation method. The introduction of a regularization penalty term by using GWR with respect to covariates can effectively improve the quality of the DO estimates. The results also suggest that HASM_MOD is able to effectively estimate nonlinear and nonstationary time series and outperforms three other methods using cross-validation, with a root-mean-square error (RMSE) of 0.20 mg/L and R2 of 0.93 for the two study sites (Sanshan and Outlet_A stations). The proposed method, HASM_MOD, provides a new way to estimate the DO concentration with high accuracy.


Introduction
As a health indicator for water bodies, the dissolved oxygen (DO) concentration plays important roles in maintaining microbial diversity and various ecosystem and biogeochemical processes in lake ecosystems [1][2][3]. Sufficient levels of DO in water are essential for the survival of various aquatic organisms, such as algae, zooplankton, and aquatic plants. Generally, the DO concentration in a healthy water body ranges from 8 to 12 mg/L, and concentrations below 8 mg/L can adversely affect the survival of aquatic species [4,5]. Studies have shown that global climate change and extensive human activities can rapidly reduce DO concentrations, leading to increased hypoxia, especially in coastal and estuarine environments [6,7]. According to the National Oceanic and Atmospheric Administration (NOAA), any persistent DO levels below 5.0 mg/L are considered unhealthy, and levels below 2 mg/L are extremely hazardous to marine ecosystems [6]. Reliable estimates of DO concentrations enable us to identify future contaminant problems and provide a basis for taking effective countermeasures to prevent water pollution.
Although in situ observations with sensors provide highly accurate measurements of DO concentrations, this approach is intensive and only gives point measurements. Studies have shown that no single identified sensor can be applied with high confidence to optimally measure DO concentrations [8]. An improved understanding of water quality can be obtained by integrating remote sensing technology, site observations, and numerical models. Recently, various approaches have been proposed for the estimation of DO concentrations. The main prediction methods include various deterministic hydrological models, statistical models, and machine learning methods [9][10][11][12][13]. Hydrological models can provide physical DO concentration estimates but are characterized by large computational capacities and high uncertainty associated with the determination of parameters and relevant physical processes [14]. The statistical method is a simple approach used to generate spatial DO concentration fields. Harvey et al. [15] established a regression model to predict the monthly water temperature and DO level. Stajkowski used an autoregressive integrated moving average (ARIMA) model to estimate water quality parameters and demonstrated its capability in DO concentration prediction [16]. In many statistical methods, the geostatistical method known as kriging has been widely applied, and cokriging is an extension of kriging used when estimating one variable from other variables [17,18]. The literature shows that cokriging has been successfully used for the prediction and estimation of groundwater quality parameters [19]. However, statistical methods generally depend on datasets with linear relationships and fail to describe nonlinear characteristics. Machine learning methods can be applied to address nonlinear problems. Support vector machines (SVMs) and artificial neural networks (ANNs) are typical models that are commonly adopted by researchers in the prediction of water quality and have displayed good performance in modeling DO concentrations [20][21][22][23][24]. ANNs are well-suited methods with self-adaptability, self-organization, and error tolerance for nonlinear simulation, but these methods have limitations due to their complex structures that require a great amount of training data. SVMs are new machine-learning technologies based on statistical theory and derived from instruction risk minimization that can enhance the generalization ability and minimize the upper limit of generalization error [25]. However, as indicated in other studies [26,27], machine learning methods are usually applied to establish global numeric relationships among datasets and ignore geographical relations; notably, environmental parameters are often spatially correlated, and the relationships among variables varying in space are important.
In recent years, a partial differential equation (PDE)-based approach, the high-accuracy surface modeling (HASM) method, was proposed based on the fundamental theorem of surfaces in differential geometry by using in situ measurements [28]. This method is effective in simulating elevations, climate variables, soil properties, ecological diversity, and other ecological variables. Researchers have found that HASM generally performs better than other classic interpolation methods, such as kriging, inverse distance weighting (IDW), and the spline method [29][30][31][32][33][34]. The good performance of HASM is due to its complete theoretical basis and the constraint conditions established using sampling information. One of the advantages of HASM is its extensibility, which allows it to simulate different environmental variables based on the characteristics of the variables and the corresponding a priori knowledge. The performance of HASM in estimating the DO concentration, however, has not been investigated until now. This approach is expected to provide an effective way to improve the accuracy of the spatial estimation of DO levels.
The objective of this study is to propose a new approach for estimating the spatial distribution of DO concentrations in Poyang Lake, China. The new method, termed HASM_MOD, was developed based on HASM and the characteristics of DO by modifying the main HASM equation. The performance of HASM_MOD was investigated by comparing it with HASM, cokriging, and SVM using a cross-validation method. This is the first study in the literature that attempts to apply HASM and HASM_MOD to simulate DO concentrations. The rest of this study is organized as follows: In Section 2, the materials are explained, including the details of the study area and datasets used in the analysis. The methods are presented in Section 3, and the results are shown in Section 4. Discussions and conclusions are given in Sections 5 and 6, respectively.

Study Area and Data
Poyang Lake, the largest freshwater lake in China, is located downstream of the Yangtze River in Jiangxi Province and freely connected to the longest river in China, the Yangtze River ( Figure 1). The lake has a southeast monsoonal climate with an annual mean air temperature of 16-18 • C and an average annual precipitation of 1340-1780 mm. The elevation of the Poyang Lake area gradually increases from north to south and west to east. Poyang Lake is considered a vital resource not only for the local population but also for the Yangtze valley and global ecology. The lake routinely fluctuates in volume between the winter and summer seasons. The surface area of Poyang Lake changes greatly with season, covering more than 4000 km 2 in the wet season (June, July, August, and September) and a much smaller area in the comparatively dry season (lasting from October to March in the following year) [35]. The lake water level is mainly regulated by the Yangtze River and the "five rivers", namely the Gan River, Fu River, Xin River, Rao River, and Xiu River, with a range of water levels spanning more than 10 m. The water level rises first due to the inflows from the five rivers from April to June and then due to the backflow of the Yangtze River from July to September; the water levels then gradually fall after October, with the decreasing trend lasting for approximately half a year [36]. Influenced by the topographic characteristics of Poyang Lake and the variations in the runoff of the five rivers, the surface area and water level of Poyang Lake vary greatly with the season. In addition, it has been reported that polluted river sections are mainly located in the Gan River, Fu River, Rao River, and Xiu River. This study performed an analysis based on the observations obtained at stations located in Poyang Lake and the surrounding inflowing rivers. The data used in this work consist of daily measurements of DO (mg/L), pH, water temperature (TE, Co), and electrical conductivity (EC, µS/cm) from 30 monitoring stations equipped with different sensors. Fourteen stations were located in the five main inflowing rivers, and the corresponding data were included in the calculation process to improve the estimations. DO concentrations were measured using a Hatch luminescent DO sensor and were transferred via SODA (SODA open data autonomy) telemetry to a central database. The water quality parameters, which were collected by monitoring centers, were quality controlled and then sent to the server through GPRS.
Previous studies predicted DO levels using several environmental and meteorological variables as inputs [16,24,37,38]. However, the conclusions were inconsistent due to variations in the study area and spatiotemporal scales. For example, Heddam [39] found that high accuracy can be obtained for DO estimations by using only water temperature (WT) as the input variable. Researchers have also used other variables, such as pH, chlorophyll-a (Chl-a), and humidity. Rankovic et al. [20] indicated that WT and pH have the greatest effects on the DO concentration. Najah et al. [40] demonstrated that water pH, as an input variable, had a very limited influence on the performance of DO prediction models in Johor, Malaysia. Li et al. [41] showed that changes in DO are mainly affected by meteorological variables, such as atmospheric temperature (AT), atmospheric pressure (AP), and precipitation (Pre). Researchers also found that among the considered variables, WT has the highest correlation with DO [8]. In this study, the correlations among environmental factors and DO were estimated by using Spearman's rank method. The candidate factors included WT, pH, Chl-a, EC, AT, AP, and precipitation (Pre). The Chl-a concentration was derived based on Landsat-8 OLI using the following formula [42]: The surface water temperature can be retrieved with a satellite-based remote sensor that detects thermal radiation (3-5 and 8-14 µm wavebands) emitted from the upper 0.1 mm of the water surface [43,44]. Since the water volume of Poyang Lake fluctuates seasonally, thermal remote sensing of WTs can be applied to provide the boundary of the water body in different months. The satellite-derived lake water surface temperatures used in this study were provided by the European Space Agency (ESA) Lakes Climate Change Initiative (Lakes-cci) project [45]. Figure 2 displays the water body boundary in the summer months (taking July and August as two examples).

Figure 2.
The Poyang Lake boundary obtained using satellite-derived lake water surface temperatures in July and August.

HASM
According to the theory of the differential geometry of surfaces, a surface can be uniquely determined by its first and second fundamental coefficients [46,47]. The first fundamental coefficients of a surface, z = f (x, y), are expressed as and they reflect geodesic curvature, curve length, and other intrinsic geometric information.
The second fundamental coefficients characterize the local structures of the surface and are expressed as [48,49] , and N = f yy Based on the theorem of surfaces, a surface can be obtained by solving the Gaussian equation set based on the condition that the first and second fundamental coefficients satisfy this equation set [47]. The assumption of HASM is that the spatial distribution of the predictor is deemed a surface that can be obtained by solving Gaussian equations. Therefore, the main equations of HASM are the following Gaussian equations: where f x , f y , f xx , and f yy are the first and second partial derivatives of the graph z = f (x, y) with respect to the x and y directions, respectively. Let x i , y j 0 ≤ i ≤ I + 1, 0 ≤ j ≤ J + 1 be the calculation grids and h be the grid spacing. The finite discrete schemes of f x , f xx , f y and f yy can be given as The first partial derivatives f x and f y represent the variations in the predictor in the x and y directions, respectively, and the second partial derivatives f xx and f yy denote the slope and direction of the variations in the predictor in the x and y directions, respectively. f xy is the mixed partial derivative and represents the cross-slope of the change in the x and y directions.
The Christoffel symbols (Γ 1 11 , Γ 2 11 , Γ 1 22 , Γ 2 22 ) are shorthand notations for various functions associated with second derivatives [49] and here depend only on the first fundamental coefficients and their derivatives: The differential Equation (1) can be converted into the following finite difference equations by applying Taylor expansions with finite difference schemes: Constraint conditions are added to Gaussian Equation (2) to guarantee that the simulated value at the l sampled location (x i, y j ) in the calculation domain is equal to or approximates the observation f i,j in the corresponding grid. Therefore, the mathematical formula of HASM is given by where each element of the vector z denotes the estimated value of the grid; the constraint equation Sz = k indicates that the predictor estimation is equal to the observation at each station location. If there are m stations in the computational region, the matrix S can be given as where M = I + 2 and N = J + 2 are the grid numbers in the x and y directions, respectively. The number of rows in the matrix S is equal to the station number m. If the lth station has a DO concentration of f i,j and is located in the ith row and jth column in the computational By applying the Lagrange multiplier method, the HASM Equation (3) can be written as where A = A T A + B T B + C T C + λ 2 S T S is a symmetric and positive definite matrix and b = A T d + B T q + C T p + λ 2 S T k. The Lagrange parameter λ is the weight of the observations. A small value of λ is given in areas with large DO variations. By setting the initial value z (0) and constructing the constraint equation Sz = k using station observations, the surface modeling of climate variables can be performed by HASM.

HASM_MOD
A disadvantage of HASM is the lack of consideration of the background information associated with predictors while only considering the station observations and their spatial autocorrelation. In this section, a HASM-based method was developed to estimate the spatial distribution of DO by introducing a drift term, which integrates the explanatory variables by using geographically weighted regression (GWR) in the model. The main equation of HASM_MOD is as follows: From a mathematics perspective, the term z − z 2 can be seen as a regularization penalty term for Az − b 2 and can be referred to as L2 regularization. However, L1 penalty methods often outperform L2 penalty methods, especially when irrelevant features are present in z [50]. Therefore, we use L1 regularization in this study. Replacing the drift term in (5) with the L1 norm yields the following expression: where z = GWR(DO, x 1 , x 2 , · · · , x n ); x 1 , x 2 , · · · , x n represent the auxiliary variables that are correlated with DO levels, such as water temperature and pH; z is an estimate of DO; and β is a regularization parameter. The split Bregman iteration was used to solve the optimization problem (6) [51,52]. The solution process is given as follows: For H(z) = Az − b 2 and Φ(z) = z − z , formula (6) can be rewritten as Additionally, with d = Φ(z), problem (7) can be converted to Then, this problem can be modified to obtain The optimization Equation (9) can be solved by using the following alternating iterative method: where z can be resolved by using the Gauss-Seidel algorithm and d can be resolved by where shrink(x, y) = x x · max( x − y, 0).
The optimal value of α can be obtained through sensitivity experiments, and β can be optimized by using the L-curve method [53]. The z result is the final estimate of the DO concentration.

Support Vector Machine
The SVM method, which is developed based on statistical learning theory, has been successfully applied to classification and regression problems. The basic concept behind SVM is to map the original datasets to higher-dimensional features of space and construct an optimal separating plane (SP), from which the distance to all the data points is minimal [54]. For training data set {(x i , y i ), i = 1, · · · , n},x ∈ R m ,y ∈ R, where n is the total number of data patterns, x is the input vector of m components, and y is the corresponding output value, the SVM regression function can be expressed as follows: where θ is the weight vector, e is the bias, and φ(x) indicates the nonlinear transfer function. The parameters θ and e, which define the location of SP, can be determined by minimizing the following regularized risk function: where β is the regularization parameter, ξ i and ξ i * are slack variables, and problem (12) is solved in a dual form using the Lagrangian multipliers.
is the kernel function. By imposing the Karush-Kuhn-Tucker (KKT) optimality condition, θ * is obtained, that is, Finally, the SVM is expressed as follows: The radial basis function (RBF) was adopted as the kernel function of SVM in this study [55].

Cokriging Method
Cokriging is the multivariate equivalent to kriging. The general form of the kriging equation is In order to achieve unbiased estimations in kriging, the following set of equations should be solved: where z * (x p ) is the estimated value at location x p , z(x i ) is the known value at location x i , λ i is the weight associated with the data, µ is the Lagrange coefficient, and γ(x i , x j ) is the value of variogram corresponding to a vector with origin in x i and extremity in x j .
By using multiple datasets, cokriging is a very flexible interpolation method, allowing the user to investigate graphs of cross-correlation and autocorrelation [17]. The general equations of cokriging estimator are [18] where u and v are the primary and covariate variables, respectively. In the cokriging method, the u and v variables are cross-correlated, and the covariate contributes to the estimation of the primary variable. For cokriging analysis, the cross variogram should be determined beforehand. Provided that there are points where both u and v have been measured, the semivariogram can be estimated by [18] The cokriging was implemented by using the "Geostatistical Analyst" tool in ArcGIS software, which can automatically optimize the parameters based on the input datasets.

Performance Assessment of the Methods
In this study, the performance of HASM_MOD was assessed by using a cross-validation method, namely the leave-one-out cross-validation, in which each observation was considered as the validation set and the remaining observations were considered as the training set [56]. Three commonly used evaluation indicators, namely the mean absolute error (MAE), root-mean-square error (RMSE), and coefficient of determination (R 2 ), were used to compare the performance of different methods. MAE can accurately reflect the actual simulation error, as it can measure the difference between simulations and observations. The RMSE indicates the fit of the method to the observed data. A high value indicates that large deviations exist between the fitted values and the observations. R 2 reflects how well the model fits the observed data and typically gives the percentage of the variation in a variable that can be explained by the model.
and N is the number of test points.

Results
The relationships between the candidate factors and the DO level were investigated by using Spearman's rank method with a significance level of 1%. In this study, we first took July and August in the wet season as examples. In July, the candidate inputs, namely WT, PH, Chl-a, and Pre, were significantly correlated with DO, and the corresponding Spearman rho values were −0.58, 0.45, 0.52, and 0.83, respectively. In August, the candidate factors WT, Chl-a, AP, and Pre were strongly correlated with DO, and the corresponding Spearman rho values were −0.64, 0.47, 0.57, and 0.52, respectively. Finally, according to the results of Spearman's rank method among the candidate factors, the explanatory variables related to the DO concentration in July included WT, PH, Chl-a, and Pre, and the most relevant environmental variables for DO in August included WT, Chl-a, AP, and Pre.
A comparison of the simulated and observed values based on cross-validation is shown in Table 1; notably, HASM_MOD performs the best in July and August. The MAEs of the prediction results of HASM_MOD, HASM, the SVM, and cokriging are 0.14, 0.22, 0.27 and 0.39, respectively, in July, and 0.22, 0.30, 0.41, and 0.30, respectively, in August. Thus, according to the MAE, HASM_MOD performs best, followed by HASM in July; however, in August, cokriging performs better than HASM according to the MAE and RMSE. The RMSE values for HASM_MOD were 28%, 46%, and 42% lower than those of HASM, the SVM, and cokriging in July and August on average. From the fitting effect (R 2 ), HASM_MOD gives the best results, followed by HASM, while the performance of cokriging and the SVM varies in different months.  Figure 3 displays the distribution of DO in Poyang Lake in July based on the HASM_MOD, HASM, cokriging, and SVM methods. The HASM results display localized patterns with large DO concentrations surrounded by low values in the lake. The HASM_MOD, cokriging, and SVM results exhibit similar patterns, with large values found in the north and low values in the southeast. Although HASM_MOD, cokriging, and the SVM yield similar patterns, large differences exist in some local areas, such as in the middle of the lake. Compared with the observations obtained from 18 stations located in the lake and along the surrounding boundary, the results of HASM_MOD were better than those of cokriging and the SVM (Figure 3a,b,d); cokriging and the SVM tended to underestimate the DO concentrations, especially in the southern part of the lake. The HASM results seem to generally fit the actual observations, but there are some oscillations, especially near the station locations; these variations may be due to the equation systems used in HASM. Furthermore, with data from 11 stations located in the lake, the results of different methods were validated through cross-validation. The scatter correlation plots of the observed and simulated values in July (Figure 4) also suggest that HASM_MOD estimated DO concentrations reliably, with an R 2 of 0.97; this value was 7%, 13%, and 17% higher than those for HASM, the SVM, and cokriging, respectively.
The performance of the methods in August is displayed in Figure 5. Obvious differences existed among the spatial patterns of DO concentrations resulting from different methods. Compared with the 18 station observations, the results of HASM_MOD were generally best. Cokriging tended to underestimate the actual values in some local areas, and the SVM exhibited overestimations in most parts of the lake (Figure 5a,b). Compared with HASM, HASM_MOD performed better at station locations (Figure 5c,d). Figure 6 illustrates the relationship between the observations of 10 stations located in the lake and the corresponding estimated values. The SVM performed the worst, with an R 2 of 0.43, and the HASM_MOD results displayed the best agreement with site observations (Figure 6d), followed by cokriging, with an R 2 of 0.83.
We also compared the accuracy of the estimated values and real observations at two stations from January 2015 to December 2017. The sites selected were the Sanshan station located in the middle of the lake and the Outlet_A station located at the outlet of Poyang Lake. Figure 7 shows the simulated DO concentrations at Sanshan station by using the cross-validation method. The predictions obtained from HASM_MOD are closer to the observations than the estimates of the other three methods and have lower prediction errors ( Table 2). Large biases between the simulated and observed values were found for HASM and the SVM (Figure 7), with RMSE values of 0.37 and 0.36, respectively. HASM_MOD performed better than cokriging, the SVM, and HASM, with MAE reductions of 25%, 42%, and 38%, respectively. The SVM produced the worst results according to the MAE, and HASM yielded the worst results in terms of the RMSE and R 2 , indicating that HASM tends to produce outliers.    (Table 2). According to the MAE, the accuracy of HASM_MOD was 37%, 48%, and 59% higher than the accuracies of HASM, the SVM, and cokriging, respectively.

Discussion
To obtain highly accurate estimates of DO in the lake, a model based on HASM was proposed in this study. Previous studies have indicated that a PDE-based method could be effective in reducing the uncertainty in environmental variable simulations [28,57]. A high-accuracy surface modeling (HASM) method in terms of the differential geometry of surfaces was recently proposed, and studies have demonstrated that HASM yields good performance in simulating some environmental variables [29][30][31][32][33]. To obtain improved estimates of DO concentrations, HASM was extended in this study.
The results show that HASM produced lower MAE values and higher RMSE and R 2 values than SVM and cokriging, indicating that some outliers may exist in the HASM results. The constraint equations in HASM based on station observations result in low MAE values; therefore, the HASM-simulated values are approximately equal to the real measurements at station locations. Furthermore, due to the restrictions of the constraint equations and equation (2), which consider the spatial autocorrelation of DO by using a finite difference scheme, the spatial patterns of DO produced by HASM oscillate, with generally accurate simulated values near station locations. The contributions of these biases of HASM mainly lie in the input datasets and the method of solving the differential equations (1). Although HASM takes into account the spatial autocorrelation of DO by using finite difference schemes, it ignores the correlation between the DO and the related explanatory variables. To eliminate the oscillation phenomenon and obtain high-accuracy DO estimates, a regularization penalty term obtained by using the local regression method together with some auxiliary variables related to DO was introduced in HASM; the resulting model was called HASM_MOD. The local regression method was implemented by using GWR, which enables the relationship between the DO concentration and the corresponding explanatory variables to vary by region. HASM_MOD integrates station observations by using HASM and the explanatory variables using GWR and can produce improved results. The results indicate that HASM_MOD is a reliable method for DO concentration prediction. The accuracy of the HASM_MOD results was not influenced by the characteristics of the time series. For the nonstationary time series of monthly mean DO concentrations at the Sanshan and Outlet_A stations, HASM_MOD yields satisfying results, whereas HASM, the SVM, and cokriging generate unreliable results, as demonstrated by the large RMSE values. The errors of HASM_MOD were reduced due to the regularization penalty term, which takes into account the spatial heterogeneity by integrating the explanatory variables using GWR. Although some researchers showed that SVMs can well predict the dissolved oxygen in aquaculture [58], the performance of SVMs was unsatisfactory in estimating the DO concentration in Poyang Lake. This may be due to the non-optimum parameters of SVMs and the kernel function used in this study, and studies have indicated that there is no effective method to obtain the optimal parameter combination of SVMs accurately [38]. The accuracy of cokriging was also not satisfactory according to the cross-validation process, which may be due to its linear nature and the failure to take into account the spatial nonstationarity of DO concentration. Compared with that of the other three methods, the performance of HASM_MOD was not influenced by the spatial distribution of the DO concentration. The good performance of HASM_MOD may be due to the combined effect of the constraint equations in HASM and the use of auxiliary variables in GWR, which integrates the spatial autocorrelation of DO concentrations; the characteristics of spatial variation of the DO concentrations, such as the direction and slope of the variation; and the local correlation between the DO concentrations and the related explanatory variables. The results indicate that the accuracy of HASM_MOD was improved by 38%, 45%, and 42% compared with the accuracies of HASM, the SVM, and cokriging, respectively.
With the correct selection of explanatory variables, it is possible to accurately estimate the DO concentration in Poyang Lake by using HASM_MOD. The explanatory variables vary with time scale, and in this study, we used Spearman's rank correlation test to select the auxiliary variables related to DO at a significance level of 1%. One of the obstacles is the difficulty in finding the appropriate spatial and temporal scientific datasets for target areas. Further studies can be conducted by introducing other explanatory variables and applying remote sensing techniques. HASM_MOD is expected to yield good performance in other areas at hourly scales, which will be investigated further in the future.

Conclusions
In this research, a model was built to yield highly accurate DO estimates by integrating station observations and explanatory variables using HASM and GWR. The developed model comprehensively considers the spatial autocorrelation of DO and correlations with other environmental variables based on station observation constraints in terms of the fundamental theorem of surfaces. By applying a cross-validation method, HASM_MOD improved the accuracy of prediction, with MAE reductions of 38%, 45%, and 42% compared with HASM, SVM, and cokriging, respectively. The proposed method provides a new way to estimate DO concentrations with high accuracy, and this method could be successfully applied to estimate other water quality parameters. Further work will consider additional explanatory variables and remote sensing techniques and focus on improving predictions of DO levels at the hourly scale.