1. Introduction
Groundwater is an important water resource to support lives and maintain agricultural and economical activities in semi-arid and arid regions and thus is vital for socioecological sustainability [
1]. However, the groundwater storage in these regions is experiencing an evident depletion due to climate change and intense human activities, further threatening the stability and security of the ecosystem [
2]. Thus, an accurate and reliable groundwater resource assessment is urgently needed. The groundwater level (GWL) is an essential parameter used to quantify groundwater resources [
3]. Accurate GWL prediction helps to provide policy makers with a scientific insight into efficient water resource planning and management [
4], which is particularly significant for arid and semi-arid regions with deficient water resources.
Physical models (e.g., Visual MODFLOW, FEFLOW, and TOUGH) have long been used in GWL simulation and prediction [
5]. The merit of these models is that they provide a robust and detailed understanding of the complex groundwater system [
6]. However, the requirement of diverse hydrogeological parameters, primarily as the initial and boundary conditions for the partial and ordinary differential equations, is difficult work in areas with complicated underlying conditions or scarce hydrogeological information [
7]. Furthermore, these models are difficult to apply on regional scales due to the scarcity of datasets required for implementation as well as the amount of money and time required to gain them, and their results are occasionally unreliable [
8,
9]. Compared with the physical models, machine learning models have the ability to explore the complex mathematical relationship between the GWL and the predictors without specific hydrogeological parameters [
10]. Moreover, machine learning models are capable of dealing with uncertainty and reducing complexity [
11]. In recent decades, machine learning models such as the artificial neural network [
12], support vector machine (SVM) [
13], random forest (RF) [
14], extreme learning machine (ELM) [
15], and adaptive neuro-fuzzy inference system [
16] have been widely used to predict GWLs. A comprehensive and detailed review of the application of machine learning models in GWL prediction can be found in Rajaee et al. [
2] and Hai et al. [
17].
In real-world scenarios, GWLs are affected by the interaction among numerous factors such as meteorological conditions (e.g., precipitation and temperature), the underlying surface (e.g., land use), and hydrogeological conditions (e.g., the aquifer) [
18]. These factors thus can be regarded as inputs for machine-learning-based GWL prediction [
2,
17]. However, incomplete in situ measurements of climate and hydrogeological input data hinder reliable prediction results by machine learning models due to the limited spatiotemporal availability of the in situ data [
19]. Remote sensing satellites (e.g., the Gravity Recovery and Climate Experiment, GRACE) and satellite-based assimilation technology (e.g., the Global Land Evaporation Amsterdam Model, GLEAM; and the Global Land Data Assimilation System, GLDAS) provide innovative insights for hydrological study [
3]. For example, Sun et al. [
20] reported that the GRACE CSR data were effective in evaluating drought features in the Yangtze River Basin. Ding et al. [
21] assessed the performance of the evapotranspiration data derived from GLDAS, MOD16, and GLEAM in a streamflow simulation; the results showed that these data had great capability in the streamflow simulation and showed higher NSE values than the threshold. Jing et al. [
22] applied GRACE and GLDAS data to invert the terrestrial water storage. Akhtar et al. [
23] also combined GRACE and GLDAS data to invert the groundwater storage change. Nevertheless, to the best of our knowledge, the current limited number of related studies warrants the investigation of the potential of satellite and satellite-based data in GWL prediction, especially in multi-predicting horizons.
In addition, the highly stochastic, nonlinear, and nonstationary features of groundwater make GWL predictions challenging, particularly when a single machine learning model is used [
1]. This is because the optimal model is usually selected without accounting for the uncertainty caused by the parameters and structure of the model [
24]. When considering this, the ensemble learning strategy can be an appropriate approach to reduce the prediction uncertainty [
25]. By integrating multiple skilled machine learning models, such an ensemble method is more likely to contain unknown true predictions [
26]. Bayesian model averaging (BMA), which is an averaging ensemble learning technique, can evaluate model implementation and build prediction distribution by using probabilistic techniques [
27,
28]. Compared with other ensemble learning methods such as the generalized likelihood uncertainty estimation [
29] or the simple average method [
30], BMA not only provides a deterministic weighted average for the interested models, but also produces a predictive distribution to analyze the uncertainty related to the deterministic prediction [
31]. It has been reported that BMA can provide more accurate and reliable predictions than the single models [
32,
33]. Although the capability of the BMA model has been investigated for various hydrological applications, it has not yet been investigated in GWL prediction, especially by incorporating satellite data and satellite-based data.
The primary goal of this study was to explore the potential of the GRACE, GLEAM, GLDAS, and publicly available data in multi-time-ahead GWL prediction and the validation of the BMA model in improving the prediction accuracy and reducing uncertainty. To achieve this, 1-, 2-, and 3-month-ahead GWL predictions for three observation wells in the Zhangye Basin in Northwest China were performed. The primary intention was twofold:
To evaluate the performance of the GRACE, GLEAM, and GLDAS data in multi-time-ahead GWL prediction;
To evaluate the robustness of the ensemble BMA model against the standalone ELM, SVR, and RF models and to access the ability of BMA in reducing modeling uncertainty.
3. Results
Table 2 demonstrates the performance of the single ELM, SVR, and RF models for 1-, 2-, and 3-month-ahead GWL predictions during the testing periods for Well I, Well II, and Well III. The performance of the single models is demonstrated in
Appendix A (
Table A2). The performance of the ensemble BMA model in multi-time-ahead GWL predictions for the three wells during the training and testing periods is shown in
Table 3.
3.1. Investigating the Capability of Forcing Data in GWL Prediction
We employed the GRACE satellite data, the GLEAM and GLDAS model data, and the public meteorological data to predict the 1-, 2-, and 3-month-ahead GWL to evaluate the potential of these data in GWL prediction. According to the performance metrics in
Table 2,
Table 3 and
Table 4, all of the ELM, SVR, and RF models achieved a satisfactory performance in the 1-, 2- and 3-month-ahead GWL predictions for Wells I, II, and III. In particular, the ELM, SVR, and RF models achieved an
NS greater than 0.57 at three timescales, which were slightly higher than the standard of a satisfied model (
NS > 0.50) [
60]. In terms of the
R and
RMSE values, the
R values of all the models exceeded 0.79, which demonstrated a high correlation between the predictions and the in situ data; while the
RMSE values were almost all lower than 1 m, which demonstrated the lower error of the predictions. The results suggested that although the hydrogeological conditions of the three selected GWL wells were different, all of the single models, including ELM, SVR and RF, yielded satisfactory prediction results with the GRACE, GLEAM, and GLDAS data as inputs. Nevertheless, we noted that the predictions for Well III had a sharp drop followed by a rise at the 30th month. This was because the data-driven models are often viewed as black-box models without prior assumptions about physical processes. Therefore, the data-driven models could not predict the GWL in complex environments such as areas with intensive irrigation activities [
65].
It is noteworthy that although the three single models achieved a good performance, none of them consistently outperformed the others. For Wells I and II, the RF achieved the best performance in the 1-, 2-, and 3-month-ahead GWL predictions. Similarly, SVR was considered to be the best model for Well III. This was mainly because the differences in model parameters and structures introduced a great deal of uncertainty to the modeling process [
24,
26]. Generally, the RF model showed a superior performance with a higher
R value of 0.931, 0.899, and 0.870 for the 1-, 2-, and 3-month-ahead GWL prediction on average, respectively; followed by the SVR and ELM models. The outperformance of the RF over the other machine learning approaches (i.e., SVR and ELM) in the GWL predictions was expected because it is an ensemble-based method that often performed better than other machine learning methods in previous studies [
13,
66].
3.2. Predicting Performance of BMA
For 1-month-ahead GWL predictions, the BMA model achieved a good performance; the BMA gained high values of
R (>0.84) and
NS (>0.69) but relatively small
RMSE values (<0.87 m) for Wells I, II and III [
59] (
Table 3). Specifically, the BMA model obtained
R,
NS, and
RMSE values of 0.938, 0.845, and 0.264 m for Well I; and 0.954, 0.909, and 0.111 m for Well II, respectively. For Well III, the values of
R,
NS,
RMSE were 0.871, 0.745, and 0.737 m, respectively. Thus, the BMA was able to provide good results in the 1-month-ahead GWL predictions.
For the 2- and 3-month-ahead GWL predictions, the BMA performed slightly worse than in the 1-month-ahead predictions (
Table 3). Taking Well III as an example, the
RMSE value of the BMA model increased by 6.38% for 2-month-ahead predictions and 17.5% for 3-month-ahead predictions; while the
R and
NS decreased by 0.93% and 1.64% for the 2-month-ahead predictions and 3.44% and 6.89% for the 3-month-ahead predictions, respectively. This meant that the accuracy of the BMA deteriorated with an increase in the prediction time. This finding was consistent with similar machine-learning-based hydrological predictions at multiple time scales [
67]. This was probably attributable to the decrease in the data characteristics for longer time steps [
68]. Although the prediction performance deteriorated, the results of the BMA for the 2- and 3-month-ahead GWL predictions met the threshold of acceptable prediction requirements [
59]. Overall, the BMA model can serve as an effective model for 1-, 2-, and 3-month-ahead GWL prediction.
3.3. Comparative Analysis of BMA and the Single Models
Comparatively speaking, the performance of the BMA far exceeded that of the single ELM, SVR, and RF models not only for 1-month-ahead GWL predictions, but also for the 2- and 3-month prediction horizons. For the 1-, 2-, and 3-month-ahead GWL predictions for the three selected monitoring wells, the BMA increased the R by 2.11%, 4.90%, and 3.97%; increased the NS by 8.32%, 16.18%, and 13.66%; and decreased the RMSE by 13.75%, 24.01%, and 16.75% on average, respectively. Specifically, for the 2-month-ahead GWL predictions, the BMA increased the R by 7.23%, 5.38%, and 2.34%; increased the NS by 19.00%, 15.53%, and 14.42%; and decreased the RMSE by 27.70%, 20.84%, 23.48%, respectively, when comparing the average GWL predicted by the ELM, SVR, and RF models. Clear improvements by the BMA were observed compared with the three single models.
In addition, the outstanding performance of the BMA over the others was consistent among the three wells. Taking Well II (the well with the best predicting performance) as an example, the best-performing SVR model obtained an R of 0.900, 0.908, and 0.876 for the 1-, 2-, and 3-month-ahead predictions; while the R value of the BMA increased to 0.954, 0.956, and 0.906, respectively. The BMA yielded an NS and RMSE of 0.909, 0.912, and 0.810 and 0.111 m, 0.100 m, and 0.141 m for the 1-, 2-, and 3-month-ahead predictions, respectively; in contrast to the corresponding values of 0.805, 0.810, and 0.749 and 0.162 m, 0.147 m, and 0.161m for the SVR, respectively. That is, the ensemble BMA model provided more accurate GWL predictions.
The hydrographs and scatter plots helped to visually assess the relationship between the observed and predicted GWL (
Figure 2 and
Figure 3). It can be seen in
Figure 2 that the predictions of all the models followed the same trend with the observed GWL, which meant that all of the proposed models could capture the change pattern of the GWL. The least-squares equation (i.e., y = ax + b) and the correlation coefficient (i.e.,
R2) were applied for further interpretation. Comparatively, the scatters of the BMA were much tighter than those of the single models in most cases (
Figure 3). Meanwhile, the BMA yielded an a closer to 1 and a b closer to 0, which demonstrated the stronger correlation between the BMA’s predicted and observed values, meaning that the BMA achieved the highest GWL prediction accuracy.
The accurate prediction of the GWL’s low values can aid in the decision making for timely groundwater warnings and efficient water resource management. The absolute error, which represented the difference between the observed and predicted GWL, is introduced and demonstrated in
Figure 4. As can be seen, all of the models over-predicted the lowest GWL values, while the BMA derived the smallest absolute error in most cases. These findings showed the deficiency of the machine learning models in predicting extreme values, which also was pointed out by other researchers [
69]. Nevertheless, the BMA did not always maintain the minimum error in the lowest GWL prediction cases. In fact, the absolute error of the BMA was the median of the three models. This particularly emphasized the ability of the BMA to yield more reliable results by weighting the average of the individual predictions [
26].
The error box–whisker plots were further developed to present the error characteristics (
Figure 5). Overall, the error of the BMA and the SVR was relatively smaller for 1 to 3 month ahead predicting horizons. However, the error median of the BMA model was much closer to 0 than that of the SVR model, which indicated the more concentrated error distribution of the BMA model. The results indicated the superiority of the BMA model over the single models in multi-time-ahead GWL prediction.
3.4. Uncertainty Analysis
The BMA was applied as an ensemble learning strategy to provide deterministic prediction of the GWL, so the uncertainty associated with the BMA approach was investigated as well.
Figure 6 describes the 95% confidence interval of the 1-, 2-, and 3-month-ahead GWL predictions derived by the BMA model for the selected GWL wells. In the 1- to 3-month-ahead GWL predictions, the BMA drove up the
CR values by 83.33% to 100%, the
B values by 0.43 to 1.31, and the
D values by 0.09 to 0.31. These results highlighted the reliability of the BMA model in yielding credible GWL predictions because most of the observations were within the 95% confidence interval. However, some low values were beyond the interval, which reflected the limitation of the BMA in predicting low GWL values.
Furthermore,
Table 4 shows the uncertainty metrics of the BMA model for the three selected wells in both the training and testing periods. Recall that the model would have perfect reliability if the
CR equals the confidence level; if the
CR values are similar, then the one with a lower
B has a better reliability. What could be derived from
Table 4 was that the uncertainty analysis results of the 1-, 2-, and 3-month-ahead GWL predictions for the three wells were not always identical according to the
CR and
B values. It seemed that it was very difficult to derive a balanced low
CR as well as high
B values, which also was encountered by other researchers [
70,
71]. Regardless of the
CR, the
B and
D values increased with the increase in the lead time in most circumstances, which was consistent with results of the statistical metrics. The results indicated that the prediction uncertainty of the BMA model accumulated with the increase in the lead time.
Even though there was inconformity in the CR and B values, the results for the CR clearly showed that the 95% confidence interval encompassed the GWL observations very well. Taking the results of Well I as an example, 91.67%, 97.22%, 88.89% of the observations fell within the 95% confidence interval for the 1-, 2-, and 3-month-ahead GWL predictions, respectively. This implied that the BMA was able to provide GWL predictions within a satisfactory uncertainty domain. As for B and D, the B values were basically less than 1 and D was smaller than 0.30 in most cases. This further demonstrated the reliability of the BMA model in the multi-time-ahead monthly GWL predictions.
Additionally, it was noteworthy that the uncertainty analysis results obtained by the BMA of the three GWL wells (i.e., Wells I, II, and III) differed greatly. Overall, the reliability predicted by the BMA of the three wells could be ranked as: Well II > Well I > Well III. The ranking was consistent with that of the performance evaluation. This phenomenon directly reflected the inevitable aleatoric uncertainty of the original GWL data. This can be explained by the differences in the hydrogeological conditions of the three GWL observation wells.
4. Discussion
According to the above analysis, it can be said that by using the GRACE satellite data, the GLDAS and GLEAM model data, and the public meteorological data as inputs, all of the models achieved satisfactory results in the 1-, 2-, and 3-month-ahead GWL predictions. Thus, the input combination can be considered as effective in multi-time-ahead GWL prediction. The reason for the efficiency of the inputs may lie in several aspects. Firstly, the GRACE satellite data contained the variations in the terrestrial water storage (including the groundwater storage) by observing the time change of the Earth’s gravitational potential [
72]. Thus, the change in the groundwater resource storage could be reflected. Secondly, the CAN and SW of the GLDAS model can serve as good responders to groundwater storage [
44,
45]. Thus, the time series of the GRACE and GLDAS data may have potential in GWL prediction. Thirdly, evapotranspiration, precipitation, and temperature are the main factors that affect the GWL in arid regions [
2], which reflects their indispensable role in GWL prediction.
In fact, it is commonly believed that the spatial resolution of the satellite data and satellite-based data is too coarse to meet the requirements of the hydrological-related study of a region [
73]. However, the current study confirmed the great potential of these data in local-scale GWL prediction. This finding was consistent with those of similar hydrological studies; for example, Yi et al. [
74] proved the validity of the GRACE data in monitoring water-storage changes for a small reservoir (Longyangxia Reservoir, China). They pointed out that a small signal size (400 km
2 area) was not a restricting factor when using GRACE data. Liesch et al. [
75] also verified the possibility of using the GRACE data in groundwater depletion estimations for an area ranging from 1500~18,000 km
2 in Jordan. Liu et al. [
3] incorporated GRACE data with P, T, solar energy, and the infrared surface temperature to predict the GWLs for 46 observation wells in the northeast US; the results indicated that the prediction accuracy for most of the stations was significantly improved by incorporating the GRACE data as inputs. Therefore, the satellite data and satellite-based data with a coarse resolution may have great potential in relevant hydrological studies.
To evaluate the importance of the inputs (AET, T, CAN, P, SW, and TWSA), the RF model was further applied to calculate the residual sum of squares (RSS) of the variables. The larger the RSS, the higher the importance of the variable. The average importance of the inputs is illustrated in
Figure 7, and the detailed importance of each case is demonstrated in
Appendix A (
Table A2). In general, the RSS of the selected variables was greater than 0.1, which indicated the efficiency of the variables in the GWL predictions. The importance of the variables could be ranked as: AET > T > CAN > P > SW > TWSA. The input with the highest influence on the GWL was the AET data of the GLEAM. This was reasonable because the strong evapotranspiration could be the main meteorological factor that affects the GWL in the Zhangye Basin [
76]. The temperature, which served as a proxy of evapotranspiration, also presented a very high significance. As for the CAN, it could be treated as an important responder to the GWL because groundwater is the main supply for irrigation in this region [
44]. Precipitation could be regarded as a reactor for the groundwater recharge. However, the impact of precipitation on the GWL was weak due to its rare occurrence [
77]. The impact of the soil water on the GWL maintained consistency with the precipitation. This was because precipitation happened to be the main source of soil water recharge in such an arid region [
78]. The TWSA data from the GRACE satellite demonstrated the least impact on GWL prediction. This was possibly due to the disturbance in the surface water storage.
When emphasizing the validity of the inputs, a comparison with similar studies is essential. For example, in the study by Zhang et al. [
79], only groundwater recharge and actual evapotranspiration data were applied in GWL prediction. Compared with the
R2 values in this study (
R2 > 0.67), both the back propagation and radial basis function models in their study yielded much a lower
R2 (<0.65). This indicated the great potential of the application of the satellite data and the land-surface model data in GWL prediction. At present, the application of satellite data and satellite-based data to predict the GWL in arid or semi-arid areas is limited, but they have been used in GWL prediction in coastal areas. For example, Yin et al. [
65] used the GRACE and the Tropical Rainfall Measuring Mission (TRMM) satellite data in GWL prediction in Victoria, Australia; the average
R values of the RF, ANN, and LSTM were 0.905, 0.973, and 0.788, respectively. Kalu et al. [
80] predicted the GWL over the next 5 months by employing GRACE, TRMM, and ERA interim data combined with ENSO and the North Atlantic Oscillation index in South Africa based on the deep belief network. The four representative wells all obtained a high accuracy (
NS > 0.52) in five timesteps. Therefore, satellite data and satellite-based data can be valid input data in GWL prediction research, and it is worth further exploring which data are more applicable in different regions.
Selecting an appropriate lag time of the input is a key aspect of machine learning modeling because it provides valuable knowledge about the dynamics of aquifers in GWL time series [
17]. However, there is currently no standard guidelines on how to determine the lag time; the commonly used methods include trial and error, statistical methods, and various optimization techniques [
1]. For example, Chang et al. [
81] employed auto-correlation (ACF) and partial autocorrelation (PACF) methods to determine the lag time of the input data in GWL prediction. Nevertheless, researchers pointed out that because the ACF and PACF programs were purely linear, they failed to capture the nonlinear relationships between the targets and probe variables [
70]. More researchers attempted to explore the relationships between variables to determine the input parameters. For example, Samani et al. [
50] and Vadiati et al. [
82] explored the relationships between variables, applied a cross-correlation method to determine the maximum lag time of the model input to be 3, and predicted the 1-, 2-, and 3-month-ahead GWLs. Yadav [
1] used a correlation analysis to determine the maximum lag time to be 3 for 1- and 2-month-ahead GWL predictions. Therefore, the three time lags (t − 1, t − 2, and t − 3) were applied in this study for the 1-, 2-, and 3-month ahead GWL predictions.
As for the BMA model, the robust and reliable GWL prediction results were achieved in view of the performance metrics and the uncertainty criterion. Moreover, the capability of the BMA model was proved for multiple prediction horizons. The plausible reasons for the good performance of the ensemble BMA model were as follows. Firstly, the BMA was able to avoid the uncertainty introduced by the parameters and the structure of the single models by extracting the effective information from an existing set of models [
24]. Secondly, the BMA determined the prior probability according to the performance of the member models [
26]. Consequently, the BMA could improve the prediction accuracy and reliability by taking the advantage of the best-performing model. However, the results in this study also showed that the capability of the BMA was not always better than that of the member models. This situation stood out particularly in the lowest GWL predictions, for which the BMA occasionally obtained a similar or inferior accuracy relative to the single models. This was mainly because the performance of the base learners largely affected the improvement degree of the BMA model [
83].
Although the proposed forcing data show great potential in GWL prediction and the proposed ensemble BMA model achieved an excellent GWL prediction performance, the predictive accuracy in low GWL prediction was relatively poor. There may have two possible reasons: first, the resolution of the satellite data (0.25° × 0.25°) may have caused the loss of some information, thereby rendering it unable to capture the change in the groundwater storage accurately; second, human factors (e.g., groundwater pumping) that may have affected the variations in the GWL were not considered in the model [
84]. Therefore, for further improvements in the performance of the satellite data in GWL prediction, downscaling the resolution of the satellite data is firstly suggested [
85]. For example, Chen et al. [
43] downscaled the resolution of GWSA estimates from 1° to 0.25° when forecasting the groundwater storage, which acquired an improvement compared with the forecasting results with the application of raw GRACE data. Meanwhile, relevant human factors can be appropriately added as additional variables to improve the predicting accuracy. For example, Sharafati et al. [
86] proved that the GWL prediction maintained an excellent consistency with the pumping rate. Additionally, the capability of the ensemble BMA model was merely explored for 1-, 2-, and 3-month-ahead timescales, so the potential of the technique in both short- and long-term GWL prediction requires further exploration.
5. Conclusions
The accurate and reliable prediction of the GWL is extremely crucial for the sustainable management of the groundwater resources in the Zhangye Basin in Northwest China. In this study, the GRACE satellite data, the GLDAS and GLEAM model data, and the publicly accessible meteorological data were used as inputs for the BMA method in 1-, 2-, and 3-month-ahead GWL prediction. The validity of the proposed input combination and the capability of the ensemble BMA model were evaluated for three monitoring wells. According to the results of the performance evaluation and uncertainty analysis, the following conclusions were drawn:
The GRACE satellite data, as well as the GLDAS and the GLEAM model data, could be used as effective inputs for the machine learning models in 1-, 2-, and 3-month-ahead GWL prediction. This highlighted the significance of these suitable satellite data and land-surface model data in providing effective alternative inputs in GWL prediction, which is greatly worthy for use in areas with insufficient or missing data because these datasets were easily and conveniently derived. The BMA had the ability to yield more accurate and reliable GWL predictions than the single machine learning models, and the BMA also provided facilities for quantifying uncertainty.
The implementation of the BMA model proved the excellent value of the ensemble learning strategy and indicated an ensemble approach that, when implemented in practice in arid regions, could improve the GWL prediction accuracy. When considering the extensive range of the machine learning models, any other models can be explored as its alternative members if necessary in further studies.
The evapotranspiration and temperature also showed great potential in the multi-time-ahead GWL prediction. Thus, the evapotranspiration data and temperature data from the alternative satellite (e.g., Landsat; and the Ecosystem Spaceborne Thermal Radiometer Experiment on Space Station, ECOSTRESS) and the relevant satellite-based products (e.g., the Moderate-Resolution Imaging Spectroradiometer, MODIS; the Atmospheric Infrared Sounder, AIRS; the North American Land Data Assimilation System, NLDAS; and GLEAM) can be valid forcing data in GWL prediction for arid regions with scarce data.
Although the proposed input variables and BMA model achieved a splendid performance in the GWL predictions, improvements are still needed. Firstly, there has been no set of data products so far that exhibits perfect performance in all regions of the world. For this study, the forcing data was applied for the specific arid region of Northwest China, so it is worthy to explore whether the same high accuracy can be obtained by using these forcing data for other regions, and if not, whether other alternative data are suitable for GWL prediction in such regions in further research. Secondly, the performance of the proposed models for three observation wells varied greatly. This was probably caused by the certain natural and anthropic factors in the specific areas such as changes in water users, excessive groundwater extraction, and the irrigation area in this agriculture area supported by groundwater. Therefore, there is a high possibility of improving the models’ accuracy if these general disturbances and the controlling factors of different wells are taken into consideration. Thirdly, it was possible to accurately predict future GWLs from previous related data. Therefore, it is very necessary to develop a standard and effective selection method to determine the lag time. Finally, this paper mainly explored the capability of the BMA model for 1-, 2-, and 3-month-ahead GWL prediction, so the potential of the technique in both short- and long-term GWL prediction requires further exploration.