Forecasting the Propagation from Meteorological to Hydrological and Agricultural Drought in the Huaihe River Basin with Machine Learning Methods

: Revealing the mechanism of hydrological and agricultural drought has been challenging and vital in the environment under extreme weather and water resource shortages. To explore the evolution process from meteorological to hydrological and agricultural drought further, multi-source remote sensing data, including the Gravity Recovery and Climate Experiment (GRACE) product, were collected in the Huaihe River basin of China during 2002–2020. Three machine learning methods, including long short-term memory neural network (LSTM), convolutional neural network (CNN), and categorical boosting (CatBoost), were constructed for hydrological and agricultural drought forecasting. The propagation time from meteorological drought to surface water storage and terrestrial water storage drought, evaluated by the standardized precipitation evapotranspiration index, was 8 and 11 months with Pearson correlation coefﬁcients (R) of 0.68 and 0.48, respectively. Groundwater storage drought was correlated with evapotranspiration and vegetation growth with a 12-month lag time, respectively. In addition, vegetation growth was affected by the drought of soil moisture at depths ranging from 100 to 200 cm with an 8-month lag time with an R of − 0.39. Although the forecasting performances of terrestrial water storage drought were better than those of groundwater storage drought and agricultural drought, CNN always performed better than LSTM and CatBoost models, with Nash–Sutclife efﬁciency values during testing ranging from 0.28 to 0.70, 0.26 to 0.33, and − 0.10 to − 0.40 for terrestrial water storage drought, groundwater storage drought, and agricultural drought at lead times of 0–3 months, respectively. Furthermore, splitting training and testing data at random signiﬁcantly improved the performances of CNN and CatBoost methods for drought forecasting rather than in chronological order splitting for non-stationary data.


Introduction
With the intensification of climate change and human activities, natural disasters occur more frequently causing a large number of casualties and economic losses.According to the worldwide disaster records from the Emergency Event Database, drought was the number one disaster type based on the total number of humans affected (106.9 million) in 2022, followed by flood (57.1 million) [1].Particularly, economic losses caused by droughts exceeded those caused by floods in China (https://www.cred.be/,accessed on 8 November 2023).
Droughts are periods with a deficit of water, which can last from a few weeks to years, and span from hundreds of km 2 to hundreds of thousands of km 2 .Owing to the slow onset characteristics of drought disasters compared to rapid-onset floods, wildfires, and earthquakes, etc., droughts have always been developed into destructive natural disaster events when they begin to be realized.They can lead to water shortage, crop yields reduction, economic loss, ecological environment deterioration, and social unrest [2][3][4].Generally, there are four types of drought, namely meteorological, hydrological, agricultural, and socio-economic drought [5][6][7].For more detailed and accurate drought research, the classifications of drought have been extended to groundwater drought [7], urban drought [8], ecological drought [9], and environmental drought [10].
Different types of drought have different causes, and thus have different drought indices which can accurately describe the intensity, extent, and start and end times of droughts.In terms of meteorological drought, it has always been assessed by standardized precipitation index (SPI) [11], and standardized precipitation evapotranspiration index (SPEI) [12].Standardized streamflow index (SSI) [13] is commonly used for evaluating hydrological drought.Soil wetness index (SWI) [14] and normalized difference vegetation index (NDVI) [15] can be used for evaluating agricultural drought.In addition, the drought severity index (DSI) is commonly used based on different related variables [16,17].In addition, drought events with onset, duration, extent, severity, and intensity can be identified according to different time scales for calculation, different thresholds of those indexes, and the run theory [7].The severity, duration, and frequency vary significantly in different time scales.Longer time scales always indicate less drought frequency, longer duration, and greater magnitude [18,19].
Although drought characteristics can be demonstrated with various methods, understanding the drought process and its propagation has been a great challenge.Meteorological drought occurs owing to a persistent lack of precipitation.As meteorological droughts intensify, other types of droughts may follow.When precipitation becomes less and temperature is higher, soil moisture begins to decrease below normal conditions with less recharge and more evaporation leading to soil drought.Similarly, water resources including river runoff, lake, and groundwater will decrease further, which can result in hydrological drought.Then, reduced soil moisture may take 10 to 20 days to stunt the growth of plants, triggering an agricultural drought [20].In addition, human activities, such as dam constructions [21,22] and urbanization [23], profoundly alter hydrologic processes and the ecological environment.Understanding the human-water relationship is a key in drought study for the coordinated development between the social economy and the ecological environment [24].
The propagation processes from meteorological drought to other types of drought has been analyzed in various methods [25,26].For example, methods of correlation analysis [27], temporal shift [28], convergent cross mapping [29], complex network [30], etc., have been performed for exploring spatiotemporal characteristics of drought propagation.Factors influencing drought propagation include conditions of meteorological drought [27,31], climate zones [25], watershed area [32], altitude [33], aquifer thickness [34], and human activities [35,36].However, influencing factors of drought propagation and their mechanisms have not been fully explored yet, and further study of drought propagation and its prediction is still needed [35].
Recently, the application of machine learning (ML) methods in drought forecasting has increased rapidly owing to their superior learning and computing ability when dealing with multimodal datasets [37].ML algorithms can be classified into several categories, including decision trees and ensemble methods, artificial neural networks, support vectors machines, etc. [38].The long short-term memory (LSTM) model has shown satisfactory forecasting performances in drought forecasting owing to its the long-term dependency problem [39,40].With the development of deep learning, many state-of-the-art algorithms have been designed, such as convolutional neural network (CNN), which has a great adaptability in learning local patterns [41].However, the applications of different types of ML methods in drought forecasting are limited.The accuracy of drought prediction, especially for long lead times, and the integration technique of models with big data, needs to be further improved [37,39].
The importance of using remote sensing data, such as precipitation from GPM [42], NDVI derived from MODIS [43], and the terrestrial water storage anomaly (TWS) estimated from the Gravity Recovery and Climate Experiment (GRACE) mission [44], in drought research has been emphasized.Particularly, TWS data inferred from the changes in gravity field not only provide useful information for water storage variations at large spatial scales, they are sensitive to water stresses in different seasons [45].Thus, drought events can be directly identified with TWS data [46].In addition, it can be integrated in land surface models or observations for the study of other components of the hydrological cycle.This is especially true for groundwater, although the lack of observations hinders the ability to effectively measure groundwater resources, it can be monitored by GRACE data assimilation [47].However, the integration of multiple remote sensing data for various drought monitoring and forecasting is limited [37].
Consequently, the aim of this study is to facilitate the understanding of drought propagation from meteorological drought to hydrological drought and agricultural drought and their forecasting with multiple remote sensing data.To achieve this objective, meteorological, hydrological, and agricultural droughts were assessed by various drought indexes with different variables.Meteorological drought was quantified by precipitation and evapotranspiration; hydrological drought was calculated with runoff, terrestrial water storage anomaly, and groundwater storage anomaly; and agricultural drought was presented with soil moisture and NDVI.nN addition, three machine learning algorithms, including two artificial neural networks, LSTM and CNN, and one decision trees and ensemble method, namely categorical boosting (CatBoost), were selected to explore their forecasting skills in hydrological drought and agricultural drought.

Study Area
The Huaihe River basin (HRB, Figure 1), located in the north-south climate transition zone, has an area of 270,000 km 2 .The north and the south parts of HRB belong to the warm temperate semi-humid monsoon and the subtropical humid monsoon climate zones, respectively.The elevation ranges from −9 m to 2122 m according to the SRTM 90 m digital elevation model (DEM) database (https://www.gscloud.cn/,accessed on 1 January 2020).The western, southwestern, and northeastern parts are mountainous and hilly areas, and the rest are vast plains.The Huai River originates from Tongbai Mountain in Henan province.It flows from west to east through Henan, Hubei, Anhui, and Jiangsu provinces.The total length and total drop of the river are approximately 1000 km and 200 m, respectively.The strata of HRB span two stratigraphic regions of North China and South China, and they are divided into three first-order tectonic units, namely the Sino-Korean quasi-platform, the Yangtze quasi-platform, and the Qinling fold area.The groundwater in HRB mainly consists of loose rock pore water, carbonate rock fissure karst water, and bedrock fissure water.In addition, loose rock pore water is the most widely distributed in the plain area.The lithology of water-bearing rock formations is mainly clayey soil, sand, and gravel, with a multi-layer structure.
The average annual rainfall is 878 mm, with large spatiotemporal differences.The average annual temperature ranges from 11 to 16 • C, increasing from the north to the south, from the coastal to the inland.The HRB is dry and rainless in winter and spring, and hot and rainy in autumn and summer.Thus, there is a sharp change between cold and warm, drought and flood.The average annual runoff depth is 230 mm.The discharge of upper reaches of Huaihe River is fast with high terrain and large river drop.However, the discharge is slow in the middle reach with low terrain and small river drop.In addition, the average annual evaporation of water surface is 900-1500 mm.The HRB has a large population, and it is an important grain production base in China.As a result, the water demand in the basin is large.However, drought disasters occur frequently in this basin, leading to an average of 2,698,000 hm 2 of crops affected each year.

Data
For analyzing meteorological drought, hydrological drought, and agricultural drought, monthly precipitation (P), evapotranspiration (ET), potential evapotranspiration (PET), runoff (Q), soil moisture (SM), terrestrial water storage anomaly (TWS), and NDVI, were collected and averaged in HRB from 2002 to 2020.Particularly, monthly streamflow data at Bengbu hydrological station (Figure 1), a representative station of the main stream, was provided by the Huaihe River Commission of the Ministry of Water Resources.Other data were obtained from the following remote sensing data.

MOD16A2
The ET and PET data were obtained from the Moderate-resolution Imaging Spectroradiometer (MODIS) Global Evapotranspiration Project MOD16A2 product.Their spatial and temporal resolutions were 500 m and 8 days, respectively.They were preprocessed by HDF-EOS to GeoTIFF (HEG) tool with extraction, projection, and spatial subset.

Data
For analyzing meteorological drought, hydrological drought, and agricultural drought, monthly precipitation (P), evapotranspiration (ET), potential evapotranspiration (PET), runoff (Q), soil moisture (SM), terrestrial water storage anomaly (TWS), and NDVI, were collected and averaged in HRB from 2002 to 2020.Particularly, monthly streamflow data at Bengbu hydrological station (Figure 1), a representative station of the main stream, was provided by the Huaihe River Commission of the Ministry of Water Resources.Other data were obtained from the following remote sensing data.

MOD16A2
The ET and PET data were obtained from the Moderate-resolution Imaging Spectroradiometer (MODIS) Global Evapotranspiration Project MOD16A2 product.Their spatial and temporal resolutions were 500 m and 8 days, respectively.They were preprocessed by HDF-EOS to GeoTIFF (HEG) tool with extraction, projection, and spatial subset.

GRACE/GRACE-FO Mascon
The monthly terrestrial water storage anomaly (TWS) used in this study was estimated by Center for Space Research (CSR) GRACE/GRACE-FO RL06 mass concentration (mascon) solutions (version 02).The Mascon solutions are the post-processing products based on spherical harmonic solutions, which are suitable for basin-scale (≥200,000 km 2 ) analysis [48].There was no need for any post-processing to obtain TWS information, which is convenient for non-geodesy and non-geophysics professionals [49].The spatial resolution was 0.25 • × 0.25 • .Data beginning from April 2002 are available.The gap between data of GRACE and GRACE-FO mission, from July 2017 to May 2018, were reconstructed by linear regression using a dataset of reconstructed terrestrial water storage in China [50].Other missing data were linearly interpolated [51][52][53].
The sources of TWS were assumed to be groundwater storage anomaly (GWS), surface water storage anomaly (SWS) including canopy water storage anomaly (CWS), soil moisture storage anomaly (SMS), and snow water equivalent anomaly.Thus, SWS and GWS can be determined as follows [54][55][56]: where CMS, SWS, and SWE can be acquired from GLDAS Noah, and they were derived by subtraction of average values from 2004 to 2009 as TWS.

Trend Analysis
The Mann-Kendall (M-K) test was used as a non-parametric test for assessing monotonous trends in hydrometeorology [58,59].For time series x i (i = 1, 2,. .., n), the statistic S can be calculated using: where sgn() is the sign function of (x i − x j ).It equals 1, 0, −1, when (x i − x j ) is greater than, equal, and less than 0, respectively.The test statistic Z can be calculated as follows: where VAR(S) is the variance of S. The increasing or decreasing trend can be detected if Z is greater or smaller than 0, respectively.In addition, trends can be identified at the significance level of 0.10, 0.05, and 0.01, when the absolute value of Z is greater than 1.65, 1.96, and 2.58, respectively.
To estimate the trend magnitude of time series X further, the Kendall slope is defined [60]: where n is the length of X, and a positive/negative β value represents a rising/decreasing trend.

Drought Indicators
Various drought indicators were calculated, including standardized precipitation index (SPI), standardized evapotranspiration index (SEI), standardized precipitation evapotranspiration index (SPEI) for meteorological drought, standardized streamflow index (SSI) for hydrological drought, standardized soil moisture (SSMI), and standardized NDVI index (SNI) for agricultural drought.These indicators were used for analyzing the drought propagation further because they can be calculated for different accumulation times.In addition, the drought severity index (DSI) was used for assessing hydrological and agricultural drought.
SPI is a commonly used meteorological drought indicator [11].The precipitation dataset was fitted by a theoretical probability distribution, for example, a Gamma distribution was used in this study.Then, the cumulative probability was converted to a standard normal distribution function.The level of drought was classified by the standardized cumulative frequency.SPI_n was calculated by the same procedure for different accumulation times n (ranging from 1 to 15 months in this study).

Correlation Analysis
The influencing factors of hydrological and agricultural drought were by crosscorrelation coefficients among different variables at different times.The maximum crosscorrelation coefficient between time series X(t) and Y(t + n) represents that Y is correlated most with X at lag time n, namely the response time can be assessed [68].
The drought propagation relationship was determined by Pearson correlation coefficients R between different drought indicators [25,27].The correlation analysis between different types of drought was implemented as follows: where T indicates the propagation time i when the correlation coefficient R i is the maximum value.In addition, the correlation coefficients between DSI-SM and SSMI, and between DSI-NDVI and SNI were not included here.The long short-term memory neural network (LSTM) is an improvement of traditional recurrent neural networks (RNN) introduced by Hochreiter and Schmidhuber [69].The LSTM model consists of the recurrent LSTM memory cells, which are designed with input, forget, and output gates.The design can address the vanishing or exploding gradient problems during calculations of error back-propagation in traditional RNN [70].Thus, it makes long-term temporal relationships between inputs and outputs easier to learn, and has been widely used in forecasting sequential data [71][72][73].

Convolutional Neural Network (CNN)
The convolutional neural network (CNN) is a modified version of feed-forward neural networks, which was developed by LeCun et al. [74].Generally, the structure of CNN is composed of the convolution, pooling, and fully connected layers.The inputs of CNN are typically a two-dimensional matrix, such as images and sequence data with previous time steps.In the convolution layers, convolutional kernels with small size are used to encode information of inputs by tiled convolution.Pooling layers are down-sampled for extracting salient features from convolution layers by pooling filters.Finally, features obtained from the last pooling layer are flattened to feed to the fully connected layers.The processes of convolution and pooling improve the robustness of feature extraction and learning efficiency.Consequently, CNN has been successfully used for complex tasks in computer vision, natural language processing, and time series domains [75][76][77].

Categorical Boosting (CatBoost)
Categorical boosting (CatBoost), a machine learning library, was created by Russian search giant Yandex in 2017.CatBoost is an improved gradient-boosting decision tree algorithm (GBDT).In addition, it can handle classification and regression issues efficiently because of the following characteristics [78].(1) Categorical features and numerical features can be trained together in the model by target statistics; (2) ordered boost is applied to reducing overfitting and prediction shift resulted from gradient bias; and (3) base predictors of CatBoost are oblivious trees with fewer parameters, which are balanced and less prone to overfitting.The superior performance, robustness, and fast prediction of CatBoost make it promising in hydrology [79,80].

Evaluation Criteria
The performances of drought forecasting models were evaluated by Nash-Sutclife efficiency (NSE) [81], root-mean-square error (RMSE), Pearson correlation coefficient (C), and threat score (TS) [82]: where y i and ŷi are the observed and forecasting values at time i, respectively; n is the length of observed or forecasting time series; and y and ŷ are the mean of the observations and forecasting values.Although it is sensitive to peak values, the values of NSE can effectively evaluate the degree of fit of the model [83].The C value presents the linear correlation between predictions and observations.The value becomes 1, if they are a perfectly positive correlation.The RMSE values measure the deviations of the observed values from the predicted values, and its unit is consistent with the observations.The TS values range from 0 to 1, which can evaluate whether the model can correctly predict the occurrence of drought.Figure 3 presents the maximum cross-correlation coefficients (R) among all variables at different lag time ranging from −15 to 15 months.Horizontal and vertical axes stand for the variable at time t and the variable at time t + n corresponding to the maximum coefficients, respectively.The response time can be indicated by the lag time of the maximum correlation coefficient.Generally, precipitation is a key factor affecting runoff and soil moisture.However, the relationships between precipitation and terrestrial water storage and groundwater storage were not obvious.Figure 3 presents the maximum cross-correlation coefficients (R) among all variables at different lag time ranging from −15 to 15 months.Horizontal and vertical axes stand for the variable at time t and the variable at time t + n corresponding to the maximum coefficients, respectively.The response time can be indicated by the lag time of the maximum correlation coefficient.Generally, precipitation is a key factor affecting runoff and soil moisture.However, the relationships between precipitation and terrestrial water storage and groundwater storage were not obvious.For runoff, the lag time between precipitation and runoff was 1 month with an R of 0.69.The runoff can affect SM4 with 1 month lag time with an R of 0.71.The changes of Q and SM1-2, ET, and TWS, were synchronized.For SM, the response time of both of SM1- For runoff, the lag time between precipitation and runoff was 1 month with an R of 0.69.The runoff can affect SM4 with 1 month lag time with an R of 0.71.The changes of Q and SM1-2, ET, and TWS, were synchronized.For SM, the response time of both of SM1-2 to precipitation was 1 month.Synchronous changes were present among SM1-3.Compared with SM1-3, the changes of SM4 always lagged by 1 month.

Trend and Correlation Analysis of Different Variables
In addition, water storage interacts with vegetation growth.All of the response times of NDVI to SM2-4 and SWS were 3 months, with negative correlation coefficients of −0.52, −0.52, −0.48, and −0.52, respectively.It shows the impact of plant uptake on water resources.The change of ET and NDVI were synchronous, and they were highly correlated with an R of 0.93.

Correlation and Propagation time of Meteorological, Hydrological, and Agricultural Drought
Figure 4a,b represent the correlation and propagation times of meteorological, hydrological, and agricultural drought, respectively.The propagation time can be inferred from the maximum correlation coefficients between the DSI of different variables (described in the Y-axis), and SPI_n, SEI_n, SPEI_n, SSMI1/2/3/4_n, SSI_n, and SNI_n (described in the X-axis).For runoff drought, the relationship between DSI-Q and SPEI had the largest correlation coefficient with an R of 0.21.The propagation time from meteorological drought (SPEI) to hydrological drought (DSI-Q) may be 7 months.The soil drought was related to SPEI, SSI, and SPI.As the soil depth increases, the propagation time from meteorological and hydrological drought indicated by SPI, SEI, and SSI, to soil drought increases.SPEI had the greatest impact on soil drought with R of 0.60, 0.64, 0.67, and 0.67 for SM1-4, respectively.The propagation time from meteorological drought (SPEI) to soil drought at the depths of 0-10 cm, 10-40 cm, 40-100 cm, and 100-200 cm were about 3, 6, 8, and 11 months, respectively.logical drought (SPEI) to hydrological drought (DSI-Q) may be 7 months.The soil drought was related to SPEI, SSI, and SPI.As the soil depth increases, the propagation time from meteorological and hydrological drought indicated by SPI, SEI, and SSI, to soil drought increases.SPEI had the greatest impact on soil drought with R of 0.60, 0.64, 0.67, and 0.67 for SM1-4, respectively.The propagation time from meteorological drought (SPEI) to soil drought at the depths of 0-10 cm, 10-40 cm, 40-100 cm, and 100-200 cm were about 3, 6, 8, and 11 months, respectively.The correlation analysis of surface water storage drought was similar to soil drought, which was also affected by SPEI, SPI, and SSI.Compared with the correlation coefficients (greater than 0.60) between droughts of surface water storage and other types of drought (SPI, SPEI, SSMI, and SSI), the drought of groundwater storage had relatively low correlation coefficients with them (less than 0.44), indicating its complex influencing mechanism or the impacts of human activities [56,84].Except for the soil drought indicators, which were directly related to the calculation of groundwater, only SEI_12 (R = 0.39) and SNI_12 (R = −0.31)were most relevant to the drought of groundwater storage.The correlation analysis of surface water storage drought was similar to soil drought, which was also affected by SPEI, SPI, and SSI.Compared with the correlation coefficients (greater than 0.60) between droughts of surface water storage and other types of drought (SPI, SPEI, SSMI, and SSI), the drought of groundwater storage had relatively low correlation coefficients with them (less than 0.44), indicating its complex influencing mechanism or the impacts of human activities [56,84].Except for the soil drought indicators, which were directly related to the calculation of groundwater, only SEI_12 (R = 0.39) and SNI_12 (R = −0.31)were most relevant to the drought of groundwater storage.
In addition to SPEI and SPI, SNI was also related to the drought of terrestrial water storage.The impact of SNI on drought of terrestrial water storage was higher than the drought of soil, runoff, surface water storage, and groundwater storage.The propagation time from vegetation growth to the drought of terrestrial water storage may be 12 months.On the other hand, the soil moisture at depths of 100-200 cm affect the vegetation growth with an R of −0.36.

Performances of Drought Forecasting by LSTM, CNN, and CatBoost
According to the maximum Pearson correlation coefficients, predictors for hydrological drought and agricultural drought evaluated by DSI-TWS, DSI-GWS, and DSI-NDVI, were selected, which are shown in Table 1.These variables were normalized for constructing different machine learning methods at lead times n ranging from 0 to 3 months.The training and testing periods were 2002-2014, and 2015-2020, respectively.The performances of terrestrial water storage drought forecasting by LSTM, CNN, and CatBoost models were assessed by NSE, C, RMSE, and TS (Table 2).With the extension of lead times, all of the models deteriorated for terrestrial water storage drought forecasting with NSE values ranging from 0.30 to 0.52, from 0.28 to 0.70, and from −0.66 to −0.13 for LSTM, CNN, and CatBoost during testing periods, respectively.The CNN model performed significantly better than LSTM and CatBoost in terms of NSE, C, and RMSE for lead times up to 2 months.For TS, the performance of LSTM was slightly better than that of CNN for lead times from 1 to 2 months.Despite the poor performance of LSTM and CNN for a three-month lead time, LSTM produced slightly higher NSE and C values, and lower RMSE value than CNN.That may have resulted from long-term dependencies of LSTM architecture for longer lead time forecasting.Regarding the CatBoost model, it was overfitting, which obtained excellent forecasting results during the training period and extremely poor performance with NSE less than 0 during the testing period.The performances of models for groundwater and NDVI drought forecasting were worse than those for terrestrial water storage forecasting.That indicates more complicated influencing mechanisms for agricultural drought assessed by DSI-NDVI and groundwater drought than terrestrial water storage drought.The performances of LSTM and CNN for groundwater drought forecasting were similar, with NSE values at different lead times during testing periods ranging from 0.24 to 0.38, and from 0.25 to 0.33, respectively (Tables 3 and 4).In addition, neither of them had the predictive ability for NDVI drought forecasting with NSE values less than 0 during testing periods.In addition, the CatBoost model was not shown because of its overfitting problem. Figure 5 presents the NSE, C, RMSE, and TS values of CNN and CatBoost models trained with random splitting datasets for terrestrial water storage drought forecasting at different lead times.The LSTM model was not shown here because only the sequential inputs were used for its recurrent architecture.To avoid the impact of data amount on models, the ratio of training data to testing data for random splitting was same as those divided in chronological order.With the random splitting for training and testing datasets, both of CNN and CatBoost models produced higher accuracy than those with training data divided in chronological order, respectively.Taking the performances at a 2-month lead time as an example, the improvements of NSE, C, RMSE, and TS values for CNN during testing were 103.45%, 27.09%, 35.32%, and 34.53%, respectively, and the improvements of NSE, C, RMSE, and TS values for CatBoost during testing were 230.03%, 24.93%, 57.31%, and 131.63%, respectively.This indicates the tendency may deteriorate the forecasting performances of CNN and CatBoost models.In other words, some of the trends' influencing factors may be not selected as the model inputs; however, the random splitting for training and testing datasets eliminated the tendency as much as possible.Generally, the CNN model obtained better forecasting performances than the CatBoost model for terrestrial water storage drought forecasting.The CatBoost produced satisfactory results for drought forecasting during testing periods compared with the trained with data divided in chronological order.This suggests the promise of the CatBoost model if the input-output relationship is stationary.Figure 6 illustrates the comparison of the CatBoost model trained with different datasets for terrestrial water storage drought forecasting at 1-month lead time as an example.The forecasting results for the CatBoost trained with data divided in chronological order matched with the observations perfectly during training periods.However, the model always overestimated during testing periods.Figures 7 and 8 illustrate the performances of CNN and CatBoost models trained with random data splitting for groundwater and agricultural drought forecasting, respectively.Similarly, both the CNN and CatBoost models with random data splitting for training and testing performed better.In addition, the CNN model always obtained higher accuracy than the CatBoost model.Taking the forecasting results at 1-month lead time as an example, Figure 9 presents the comparison of CNN models trained with different datasets.Although the forecasting results were unsatisfactory for extreme values, the trends of forecasting results were basically consistent with the observations.For the CatBoost models, the random splitting method also improved the model significantly.However, the obvious differences between performances during training and testing periods still existed.Figures 7 and 8 illustrate the performances of CNN and CatBoost models trained with random data splitting for groundwater and agricultural drought forecasting, respectively.Similarly, both the CNN and CatBoost models with random data splitting for training and testing performed better.In addition, the CNN model always obtained higher accuracy than the CatBoost model.Taking the forecasting results at 1-month lead time as an example, Figure 9 presents the comparison of CNN models trained with different datasets.Although the forecasting results were unsatisfactory for extreme values, the trends of forecasting results were basically consistent with the observations.For the CatBoost models, the random splitting method also improved the model significantly.However, the obvious differences between performances during training and testing periods still existed.

Discussion
From the correlation analysis of raw variables and drought indicators, soil moisture (drought) always had a higher correlation than TWS (drought) with precipitation (SPI) and potential evapotranspiration (SPEI).This agrees with Forootan et al. [85] where the variations of soil moisture are dominated by precipitation, while terrestrial water storage involves complicated surface and subsurface processes.The significantly negative correlation coefficient between NDVI and deep soil moisture indicates the consumption of soil moisture for vegetation growth [86].However, the controlling factors affecting terrestrial water storage and groundwater storage were unclear from the correlation analysis, which makes it difficult to reveal their mechanism [87,88].
The correlation relationships between various factors indicate droughts can be different with various drought indicators and calculation methods.Similarly, the propagation time from meteorological drought to hydrological and agricultural drought can vary with different approaches, such as correlation analysis by different correlation calculation methods.The uncertainties should be further assessed when they are used to unravel drought mechanisms.
In general, the drought of terrestrial water storage has higher correlations with other impact factors considered in this study than groundwater drought, followed by NDVI.Similarly, the forecasting performances of different machine learning methods were ranked by the drought of total terrestrial water storage, groundwater drought, and agricultural drought indicated by NDVI.It demonstrated better prediction of machine learning methods for stronger correlation between input and output.Kim et al. [89] discussed the better forecasting performances for streamflow forecasting in the high flow (higher autocorrelation coefficients) regime than those in the low flow regime.The data with no autocorrelation, deemed as white noise, is difficult to forecast [90].
The gridded remote sensing data used in this study were processed as an average for analyzing drought propagation and prediction in the HRB.Although temporal propagation from meteorological drought to hydrological and agricultural was analyzed by correlation analysis, the spatial characteristics of meteorological, hydrological, and agricultural droughts were not shown in this study.However, spatial-based assessment, including characteristics, propagation, and forecasting, is of great importance for the interpretation of drought mechanisms [91][92][93].Thus, the integration of temporal and spatial characteristics and propagation will be the focus of future study.In addition, the drought evolution of onset, duration, extent, severity, and intensity should be analyzed further in different time scales and seasons.
According to the relatively low forecasting performances for hydrological and agricultural drought shown in Tables 2-4 some improvements need to be made, especially for the groundwater and NDVI drought forecasting.In addition, the change of groundwater storage derived from GRACE and GLDAS products was not validated.More data, such as the groundwater level, and hydrologeologic properties [47], the atmospheric circulation indices [94], spatiotemporal related variables, and human activities, etc., should be collected for comprehensive analysis of drought propagation and forecasting.
Although the impacts of other factors, such as the amount of training samples, on the prediction performances were controlled as much as possible, the uncertainty due to random splitting of training and testing data can still remain.Thus, the random splitting methods were utilized for the drought forecasting of terrestrial water storage, groundwater, and NDVI drought at lead times of 0-3 months to obtain more convincing results.For the comparison of different machine learning methods, the same training and testing data were used for drought forecasting after random splitting.However, all of the comparisons between machine learning methods trained with random splitting methods and those trained with data divided in chronological order indicated that the inconsistency between training and testing datasets resulted from the tendency can damage the generalization performances of machine learning models.Ovadia et al. [95] also discussed the predictive uncertainty under dataset shift between training data and testing data.In addition to the random splitting of training and testing data, more methods should be designed to deal with the nonstationary characteristics of the data [96,97].

Conclusions
The propagation from meteorological drought to hydrological drought and agricultural drought was analyzed based on correlation analysis in this study.However, the correlation analysis of original data and drought indicators are obviously different.For the correlation analysis of original data, precipitation is the key factor affecting soil water, runoff, and surfacewater storage with R values of 0.65, 0.69, and 0.53, respectively.Except for SM3, the changes of all of them responded to precipitation with a 1-month lag time.Evapotranspiration is also critical for the changes of soil water, runoff, and surfacewater storage.Except for shallow soil moisture at the depth of 0-10 cm, deep soil moisture was negatively correlated with evapotranspiration.It is worth noting the impacts of vegetation growth on consuming surface water storage and terrestrial water storage significantly with R values of −0.52 and −0.33, respectively.Thus, the interaction between vegetation growth and water resources should be explored further.
According to the correlation analysis of drought indicators, the propagation time from meteorological drought (SPEI) to runoff drought, surface water drought, and terrestrial water storage were around 7, 8, and 11 months with R values of 0.21, 0.68, and 0.48, respectively.In addition to the effects of meteorological drought on hydrological drought, the vegetation growth can lead to hydrological drought with 12-month a lag time.Furthermore, the agricultural drought indicated by vegetation growth can be also affected by the drought of soil moisture at depths ranging from 100 cm to 200 cm with an 8-month lag time, and an R value of −0.36.
For hydrological and agricultural drought forecasting, CNN always performed better than LSTM and CatBoost models, and different splitting methods of training and testing data showed a significant influence on the prediction of non-stationary data.The random data-splitting method improved the CNN and CatBoost models trained with data divided in chronological order obviously.For example, the maximum improvements of NSE, C, RMSE, and T values for terrestrial water storage drought forecasting based on CNN models were 103.45%, 27.09%, 35.32%, and 34.53%, respectively.While randomizing the training and test sets is a parsimonious approach for the improvement of non-stationary data forecasting, it is not suitable for recurrent models such as LSTM.New approaches should be introduced to explore the drought transmission mechanism under a changing environment.
The groundwater drought forecasting and agricultural drought forecasting indicated by NDVI were worse than the drought forecasting of terrestrial water storage even after the random splitting methods, suggesting the complicated influencing mechanisms of groundwater drought and vegetation growth.The data used were limited to the averaging time series in HRB.Data concerning variant spatiotemporal factors and invariant spatial properties should be integrated for a more comprehensive study of hydrological and agricultural drought mechanisms.nasa.gov/missions-and-measurements/products/MOD16A2,accessed on 15 August 2021.Monthly streamflow can be accessed at http://www.hrc.gov.cn,accessed on 10 April 2023.GLDAS and GRACE data can be downloaded from https://disc.gsfc.nasa.gov/,accessed on 16 August 2021 and http: //www2.csr.utexas.edu/grace/RL06_mascons.html, respectively.The NDVI dataset can be obtained from https://ladsweb.modaps.eosdis.nasa.gov/missions-and-measurements/products/MOD13C2,accessed on 18 March 2023.

3. 4 .
Machine Learning Methods for Drought Forecasting 3.4.1.Long Short-Term Memory Neural Network (LSTM) N h , N f , and N m are numbers of hits (both observations and forecasting values indicate the occurrence of drought), false alarms (forecasting values indicate drought but observations record no drought), and misses (observations record drought but forecasting values indicate no drought), respectively.The ranges of NSE, C, and RMSE are (−∞, 1], [−1, +1], [0, +∞), respectively.The NSE value becomes 1, when the forecasting values are exactly the same as the observed values.When predictions are worse than the simple average value, it becomes less than 0.

Figure 2 25 Figure 2 .
Figure 2 displays the trend analysis of P, ET, PET, SM1-4, Q, water storage anomaly, and NDVI values by the M-K test and Kendall slope methods.Increasing trends were found for the precipitation, evapotranspiration, potential evapotranspiration, and NDVI.In particular, the increasing trend of NDVI was significant at a confidence level of 95%.Although the precipitation increased, the water resource represented by soil moisture, runoff, and water storage in the Huaihe River basin decreased obviously.This may result from the evaporation loss, vegetation growth, and other human activities.Furthermore, the surface water storage, groundwater storage, and terrestrial water storage decreased significantly between June 2012, May 2015, and May 2013, respectively.Remote Sens. 2023, 15, x FOR PEER REVIEW 9 of 25

Figure 2 .
Figure 2. M-K test and Kendall slope of monthly time series.

Figure 3
Figure3presents the maximum cross-correlation coefficients (R) among all variables at different lag time ranging from −15 to 15 months.Horizontal and vertical axes stand for the variable at time t and the variable at time t + n corresponding to the maximum coefficients, respectively.The response time can be indicated by the lag time of the maxi-

Figure 2 .
Figure 2. M-K test and Kendall slope of monthly time series.

Figure 4 .
Figure 4. Pearson correlation analysis among meteorological drought, hydrological drought, and agricultural drought.(a) Maximum correlation coefficients; (b) propagation time.No significant correlations were shown in dotted line.

Figure 4 .
Figure 4. Pearson correlation analysis among meteorological drought, hydrological drought, and agricultural drought.(a) Maximum correlation coefficients; (b) propagation time.No significant correlations were shown in dotted line.

25 Figure 6 .
Figure 6.Comparison of CatBoost models trained with different datasets for terrestrial water storage drought forecasting.

Figure 6 .
Figure 6.Comparison of CatBoost models trained with different datasets for terrestrial water storage drought forecasting.

Figure 7 .
Figure 7. Performances of CNN and CatBoost for groundwater drought forecasting during training (left column) and testing (right column) periods by random splitting at different lead times.(a,b) NSE; (c,d) C; (e,f) RMSE; (g,h) TS.

Figure 7 .
Figure 7. Performances of CNN and CatBoost for groundwater drought forecasting during training (left column) and testing (right column) periods by random splitting at different lead times.(a,b) NSE; (c,d) C; (e,f) RMSE; (g,h) TS.

Figure 8 .
Figure 8. Performances of CNN and CatBoost for agricultural drought forecasting during training (left column) and testing (right column) periods by random splitting at different lead times.(a,b NSE; (c,d) C; (e,f) RMSE; (g,h) TS.

Figure 8 .Figure 9 .
Figure 8. Performances of CNN and CatBoost for agricultural drought forecasting during training (left column) and testing (right column) periods by random splitting at different lead times.(a,b) NSE; (c,d) C; (e,f) RMSE; (g,h) TS.

Figure 9 .
Figure 9.Comparison of CNN models trained with different datasets for drought forecasting: (a) terrestrial water storage; (b) groundwater; (c) NDVI.

Table 1 .
Inputs for drought forecasting models.

Table 2 .
Comparison of terrestrial water storage drought forecasting with different models for lead times up to 3 months.

Table 3 .
Comparison of LSTM and CNN for groundwater drought forecasting.

Table 4 .
Comparison of LSTM and CNN for agricultural drought forecasting.