1. Introduction
Global warming makes extreme events more likely to occur [
1,
2,
3], especially extreme precipitation events [
4,
5,
6]. The increasingly frequent extreme precipitation has led to more drought events [
7,
8]. Among all kinds of natural disasters around the world, drought is extremely destructive, and its main manifestation is water shortage [
9]. The impacts of drought have both cumulative and persistent characteristics, and their effects remain in the ecosystem for a long time even after the drought ends. This effect is more pronounced in arid and semi-arid regions [
10,
11]. Therefore, a systematic study of long-term precipitation changes and their associations with droughts and extreme precipitation events is conducive to revealing the characteristics and evolution mechanisms of climate abrupt changes and providing a scientific basis for disaster risk assessment and the enhancement of disaster resilience.
At global and regional scales, the dynamic system formed by the ocean and atmosphere creates a complex and variable climate background. This system significantly alters water vapor transport and water availability, thereby becoming a key driving factor that directly controls the magnitude, occurrence frequency, and persistence duration of extreme precipitation events [
12,
13,
14]. Large-scale climate modes such as the Atlantic Multidecadal Oscillation and the Pacific Decadal Oscillation have been identified as the dominant factors influencing the spatiotemporal distribution of extreme precipitation and drought conditions in China’s monsoon region and the Yangtze River Basin, and this role is particularly prominent in the process [
15,
16,
17].These climatic phenomena generate a typical regional spatial pattern of droughts and floods by modulating atmospheric circulation and air-sea interactions [
18], and they can trigger extensive precipitation anomalies and severe drought conditions during El Niño events [
19,
20]. For the variation patterns of such hydrological extremes in non-stationary environments, the research of scholars at home and abroad mainly focuses on three aspects: the processing of non-stationarity in hydrological series [
21], the identification of abrupt change points [
22], and the analysis of periodic characteristics [
23]. In the exploration of the spatiotemporal variation characteristics of extreme precipitation events, the Mann–Kendall test, wavelet analysis, and Copula function are several commonly used technical means in the academic field [
24,
25]. In the field of non-stationary frequency analysis, the Generalized Additive Models for Location, Scale and Shape (GAMLSS) have gradually developed into a core technical tool due to their ability to flexibly represent the complex relationships between distribution parameters and various explanatory covariates (such as climate indices and reservoir regulation indices) [
26,
27]. This methodology has been effectively applied and widely adopted in investigating non-stationary drought patterns through the integration of climate indices [
28].
The Yellow River Basin (YRB), serving as a critical agricultural ecosystem in China, has long faced severe drought challenges, which are closely associated with regional topographic variability, complex climatic conditions, and uneven water resource distribution [
29,
30]. Wang et al. [
31] applied Copula functions to analyze drought duration and severity using the SPEI in the Yellow River Basin, while Guo et al. [
32] assessed hydrological drought risk and its spatial transmission through a three-dimensional Copula framework. Zhang et al. [
33] analyzed the evolution characteristics of meteorological drought under future climate change in the middle reaches of the YRB based on the Copula function. Yang et al. [
34] selected the SPEI (Standardized Precipitation Evapotranspiration Index) as the drought measurement indicator and, based on this, analyzed the changing trends and abrupt points of temperature and precipitation, while also exploring the meteorological drought characteristics of the YRB and the changes in the drought recovery process on an annual scale. Facing severe challenges such as extreme drought and the intensification of water resource supply and demand contradictions, Feng et al. [
35] adopted a dual-index framework composed of SPI and SPEI to quantitatively analyze historical drought and flood conditions and predict future trends under various scenarios. However, these related studies all relied on the assumption of stationarity. Cui et al. [
36] constructed a non-stationary standardized runoff index (NSRI) under the GAMLSS framework using four local driving factors (precipitation, temperature, water withdrawal, and reservoir index), providing a drought assessment scheme for the YRB. Li et al. [
37] applied the GAMLSS model to quantify the effects of precipitation and agricultural planting changes on seasonal runoff across five hydrological stations, demonstrating the model’s regional capability in hydrological analysis. Yu et al. [
38] further developed a covariate-based standardized runoff index (SRI_cov) across the seven sub-basins of the YRB, revealing that drought events identified by traditional SRI often deviated from the actual distribution and that abnormal water supply conditions were poorly captured. Despite these advances, existing non-stationary assessments in the YRB have concentrated on runoff-based hydrological drought, leaving the precipitation-driven meteorological drought under large-scale climate oscillations largely unexplored. The delayed teleconnection effects of climate indices over a 0–12 month window have not been systematically screened, and Copula-based joint risk analysis coupled with machine learning prediction has yet to be incorporated into the non-stationary framework. Most of the YRB is located in an area with relatively low precipitation and high evaporation, and the non-stationary characteristics of the precipitation sequence have often been overlooked. In this context, the SPI more effectively reflects drought intensity and duration, enabling consistent drought assessment across multiple temporal scales and geographical regions, which accounts for its widespread adoption [
39], but its stationary assumption limits its applicability in a changing climate. This study addressed these gaps by constructing a non-stationary standardized precipitation index (NSPI) driven by optimally lagged climate factors, coupling it with Copula joint distribution analysis for bivariate return period estimation, and employing an RF-LSTM model for dynamic drought projection across the YRB.
This study follows the logical sequence of “mechanism identification—model construction—evaluation and verification—prediction application”, screening out key climate factors such as AOI and PDO and analyzing the differences in their roles and lags in the spatiotemporal distribution of precipitation. To fit the monthly precipitation series, this study established a non-stationary model, in which multiple factors were incorporated as covariates. By comparing it with traditional stationary models, its superiority and applicability in a changing environment are verified. In the drought assessment stage, drought characteristic variables are extracted, and the differences in the identification of historical drought events between traditional SPI and NSPI are compared. By using the Copula function, the two variables of drought duration and intensity are constructed into a two-dimensional joint distribution to test whether the non-stationary framework can effectively describe the extreme drought process. To expand the predictive perspective, a hybrid machine learning model combining random forest and LSTM is further developed. It extracted the temporal autocorrelation patterns from the NSPI series to validate short-term drought characteristics, providing a basis for precise drought risk prevention and control in the YRB.
3. Methods
3.1. Standardized Precipitation Index (SPI)
The calculation procedure for the conventional SPI can be summarized in three steps. For a time scale of
months, the cumulative precipitation sequence is fitted using a Gamma distribution, and the specific expression is shown in the corresponding formula (1). The derivation expression of the cumulative distribution function (CDF) of the precipitation sequence based on the Gamma distribution (GA) is shown in Formula (2). Finally, the CDF is standardized to conform to a normal distribution with
= 0 and
= 1, and the SPI value is calculated accordingly [
42]. The calculation formula is as follows:
In the formula, represents the time scale, represents the month, is the cumulative precipitation sequence of the -month scale in the mth month, and the precipitation in the ()th month is denoted as .
In the formula, f(·) corresponds to the probability density function of the Gamma distribution. For SPI, the GAMLSS GA family parametrization was adopted, where
and
respectively represent the mean and the dispersion, with both estimated from the historical precipitation series and treated as fixed constants.
The cumulative probability over a certain time scale is:
The SPI is obtained by normalizing the cumulative probability:
In the formula, the CDF corresponding to the standard normal distribution has its inverse function denoted as .
3.2. Non-Stationary Standardized Precipitation Index (NSPI)
The methodology for computing the NSPI in this study is conducted in two sequential steps: initially, an appropriate set of climate factors is preliminarily selected to constitute a covariate combination; subsequently, a GAMLSS model is constructed using the R programming language and its performance is rigorously evaluated.
3.2.1. Screening Climate Factors
Research indicates that large-scale climate indices such as the North Atlantic Oscillation (NAO), Pacific Decadal Oscillation (PDO), and Atlantic Multidecadal Oscillation (AMO) are associated with droughts in various regions around the world [
43]. When evaluating the teleconnection relationship between hydrological variables and climate models, common methods include Kendall and Spearman correlation analyses [
44]. For the six selected climate factors, within the lag range of 0 to 12 months, the Kendall correlation test at a significance level of 0.05 was used to screen out the optimal lag duration corresponding to the cumulative precipitation dataset and the most suitable large-scale climate oscillation. The specific steps are as follows:
For the cumulative precipitation given in month , a series of -month scale monthly average climate indices, denoted as , were determined by considering different lead times (L months). For each month and each climate index, 13 sets of data are constructed respectively: one set has no lead time (L = 0), and the other 12 sets correspond to different lead times (L = 1 to 12), to test their correlation with the corresponding cumulative precipitation sequence in month . Finally, for each month, the sequence of climate index lead times with the maximum correlation was selected as the main covariates for establishing the NSPI. A two-stage screening strategy was adopted to handle the large candidate pool. In the first stage, Kendall rank correlation was computed between the 12-month cumulative precipitation and each of the 78 lagged climate variables (6 indices × 13 lags). The top 10 variables with the strongest absolute correlations were retained as the candidate set for each month. This pre-screening reduced the dimensionality of the predictor space and ensured numerical stability. In the second stage, the GAMLSS stepGAIC function performed backward elimination based on the AIC criterion, starting from an initial model containing the top 5 candidates and testing all remaining candidates from the first stage. Only variables that provided sufficient improvement in model fit (ΔAIC > 2) were retained in the final model. This approach effectively shifted the selection criterion from individual correlation significance to overall model improvement, while the stepwise procedure inherently guarded against overfitting by excluding redundant predictors.
3.2.2. GAMLSS
Generalized Additive Models for Location, Scale and Shape (GAMLSS) represent a semi-parametric regression framework designed to assess non-stationarity in hydrometeorological time series. This approach models the distribution parameters of the response variable using linear or nonlinear functions of explanatory variables. For further details on GAMLSS, refer to the cited literature [
26]. For the GAMLSS model, the observed value
is fitted to the probability density function
, where
=
. The vector composed of
parameters of the probability density function
is denoted as
. The “k”th parameter of
is characterized as a location, scale, or shape parameter, and is linked to the covariates (explanatory variables) through a monotonic link function
.
The GAMLSS method is used to solve the location and scale parameters of the non-stationary gamma distribution. Each of the two parameters forms a linear relationship with the large-scale climate index in the covariates. By means of the multiple stepwise regression method provided by the GAMLSS software package (R version 4.5.1), multiple meteorological factor covariates were screened to obtain the optimal combination. The non-stationary parameters and are as follows:
The location (
) and scale (
) parameters of the two-parameter Gamma distribution were fitted:
In the formula, represents the -th climate change factor, and and are the coefficients of the regression equation.
After estimating the non-stationary parameters
and
using the GAMLSS model, the cumulative precipitation
was fitted to a Gamma (
,
) distribution. The Gamma family was used with the default log-link functions for both
and
, ensuring positivity of the estimated values. The stepwise selection followed a two-stage procedure: first, covariates for μ were selected by backward elimination using the AIC criterion, with a scope ranging from the intercept-only model to the full set of candidate climate indices at their optimal lags; second, covariates for σ were selected, starting from the intercept-only model with the same upper scope. The entry and retention threshold was set at ΔAIC > 2. Across all 12 months, no covariates met the inclusion threshold for σ, indicating that the climate indices provided sufficient explanatory power for the location of the precipitation distribution but not for its scale. The cumulative probabilities were then transformed into a standard normal distribution via a specified equation to derive the NSPI. To mitigate model overfitting, the Akaike Information Criterion (AIC), Bayesian Information Criterion (BIC), and Global Deviance (GD) were employed for model selection. Worm plots and Q-Q plots were utilized for visual residual diagnostics and comparative analysis of model efficacy. Through analytical screening, a time scale of 12 months was selected. Each candidate model was fitted independently to the cumulative precipitation series at every 10 km grid cell across the basin. The drought classification criteria are provided in
Table 1 [
45].
3.3. Run-Length Theory
The run-length theory was applied to the SPI12 and NSPI12 series at each 10 km grid cell to identify drought events and their extract duration, severity, and peak values. This procedure was performed independently across the basin to capture spatial variations in drought characteristics. Basin-averaged results were subsequently derived from these grid-cell outputs to represent the overall drought regime of the entire YRB. Previous research has confirmed the feasibility of this method [
46]. A drought event is defined as a continuous period during which drought index values persistently fall below a predetermined threshold. For a drought event, its duration (D) is the length of time from the beginning to the end. For a drought event, the cumulative value of the drought index is taken as the events severity (S); the lowest value reached by the index during the event is defined as the peak severity (I) [
47].
Following Yevjevich (1967) [
48], this study set three threshold levels, denoted as X0, X1 and X2, corresponding to 0, −0.5 and −1.0, respectively. The threshold of −0.5 (X1) served as the criterion for drought event onset, −1.0 (X2) provided an additional confirmation threshold for events lasting exactly one month, and 0 (X0) served as the merge criterion for events separated by a single non-drought month. A drought event is considered to occur when the drought index is lower than −0.5. If a drought lasts exactly one month and the index is lower than −1, it is confirmed as a drought event; both conditions must be met, otherwise it is not recognized. For two drought events, if they are separated by only one month and the index value during the interval is less than 0, they are combined into one event. The duration of the combined drought is equal to the sum of the durations of the two events plus one month, and the intensity is the sum of the intensities of the two events.
3.4. Trend and Turning Point Test
Climate change, natural geographical conditions and human activities can affect hydrological sequences within a specific period, leading to significant changes before and after a certain point in time [
38]. In this study, after obtaining the NSPI based on the GAMLSS model, basin-averaged SPI12 and NSPI12 series were constructed, and trend analysis and change point detection were carried out on these sequences.
For trend testing, the Trend-Free Pre-Whitening Mann–Kendall (TFPW-MK) method was adopted. The standard M-K test requires observations to be independent, but hydrological time series usually contain serial autocorrelation, which raises the chance of detecting false trends. The TFPW-MK method addresses this problem through three steps [
49]. First, a trend was estimated and removed from the original series to obtain a detrended residual series. Second, the lag-1 autoregressive component was removed from these detrended residuals through pre-whitening to eliminate serial correlation. Third, the previously removed trend was added back to the pre-whitened residuals, and the standard M-K test was then performed on this reconstructed series. The Z value and the slope were computed to assess the significance and magnitude of the trend.
The TFPW-MK procedure produces three standard outputs. The Z value is the standardized test statistic derived from the Kendall score; under the null hypothesis of no trend, it asymptotically follows the standard normal distribution [
49]. The
p-value represents the probability of observing a test statistic at least as extreme as the computed Z value if the null hypothesis were true. The Sen slope is a nonparametric estimator of trend magnitude, computed as the median of all pairwise slopes between observations. By design, the TFPW-MK method reports the Sen slope as a point estimate together with the Z value and the
p-value, but does not produce an analytical confidence interval for the slope [
50]. This is because the Sen slope is based on the median of pairwise differences, and its exact sampling distribution under the pre-whitening transformation does not admit a simple closed-form variance estimator. Consequently, inference is conventionally drawn from the joint consideration of the highly significant
p-value, the direction of the Sen slope, and the large sample size [
51].
For change point detection, the Pettitt test was applied. This test identifies a single change point by locating the point that maximizes the difference between the ranks of observations before and after it [
52]. The mean values before and after the identified change point were calculated to quantify the shift magnitude.
3.5. Copula Function
Through the Copula function, two or more probability distributions can be integrated into a multivariate distribution. Compared with traditional multivariate distributions, a core difference of the Copula lies in that it does not require the probability distribution types of each variable to be similar to each other [
53]. For continuous random variables X and Y, let F(x,y) denote their joint distribution function, and let the marginal distributions be
and
respectively. According to Sklar’s theorem, there exists a unique copula function C(
that satisfies the following equation:
This study first identified drought events based on the SPI12 and NSPI12 sequences using the run-length theory, and extracted the drought duration D and drought severity S for each event. Subsequently, it fit the Gamma, Weibull, Normal, and Lognormal marginal distributions to D and S and constructed the D-S joint distribution using five Copula functions: Gaussian, t, Clayton, Frank, and Gumbel. The selection of marginal distributions and Copula functions was based on the minimum AIC value, with RMSE as an auxiliary goodness-of-fit indicator.
The joint recurrence period is defined as the AND-type joint exceedance recurrence period when both the drought duration and drought intensity reach or exceed a specified threshold, rather than being directly calculated from the joint cumulative distribution probability. For a given scenario (d0, s0), the joint exceedance probability and joint recurrence period are [
54]:
In the formula, and are respectively the cumulative edge distribution functions of the duration of drought D and the intensity of drought S; is the joint distribution function; and E(L) is the average interval of drought events. The joint recurrence period was calculated under conditions of moderate drought (D = 3, S = 3) and severe drought (D = 6, S = 9), and the inverse distance weighting method was used for spatial interpolation of the grid-scale return periods.
3.6. Machine Learning—Random Forest (RF) + LSTM
In the field of long-term prediction within earth sciences, the capacity of machine learning algorithms to process nonlinear processes has garnered empirical support [
55]. Random Forest (RF), as an ensemble learning model, can effectively deal with high-dimensional features, evaluate variable importance, and reduce overfitting risk through its ensemble mechanism, providing robust initial feature selection and nonlinear fitting capabilities for the model [
56]. As a variant of the recurrent neural network, the LSTM model has the ability to autonomously learn the long-term dependency features in time series and can precisely depict the dynamic change process of hydro-meteorological data. This model was first proposed by Hochreiter and Schmidhuber (1997) [
57]. This paper adopted a cascaded RF-LSTM architecture for NSPI12 prediction. The RF component was employed to correct the residuals of an LSTM baseline that was trained on 12-month lagged NSPI values and seasonal encodings. Specifically, the LSTM first established a baseline prediction, and the RF was subsequently fitted to the residuals between the observed NSPI and the LSTM output, using the same lagged NSPI features plus the LSTM prediction as an additional covariate. The final prediction was obtained by adding the RF residual estimate to the LSTM baseline. The model was trained on 1961–1988, validated on 1989–2016, and assessed on 2017–2024. This combined model was applied to the NSPI12 series at each 10 km grid cell across the basin.
4. Results
4.1. The Construction of Non-Stationary Models
4.1.1. Screening of Climatic Factors
To explore the influence of climate factors in each month, six climate indices—namely the Arctic Oscillation Index (AOI), the North Atlantic Oscillation Index (NAO), the Pacific Decadal Oscillation Index (PDO), the Atlantic Multidecadal Oscillation Index (AMO), the Southern Oscillation Index (SOI), and the North Pacific Index (NPI)—were selected. The Kendall correlation test was applied to the basin-averaged cumulative precipitation series to screen each climate index at a significance level of 0.05. The results of the significance tests at 0.01 and 0.05 levels (
Table 2) show that the optimal lag order of different climate factors varies in the same month. Even for the same factor, the selected results in the same month can also be different. For example, in January, the
p-value of AOI
10 is 0.007 and the
p-value of AOI
6 is 0.018. The
p-value of AOI
10 is the smallest among all factors, based on which it is determined as the optimal driving factor for climate change in January, and the corresponding optimal lag order is also identified. Most months have significant correlations with AOI, PDO, and SOI. NPI also has a relatively good correlation with precipitation changes, mainly affecting the months from April to June and September. NAO is significantly correlated with the cumulative precipitation in May, July, and August. AMO did not pass the significance tests at 0.01 and 0.05 levels in the cumulative precipitation sequence of the 12 months. However, AMO was retained in the initial candidate set based on documented teleconnections with decadal precipitation variability in the YRB, and the stepwise GAMLSS procedure subsequently excluded it from all final models (
Table 3), confirming that it did not contribute additional explanatory power beyond the other retained covariates.
4.1.2. GAMLSS
This study sets up three types of models: Model
0 is the stationary type, where the distribution parameters have fixed values and do not change with external factors. Model
1 introduces time as a covariate, and the distribution function has a linear relationship with time t. Model
2 uses the optimal climate factor as a covariate, and the distribution parameters form a linear expression with it. GD, AIC criterion, and SBC criterion were adopted to prevent overfitting of the models. The results are shown in
Figure 2. For the Yellow River Basin (YRB) at a 12-month scale, among all the models, Model
2, with climate factors as covariates, had the smallest median values of GD, AIC, and SBC. The ΔAIC values between Model
2 and Model0 ranged from −15.2 to −5.4 across the 12 months, and between Model
2 and Model
1 from −16.8 to −7.3. All differences substantially exceeded the threshold of 2, providing strong evidence that Model
2 was the superior specification. Under the minimization criterion, after introducing climate factors as covariates into the non-stationary model, the goodness of fit of this model for the 12-month cumulative precipitation data in the YRB is higher than that of the stationary model and the time-varying model. In terms of the cumulative precipitation sequence of the study area, the selected non-stationary model has higher applicability.
Table 3 presents the covariate composition of the distribution parameters
and
of the optimal non-stationary models for the cumulative precipitation series of each month. It can be seen that the best covariate combinations after optimization are different, and the lag times of the same climatic factors in the same month are also different. For parameter
, SOI is selected most frequently, and different lags of it are optimal covariates for the distribution parameters in the same month. The simultaneous inclusion of multiple SOI lags reflects the persistent influence of the Southern Oscillation on precipitation at different time scales. Each retained lag provided additional explanatory power, as measured by the AIC improvement criterion, and the stepwise procedure automatically excluded redundant lags. Among the 12 months, AOI is selected 7 times, NPI is selected in May and June, and PDO and NAO are each selected once. Although the parameter
failed to match the optimal covariates, the estimation results of parameter
indicate that the mean level shows stronger non-stationary characteristics of the precipitation sequence, with the dominant factors being SOI and AOI. This pattern reflects that large-scale climate indices primarily influence the location of the precipitation distribution rather than its spread.
To evaluate the reliability degree of the optimal non-stationary model, this paper analyzed its residual sequence and goodness of fit. The statistical indicators calculated for the residual sequence of the optimal non-stationary model in March are as follows: the mean is 0, the variance is 1.02, the skewness coefficient is 0.05, the kurtosis coefficient is 3.22, and the Filliben coefficient is 0.987; in June, the corresponding indicators of Model2 were 0, 1.02, −0.57, 2.62 and 0.980; in December, the corresponding indicators of Model2 were 0, 1.02, 0.22, 3.59 and 0.985. The indicators for the remaining months also meet the following evaluation requirements: the mean approaches 0, the variance approaches 1, the skewness coefficient approaches 0, the kurtosis coefficient approaches 3, and the Filliben coefficient is not less than 0.978. Overall, on a 12-month scale, Model2 showed good rationality in each month in the YRB. In the results presented in
Figure 3, the data points in the normal Q-Q plot lie approximately near the 45°reference line; the majority of residual points in the worm plot fall within the bounds of the 95% confidence interval. These features reflect a satisfactory fitting effect of the optimal non-stationary model.
4.2. Non-Stationary Meteorological Drought Assessment
4.2.1. Analysis of the Applicability of NSPI
Based on the numerical values output by the optimal non-stationary model, cumulative probability calculation and standardization were conducted to obtain the NSPI. A comparison and analysis with the traditional SPI were carried out, as shown in
Figure 4. On a 12-month time scale, although the trends of SPI and NSPI over time are generally consistent, there are also certain differences, which may be attributed to the different results of precipitation estimation by the stationary model and the non-stationary model. The TFPW-MK test results (
Figure 5, left column) showed that both series had a statistically significant upward trend. The SPI12 series had a Z value of 6.47 (
p < 0.0001) and a Sen slope of 0.0044/a. The NSPI12 series had a Z value of 7.53 (
p < 0.0001) and a Sen slope of 0.0044/a. Both Z values exceeded the critical value of 1.96 (α = 0.05). The NSPI12 Z value was higher than that of SPI12 (7.53 versus 6.47), which indicated that the non-stationary framework detected a more pronounced trend after climate covariates were included. Both results showed that dry and wet conditions in the YRB improved gradually over the study period. These Z values were higher than those from the standard M-K test, confirming that the upward trend remained significant after removing the effect of positive serial autocorrelation. The Pettitt test results (
Figure 5, right column) identified a change point around 2004 for the SPI12 series and around 2012 for the NSPI12 series. In both cases, the mean values after the change point were higher than those before it, indicating a shift toward wetter conditions. The difference in the timing of the change points between the two indices reflected the different responses of the stationary and non-stationary frameworks to climate regime shifts. Both series showed clear interannual and interdecadal fluctuations. The average value of SPI12 was 0.90, ranging from −1.5 to over 2.5. This positive offset reflected the regional parameterization strategy: the Gamma distribution for SPI was fitted to the basin-averaged precipitation series and applied uniformly to all grid cells, rather than fitting pixel by pixel. Under this approach, spatial heterogeneity in precipitation regimes across the YRB prevented a perfect basin-mean standardization of zero. The relatively moist period occurred between the early 1960s and the late 1970s, while the significant drought peaks were in 1986–1987, 1992, and 1999–2001. The NSPI12 values fluctuated around −0.14: the values were generally high in the 1960s and 1970s, decreased significantly in the 1980s and 1990s, and showed an upward trend after the change point around 2012.
Based on the running theory, the drought characteristic variables of SPI and NSPI were extracted. The specific results are shown in
Figure 6. From 1965 to 1966, SPI identified an extreme drought lasting for 13 months, while NSPI identified a moderate drought lasting for 16 months. Between 1979 and 1982, SPI identified a mild drought from February to April 1980 and a moderate drought from July 1980 to July 1981, while NSPI identified a mild drought lasting for 26 months from October 1979 to November 1981. The identification results show that both SPI and NSPI indicators can reflect the evolution trajectory of drought in the YRB from 1960 to 2024, but there are certain differences in the specific manifestations of drought (including the start and end times, duration and drought grades).
To better analyze and compare some aspects of drought evolution between SPI and NSPI, the drought characteristic variables and drought grades at the same drought onset time were extracted, as shown in
Table 4. In 1960, 1969, 1995, and 2008, both SPI and NSPI classified the droughts as mild droughts, but the severity of droughts assessed by NSPI was often higher. For instance, in 1960, both were classified as mild droughts, but the drought intensity of NSPI was 5.800 and the drought severity was −0.967, both higher than those of SPI, which were 4.359 and −0.726 respectively. This indicates that NSPI assessment of droughts is usually more severe. In other years, the drought grades assessed by NSPI were higher than those by SPI. For the drought that started in October 1962 and lasted for 7 months, the precipitation in the YRB during this period was 106.6 mm. For the same event, NSPI classified the severity as moderate drought, while SPI determined it as mild drought. In 1974 and 1986, SPI classified the droughts as moderate, but NSPI classified them as severe. Under the standards of drought intensity and drought severity, NSPI was also significantly higher. Additionally, for 1969, SPI indicated that the drought lasted for 8 months, while NSPI only identified 2 months, but the drought intensity of NSPI was −0.713, which was lower than that of SPI, which was −0.651. This reflects the different sensitivities of the two indicators in identifying drought characteristic variables and drought grades. Overall, NSPI usually assigned higher drought grades and greater severity but lower intensity than SPI in most typical drought years. These differences arose because the two indices standardized precipitation against different reference distributions: NSPI conditioned on time-varying climate covariates through the GAMLSS model, whereas SPI used a stationary Gamma distribution fitted to the full historical record. Consequently, they quantified drought relative to different baselines and answered distinct scientific questions. Neither index was intrinsically more accurate; the choice between them depended on whether the analysis goal was to assess drought against a climate-conditioned baseline (NSPI) or an unconditional historical baseline (SPI).
4.2.2. Comparative Analysis of Drought Grades in Different Decades
The threshold values of SPI and NSPI indices can be used to identify meteorological drought events in the YRB. To better compare the two drought indices, the index series were divided into seven periods according to decades: P1 (1960–1969), P2 (1970–1979), P3 (1980–1989), P4 (1990–1999), P5 (2000–2009), P6 (2010–2019), and P7 (2020–2024). Because P7 covered only 5 years, the drought frequencies in
Figure 7 were expressed as percentages (drought months divided by total months in each period), which normalized the differing period lengths in the denominator. Considering the occurrence frequencies of four drought grades according to the drought grade classification table, the results are shown in
Figure 7.
As shown, from 1960 to 2024, SPI and NSPI have exhibited distinct patterns in the frequency of droughts of different severity levels. In view of the non-stationary characteristics of the climate system (such as long-term change trends and abrupt turning points), NSPI takes them into account during the modeling process. This approach may lead to systematic differences between NSPI and SPI in reflecting the frequency of droughts. As shown in
Figure 7, for mild drought events, SPI-based frequencies exceeded those of NSPI in most periods except P2, P3 and P7. For moderate droughts, NSPI identified higher frequencies of moderate drought events than SPI in all periods except P3. In the case of severe droughts, NSPI showed higher frequencies than SPI during P1 and P2; neither index detected severe droughts in P6 or P7. Although NSPI produced lower frequencies than SPI in the remaining periods, the differences were minimal. For extreme drought events, neither SPI nor NSPI identified any occurrences during P2, P6 and P7; in all other periods, SPI yielded higher frequencies than NSPI. Overall, from mild to extreme drought levels, the frequencies generally decreased, reflecting the relative rarity of extreme drought events. Under the non-stationary framework, mild and moderate droughts occurred more frequently, indicating that the stationary assumption underlying SPI led to systematic deviations in drought frequency estimates relative to the climate-conditioned NSPI framework.
4.3. The Drought Characteristics of NSPI
Figure 8 presents the interannual variations of the YRB in terms of drought duration, intensity grades, severity, and peak values. The changing trends of the above four indicators (duration, intensity, severity and peak) can all be described by linear equations, reflecting long-term, slow and systematic evolution characteristics rather than short-term random fluctuations. From 1960 to 2024, at the 12-month scale across the YRB, the Mann–Kendall trend test after pre-whitening (TFPW-MK) was applied to the four drought characteristics. Drought duration exhibited a decreasing trend that was not statistically significant (Z = −1.35,
p = 0.177), with a Sen slope of 0.0000 per year. In contrast, drought intensity declined significantly (Z = −2.20,
p = 0.028), with a Sen slope of −0.0096 per year, and drought severity also decreased significantly (Z = −2.57,
p = 0.010), with a Sen slope of −0.0035 per year. Meanwhile, the drought peak showed a significant upward trend (Z = 2.66,
p = 0.0078), with a Sen slope of 0.0046 per year. The pronounced decline in drought severity indicated a substantial alleviation of overall drought stress, whereas the significant rise in peak severity suggested that extreme water deficits during individual drought events may have intensified. Overall, drought events in the basin exhibited a mitigating trend characterized by shortened duration, reduced intensity and diminished severity, albeit with an increasing peak severity.
For the drought events in the YRB at the monthly scale during 1960–2024,
Figure 9 presents the spatial distribution of multiple characteristics, including average intensity, average duration, peak intensity, and occurrence frequency. In
Figure 9a, the Loess Plateau region, especially its northwest part, is a concentrated area of high drought intensity values, while the upper plateau region of the YRB (such as Qinghai and Gansu) is a concentrated area of low drought intensity values. The distribution of average drought duration (
Figure 9b) shows high spatial consistency with drought intensity. In the middle and lower reaches of the YRB, high-intensity areas are often accompanied by longer average drought durations. Drought events in this region frequently exhibit a compound characteristic of “high intensity and long duration”, which increases the possibility of causing disasters. The Hetao Irrigation District and some parts of the Loess Plateau also show longer drought durations, which may be related to the weak drought mitigation capacity of the ecosystem in these areas and the unstable seasonal precipitation supply. Drought events in the upper reaches are usually sudden and do not last long; thus, the duration of their droughts is relatively short. The peak intensity (
Figure 9c) characterizes the possible extreme drought intensity during the observation period. Its high-value areas are concentrated in the grain-producing and densely populated regions within the middle and lower reaches of the YRB, such as eastern Henan, western Shandong, and northern Anhui. This indicates that these core areas suffer from extremely severe instantaneous water stress in extreme drought years, with particularly fatal impacts on agricultural production and water supply security. The peak intensity in the upper reaches is generally low, and some areas even show negative values. In
Figure 9d, the spatial distribution of drought frequency is in good agreement with the spatial characteristics of drought intensity and duration, which further demonstrates the spatial differences in drought risk within the basin. The areas with frequent droughts are mainly concentrated in the northwest, especially in the Loess Plateau region. The frequency of droughts does not strictly follow a linear correspondence with their intensity. However, generally speaking, the high-frequency areas overlap spatially with the high-intensity and long-duration areas, jointly forming the “high-risk core area of drought” in the YRB.
Overall, the spatial pattern of drought risk in the YRB is clear and severe: the northwestern part of the basin, especially the Loess Plateau region, is under the quadruple threat of “high intensity, long duration, frequent occurrence, and strong peak”, making it the top priority for drought prevention and mitigation, as well as adaptive water resource management. Although the upstream region experiences relatively lower intensity and frequency of drought, its ecosystem exhibits greater vulnerability and insufficient resilience to post-drought recovery.
4.4. The Recurrence Period of Drought in NSPI
A mutual relationship exists between drought duration and severity. Using only the univariate return period may produce biased assessments, so the bivariate distribution method was introduced to describe drought characteristics more completely.
Five common Copula functions (Gaussian, t, Clayton, Frank, and Gumbel) were fitted to the D-S pairs extracted from both SPI12 and NSPI12 sequences. The goodness-of-fit results are presented in
Table 5. All five Copulas converged successfully for both indices. For SPI, the Clayton Copula yielded the lowest AIC (−58.028), followed closely by the Gaussian Copula (−54.857). For NSPI, the Frank Copula achieved the lowest AIC (−84.071), followed by the Gaussian Copula (−83.715).
It should be noted that in preliminary exploratory fitting under the previous cumulative-probability-based return period formulation, the Frank Copula had failed to converge for NSPI. This occurred because the extreme dependence structure in the non-stationary drought characteristics caused the Frank parameter to approach its numerical limit (as Kendall’s tau neared unity), which triggered overflow in the optimization algorithm. After reformulating the return period as the AND-type joint exceedance probability and re-optimizing the marginal distributions, numerical stability improved substantially and the Frank Copula converged normally. Consequently, the Clayton Copula was retained for SPI and the Frank Copula for NSPI in all subsequent joint distribution analyses.
Figure 10 shows the joint probability contours (A, C) and the AND-type joint exceedance return period contours (B, D) for SPI-Clayton (A, B) and NSPI-Frank (C, D). Several differences were observed between the two indices. For joint probability (panels A and C), the NSPI-Frank probabilities were generally higher than the SPI-Clayton probabilities at comparable drought durations and severities. For example, when drought duration was approximately 10–15 months and severity ranged between 10–20, the SPI-Clayton joint probability fell between 0.2 and 0.4, whereas the NSPI-Frank probability exceeded 0.6. As duration and severity increased further, the NSPI-Frank probability contours shifted toward higher intervals, reflecting stronger lower-tail dependence captured by the Frank Copula under the non-stationary framework.
For the joint return period (panels B and D), the SPI-Clayton return period contours were relatively evenly distributed, while the NSPI-Frank contours were more compressed in the region of prolonged duration and high severity. This indicated that under the non-stationary framework, the co-occurrence of long duration and high severity was assigned a shorter return period than under the stationary framework, implying greater drought risk when climate covariates were considered. These differences demonstrated that the NSPI captured non-stationary variations in the dependence structure between drought duration and severity.
Table 6 further compares the sensitivity of the joint return period results to different Copula selection criteria. Under the AIC criterion, the optimal Copulas were Clayton for SPI12 and Frank for NSPI12; under the RMSE criterion, the optimal Copula for SPI12 was Gaussian and for NSPI12 was Clayton. For SPI12, the moderate drought return period varied only slightly between the two criteria (2.87 years under AIC versus 2.88 years under RMSE), and the severe drought return period ranged from 5.74 to 5.92 years. For NSPI12, the moderate drought return period ranged from 3.41 to 3.81 years, and the severe drought return period ranged from 8.40 to 9.54 years. These results indicated that the joint return period estimates were relatively robust to the choice of Copula selection criterion, particularly for SPI12. The differences were larger for NSPI12, reflecting the stronger influence of the non-stationary dependence structure on the Copula selection. Based on the primary AIC criterion, the Clayton Copula was retained for SPI12 and the Frank Copula for NSPI12 in the spatial mapping of drought return periods.
Figure 11 shows the spatial distribution of the AND-type joint exceedance return period for moderate drought (D = 3, S = 3, panel a) and severe drought (D = 6, S = 9, panel b) across the YRB, derived from the Clayton Copula for SPI12 and the Frank Copula for NSPI12. For moderate drought, the return period ranged from 2.46 years to 5.83 years. Shorter return periods (2.46–3.5 years), indicating higher drought risk, were concentrated in the middle and lower reaches, particularly the North China Plain. Longer return periods (up to 5.83 years) were found in the upper reaches. For severe drought, the return period ranged from 3.77 years to 9.15 years. The spatial gradient was similar: the middle and lower reaches had shorter return periods (3.77–5 years), while the upper reaches had longer return periods (up to 9.15 years). Both maps revealed a distinct spatial pattern of drought risk in the YRB. The middle and lower reaches, where population and agriculture are concentrated, were the core high-risk areas. The transition from the lower to upper reaches showed a clear increase in return period, reflecting reduced drought risk upstream.
4.5. Random Forest (RF) + LSTM Machine Learning Prediction Evaluation
To assess the short-term autoregressive predictability of NSPI, the cascaded RF-LSTM model was trained on data from 1961–1988 and applied to the prediction period of 2017–2024.
Figure 12 presents the observed NSPI during 1989–2016, together with the model output for 2017–2024: the LSTM baseline is shown in blue, and the RF-corrected fused prediction in orange. During the drought interval of 2020–2021, when the observed NSPI fell below the drought threshold (−0.5), the predicted values captured the negative anomaly, confirming that the model detected the drought signal from the temporal structure of the series. The overestimation in 2023 suggests that the autoregressive framework has limited capacity to anticipate abrupt positive anomalies that deviate from the recent historical pattern.
The agreement between the observed and fused predicted NSPI series during the validation period (1989–2016) is shown in
Figure 13. Overall, the predicted sequence and the actual sequence had basically consistent fluctuation trends. Specifically, during the drought period from 2020 to 2021, when the actual NSPI was below the drought threshold (−0.5), the predicted values also decreased simultaneously, indicating that the model has a certain ability to capture typical drought events. However, in 2023, the predicted value was slightly overestimated, which might be due to a systematic bias in the model’s response to positive anomaly signals.
The consistency and error distribution between the observed and predicted values were further shown by a scatter plot (
Figure 14). The error metrics were R
2 = 0.429, RMSE = 0.370, and MAE = 0.279, indicating moderate explanatory power. The scatter points clustered tightly along the diagonal line, which suggested that the model could stably reproduce the changes in NSPI across the whole dynamic range. The model responded well to extreme dry and wet events: for intervals representing extreme low (NSPI < −0.5) and extreme high (NSPI > 1.0) values, the predicted values still changed in the same direction as the observed values, capturing the generation and dissipation of these key climate anomaly signals. Although the predicted and observed values did not fully match in these complex, nonlinear extreme cases, the overall trend consistency and the ability to capture signals showed that this fusion model had a solid predictive basis and potential for further optimization when dealing with extreme hydrological and climatic events in the YRB.
The prediction errors of the NSPI index sequence and the duration, intensity and peak of the drought events were calculated. The specific values are shown in
Figure 15. Overall, the model had zero error in duration, indicating that the combined prediction model of random forest and LSTM was completely consistent with the actual duration of drought. The errors in the “peak” and “intensity” indicators were 0.159 and 0.230 respectively, which were relatively good. The error in the “NSPI index sequence” indicator was the highest (0.370). In summary, the random forest and LSTM fusion model demonstrated certain applicability in NSPI prediction and could well reflect the intensity and evolution trend of drought.