Drought Forecasting for Decision Makers Using Water Balance Analysis and Deep Neural Network

: Reliable damage forecasting from droughts, which mainly stem from a spatiotemporal imbalance in rainfall, is critical for decision makers to formulate adaptive measures. The requirements of drought forecasting for decision makers are as follows: (1) the forecast should be useful for identifying both the afﬂicted areas and their severity, (2) the severity should be expressed quantitatively rather than statistically, and (3) the forecast should be conducted within a short time and with limited information. To satisfy these requirements, this study developed a drought forecasting method that sequentially involves the water balance model and a deep neural network (DNN). The annual water shortage in the study area was estimated with the former, and meteorological data and the annual water shortage data were used as independent and dependent variables, respectively, for the latter model’s training. The results from the water balance analysis were more reliable for identifying the four severely impacted areas based on the amount of water shortage, while the meteorological drought index indicated that the 20 sub-basins were severely inﬂuenced in the worst year of the drought. For the DNN model’s training, representative concentration pathway scenarios (RCP scenarios) were adopted as future events to extend the available data for the model training. Compared to the model trained with a limited number of past observed data (correlation coefﬁcient = 0.52~0.63), the model trained with the RCP scenarios exhibited a signiﬁcant increase in the correlation coefﬁcient of 0.82~0.83. Additionally, the trained model afforded reliable drought damage forecasting with various meteorological conditions for the next several months. The trained short-term forecasting model can help decision makers promptly and reliably estimate the damage from droughts and commence relief measures well before their onset. from the in the demand estimation method; 35 million m 3 of agricultural water in the sub-basin 3203. The differences in the afﬂicted areas and their severity between the two methods show that the improved method of this study more accurately simulates the drought damage than the National Plan. The improved method requires calibration through comparison with the recorded drought damage. However, the calibration was not performed herein owing to insufﬁcient detailed data on the water shortage in the affected areas in the drought damage investigation reports published by the Korean government [36–38]. When sufﬁcient data are available, additional studies need to be performed for model calibration. The annual water shortages in the case of the past observation data and RCP scenarios are shown in In the observation data for the past 49 years, the worst drought was recorded in 2015 with water shortage of 143 million m 3 , but the maximum water shortage reached 310–458 million m 3 in each scenario. In each of the RCP scenarios, the water shortage exceeded the value in the year 2015 by 7–10 times, and 32 extreme drought cases were additionally included in the training of the DNN model for drought forecasting. This can contribute to the reliable prediction of the expected damage in the drought events, exceeding the recorded range of past observation data. This is similar to the improvement of the reliability of the peak discharge estimate for extreme ﬂood events by adding extreme values from the regional frequency analysis.


Introduction
Droughts result from a decrease or deficiency in precipitation on a certain scale of the area during a short-term period, while aridity is a long-term (climatic) phenomenon [1]. Compared to other natural disasters (such as floods, storms, and earthquakes), detecting the occurrence of droughts is more difficult owing to their features (such as slow onset and nonstructural impacts), the absence of a universally accepted definition, and the difficulty in determining their start and end [2,3]. Therefore, previous studies have specifically defined the types of drought (i.e., meteorological, hydrological, agricultural, and socioeconomic) and evaluated their features (duration, severity, frequency, spatial extent, etc.). To assess the features of droughts, a drought index, which is a single value combined with several observation data such as precipitation and evaporation, has been typically employed in previous studies; about 150 drought indices have been developed and applied worldwide for different purposes [4]. Among them, Standardized Precipitation Index (SPI) [5], Palmer Drought Severity Index [6], Standardized Precipitation Evapotranspiration Index (SPEI) [7], and Reconnaissance Drought Index (RDI) [8] are widely adopted. While drought indices can Risk = Hazard × (Exposure + Vulnerability) (1) Advantageously, in a risk map, relatively vulnerable areas can be easily identified and long-term countermeasures can be formulated for the potential damage. On the other hand, a place with high population density or high water demand is likely to be identified as an area with high drought risk. Particularly, estimating the amount of risk to which each region or district is exposed is difficult, as the values of individual components are normalized between the maximum and minimum values in the study area.
Risk-based drought management, proposed for decision makers, comprises three stages: monitoring and issuance of early warning, risk assessment, and mitigation and response [16]. However, it has a limitation: the response level is determined by assessing the existing drought conditions. Therefore, an improved process of decision support was developed for drought management, which comprises four major steps: drought monitoring and evaluation, drought risk prediction considering weather forecasting information (weeks or months), development of countermeasures for the expected damage, and drought record management [17]. The most essential step for drought management is predicting the severity of future droughts, mainly using physical/conceptual models or data-driven models. In previous studies on physical-/conceptual-model-based drought severity prediction, a correlation between crop yield models, such as the Environmental Policy Integrated Climate model, and the drought risk index has been established to simulate the expected decrease in crop production because of future climate change [18,19]. Furthermore, a conceptual model for predicting the decrease in crop production and power generation in the event of insufficient precipitation has been developed through an analysis across 21 European countries to assess the vulnerability to drought [20]. A physical/conceptual model is advantageously capable of examining the direct damage spatially and temporally caused by drought (including water shortage and reduction in agricultural yield and power generation) throughout the study area, but considerable data and time are required to develop models for each simulation case. In another study predicting drought severity using data-driven models, a groundwater level was simulated via long short-term memory (LSTM) to predict the abrupt changes in the groundwater conditions [21]. Additionally, Blauhut et al. analyzed the vulnerability of agricultural and energy production, public water supply, and water quality in each region of Europe under a drought condition by performing a correlation analysis between previously reported drought impacts and SPEI [22]. Beneficially, this data-driven model requires fewer data compared to physical/conceptual models and enables faster analysis. Unlike a physical model, the prediction results are mainly expressed as drought indices, which makes it difficult for decision makers to estimate the drought severity.
This study mainly aims to propose a reliable drought forecasting method for decision makers. Based on the strengths and limitations of previous studies, several requirements for the drought forecasting method are summarized as follows: forecasts should be (1) supportive for identifying both afflicted areas and their severity, (2) expressed as nonstatistical values (e.g., amount of water shortage) rather than statistical values (e.g., duration Water 2022, 14, 1922 3 of 19 and frequency), and (3) performed frequently with the available information, which might be limited, and provided to decision makers well before the drought's onset. To satisfy these requirements, this study develops a drought forecasting method by sequentially involving conceptual (water balance model) and data-driven (deep neural network) models. As previously described, each model has its advantages and disadvantages. Although conceptual models can spatially and temporally examine the direct damage caused by drought, the considerable data and time required can hinder the realization of prompt results. In contrast, data-driven models can generate results faster with fewer data compared to conceptual models, but the result is not expressed as the direct damage from droughts. Herein, the advantages of both models are utilized for drought forecasting by sequentially linking both the models. First, the existing method of water balance analysis is improved to make it applicable to drought forecasting in the sub-basin areas. Furthermore, under the condition that the past 50 years of the recorded data are insufficient for training the deep neural network (DNN) model, the RCP scenarios, regarded as potential future events, are used to handle this shortcoming. Finally, the trained DNN model with a sufficient number of data is adopted to reliably forecast the amount of water shortage under various meteorological conditions for the next several months.

Study Area and Available Data
The study area is the Geumgang river ( Figure 1), which covers the region 35.5-37.125 • N and 126.0-128.0 • E. The basin area is 17,925 km 2 , and the total river length is 36,142 km [23]. The study area comprises 21 sub-basins, and the basin areas of individual sub-basins range from 127.7 to 1843.7 km 2 . Additionally, the study area includes four multipurpose dams (Yongdam dam, Daecheong dam, Boryeong dam, and Buan dam) and 27 large-scale reservoirs with an effective reservoir capacity exceeding 5 million m 3 for supplying water for agriculture as well as 2300 small reservoirs. In large cities with a population exceeding 500,000, such as Daejeon, Cheongju, Jeonju, and Cheonan, the household water demands are high, and several water-demanding places are present in national industrial complexes such as Gunsan and Asan. Furthermore, agricultural water demand stems from paddy fields and farm fields, accounting for about a quarter of the total basin area. Since the water demand for large cities, industrial complexes, and agricultural areas can be simultaneously simulated in the water balance model with various water supply facilities, the Geumgang river was selected as the study area.   The available data for this study are summarized in Table 1. The meteorological data recorded by weather stations from 1967 to 2015 were collected through the Water Resources Management Information System. RCP scenario data projected from 2011 to 2100 were collected from the Korea Metropolitan Administration. The acquired data were obtained from HadGEM3-RA, which is a regional climate model based on the atmospheric component of the Earth system model developed by the Met Office Hadley Centre, i.e., HadGEM3. To simulate the data with a spatial resolution of 0.125 • in the Korean Peninsula area, the dynamic downscaling method was adopted for the four RCP scenarios (RCP 2.6, 4.5, 6.0, and 8.5), which are labeled as a possible range of radiative forcing values in the year 2100 (2.6, 4.5, 6.0, and 8.5 W/m 2 , respectively).

Conceptual Model (Water Balance Analysis)
To assess water shortages in the study area, the water system, including all sub-systems in the basin, needs to be analyzed. A water system can be defined as an entity extending over a geographical area, including all watersheds and groundwater recharge areas together with all water consumption centers and ecosystems associated with the processes occurring in natural (abiotic or biotic) and human sub-systems [27]. In the water system analysis, details of each sub-basin, e.g., rainfall-runoff characteristics, reservoir operation, and the lag time required for groundwater recharge after rainfall, can be included. Consequently, the analysis results can predict the affected areas and their amount of water shortage in each sub-basin. Herein, the water balance analysis conducted on the National Water Resources Plan (2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015)(2016)(2017)(2018)(2019)(2020) in Korea (hereafter referred to as the "National Plan") [28] is first briefly explained and then the improvements for water supply and demand analysis are suggested for drought forecasting.
A National Plan is frequently prepared for developing detailed plans for water supply facilities. Figure 2 displays the water balance analysis process in the National Plan. For the 117 sub-basins in Korea, existing supply water sources (dams, reservoirs, groundwater, etc.), including natural flows, are compared with future water demand (household, industrial, agricultural, and ecological) to predict the probable water shortage. The amount of supply from natural flow is simulated using the Tank model with four serially connected tanks. The future water demand is estimated for each municipal-level district for low, baseline, and high demand cases for the year 2025, and the baseline demand is adopted herein. Since the National Plan aims to roughly determine the water shortage that can occur in the event of the worst drought for each major river basin, several assumptions are made in the analysis for convenience. However, these assumptions should be improved for drought forecasting in each sub-basin. adopted herein. Since the National Plan aims to roughly determine the water s that can occur in the event of the worst drought for each major river basin, sev sumptions are made in the analysis for convenience. However, these assumptions be improved for drought forecasting in each sub-basin.

Improvement of Water Demand Analysis
In the National Plan, the water demand in each sub-basin is calculated as the area ratio between the total area of each district and the area of the district include sub-basin, given by Equation (2), under the assumption that the water demands fo hold, industry, and agriculture are evenly distributed within an administrative However, mountainous, agricultural, industrial, and residential areas tend to be located in specific areas because of suitable natural conditions for agricultural an developments. Thus, the calculated demand for each sub-basin may differ from th amount of water demand.

= ∑ × =1
Here, represents the total water demand in each sub-basin , is t water demand (for household, industrial, and agricultural) in the administrative , is the total area of , and is the area of included in sub-basin . To improve the water demand estimation in each sub-basin, this study ad 2015 Population Census data (Eup/Myeon/Dong level, which is the lowest level of istrative divisions in Korea) and the GIS format of land use data (1:25,000) shown 1. The population of the administrative districts and the areas of agricultural and trial land in each sub-basin were used as weights to calculate the water demand sub-basin as follows.

Improvement of Water Demand Analysis
In the National Plan, the water demand in each sub-basin is calculated as the district area ratio between the total area of each district and the area of the district included in the sub-basin, given by Equation (2), under the assumption that the water demands for household, industry, and agriculture are evenly distributed within an administrative district. However, mountainous, agricultural, industrial, and residential areas tend to be densely located in specific areas because of suitable natural conditions for agricultural and urban developments. Thus, the calculated demand for each sub-basin may differ from the actual amount of water demand.
Here, D i represents the total water demand in each sub-basin i, D city j is the total water demand (for household, industrial, and agricultural) in the administrative district j, A city j is the total area of j, and A i city j is the area of j included in sub-basin i. To improve the water demand estimation in each sub-basin, this study adopts the 2015 Population Census data (Eup/Myeon/Dong level, which is the lowest level of administrative divisions in Korea) and the GIS format of land use data (1:25,000) shown in Table 1. The population of the administrative districts and the areas of agricultural and industrial land in each sub-basin were used as weights to calculate the water demand for each sub-basin as follows.
Here, W city j is the total area of a specific region (agricultural or industrial) or total population in the administrative district j, and W i city j is the area of a specific region (agricul-tural or industrial) included in sub-basin i out of the total area in the administrative district j or the population in sub-basin i out of the total population in the administrative district j.
In the water demand calculation for all 21 sub-basins via both the existing method and the improved method, the water demand in five sub-basins is at a level of significant difference (the estimation in the improved method is more than 20% higher or lower than that in the existing method) and that in six sub-basins is at a level of moderate difference (the estimation in the improved method is more than 10-20% higher or lower than that in the existing method). Figure 3 describes the case of sub-basin 3008, exhibiting the maximum difference in demand estimation; a big portion of the watershed area is located upstream of the Daecheong dam, including mountainous and inundated areas. However, metropolitan cities such as Daejeon and Cheongju cover a large part of the watershed area, and therefore, the National Plan estimated an annual water demand of 153 million m 3 . However, the improved method estimated a sharply reduced value of 80 million m 3 .
ricultural or industrial) included in sub-basin out of the total area in the administrative district or the population in sub-basin out of the total population in the administrative district .
In the water demand calculation for all 21 sub-basins via both the existing method and the improved method, the water demand in five sub-basins is at a level of significant difference (the estimation in the improved method is more than 20% higher or lower than that in the existing method) and that in six sub-basins is at a level of moderate difference (the estimation in the improved method is more than 10-20% higher or lower than that in the existing method). Figure 3 describes the case of sub-basin 3008, exhibiting the maximum difference in demand estimation; a big portion of the watershed area is located upstream of the Daecheong dam, including mountainous and inundated areas. However, metropolitan cities such as Daejeon and Cheongju cover a large part of the watershed area, and therefore, the National Plan estimated an annual water demand of 153 million m 3 . However, the improved method estimated a sharply reduced value of 80 million m 3 .
Furthermore, the improved method separately simulates the water demand of household and industrial areas for each administrative district in the sub-basin to overcome the difficulty in calculating the water shortage for each district in the National Plan. Moreover, it includes the water demand in rainfed areas in the water shortage estimation with all water supply sources and the demand in the actual conditions, which is not included in the analysis of the National Plan.

Improvement of Water Supply Analysis
As shown in Figure 2, the National Plan does not include the regional water supplies from agricultural reservoirs and groundwater in the water balance analysis model, which can cause an underestimation of the water shortage amount [29,30]. Therefore, herein, groundwater for different consumers (household, agricultural, and industrial) is separately included in the analysis model. Furthermore, the integrated agricultural reservoir whose effective capacity is the same as the capacity of all reservoirs in each sub-basin is Furthermore, the improved method separately simulates the water demand of household and industrial areas for each administrative district in the sub-basin to overcome the difficulty in calculating the water shortage for each district in the National Plan. Moreover, it includes the water demand in rainfed areas in the water shortage estimation with all water supply sources and the demand in the actual conditions, which is not included in the analysis of the National Plan.

Improvement of Water Supply Analysis
As shown in Figure 2, the National Plan does not include the regional water supplies from agricultural reservoirs and groundwater in the water balance analysis model, which can cause an underestimation of the water shortage amount [29,30]. Therefore, herein, groundwater for different consumers (household, agricultural, and industrial) is separately included in the analysis model. Furthermore, the integrated agricultural reservoir whose effective capacity is the same as the capacity of all reservoirs in each sub-basin is included in the analysis. Additionally, the improved method considers the location and capacity of intakes and sewage treatment plants in the water balance analysis. Therefore, the water intake and return within a sub-basin boundary and the water conveyance to neighboring sub-basins can be simulated using the improved model, whereas such processes cannot be simulated using the method in the National Plan. Table 2 summarizes the improvements in water demand and supply analysis, and Figure 4 displays the flow networks in a watershed for the National Plan and the improved method. included in the analysis. Additionally, the improved method considers the location and capacity of intakes and sewage treatment plants in the water balance analysis. Therefore, the water intake and return within a sub-basin boundary and the water conveyance to neighboring sub-basins can be simulated using the improved model, whereas such processes cannot be simulated using the method in the National Plan. Table 2 summarizes the improvements in water demand and supply analysis, and Figure 4 displays the flow networks in a watershed for the National Plan and the improved method.

MODSIM-DSS Model
Herein, the MODified SIMyld-Decision Support System (MODSIM-DSS) model is used for water balance analysis. This model was developed by Prof. Labadie of the Colorado State University by modifying SIMYLD, a network model developed by the Department of Water Resources Development (in 1972) in Texas, USA [31]. The model can reproduce the actual hydrological characteristics of the water system of the study area through the links and arcs connecting nodes (storage node, nonstorage node, demand node, and flowthru node) provided by the MODSIM-DSS model. The MODSIM-DSS model optimizes the flow rate in the links such that the cost incurred at all links during the calculation time (t = 1, 2, · · · , T) is minimized, which can be expressed as Here, c k represents the cost, weight, or priority in link k, q k represents the flow rate in link k, and A represents all links contained in the network.
The calculation constraints in each node can be expressed as follows: Here, O i is the outgoing link from node i, I i is the inflow link to node i, b it is the demand of node i at time t, and l lt and u lt are the lower and upper limits, respectively, in link l at time t. Equation (5) is identical to the water balance equation (the total amounts of inflow and outflow at any node are the same), and Equation (6) is a physical condition that constrains the upper and lower limits of the flow in all the links.
In the optimization process involving Equations (4)- (6), the initial value of the flow vector q is assumed and the dependent variables b it , l lt , and u lt are calculated based on the initial values. Then, the Lagrangian relaxation algorithm, which exhibits very fast convergence compared to existing linear programming, is used to iteratively calculate the priority or cost in each node and link until the values of the dependent variables converge. Subsequently, the optimal distribution of the water resources can be determined.

Data-Driven Model (Deep Neural Networks)
The first part of the methodology highlights the improvements in the water balance analysis, which is suitable for drought forecasting in the target watershed, and a method to quantitatively assess the risk of drought based on the water shortage amount. Advantageously, this method can spatially and temporally determine the drought risk, but considerable input data are required for each sub-basin for the assessment. Particularly, the water supply varies depending on the annual hydrologic and meteorological conditions. The amount of water available in the future can be estimated when the daily rainfall and evapotranspiration data over the next 1-3 months are obtained from the weather forecasting model. Considering the estimation uncertainty in the weather forecasting model, the water shortage with the natural flow needs to be determined from various scenarios (wet-normal-dry). This process can hinder prompt decision-making since it is complex and computationally intensive.
To promptly afford drought forecasts under various weather conditions, a DNN model based on the quantitative drought risk assessment is adopted herein. For the training and validation processes of the data-driven model, more than hundreds of data are required. According to Lee and Kim [32], when droughts that occurred in Korea from 1976 to 2010 were analyzed by SPI, the duration of each drought ranged from 2 to10 months, and in very few cases, two droughts occurred in a year. Only 49 years of data from 1967 to 2015 are available, and this is insufficient for training the drought forecasting model to derive a reliable outcome. Previous studies have developed alternative methods to extend the number of available data for the drought assessment. One study analyzed a tree-ring chronology network and converted it into the Palmer Severity Drought Index [33]. Furthermore, another study used rainfall data officially recorded by government agencies from the 18th to the 20th century in Korea to compare the drought risks in the past and present [34]. Herein, RCP scenarios are regarded as an event that may occur in the future, and a total of 360 years of data (4 scenarios × 90 years (2011-2100)) are additionally applied to extend the range of severe drought cases in the model training. Thus, we overcome the reliability issue stemming from the limited number of data in the past observation series.
The procedure for constructing an optimal DNN model for each sub-basin with past observation data and RCP scenarios (RCP 2.6, 4.5, 6.0, and 8.5) is as follows. First, the observation data and RCP scenario data are used in a previously constructed water balance analysis model, and the annual water shortage in the study area is calculated for each subbasin. Second, to include the estimated water shortage as a dependent variable in the deep learning model, the water shortage data are rescaled to a range between zero and one by either normalization between the maximum and minimum values of the available data or a standardization with the average and standard deviation. This rescaling process limits the influence of large-scale variables in the training process of the DNN model and prevents the model from falling to a local minimum. However, the annual water shortage calculated in the water balance analysis tends to be a right-skewed distribution, indicating that moderate droughts frequently occur and extreme droughts rarely occur. Therefore, the calculated water shortage data need to be appropriately rescaled. Otherwise, the rescaled data cannot be evenly distributed in the range between zero and one. Herein, the generalized extreme value distribution in Equation (7) is adopted to rescale the amount of water shortage (x) to afford a non-exceedance probability (F X (x), 0-1).
Here, x is the annual water shortage (in thousand cubic meters), k is the shape parameter, α is the scale parameter, and u is the location parameter.
The independent variable of the DNN model is the 12 values of SPEI 1 (from October of the previous year to September of the present year), as shown in Figure 5. The input variable (SPEI 1) is rescaled to a range between zero and one by the non-exceedance probability of the normalized standard distribution to maintain the same scale in the input and output of the DNN model. A total of 45 models are trained by varying the number of hidden layers (three, four, and five layers; three cases), the number of nodes (30, 40, and 50 nodes; three cases), and the number of epochs (30, 60, 90, 120, and 150 epochs; five cases). Finally, past observation data (1967-2015, 49 years), which are not used in the training process, are input into the 45 trained DNN models. Between the predicted water shortage through the trained model and the estimated water shortage through water balance analysis, the model displaying the best performance is selected as the optimal DNN for The training process involves a back-proposition algorithm where the weight is adjusted according to Equations (8) and (9) to minimize the cost functions of the predicted value (ŷ) and observed value (y) in the output layer using the activation function [35].
Here, y k andŷ k represent the observed and predicted values at the k-th node of the output layer, w t ij denotes the weight between the i-th node of the previous layer and the j-th node of the next layer in the t-th model learning, and η denotes the learning rate.
Finally, past observation data (1967-2015, 49 years), which are not used in the training process, are input into the 45 trained DNN models. Between the predicted water shortage through the trained model and the estimated water shortage through water balance analysis, the model displaying the best performance is selected as the optimal DNN for drought forecasting in each sub-basin. The mean squared error (MSE) and correlation coefficient are used as indicators to determine the model performances. Figure 6 describes the entire process to develop an optimal DNN model for drought forecasting in each sub-basin. Finally, past observation data (1967-2015, 49 years), which are not used in the training process, are input into the 45 trained DNN models. Between the predicted water shortage through the trained model and the estimated water shortage through water balance analysis, the model displaying the best performance is selected as the optimal DNN for drought forecasting in each sub-basin. The mean squared error (MSE) and correlation coefficient are used as indicators to determine the model performances. Figure 6 describes the entire process to develop an optimal DNN model for drought forecasting in each subbasin.

Drought Assessment
In the drought assessment with observation data for 49 years (1967-2015), the worst water shortage was estimated in the year 2015. For the worst year, the annual water shortage determined from the National Plan and the improved method proposed herein were compared in terms of a meteorological drought index (SPEI 6), as described in Figure 7. For the meteorological drought index (SPEI 6) for September 2015, it was difficult to ascertain which area should be first considered; apart from the sub-basin 3003, the entire study area exhibited severe or extreme levels of drought. However, the hydrological drought cases from the water balance analysis model provide information about the affected location and the expected water shortage amount. Table 3 shows the criteria for classifying drought severity based on water shortage; the criteria were determined by applying the severity of an event in SPI to the water shortage performed with the observation data for 49 years for each sub-basin.
For the meteorological drought index (SPEI 6) for September 2015, it was difficult to as-certain which area should be first considered; apart from the sub-basin 3003, the entire study area exhibited severe or extreme levels of drought. However, the hydrological drought cases from the water balance analysis model provide information about the affected location and the expected water shortage amount. Table 3 shows the criteria for classifying drought severity based on water shortage; the criteria were determined by applying the severity of an event in SPI to the water shortage performed with the observation data for 49 years for each sub-basin.
Comparison of the drought assessment results between the National Plan and the improved model showed that a severe or extreme level of water shortage was predicted in four areas by the improved model and in only three areas by the National Plan. Particularly, in sub-basins 3101 and 3202, no shortage was expected according to the National Plan, while the analysis with improvements indicated a severe or extreme level of drought. Such difference in the impacted areas stemmed from the water demand in the rainfed farm field, which was not considered in the National Plan. Furthermore, in subbasin 3203, the National Plan predicted a water shortage of 54 million m 3 while the improved method predicted a significantly reduced a water shortage of 31 million m 3 . Such a reduction resulted from the improvement in the demand estimation method; 35 million m 3 of agricultural water demand was decreased in the sub-basin 3203.    Comparison of the drought assessment results between the National Plan and the improved model showed that a severe or extreme level of water shortage was predicted in four areas by the improved model and in only three areas by the National Plan. Particularly, in sub-basins 3101 and 3202, no shortage was expected according to the National Plan, while the analysis with improvements indicated a severe or extreme level of drought. Such difference in the impacted areas stemmed from the water demand in the rainfed farm field, which was not considered in the National Plan. Furthermore, in sub-basin 3203, the National Plan predicted a water shortage of 54 million m 3 while the improved method predicted a significantly reduced a water shortage of 31 million m 3 . Such a reduction resulted from the improvement in the demand estimation method; 35 million m 3 of agricultural water demand was decreased in the sub-basin 3203.
The differences in the afflicted areas and their severity between the two methods show that the improved method of this study more accurately simulates the drought damage than the National Plan. The improved method requires calibration through comparison with the recorded drought damage. However, the calibration was not performed herein owing to insufficient detailed data on the water shortage in the affected areas in the drought damage investigation reports published by the Korean government [36][37][38]. When sufficient data are available, additional studies need to be performed for model calibration.
The annual water shortages in the case of the past observation data and RCP scenarios are shown in Figure 8. In the observation data for the past 49 years, the worst drought was recorded in 2015 with water shortage of 143 million m 3 , but the maximum water shortage reached 310-458 million m 3 in each scenario. In each of the RCP scenarios, the water shortage exceeded the value in the year 2015 by 7-10 times, and 32 extreme drought cases were additionally included in the training of the DNN model for drought forecasting. This can contribute to the reliable prediction of the expected damage in the drought events, exceeding the recorded range of past observation data. This is similar to the improvement of the reliability of the peak discharge estimate for extreme flood events by adding extreme values from the regional frequency analysis.
The differences in the afflicted areas and their severity between the two methods show that the improved method of this study more accurately simulates the drought damage than the National Plan. The improved method requires calibration through comparison with the recorded drought damage. However, the calibration was not performed herein owing to insufficient detailed data on the water shortage in the affected areas in the drought damage investigation reports published by the Korean government [36][37][38]. When sufficient data are available, additional studies need to be performed for model calibration.
The annual water shortages in the case of the past observation data and RCP scenarios are shown in Figure 8. In the observation data for the past 49 years, the worst drought was recorded in 2015 with water shortage of 143 million m 3 , but the maximum water shortage reached 310-458 million m 3 in each scenario. In each of the RCP scenarios, the water shortage exceeded the value in the year 2015 by 7-10 times, and 32 extreme drought cases were additionally included in the training of the DNN model for drought forecasting. This can contribute to the reliable prediction of the expected damage in the drought events, exceeding the recorded range of past observation data. This is similar to the improvement of the reliability of the peak discharge estimate for extreme flood events by adding extreme values from the regional frequency analysis.

Drought Forecasting
As previously described, the rescaled values of SPEI 1 and annual water shortage from the water balance analysis were used as the independent and dependent variables, respectively, in the DLL model's training. For each of the sub-basins, models with 45 hyperparameter combinations were constructed, and training and validation processes were sequentially performed for each model to minimize the cost function between the target and predicted values. As the number of epochs was increased from 30 to 150, for each epoch, nine models were trained and validated with different values of hidden layers and nodes for each hidden layer. Figure 9 presents the cases with the best validation results for each epoch after the training and validation processes in sub-basins 3101 and 3202. The figure displays the performance of the trained models along with the MSE and the corre-

Drought Forecasting
As previously described, the rescaled values of SPEI 1 and annual water shortage from the water balance analysis were used as the independent and dependent variables, respectively, in the DLL model's training. For each of the sub-basins, models with 45 hyperparameter combinations were constructed, and training and validation processes were sequentially performed for each model to minimize the cost function between the target and predicted values. As the number of epochs was increased from 30 to 150, for each epoch, nine models were trained and validated with different values of hidden layers and nodes for each hidden layer. Figure 9 presents the cases with the best validation results for each epoch after the training and validation processes in sub-basins 3101 and 3202. The  Then, the optimal DNN for each sub-basin was determined using the meteorologica data from 1967 to 2015 (49 years), which were not employed in the previous stage. Of the 45 models, the model exhibiting the minimum MSE value between the inferred and targe values (the annual water shortage from the MODSIM-DSS model) was selected as the op timal model. In the sub-basins 3101 and 3202, the model with 150 epochs, four hidden layers, and 40 nodes for each hidden layer and that with 60 epochs, five hidden layers and 50 nodes for each hidden layer were selected as the optimal model, respectively. Fig  ure 10 describes the water shortage estimated from the past observation data in the opti mal model, which are displayed with the values predicted by the water balance analysis as a time series. The estimated water shortage agrees with the target value in the two sub basins, and the occurrences of severe water shortages were correctly predicted (e.g., in 1988, 1995, and 2015). Table 4 displays the contribution of the RCP scenarios to the DNN model reliability in comparison with Case 1, where the past data from 1967 to 2005 were repeatedly used for training and validation of the DNN models. For Case 2, the optimal DNN model pre viously selected was adopted. Apart from the available data for model training and vali dation, model training with several hyperparameter cases and the selection of the optima DNN model among the trained models were identical to those for the RCP scenario case Furthermore, to compare the training performance on the same basis, the past observation data from 2006 to 2015 were used in the two models and the inference results of both the models were compared with the water shortage amounts derived from the water balance analysis. Table 4 shows that the results of the training and validation process did not sig nificantly differ between Case 1 and Case 2. However, in the inference results with data from 2006 to 2015, for the model trained with a limited number of past observation data MSE significantly increased and the correlation coefficient decreased, which was not ob served for the model trained with the RCP scenarios. This comparison result shows tha the construction of a model with considerably more data can afford more reliable drough forecasting results. Then, the optimal DNN for each sub-basin was determined using the meteorological data from 1967 to 2015 (49 years), which were not employed in the previous stage. Of the 45 models, the model exhibiting the minimum MSE value between the inferred and target values (the annual water shortage from the MODSIM-DSS model) was selected as the optimal model. In the sub-basins 3101 and 3202, the model with 150 epochs, four hidden layers, and 40 nodes for each hidden layer and that with 60 epochs, five hidden layers, and 50 nodes for each hidden layer were selected as the optimal model, respectively. Figure 10 describes the water shortage estimated from the past observation data in the optimal model, which are displayed with the values predicted by the water balance analysis as a time series. The estimated water shortage agrees with the target value in the two sub-basins, and the occurrences of severe water shortages were correctly predicted (e.g., in 1988, 1995, and 2015). Table 4 displays the contribution of the RCP scenarios to the DNN model reliability in comparison with Case 1, where the past data from 1967 to 2005 were repeatedly used for training and validation of the DNN models. For Case 2, the optimal DNN model previously selected was adopted. Apart from the available data for model training and validation, model training with several hyperparameter cases and the selection of the optimal DNN model among the trained models were identical to those for the RCP scenario case. Furthermore, to compare the training performance on the same basis, the past observation data from 2006 to 2015 were used in the two models and the inference results of both the models were compared with the water shortage amounts derived from the water balance analysis. Table 4 shows that the results of the training and validation process did not significantly differ between Case 1 and Case 2. However, in the inference results with data from 2006 to 2015, for the model trained with a limited number of past observation data, MSE significantly increased and the correlation coefficient decreased, which was not observed for the model trained with the RCP scenarios. This comparison result shows that the construction of a model with considerably more data can afford more reliable drought forecasting results.  Through the aforementioned process, an optimal DNN model was constructed for each sub-basin. Thus, if the weather data observed from October of the previous year and the weather forecasting data in the near future are prepared in the form of monthly SPEI Figure 10. Inference results of the optimal DNN models for predicting the water shortage amount. The target indicates the water shortage predicted from the water balance analysis model and the inferred indicates the value estimated from the optimal DNN model. The performance of the optimal DNN model is presented in (a) sub-basin 3101 and (b) sub-basin 3202. Through the aforementioned process, an optimal DNN model was constructed for each sub-basin. Thus, if the weather data observed from October of the previous year and the weather forecasting data in the near future are prepared in the form of monthly SPEI indices, water shortages for the following period can be predicted. Currently, the National Drought Information Portal of Korea declares the drought index of the past six months and forecasts future droughts in units of month (1 month) or season (3 months) [39]. Therefore, the past observation and three-month weather forecast data can be used for drought forecasting. The results of the application of various meteorological conditions to the optimal DNN model for sub-basins 3101 and 3202 in the year 2015 are shown in Figure 11. In the drought situation occurring from October 2014 to May 2015, highlighting the lack of rainfall in March and May associated with seasonal rainfall fluctuations, is notable. The monthly rainfall that occurred in March and May was just 42% and 32%, respectively, of the average monthly rainfall in the observation data. Specifically, since the high demand period for agricultural water usually starts from May because of rice seeding in paddy fields, water needs to be supplied from agricultural reservoirs to meet the demand. However, the precipitation during these periods was not sufficient to refill the reservoirs. Therefore, this was the main reason for the most severe drought from June to August (the crop growth period) in the past observation cases.
11. In the drought situation occurring from October 2014 to May 2015, highlighting lack of rainfall in March and May associated with seasonal rainfall fluctuations, is not The monthly rainfall that occurred in March and May was just 42% and 32%, respectiv of the average monthly rainfall in the observation data. Specifically, since the high mand period for agricultural water usually starts from May because of rice seedin paddy fields, water needs to be supplied from agricultural reservoirs to meet the dem However, the precipitation during these periods was not sufficient to refill the reserv Therefore, this was the main reason for the most severe drought from June to August crop growth period) in the past observation cases.
In the annual water shortage forecasting in sub-basins 3101 and 3202, various indices in the range from −1 to 1 were used in the optimal DNN model for three mo from June to August. The optimal DNN model showed that water shortages may occ the range of 3-16 and 8-27 million m 3 in sub-basins 3101 and 3202, respectively. In basin 3101, an SPEI index between −0.6 and −0.9 was recorded during the period, and water balance analysis model predicted a water shortage of 13 million m 3 . In sub-b 3202, an SPEI index between −0.9 and −1.4 was recorded, and the water balance m predicted a water shortage of 30 million m 3 . These results are clearly in good agreem further demonstrating the reliability of the optimal DNN model in forecasting dro damage. Therefore, this is a useful tool for a decision maker to determine the water sh age and the required amount of rainfall to reduce the damage. This model can be use proactively formulate water-saving measures and water supply measures through d sion from nearby basins before the on-set of extreme droughts. In the annual water shortage forecasting in sub-basins 3101 and 3202, various SPEI indices in the range from −1 to 1 were used in the optimal DNN model for three months from June to August. The optimal DNN model showed that water shortages may occur in the range of 3-16 and 8-27 million m 3 in sub-basins 3101 and 3202, respectively. In sub-basin 3101, an SPEI index between −0.6 and −0.9 was recorded during the period, and the water balance analysis model predicted a water shortage of 13 million m 3 . In sub-basin 3202, an SPEI index between −0.9 and −1.4 was recorded, and the water balance model predicted a water shortage of 30 million m 3 . These results are clearly in good agreement, further demonstrating the reliability of the optimal DNN model in forecasting drought damage. Therefore, this is a useful tool for a decision maker to determine the water shortage and the required amount of rainfall to reduce the damage. This model can be used to proactively formulate water-saving measures and water supply measures through diversion from nearby basins before the on-set of extreme droughts.

Discussion and Conclusions
Herein, a risk-based drought management system was presented for decision makers to formulate countermeasures for upcoming drought damage [14]. The following requirements should be satisfied to support decision makers in forecasting drought damage. First, the spatial extent considered for forecasting should include all water systems (all the water consumptions and all sources of water supply) in the area. This assessment of water shortages in the water system can contribute toward identifying both afflicted areas and their severity. The assessment results, rather than being expressed as complex statistical figures, should be expressed as quantitative values, such as the amount of water shortage directly resulting from an existing drought and the amount of precipitation required for the damage mitigation. Finally, rather than putting effort into recovery after disaster occurs, forecasting should be frequently performed with limited information and numerous weather forecasting scenarios, and the result should be delivered to decision makers well before drought onset. To satisfy these requirements, this study proposes the methodology for a drought forecasting system to couple the water balance analysis model, which is a physical/conceptual model, and the DNN model, which is a data-driven model.
In drought assessment, based on the analysis method currently used in the National Plan, several improvements were proposed to include all the water systems and examine the actual situation. Moreover, the demand estimation method in the sub-basin was improved by applying the GIS format of land use and the population census data, and the agricultural demand in the rainfed area was included to accurately estimate the actual water demand. In terms of water supply, both agricultural reservoirs and confined aquifers as well as intake and sewage treatment facilities were included to simulate water supply and return within the sub-basin boundary. Due to these improvements, the affected areas and their water shortage was simulated in the model; thus, the model is suitable for drought forecasting at the sub-basin level. In previous studies, to improve the water balance analysis model in the National Plan, similar approaches have been suggested, such as including agricultural reservoirs and agricultural demand in the rainfed area in the analysis [30,40,41]. However, to our knowledge, no previous study has adopted the GIS format of land use and the population census data to estimate the water demand in each sub-basin area. Of the 21 sub-basins in the study area, the water demand in 11 sub-basin areas estimated by the current method and the improved method was at a level of significant or moderate difference. Additionally, the water balance analysis model is generally adopted to calculate the required volume of the water management facility to prevent water shortage under the future water demand conditions and to simulate the variation of the water shortage due to climate change [42][43][44]. However, herein, the water balance analysis was used for the drought damage forecasting to simultaneously identify the afflicted areas and their water shortage. For the drought case of the year 2015, the water balance analysis reliably indicated the four severely impacted areas based on the amount of water shortage, while the meteorological drought index indicated that 20 sub-basins were severe drought areas.
In the drought forecasting with the DNN model, observation data from the past 50 years were not sufficient to train the model. As an alternative measure, RCP scenarios were assumed to represent possible weather events in the future. Of the 360 years of the water shortage resulting from the four RCP scenarios, 32 cases of water shortage exceeded the value of the worst drought year obtained from the past observation data. When sufficient data from extreme drought cases were considered, the performance of the model training and validation was within an acceptable range, and optimal DNN models were derived for each sub-basin. In previous studies, the tree-ring chronology network and the recorded rainfall data from the 18th to 19th century were adopted for reliability in the extreme drought analysis [31,32,45]. Herein, the RCP scenarios were included in the training process of the DNN model, which significantly increased the correlation coefficient of 0.82-0.83 in comparison with the correlation coefficient of 0.52-0.63 in the model training with a limited number of the observed data.
Herein, for the drought event in 2015, the optimal DNN model exhibited reliable drought damage forecasting with various meteorological conditions for the next several months. Data-driven stochastic methods, such as artificial neural networks and autoregressive integrated moving average models, are commonly employed for drought forecasting [21,[46][47][48]. Additionally, the meteorological and hydrological models are employed together for hydrological drought forecast [49][50][51]. However, none of these previous studies have considered coupling the water balance model and DNN model. Since the coupled model proposed herein can deliver reliable and prompt predictions of the drought damage, it can contribute to the continuous analysis of the expected water shortage and the amount of rainfall for the following period, which can assist in formulating proactive measures for forecasted droughts.
In a follow-up study, the reliability of the analysis results needs to be improved by calibrating the water balance analysis model, which was not performed in this study because of the lack of current drought damage survey data. Moreover, to improve the consistency between the values predicted through the optimal DNN model and the values calculated through the water balance analysis, additional climate change scenarios from other organizations need to be adopted in the training and validation process of the DNN model. Additionally, other variables need to be considered as independent variables for the DNN model, such as the number of days of rainfall in each month and the average rainfall intensity. Finally, since recognizing the cause of the predicted results from the DNN model is difficult, we propose that a preliminary analysis should be performed of the threshold or weather patterns that might cause an extreme drought event since it will improve the understanding of the cause.