Comparison of Entropy Methods for an Optimal Rain Gauge Network: A Case Study of Daegu and Gyeongbuk Area in South Korea

To reduce hydrological disasters, it is necessary to operate rain gauge stations at locations where the spatio-temporal characteristics of rainfall can be reflected. Entropy has been widely used to evaluate the designs and uncertainties associated with rain gauge networks. In this study, the optimal rain gauge network in the Daegu and Gyeongbuk area, which requires the efficient use of water resources due to low annual precipitation and severe drought damage, was determined using conditional and joint entropy, and the selected network was quantitatively evaluated using the root mean square error (RMSE). To consider spatial distribution, prediction errors were generated using kriging. Four estimators used in entropy calculations were compared, and weighted entropy was calculated by weighting the precipitation. The optimal number of rain gauge stations was determined by calculating the RMSE reduction and the reduction ratio according to the number of selected rain gauge stations. Our findings show that the results of conditional entropy were better than those of joint entropy. The optimal rain gauge stations showed a tendency wherein peripheral rain gauge stations were selected first, with central stations being added afterward.


Introduction
Recently, there has been a significant increase in cases of damage caused by natural disasters globally, due to an increase in extreme weather events caused by climate change. Some of the major causes of socioeconomic damage on the Korean Peninsula are natural disasters related to drought, home rain, flooding, and flooding of dams [1]. Droughts, in particular, are a type of natural disaster that can cause long-term damage. Due to limited annual precipitation, the Daegu and Gyeongbuk area is susceptible to severe drought damage [2]. To respond to the risk of such damage, it is important to efficiently utilize and manage water resources through hydrological analysis. The efficient management of water resources can counteract rain-related natural disasters. Water resources can be efficiently managed by identifying information about the rainfall characteristics by basin and the spatial distribution of rainfall. In other words, water resources can be managed using rain information.
No wonder information about rainfall with high observational accuracy is important. We need to construct a rain gauge network with high accuracy. The more rain gauge stations there are, the higher the accuracy of the observations can be. However, the higher the number of rain gauge stations, the higher the cost. A rain gauge station should be properly positioned so that it can obtain a lot of differences between the observed and predicted values. In the case of continuous data, the root mean square error (RMSE) is mainly used as a model selection criterion. In this study, several estimators of entropy and weighted entropy were compared. A quantitative comparison of the rain gauge networks selected by various entropy methods allows the design of appropriate rain gauge networks. An entropy-based rain gauge network was quantitatively evaluated by predicting the precipitation values of unselected rain gauge stations based on the stations selected as the optimal rain gauge network using entropy. In addition, the performance of a rain gauge network selected as entropy was evaluated using the RMSEs of all rain gauge network combinations. Additionally, a method of determining the number of rain gauge stations using RMSE was proposed. The number of rain gauge stations that can yield the maximum efficiency was determined by calculating the RMSE reduction and the reduction ratio according to the number of rain gauge stations.
The structure of this paper is as follows. Section 2 presents the kriging theory for considering entropy and spatial distribution and discusses the methods of selecting the optimal rain gauge networks and determining the optimal number of rain gauge stations using RMSE. Entropy theory presents several estimators and weighted entropy. In Section 3, the optimal rain gauge network for 11 rain gauge stations in the Daegu and Gyeongbuk area is evaluated based on entropy, and the determination of the number of rain gauge stations for efficient operation is described. Conclusions are presented in Section 4.

Methodology
This section discusses the method of selecting the optimal rain gauge network using entropy. The optimal rain gauge network was selected through the conditional and joint entropies of the long-term precipitation observed from rain gauge stations. Entropy cannot consider the spatial distribution of rain gauge stations because uncertainty is calculated from the probability distribution of precipitation. To consider the spatial distribution, prediction errors were generated using ordinary kriging (OK) and leave-one-out cross validation (LOOCV). Rain gauge stations with small prediction errors were the ones with low entropy. The optimal rain gauge network was selected through the conditional and joint entropies of the prediction errors generated by OK. In addition, the precipitations of unselected stations were predicted using OK, and the optimal number of stations was determined using the RMSE. The operating system of the computer used in this study was Windows 10. The central processing unit (CPU) was i5-7500 and the RAM was 32 GB. The software used for data processing was R version 3.5.2. (Daegu university, Korea). Figure 1 shows the methods used to select the optimal rain gauge stations considering conditional and joint entropies. Methods A and B were the calculation of conditional entropy from the empirical probability distribution of rain gauge data, and method C was the calculation of joint entropy. In addition, the prediction error of each rain gauge station was generated using kriging and LOOCV. As entropy decreases with an increase in the kriging interpolation performance, kriging and entropy were used for considering both the spatiality of rain gauge stations and the uncertainty of rain gauge data. Methods D and E were the calculation of conditional entropy from the probability distribution of prediction error, and method F was the calculation of joint entropy. Methods A and D were used for obtaining the entropy of unselected rain gauge stations using the stations selected through entropy. Methods B and E, on the other hand, were used for obtaining the entropy of selected rain gauge stations using unselected stations. These methods were calculated and compared for each of the entropy estimators. In addition, the entropy with weight was calculated and compared. We applied the monthly average precipitation, regional precipitation, and regional monthly precipitation as weights. Further, in order to focus on places with little rainfall in each region, 1-regional precipitation was applied as a weight, as opposed to regional precipitation. In addition, conditional and joint entropies based on the number of stations were calculated for each method, and the number that maximized the entropies was determined. The mean absolute error (MAE), RMSE, bias, and correlation coefficient (Corr) were calculated using the predicted precipitations of the rain gauge stations that were not selected in the combination of stations with high entropies. Finally, the optimal number of rain gauge stations was determined using the quality of the RMSE decrease with an increase in the number of stations.

Research Flowchart
Appl. Sci. 2020, 10, 5620 4 of 23 stations that were not selected in the combination of stations with high entropies. Finally, the optimal number of rain gauge stations was determined using the quality of the RMSE decrease with an increase in the number of stations.

Entropy
Entropy is a method of quantitatively evaluating the degree of randomness using the probability distribution of data [25]. Entropy includes marginal entropy, conditional entropy, and joint entropy. Marginal entropy proposed by [25] is the amount of uncertainty of discrete random variable X, as shown in the following equation.
where p(x n ) is the empirical probability of x n and H(X) is the uncertainty of marginal entropy X. n is the number of the class intervals of X.
If there are two random variables X and Y, conditional and joint entropies are calculated through conditional and joint probabilities. When variable Y is given, the entropy of variable X is conditional entropy H(X|Y), as shown below.
where n and m of X and Y are the number of class intervals. P(x n , y m ) is the joint probability and p(x n |y m ) is the conditional probability. Joint entropy H(X, Y) is calculated from the joint probability of random variables X and Y.
Weights can be applied according to the purpose and study characteristics determined by the experimenter. The weight may be independent of or dependent on p(x n ). Weight entropy is given by the following equation [26].
where w n is weight of event. Weight entropy is applicable to conditional and joint entropy. Rain gauge stations with high entropy are important as they have high uncertainties in rainfall observation. The most common entropy calculation method estimates uncertainty from the empirical probability distribution of data. Several approaches have been developed and used to compensate for the shortcomings of these methods. In this study, three more methods were considered, other than the "emp" method using empirical probability.
"mm" is the bias-corrected empirical entropy estimator proposed by [27]. "shrink" is a shrinkage estimate of the entropy of a Dirichlet probability distribution [28]. "sg" is the entropy of a Dirichlet probability distribution proposed by [29]. In this study, entropy was calculated using the entropy, condentropy, and multiinformation functions of the infotheo package [30].

Entropy
Entropy is a method of quantitatively evaluating the degree of randomness using the probability distribution of data [25]. Entropy includes marginal entropy, conditional entropy, and joint entropy. Marginal entropy proposed by [25] is the amount of uncertainty of discrete random variable X, as shown in the following equation.
where p(x n ) is the empirical probability of x n and H(X) is the uncertainty of marginal entropy X. n is the number of the class intervals of X. If there are two random variables X and Y, conditional and joint entropies are calculated through conditional and joint probabilities. When variable Y is given, the entropy of variable X is conditional entropy H(X|Y), as shown below.
where n and m of X and Y are the number of class intervals. P x n , y m is the joint probability and p x n y m is the conditional probability. Joint entropy H(X, Y) is calculated from the joint probability of random variables X and Y.
M m=1 p x n , y m ln p x n , y m .
Weights can be applied according to the purpose and study characteristics determined by the experimenter. The weight may be independent of or dependent on p(x n ). Weight entropy is given by the following equation [26].
where w n is weight of event. Weight entropy is applicable to conditional and joint entropy. Rain gauge stations with high entropy are important as they have high uncertainties in rainfall observation.
The most common entropy calculation method estimates uncertainty from the empirical probability distribution of data. Several approaches have been developed and used to compensate for the shortcomings of these methods. In this study, three more methods were considered, other than the "emp" method using empirical probability.
"mm" is the bias-corrected empirical entropy estimator proposed by [27]. "shrink" is a shrinkage estimate of the entropy of a Dirichlet probability distribution [28]. "sg" is the entropy of a Dirichlet probability distribution proposed by [29]. In this study, entropy was calculated using the entropy, condentropy, and multiinformation functions of the infotheo package [30].

Ordinary Kriging (OK)
Kriging is a technique for predicting the values of unobserved points using the weighted combination of known surrounding values. Kriging was developed by [31] and mathematically established by [32]. Kriging defines the spatial interrelationship as a variogram calculated from the given data. The variogram 2γ(h) is a measure of spatial autocorrelation that represents the similarity of data present at a constant distance h.
OK predicts the unknown value of point z 0 using z i , which are n data values already known. OK resolves the bias problem of simple kriging by proposing the sum of weights λ i as 1.
Under this condition, weights are obtained so that the error variance can be minimized. The problem of obtaining the minimum and maximum values under constraints is solved by the Lagrange parameter method. The error variance equation of OK is as follows.
L(λ 1 , λ 2, · · · , λ n : where w is the Lagrangian factor and L(λ 1 , λ 2, · · · , λ n : n) is the Lagrangian objective function. This study used a variogram based on the exponential model. In this study, OK was performed using the krige function of the gstat package [33].

Method for Selecting the Optimal Rain Gauge Network
The combination of two or more rain gauge stations that maximizes conditional entropy or joint entropy is considered an optimal combination. In this study, the observation data of unselected rain gauge stations were predicted using the selected rain gauge network and OK to evaluate the suitability of the selected combination of rain gauge stations. This is because the precipitation of unselected stations is quantitatively calculated when the selected rain gauge network is operated. The MAE, RMSE, bias, and Corr calculated from the differences between the observed and predicted values of unselected rain gauge stations were considered as the measures of evaluation.
where n is the number of data,ẑ(x i ) is the predicted value at an arbitrary position x i , and z(x i ) is the observed value. In addition,ẑ(x i ) is the average of the predicted values and z(x i ) is the average of the observed values. The optimal combination according to the determined number of stations can be obtained using the maximum value of entropy. A previous study presented a method of obtaining the optimal number of stations using the maximum informational content [34]. The study selected the combination with the maximum informational content among the various combinations of rain gauge stations based on joint entropy. The prediction performance of the selected combination, however, could not be reflected. This study proposes a method of determining the optimal number of stations through the RMSE calculated using OK. As the number of rain gauge stations increases, the RMSE decreases due to the improvement in the spatial interpolation performance. In other words, the appropriate number of rain gauge stations can be determined by evaluating the degree of the RMSE reduction due to the increase in the number of stations.
First, we considered the difference in the RMSE between the optimal combination of i rain gauge stations and that of i − 1 stations. The minimum value of the RMSE (RMSE min ) was obtained when there were n − 1 rain gauge stations, and the maximum value of RMSE (RMSE max ) was obtained when there was one rain gauge station. When the number of rain gauge stations increased by one, the average RMSE reduction was obtained from the range of RMSE (RMSE max − RMSE min ), as follows.
Assuming that S i is the reduction ratio of the RMSE of i rain gauge stations, S i is calculated using the following equation.
In other words, when S i is larger than 1, the number of rain gauge stations can be considered an efficient number because the RMSE reduction for i rain gauge stations is relatively larger than the average RMSE reduction. In addition, we calculated the RMSE reduction ratio R i for i rain gauge stations. R i represents the RMSE reduction when i rain gauge stations are operated, compared to the total RMSE reduction.
Assuming that x-axis is the number of stations and the y-axis is R i , y = 0 holds for x = 1 and y = 1 for x = n − 1; the farthest point from the straight line of the two points is the optimal number of stations.

Study Area and Data
The Daegu and Gyeongbuk area is a large basin that occupies the southeast of the Korean Peninsula. Topographically, the Daegu and Gyeongbuk area appears to be affected by the mountains of the nearby Taebaek mountain range. The limited annual precipitation results in a scarcity of water in the area. Furthermore, climate change has led to an increase in the number of days with heavy rainfall, leading to an increase in disaster risk awareness in the area [35]. Hence, identification of the spatial distribution of rain gauge stations installed in the area and the evaluation of their efficiency are important.
The study used 11 rain gauge stations installed by the Korea Meteorological Administration (KMA) in the Automated Surface Observing System (ASOS) of the Daegu and Gyeongbuk area, as shown in Figure 2. The daily precipitation data from 1988 to 2016 obtained from the meteorological data open portal site (https://data.kma.go.kr) were used as research data. The daily maximum precipitation was 516.4 mm, and the average annual precipitation ranged from 1018.2 to 1328.7 mm. The northern part of the study area had higher precipitation than the southern part (Table 1).

Rain Gauge Network Using Entropy
The precipitation data were converted to discrete spaces to calculate entropy. Figure 3 compares the entropies for class intervals of 100, 200, 300, and 400. Figure 3a shows the entropy of the observed precipitation, and Figure 3b shows the entropy of the prediction error. For both cases, the amount of entropy information increased as the class interval increased; however, the pattern of the maximum

Rain Gauge Network Using Entropy
The precipitation data were converted to discrete spaces to calculate entropy. Figure 3 compares the entropies for class intervals of 100, 200, 300, and 400. Figure 3a shows the entropy of the observed precipitation, and Figure 3b shows the entropy of the prediction error. For both cases, the amount of entropy information increased as the class interval increased; however, the pattern of the maximum information amount according to the number of stations was the same. This study was conducted for a fixed class interval of 200. To find the optimal rain gauge stations of the study area using entropy, the first optimal rain gauge station was selected using marginal entropy. First, entropy was calculated using the basic "emp" method. The rain gauge station installed in Mungyeong was selected as the first station, as its marginal entropy (2.441) was higher than those of other regions. Daegu (2.766) was selected based on the marginal entropy that used the prediction error.
Two or more optimal rain gauge stations can be determined using conditional and joint entropies. As conditional entropy (methods A, B, D, and E) sequentially determines the optimal rain gauge stations, it takes a relatively short time compared with the joint entropy methods. Joint entropy (methods C and F) calculates the entropies of all combinations and selects those with the maximum values. Therefore, they require relatively more computing time than conditional entropy. In addition, the combinations of rain gauge stations were selected irregularly. In contrast, the joint entropy of the prediction error that considered the spatiality of stations sequentially selected the optimal rain gauge stations in a manner similar to conditional entropy ( Table 2).  To find the optimal rain gauge stations of the study area using entropy, the first optimal rain gauge station was selected using marginal entropy. First, entropy was calculated using the basic "emp" method. The rain gauge station installed in Mungyeong was selected as the first station, as its marginal entropy (2.441) was higher than those of other regions. Daegu (2.766) was selected based on the marginal entropy that used the prediction error.
Two or more optimal rain gauge stations can be determined using conditional and joint entropies. As conditional entropy (methods A, B, D, and E) sequentially determines the optimal rain gauge stations, it takes a relatively short time compared with the joint entropy methods. Joint entropy (methods C and F) calculates the entropies of all combinations and selects those with the maximum values. Therefore, they require relatively more computing time than conditional entropy. In addition, the combinations of rain gauge stations were selected irregularly. In contrast, the joint entropy of the prediction error that considered the spatiality of stations sequentially selected the optimal rain gauge stations in a manner similar to conditional entropy ( Table 2).
The "mm" and "shrink" methods showed the same results in all cases as the "emp" method (Tables A1 and A2). The calculated entropy values were different, but the selected rainfall network was the same. However, there was a slight difference in the case of the "sg" method (Table A3). In the process of selecting two rain gauge stations with conditional entropy using prediction error (method E), the rain gauge network was selected differently. As a result, the estimator differed in terms of the estimation method, but there was no significant difference noted in this study.

Rain Gauge Network Using Weighted Entropy
We also considered weighted entropy, as precipitation varies seasonally, and entropy performance varies depending on the amount of precipitation. On heavy precipitation days, entropy was better to select the optimal rain gauge network. We applied monthly average precipitation, regional precipitation, 1-regional precipitation, and regional monthly precipitation as weights. Table A4 shows the result of calculating the entropy using the monthly average precipitation as a weight. Methods A and F showed one + (better rank), method C showed two + and one − (worse rank). Method D showed one + and one −, and method E showed three − only. Method B did not show any change. In the calculation of entropy using observation data, we can see that a better rain gauge network was selected. However, it can be seen that the entropy calculation using the prediction error was slightly worse.  Table A5 shows the result of calculating the entropy using regional precipitation as a weight. Method A had one + and four − changes, method B had only two −, and method C had only four −. Methods D and E had two + and seven −, and method F had seven + and two −. Most methods showed worse results; however, method F showed improved results. Nevertheless, this was confirmed as a worse result than that of method B. Table A6 shows the results of calculating the entropy using 1-regional precipitation as a weight. Method B had one −; method C had one +. Method D had four −, method E had two − and three +, and method F had five −. Method A was unchanged. There were some improvements in methods C and E, but most methods showed poor results. Table A7 shows the results of calculating the entropy using regional monthly precipitation as a weight. Method A had one +, and Method B had two −. Method C had one + and two −, method D had one + and one −, and method E had two + and two −. Method F remained unchanged. There was a slight improvement in the results, except for those by method F, but there was also a deterioration. Although it looks similar, entropy had a bad appearance in method B, which was considered to be the best rank. The schematic of these results is shown in Table 3 below. Table 4 shows the top five station combinations by calculating the RMSEs for all combinations. In addition, the RMSEs of the rain gauge network combinations by the number of rain gauge stations are shown in Figure 4. When selecting one rain gauge station, the stations located in the centers of Daegu and Gyeongbuk regions showed excellent prediction performance. However, if one selected more than two rain gauge stations, you can see that stations located outside Daegu and Gyeongbuk were mainly selected. In other words, the station when selecting one rain gauge station was not often selected in more than one combination. Rain gauge stations located in Pohang (3) were frequently selected from a combination of two stations, and rain gauge stations in Bonghwa (5) were also frequently selected. From the combination of three rain gauge stations, Yeongju (6) was added and selected, and from the combination of five rain gauge stations, Uljin (1) can be confirmed to be included in all combinations. Pohang, Bonghwa, Yeongju, and Uljin are rain gauge stations located along the north and the coast in Daegu and Gyeongbuk region. In several combinations of rain gauge stations, Uiseong (9), Gumi (10), and Yeongcheon (11) were added, and the prediction performance of the rain gauge network was improved. Table 3. Results of increase and decrease of rank after application of weighted entropy.

Weight: Monthly Average Precipitation Weight: Precipitation by 1-Station
Method Weight: precipitation by station Weight: monthly average precipitation by station In the case of a rain gauge network using conditional entropy, Mungyeong (7) and Daegu (4) appear to have been selected first, and the performance seems to have deteriorated. In the case of method B, which was judged to be the best method for selecting the rain gauge network using entropy, the performance was improved with the addition of Bonghwa and Yeongju in the north. In other words, it is considered that the rain gauge stations in the north are important locations for rain gauge networks in Daegu and Gyeongbuk.  In the case of a rain gauge network using conditional entropy, Mungyeong (7) and Daegu (4) appear to have been selected first, and the performance seems to have deteriorated. In the case of method B, which was judged to be the best method for selecting the rain gauge network using entropy, the performance was improved with the addition of Bonghwa and Yeongju in the north. In other words, it is considered that the rain gauge stations in the north are important locations for rain gauge networks in Daegu and Gyeongbuk.

Optimal Number of Rain Gauge Stations
For comparing the six methods, the errors generated by predicting the average daily precipitation of the unselected stations based on the selected stations were evaluated through MAE, RMSE, bias, and Corr ( Figure 5).

Optimal Number of Rain Gauge Stations
For comparing the six methods, the errors generated by predicting the average daily precipitation of the unselected stations based on the selected stations were evaluated through MAE, RMSE, bias, and Corr ( Figure 5). Conditional entropy methods (methods A, B, D, and E) generally exhibited similar results. For joint entropy methods (methods C and F), the results were similar until the number of stations was three; different results were observed for a higher number of stations. The conditional entropy methods exhibited better results than joint entropy because MAE and RMSE for these were lower, Conditional entropy methods (methods A, B, D, and E) generally exhibited similar results. For joint entropy methods (methods C and F), the results were similar until the number of stations was three; different results were observed for a higher number of stations. The conditional entropy methods exhibited better results than joint entropy because MAE and RMSE for these were lower, and Corr was higher. With regard to the bias, methods A and B showed a tendency to predict lower values than the observed values, while methods C, D, E, and F predicted higher values. For the RMSE, methods A and D exhibited similar results; however, they started to show different results when the number of stations reached nine. The RMSE of method A became lower and exhibited excellent performance, with Pohang being the location of the added station. For method D, Pohang was the last station added. This was because it exhibited the highest daily maximum precipitation of 516.4 mm based on the research data mentioned above. In method B, the rain gauge stations installed in coastal areas were first selected. As for the RMSE values of methods A and B, method A exhibited better results when the number of stations was less than six, and method B when the number of stations was six or more. In other words, methods A and B exhibited relatively good performance. Table 5 and Figure 6 show the RMSE results for determining the optimal number of stations. S shows the actually decreasing amount under the assumption that RMSE decreases at equal intervals as the number of rain gauge stations increases. In other words, it is meaningful to find the values of S higher than one and higher than the average value. For method A, the numbers of stations where S was higher than one were 2, 3, 9, and 10. The number of stations that exhibited the highest value was two (4.436), followed by nine (1.334), three (1.173), and 10 (1.162). For method B, the numbers of stations where S was higher than one were 2, 4, 6, and 9. The number of stations that showed the highest value was two (4.119), followed by six (1.912), four (1.300), and nine (1.106).    R is the predictive power of i rain gauge stations by comparing the total RMSE reduction with the RMSE reduction when i stations are used. For method A, R was 83.45% and represented more than 80% when the number of stations was nine or higher. For method B, R was 81.26% and represented more than 80% when the number of stations was six or higher. Subsequently, R did not significantly increase even with an increase in the number of stations. Hence, six stations were judged to be appropriate. In addition, when the farthest R values from the average curve were compared, six and seven stations had the best predictive power.
The optimal number of stations was determined to increase the accuracy of observation while minimizing costs. If the minimum number of stations is operated considering the accuracy of rainfall observation, then the optimal number of stations is six. This number was selected based on the stations of method B, because method B exhibited excellent performance for up to six stations. The optimal rain gauge stations installed in the Daegu and Gyeongbuk area were Mungyeong, Pohang, Uljin, Daegu, Yeongdeok, and Bonghwa. Figure 7 shows the optimal rain gauge stations in the order of selection. The selected stations were located on the outskirts of the Daegu and Gyeongbuk area.

Conclusions
In this study, the uncertainties were calculated and compared according to the various estimators of entropy. There was no significant difference between the entropy estimators "emp," "mm," "shrink," and "sg." In addition, weighted entropy was calculated to determine the uncertainty by applying monthly average precipitation, regional precipitation, and regional monthly precipitation as weights. In order to focus on regions with little rainfall, 1-regional precipitation was applied as a weight as opposed to regional precipitation. In this case, there was a tendency to improve, but also a tendency to get worse. There was no significant difference noted when using method B, which showed the best results.
The RMSEs of all combinations of rain gauge stations were calculated to determine the patterns of the top five combinations. The main locations for rain gauge stations were Pohang, Bonghwa, Yeongju, and Uljin, which are located along the north and the coast of Daegu and Gyeongbuk regions. Uiseong, Gumi, and Yeongcheon in the central region can also be considered as locations for rain gauge stations.
The study also proposed a method for determining the optimal number of stations by calculating the RMSE reduction and (S) reduction ratio (R). The optimal rain gauge network was selected using entropy, and the optimal number of rain gauge stations was determined through RMSE for the efficient operation of rain gauge stations. RMSE was obtained by predicting the unselected rain gauge stations using the observation data of the stations selected according to the optimal number of stations. When the value of S was higher than one according to the number of rain gauge stations, the case with R higher than 80% was selected to determine the optimal number of rain gauge stations. When the optimal number of rain gauge stations was determined for the 11 stations in the Daegu and Gyeongbuk area, 80% performance could be obtained with six stations, and performance

Conclusions
In this study, the uncertainties were calculated and compared according to the various estimators of entropy. There was no significant difference between the entropy estimators "emp," "mm," "shrink," and "sg." In addition, weighted entropy was calculated to determine the uncertainty by applying monthly average precipitation, regional precipitation, and regional monthly precipitation as weights. In order to focus on regions with little rainfall, 1-regional precipitation was applied as a weight as opposed to regional precipitation. In this case, there was a tendency to improve, but also a tendency to get worse. There was no significant difference noted when using method B, which showed the best results.
The RMSEs of all combinations of rain gauge stations were calculated to determine the patterns of the top five combinations. The main locations for rain gauge stations were Pohang, Bonghwa, Yeongju, and Uljin, which are located along the north and the coast of Daegu and Gyeongbuk regions. Uiseong, Gumi, and Yeongcheon in the central region can also be considered as locations for rain gauge stations.
The study also proposed a method for determining the optimal number of stations by calculating the RMSE reduction and (S) reduction ratio (R). The optimal rain gauge network was selected using entropy, and the optimal number of rain gauge stations was determined through RMSE for the efficient operation of rain gauge stations. RMSE was obtained by predicting the unselected rain gauge stations using the observation data of the stations selected according to the optimal number of stations. When the value of S was higher than one according to the number of rain gauge stations, the case with R higher than 80% was selected to determine the optimal number of rain gauge stations. When the optimal number of rain gauge stations was determined for the 11 stations in the Daegu and Gyeongbuk area, 80% performance could be obtained with six stations, and performance similar to that of all stations could be obtained with only six stations. The peripheral stations were selected first, followed by the central stations. The uncertainties of the peripheral stations were found to be higher because of large fluctuations.
It was confirmed that the conditional entropy of observation data was excellent for selecting the optimal rain gauge network and the proportion of RMSE was used for the selection of the optimal number of rain gauge stations. They can contribute to the improvement of the accuracy of hydrological research and the efficient operation of rain gauge stations. Future research will allow efficient rain gauge networks to be designed if radar data are used.