Spatiotemporal Scaling Effect on Rainfall Network Design Using Entropy

Because of high variation in mountainous areas, rainfall data at different spatiotemporal scales may yield potential uncertainty for network design. However, few studies focus on the scaling effect on both the spatial and the temporal scale. By calculating the maximum joint entropy of hourly typhoon events, monthly, six dry and wet months and annual rainfall between 1992 and 2012 for 1-, 3-, and 5-km grids, the relocated candidate rain gauges in the National Taiwan University Experimental Forest of Central Taiwan are prioritized. The results show: (1) the network exhibits different locations for first prioritized candidate rain gauges for different spatiotemporal scales; (2) the effect of spatial scales is insignificant compared to temporal scales; and (3) a smaller number and a lower percentage of required stations (PRS) reach stable joint entropy for a long duration at finer spatial scale. Prioritized candidate rain gauges provide key reference points for adjusting the network to capture more accurate information and minimize redundancy.


Introduction
The most crucial information required for planning, constructing, and operating hydraulic structures is rainfall data.The objective of a rainfall network is to design hydraulic structures efficiently and economically, according to the researched rainfall data [1].However, because of topography, rain patterns, and effects of time, the spatial and temporal distributions of precipitation are uneven.scaling issue for network design.The proposed scheme for analyzing an optimal rainfall network involves kriging to generate the rainfall data of the candidate rain gauge stations, and entropy to evaluate the uncertainty in different combinations of spatial and temporal scales.In this study, we evaluate the spatiotemporal scaling effect on rain gauge network design and suggest optimal configuration.In particular, because records of several typhoon rainfall events, together with monthly, six dry and wet months, and annual rainfall, were analyzed, an optimal rain gauge network can be optimized by the hydrologic or climatic consideration, either short-or long-duration based.Once the optimal network design with maximum information and minimum redundancy is established, rainfall characteristics can be obtained to provide the key reference for the hydrologic planning for the watershed.

Methodology
This study involved the following steps: (1) determining the different combinations of spatial and temporal scales and delineating the study area; (2) applying kriging to existing rainfall data to generate the rainfall data of the candidate rain gauge stations for a certain combination; (3) determining the priority sequence of the candidate rain gauge stations and evaluating the minimum number required; and (4) summarizing the spatiotemporal scaling effect.

Spatiotemporal Scale
Stewart et al. [38] pointed out two distinct problems involved in scaling: (1) the requirement for a set of concepts that will allow the correct partitioning of the water balance at any given scale; and (2) the concepts that will allow information gathered at one scale to be used in making predictions at other scales.To solve these two problems, the possible scaling effect should be addressed first.In this study, spatial scales for 1 × 1, 3 × 3, and 5 × 5 km grids were partitioned to delineate a study area comprising 327.86 km 2 .A total of 346, 45, and 20 grids were created, respectively.The center of each grid was assigned the location of a candidate rain gauge station.At each of the three different spatial scales, hourly records for typhoon events, monthly, six dry and wet months and annual rainfall were individually analyzed.Hourly data are used to investigate fluctuations of short duration for extreme events while the monthly, six dry and wet months and annual data depict the possible seasonal and annual trends or variations.Therefore, a total of fifteen combinations for different conditions are evaluated.

Kriging
Kriging is a geostatistical method for interpolating random spatial variations in rainfall data to estimate linear grid points [39][40][41][42][43].In this study, we use the exponential model to fit the semi-variogram from the measurements of rainfall [24]: where a denotes the range parameter, and b denotes the sill; that is, the critical variance in the spatial independence is as high as 3a.
In particular, three independent short-term rainfall measurements from the 50 rain gauges were used to verify the kriging results.

Entropy
The information entropy introduced by Shannon [44,45] is based on probabilities.The entropy value is used to estimate uncertainties: where H(x) is the entropy value and p i is probability.
Shannon's entropy [45] is a measure of information content, which depends on the current level of knowledge or uncertainty.Mathematically, the amount of information is inversely related to the probability of occurrence.The basic assumptions of the entropy are the amount of information, I(p), being a real nonnegative measure, additive, and a continuous function of probability p.For the rational numbers, the function of I(p) obeys the same formula as the log function.For any discrete probability distribution, Shannon's entropy is expressed as: where p i is the probability of event x i .Equation ( 3) refers only to the information state before receiving data.Thus, H(x 1 ) measures the average amount of information.H(x 1 ) = 0 when the event is certain (p i = 0 or 1) and there is no surprise.Because of the uniformity resulting in an inability to believe any outcome being more likely than any other, uniform distribution results correspond to maximum ignorance.Maximum entropy can be seen as a generalization of the classical principle of indifference and can be used to obtain unbiased probability assessments.The rainfall information of two rain gauge stations may be overlapped.Therefore the rainfall information of the two rain gauge stations can become two variables x 1 and x 2 .Corresponding to Equation (3), the joint entropy of two variables is [24]: For three variables of x 1 , x 2 , and x 3 , the joint entropy is [24]: (5) where ijk p is the joint probability of x 1 , x 2 , and x 3 .
When the x 1 rain gauge station is examined to record rainfall data, the remaining uncertainty of the x 2 rain gauge station will be exhibited by the conditional entropy.The probability of x 2 under the influence of x 1 's condition can be shown as below [24]: The joint entropy that can measure the amount of information of the joint events is derived by Equation ( 7) with conditional probability [24]: where H(x 2 |x 1 ) is the conditional entropy of x 2 given x 1 .To find out the amount of mutual or overlapped information of the two stations, a transferable information calculation can be utilized to do so, as if using the x 1 rain gauge station to forecast information from the x 2 rain gauge station.

Optimization of Network Design
The significance of each rain gauge station in its network can be determined from its entropy value.The greater the value, the higher the uncertainty; thus, each station is prioritized according to the descending order of entropy values.After confirming the first station, which has the greatest entropy value, the rest are selected and added one at a time, according to the inferiority of the system and overlapped information.To minimize the system's uncertainty, the standardization of determining the second most important station in the sequence is set as [24]: Then, the n-th to be added is [24]: In the calculation, therefore, the greatest  can be selected to arrange the order of data overlap of all the stations, in which the station with minimum overlap is the first to be added to the network, and the station with maximum overlap is the last to be added.
The sequence of prioritized stations determined by the entropy values can also be used as the sequence of station elimination.In each selection stage, the objective is to ascertain the maximum entropy value of each selected station.Stations are subsequently added according to the gradual increase in the joint entropies.However, the joint entropy does not increase sufficiently to show any differences when it reaches a certain number and attains a definite value.In other words, adding more stations has a very limited effect on the network system.The exponential model is applied to the correlation graph of the station numbers and to detect the critical data volume and the supposed number of stations.The coefficient k m denotes the specific value of the m-th entropy value, as compared with all entropy values of the study area; it is assumed to be used as the reveal data volume of the m-th station.Assuming n stations in the study area and a number of basic stations have been selected, and the addition of new candidate rain gauge stations is prioritized on the basis of the entropy value, the definition of k m in this study can be expressed as [24]: ,  , , , , , , . Hence, in determining the number of stations in a catchment area, a threshold value * m k must be set, and by setting a limit such as k m > * m k , the number can be secured.
On the other hand, the threshold value is determined by the increasing efficiency revealed by the increment of k m .In this study, k m is set to 0.95, which is 95% of the information.Hence, if the number of rain gauge stations in the existing network is greater than that in the candidate network, those existing stations sequenced behind the candidate station are to be eliminated; otherwise, more stations are added.
In this study, there are fifteen combinations of different spatiotemporal scales.To evaluate the efficiency to the number k m , we define the percentage of required stations reaching 95% information PRS (%) as: PRS is used to evaluate the efficiency of the sequence of prioritized stations.The number of prioritized stations may be the same at different combinations of spatiotemporal scales.However, lower PRS can be regarded as more efficient to reach the total 95% threshold value of measured uncertainty with fewer stations.

Study Area and Data
To illustrate and evaluate the proposed model, rainfall data within and near the National Taiwan University Experimental Forest (NTUEF) is used to demonstrate an optimal rainfall network.Located upstream on the Zhuoshui River, the geographical site lies between 23°48'49" and 23°28'10"N and 121°45'16" and 121°59'15"E, accounting for an approximate area of 327.86 km 2 (Figure 1).The catchment area extends from the Mt.Jade (elevation 3952 m, the highest peak in Taiwan) to Gueitsuto (220 m).Coursing through major forestland within the Chenyulan catchment, the river flows over 41.4 km, with an average slope of 2.7%.Most of the catchment is covered by mature forests, and the geological features of this region include a complex suite of rocks, such as granite, gneiss, schist, sandstone, conglomerate, and marl.Because of differences in altitude, the climate is divided into subtropical, warm temperate, cold temperate, subfrigid, and frigid zones.The mean annual temperature ranges between a low of 4 °C (Mt.Jade) and a high of 23 °C (Jushan).The NTUEF area usually experiences enhanced rainfall, and it receives an average (1992-2012) rainfall of 2408 mm; however, the rainfall is unevenly distributed, with more than 70% of it occurring between May and September.The annual rainfall increases from north to south and east to west, and consists of several centers (Figure 2a).During the period 1992 to 2012, the most severe rainfall event, Typhoon Morakot, which occurred 5 August to 9 August 2009, poured almost the entire average annual rainfall within a few days (Figure 2b).The largest rainfall recorded by the Alishan weather station (bottom left near the boundary in Figure 1) is 2884 mm, which caused a large-scale landslide and debris flow.
There are 50 rain gauge stations within or near the NTUEF territory; they are listed in Table 1.These stations are operated by the Central Weather Bureau and the NTUEF, and include the microclimate station and rain gauges set by the authors.Figure 1 also shows the locations of the 50 existing and candidate rain gauge station from 1 × 1, 3 × 3, and 5 × 5 km.For integrity, the rainfall data from 1992 to 2012 is used in this study.In addition, only hourly records of seven severe typhoon events during this period are selected to evaluate the short-duration rainfall (Table 2).Three criterions were considered to select these seven typhoons (1) the most of typhoon period cover the study area; (2) its 24-h rainfall is approaching or over 600 mm; (3) recorded massive disaster such as landslide and debris flow occurred in the study area.The sample size for hourly, monthly, and annual rainfall (dry and six wet months) is 385, 252, and 21, respectively.

Validation of Kriging Estimates
Kriging was used in this study to estimate the rainfall at ungauged sites.However, how to validate the estimates using existing observation data becomes another important issue.The fitted parameters and kriging variance are listed in Table 3. From the basic statistical data in Table 1 and spatial distribution in Figure 2, the range of hourly, monthly, annual rainfall is quite large, the b sill parameter shows increase trend as the temporal scale enlarges; the a range parameter shows no significant variation except for six dry months.During six dry months, the rainfall regime dominated mainly by frontal rain lead to low rainfall intensity in winter and spring which yields wide influence range.Kriging variance in wet six months is far larger than that of dry six months, showing the significant rainfall amount distribution for dry (30%) and wet period (70%).Besides the 50 rain gauges, rainfall measurements of another three temporary rain gauge stations located in Xitou were used to validate the kriging estimates.By using the semi-variogram constructed from the 50 rain gauges, estimates of these three sites were then obtained and compared with actual point measurements.Monthly rainfall validations are shown in Figure 3.Although kriging results were underestimated by 11% to 24%, considering the high rainfall variability in mountainous areas, the correlation between estimates and measurements were high enough to be adjusted using simple regression.

Uncertainty Distributed in Space
The entropy was used in this study to analyze the uncertainty of rainfall for individual gauges.The result shows the spatial distribution differs from typhoon hourly, monthly, six dry and wet monthly and annual rainfall, as illustrated in Figure 4.For hourly data, the contour line is comparably smooth, and the entropy values increase from north (1.05) to south (1.50) and east (1.35) to west (1.95).For monthly rainfall, the contour line changes locally, and the values are higher than those of typhoon hourly values, especially in northern and eastern areas, which indicates higher uncertainty exists in temporal scales.As aforementioned, the rainy season in the study area ranges from May to September, resulting in uneven monthly distribution.In addition, the entropy contour line of annual rainfall is irregularly distributed.However, the pattern of the annual contour map is distinct from the other two, revealing the larger local variation even with small entropy values.Compared with Figure 2a, around the rainfall center in Figure 4c, the contour line is also comparably smooth.This indicates that the entropy value in surrounding areas with large rainfall is smaller.If the network design is based on the uncertainty, the priority in this area is not so important, and vice versa.

Spatial Scale Effect
If the same temporal scale was considered, the variation at the spatial scale is not so significant.The effect decreases as the temporal scale increases.Because of different candidate gauge stations for 1, 3 an 5-km, only the first 20 prioritized candidate rain gauge stations were compared for the spatial scale effect shown in Figure 5.For hourly data, the variation in the 5-km grid is larger than in the 3-km grid and in the 1-km grid.Before the eighth selected candidate rain gauge, the joint entropy value of 1-, 3-, and 5-km grid oscillates.However, the joint entropy of the 5-km grid is higher than that of 3-km grid and the 1-km gird after the eighth selected candidate rain gauges.The joint entropy of the 3-km grid is always higher than that of the 1-km grid.For the annual scale, no significant difference was found among the 1-, 3-, and 5-km grids.In general, for the first 20 prioritized candidate rain gauge stations, the maximum entropy of hourly, monthly, dry and wet six months and annual rainfall is around 4.5, 4.3, and 3.0, respectively.It implies the uncertainty of these two temporal scales are higher than the long-term scale and also implies that the spatial scale effect is less for long temporal scales and fewer rain gauges are required for long-term monitoring.In particular, the hourly data were collected during typhoon events, regarded as rainy and comparably large, and exhibiting diverse variability in high terrain relief.Within the 5-km grid, the joint entropy is larger because of more uncertainty existing in short-period rainfall.The rain gauge network is suggested to increase the number of stations to obtain more detailed variation for hydrologic design and rainfall forecasting.

Temporal Scale Effect
Based on the same spatial scale, more candidate rain gauge stations are needed to reach the stable value of joint entropy for a short temporal scale.In Figure 6, fewer gauge stations are required to reach the stable value of joint entropy.The trends coincide with each other between different spatial scales.In the 5-km grid, only 4 stations are needed to reach maximum joint entropy, but more than 300 gauges are needed to reach maximum joint entropy in the 1-km grid.The maximum joint entropy value did not change much between the 1-, 3-, and 5-km grids, but that of the annual rainfall, dry and wet six months (around 3) is separated from hourly and monthly rainfall.The spatial scaling effect is also proved again not to be as significant at the temporal scale.

Optimal Rain Gauge Station Network of the NTUEF Area
The priorities of the rain gauge stations are determined by calculating the joint entropies of the study area.The priority of the rain gauge stations obtained on the basis of the entropy can also become the sequence of removal of the stations.As aforementioned, different candidate rain gauge stations exist for different spatial scales.The number and PRS are calculated to express the percentage of required gauge stations at the three spatial scales listed in Table 4. Figure 7 illustrates the first ten prioritized gauge stations in 1-, 3-, and 5-km grids.Figure 7a demonstrates the first prioritized candidate station almost located at the southwestern corner with all five temporal scales at the 1-km grid.Hourly, monthly and six dry months temporal scale are located around rain gauge No. 1.The annual rainfall around this region is over 3600 mm and also with the largest rainfall in the study area.However, the first prioritized candidate station for six wet months and annual temporal scale is located at rain gauge northern No.19 and eastern No.18, respectively.Three groups along the eastern boundary were clearly identified, implying that the existing rain gauge stations were crucial across temporal scales; however, no distinct groups were found in Figure 7b,c.More gauge stations are needed for these three concentrated groups.In addition, the second large prioritized candidate gauge stations were quite different across 1, 3, and 5 km.It can be inferred that the existing rain gauge stations not in the neighboring area of prioritized candidates should be addressed for the issue of stopping observation or abandonment.The decision for optimal rain gauge network can be made according to the prioritized and overlapped gauge stations across five temporal scales.If the number of stations in the network is greater than the minimum number of candidate stations, then the stations exceeding the minimum candidate number can be processed for elimination.In Table 4, PRS exceeding the threshold value of 95% across different temporal scales is smaller at finer spatial scales.Although the entropy value changes in Figure 4e, only three to four gauges are enough to represent the variability of annual rainfall.The PRS of hourly, monthly, six dry and wet monthly and annual rainfall increases as the spatial scale enlarges.For the same spatial scale, the PRS for shorter temporal scale as hour and month is far larger than longer temporal scale; for the same temporal scale, the PRS increase as the spatial enlarges.It is noted the six rainy months need less candidate stations than six dry months which implies more variation and uncertainty of rainfall existed during dry seasons.According to the third category for the WMO standard aforementioned, the study area of 327.86 km 2 is equivalent to 13.1 gauges, very close to the number analyzed for hourly and monthly rainfall in the 5-km grid in Table 4.More rainfall information can be obtained as the required rain gauges increased for 3-and 1-km scale.However, for efficiency, 13 and 14 candidate stations at monthly and hourly at the 5-km grid, equivalent to one-fourth of existing rain gauge stations, are enough for general use, respectively; for hydrologic design and using the prioritized network at 3-or 1-km grid, the number will double or even more.Compromising the accuracy and network density, 13 candidate stations were identified as the optimal network according to the prioritized and overlapped gauge stations across all spatiotemporal scales in Figure 8.Compared with Figure 7, these 13 candidate stations locate very closely with three concentrated groups found at 1-km scale.Kay and Kutiel [46] suggested a new approach in mapping climate maps of precipitation and found the actual rainfall field is more closely represented if more rainfall events and dense grid.In this study, the 1-km grid can capture more rainfall uncertainty than 3-and 5-km grid but with low PRS, indicating more candidate rain stations need to yield same accuracy.Kutiel and Kay [47] found no consistent recommendation of network design is best for all purposes.From Table 4 we demonstrate the PRS for fifteen combination of spatiotemporal scale, for best efficiency and low cost of rain gauge configuration, we should choose lowest PRS both at spatial and temporal scale, which means 1-km at six months or annual scale is best choice.However, it may only satisfy the evaluation or research for long-term climate and fail to capture need information for short-term such as hydrologic forecast.
Compared with Figure 7, these 13 candidate stations locate very closely with three concentrated groups found at 1-km scale.Kay and Kutiel [46] suggested a new approach in mapping climate maps of precipitation and found the actual rainfall field is more closely represented if more rainfall events and dense grid.In this study, the 1-km grid can capture more rainfall uncertainty than 3-and 5-km grid but with low PRS, indicating more candidate rain stations need to yield same accuracy.Kutiel and Kay [47] found no consistent recommendation of network design is best for all purposes.From Table 4 we demonstrate the PRS for fifteen combination of spatiotemporal scale, for best efficiency and low cost of rain gauge configuration, we should choose lowest PRS both at spatial and temporal scale, which means 1-km at six months or annual scale is best choice.However, it may only satisfy the evaluation or research for long-term climate and fail to capture need information for short-term such as hydrologic forecast.Compared with Figure 7, these 13 candidate stations locate very closely with three concentrated groups found at 1-km scale.Kay and Kutiel [46] suggested a new approach in mapping climate maps of precipitation and found the actual rainfall field is more closely represented if more rainfall events and dense grid.In this study, the 1-km grid can capture more rainfall uncertainty than 3-and 5-km grid but with low PRS, indicating more candidate rain stations needed to yield same accuracy.Kutiel and Kay [47] found no consistent recommendation of network design is best for all purposes.From Table 4 we demonstrate the PRS for fifteen combinations of spatiotemporal scale, for best efficiency and low cost of rain gauge configuration, we should choose lowest PRS both at spatial and temporal scale, which means 1-km at six months or annual scale is best choice.However, it may only satisfy the evaluation or research for long-term climate and fail to capture need information for short-term such as hydrologic forecast.
The authors did not analyze all the hourly rainfall records for three reasons.First, for hydrologic design and disaster warning and prevention, the records with "rain" are far more important than "no-rain."Second, if all the hourly data are considered, the sample size will be larger than 183,000, which contains too many zeros or tiny rain records (e.g., 0.5 mm).The discrete distribution of data will cause the bias in calculating the entropy with respect to monthly and annual data in Equation (2).Third, the rainfall of typhoon events covered most of the study area for constructing semi-variogram in Equation (1), preventing inadequate semi-variogram resulting from rainfall only in some local areas.Despite above three reasons, hourly data analyzed in this study is still part of the whole dataset and just represent the network design for rainy hours.For No. 1 rain gauge (Alishan), there were only about half rainy days between 1992 and 2012 (total 7671 days).We do not include daily rainfall data for the same reasons even though it may be a suitable temporal scale between hour and month.Compared with the work by Cheng et al. [5] at hourly and annual scale, this study only includes hourly data for typhoons with the selecting criterion that over two thirds records are non-zero data, we neglected the

Figure 1 .
Figure 1.Candidate and existing rain gauges in the study area NTUEF.

Figure 2 .
Figure 2. Contour maps of (a) average annual rainfall between 1992 and 2012; and (b) Typhoon Morakot rainfall of 5-9 August 2009 at the NTUEF area.

Figure 3 .
Figure 3. Validation of monthly rainfall at three stations in Xitou Tract by Ordinary Kriging; (a) Phoenix 3.8 K Station (January 2004 to September 2005); (b) Liu Long Gully Station (January 2004 to September 2005); and (c) Upper Station of University Gully (December 2004 to September 2005).

Figure 5 .
Figure 5. Variation of joint entropy for first 20 prioritized candidate gauges at (a) hourly, (b) monthly; (c) six dry monthly; (d) six wet monthly; and (e) annual scale.

Table 1 .
Summary description of rain gauge stations in Study Area.

Table 2 .
Typhoon events in this study between 1996 and 2012.

Table 3 .
Details of kriging estimates.

Table 4 .
Number and Percentage of Required Stations (PRS) at different spatiotemporal scales.