Entropy Applications to Water Monitoring Network Design: A Review

.


Introduction
Water monitoring networks account for all aspects of the water-related measurement system including precipitation, streamflow, water quality, groundwater, soil moisture, etc. [1][2][3].Adequate water monitoring networks and quality data from them comprise one of the first and primary steps towards efficient water resource management.The basic principles of water monitoring network design have simply been a number of monitoring stations, locations of the stations and data period or sampling frequency [4,5].Recent technological advances have allowed gradual transitions from manual sampling to the automated observations, while some water quality parameters still require field and/or lab analyses of water or other environmental samples.One may expect that the more data we collect, the more water resource problems are solved efficiently.However, this is not always true because irrelevant, inadequate or inefficient data in the wrong location or at the wrong time can inhibit the quality of a dataset [1,6,7].More seriously, the decline of water monitoring networks has been a general trend due to financial limitations and changes of monitoring priority [8][9][10].Therefore, determining the adequate number of monitoring stations and their locations has become critical to network design.However, a standardized methodology for a proper water monitoring network design process has not been drawn yet due to the practical and socioeconomic complexity in diverse design cases [1,11].
The existing reviews have investigated the broad range of the water monitoring network design methodologies, such as statistical analysis, spatial interpolation, application of information theory, optimization techniques, physiographic analysis, user survey or expert recommendations and combinations of multiple methods [4,6,[10][11][12][13][14][15].A prior comprehensive review by Mishra and Coulibaly [10] reviewed evidence of declining hydrometric network density, highlighted the importance of quality data from well-designed networks and considered a range of approaches by which networks were designed.They also compared statistical, spatial interpolation, physiographic, sampling-driven and entropy-based approaches to hydrometric network design.Mishra and Coulibaly [10] were able to draw several conclusions about the importance of high quality hydrometric data for water resource management that remain valid.They also concluded that one of the most promising approaches for network design was the application of entropy methods highlighting early studies using the principle of maximum entropy and information transfer.Therefore, this review focuses on the recent studies that have applied information theory.Information theory was initially developed by Shannon in 1948 [16] to measure the information content in a dataset and has been applied to solve water resource problems.Recently, its applications extended to water monitoring network design by adopting the concept that the entropy would be able to explain the inherent information content in a monitoring station or a monitoring network.The basic objective has naturally been to have the maximum amount of information.In other entropy approaches, stations in a monitoring network would have the least sharable or common information, which is called transinformation.To achieve this, the stations should be as independent from each other as possible.
The scope of this paper includes water monitoring network design and evaluation studies that (1) applied entropy theory in the design process and (2) were published after the existing comprehensive review by Mishra and Coulibaly in 2009 [10].To the best of our knowledge, no review exists that has focused on entropy applications to water monitoring network design previous to the 2009 study by Mishra and Coulibaly [10].However, there has been considerable progress in the application of entropy theory to monitoring network design following the previous review, including new entropy-based measures, optimization techniques and approaches to estimating information content at ungauged stations.Therefore, a need was identified to consolidate knowledge and recent advances on the subject.For publications prior to 2010, the reader is referred to Mishra and Coulibaly [10].This review firstly describes entropy concepts and various terms that are typically used in network design.The previous studies are then summarized by categorizing the type of networks; i.e., precipitation, streamflow and water level, soil moisture and groundwater and water quality monitoring networks.The integrated design method for multiple types of networks is also reviewed.We define some terminology hereafter to ensure a common understanding for the readers.
The term network evaluation is used when the network quality is assessed without changing any station, while network design is a general term that suggests some changes in stations.Specifically, network design includes network reduction, network expansion and network redesign.Network reduction is applied where some monitoring stations should be removed from the network.On the other hand, if financial flexibility meets the monitoring needs, further stations can be added to the existing network, called network expansion.Network redesign refers to rearranging stations without changing the number of stations.The term optimal network is to be used only if the network consists of optimal locations of stations that are identified by the actual use of an optimization technique.

Entropy Concept
In thermodynamics, entropy has been understood as a measure of disorder or randomness of a system.Shannon [16] extended the entropy concept to information theory by recognizing that uncertainty in a system will be decreased when information is added to the system.Therefore, the term entropy in information theory introduced by Shannon [16] in 1948 describes the amount of information content in a random variable.The likelihood of an event is typically given by probability p.If a probability of an event is very high, such as 0.9999 or one, one will not be surprised, but can certainly anticipate the outcome.On the other hand, any low probability event is highly uncertain, so that a considerable amount of information can be given if this happens.Hence, the information from an event that occurred is inversely related to its probability, 1/p [17].Suppose that there are two independent events A and B with their probabilities p A and p B , respectively.The probability of the joint occurrence of the events A and B can be p A p B , and the information gained by the joint event is then 1/(p A p B ).However, the sum of information from each individual event is not equal to the information from the joint event, that is: The only transition that will make both sides of Equation (1) be equal is the logarithm [17][18][19], which can be written as: Likewise, Tribus [20] showed that the uncertainty of an event with probability p is − log p, which became a basis of the Shannon entropy, which is further described hereafter.

Marginal Entropy
When information is provided in a system, one can expect that the uncertainty of the system would be reduced; therefore, the amount of information that was given to a system by knowing a variable is called marginal entropy.If a random variable X is expected to have N outcomes with a probability distribution P = {p 1 , p 2 , • • • , p N }, the (weighted) average information provided by the N joint events is given by: where H(X) is the marginal entropy of a random variable X.Any base of the logarithm can be used in Equation ( 3), the choice depending on the problem given.In binary questions (i.e., yes or no questions), the base of two should be used, and the corresponding unit of entropy is bit.Similarly, unit trit for base 3, unit nat for base e and unit decibels or decit for base 10 are some example units of information.Recall that this review covers entropy applications for hydrometric network design, and the expected answer of the design process can be either "use/include/install the station" and "do not use/include/install the station" for the network to be optimal.Therefore, the logarithm in Shannon entropy calculation for hydrometric network design is most appropriate with a base of two.Then, the H(X) value from Equation (3) will be understood as the information contents of a station X that can be delivered if installed.If a variable K has a known value, the probability of an event will be one, while all the other alternative probabilities are zero.The information content in the variable K, H(K), will be zero from Equation (3) representing that there is no uncertainty or a certain outcome.On the other hand, if a variable U has a uniform distribution (i.e., probability of each event is equal, 1/N), the entropy of the variable U will be: The value of Equation ( 4) is often called as maximum entropy or saturated entropy.These two entropies, H(K) and H(U), define the minimum and maximum boundaries of entropy values, that is:

Multivariate Joint Entropy
While the marginal entropy described in Section 2.2 explains a univariate entropy, one can imagine how to calculate entropy values in a bivariate or a multivariate case.The total information contents from N variables can be calculated by using joint probability instead of univariate probability in Equation (3), given by: where is the joint probability of N variables and n 1 , n 2 , • • • , n N are the numbers of class intervals of corresponding variable distributions [21].If all variables are stochastically independent, the joint entropy from Equation ( 6) will be equal to the sum of marginal entropies, which becomes the maximum value of joint entropy.Therefore, the joint entropy is bounded by [21]:

Conditional Entropy
Conditional entropy explains a measure of information content of one variable that is not deliverable by other variables.If two random variables, A and B, are correlated, providing information from one variable may clear some uncertainty that the other variable has.In the case of no correlation between variables, the conditional entropy is equal to marginal entropy.That is: where H(A|B) is conditional entropy of the variable A when the information contents of the variable B is given.One can rewrite Equation (8) as: Furthermore, conditional entropy can be also presented mathematically using joint and conditional probabilities and Bayes theorem as:

Transinformation
The two variables, A and B, described in Section 2.4 will have some common or shared information, which is called transinformation or mutual information, because they are correlated.
where T(A, B) is transinformation between the variables A and B. The larger the transinformation is, the higher those variables depend on each other.In other words, the transinformation indicates how much information content is transferrable from other variables.Similar to Equation (10), transinformation shall be written as [19]: Transinformation is typically used for measuring mutual information between two variables or two groups of variables as the generalized form for multivariate transinformation is given as:

Total Correlation
While transinformation and mutual information have the same definition, total correlation is not equivalent to them as the total correlation is a simple estimate that defines the amount of shared information typically of multiple variables.Simply, the total correlation is defined by the difference between the sum of marginal entropy of N variables and their joint entropy [22,23], which is given as: If N = 2 in Equation ( 14), the total correlation will be equal to the transinformation or mutual information.However, the transinformation is only meaningful to two random variables as shown in Equations ( 11) to (13); therefore, the total correlation and the transinformation values would be different if N > 2.

Other Entropy Terms
The entropy terms described above (i.e., marginal entropy, joint entropy, conditional entropy, transinformation and total correlation) are the basic measures that have been typically used in entropy applications to water monitoring network design.While many studies developed specific approaches and applied for case studies using the basic entropy terms, some have extended the terms beyond them by deriving from or combining the basic measures.The detailed descriptions of the extended entropy terms are not included in this review, but briefly explained when needed in Section 3. Interested readers may refer to the original references.

Applications of Entropy to Water Monitoring Network Design
This section summarizes the recent applications of entropy theory to design water monitoring networks.The review was categorized by the types of networks, such as precipitation, streamflow or water level, soil moisture or groundwater and water quality networks.Then later, a hybrid design method for multivariate water monitoring networks was discussed.Table 1 presents brief summaries including network types, methods and key findings of the selected research articles that applied entropy theory for designing the water monitoring network and were published in 2010 or after to cover the most recent contributions since the existing review [10].-Using POMCE with two variants of Hydrus-3D, additional monitoring stations were added where the difference between the models was greatest

Precipitation Networks
The design of a representative precipitation monitoring network is an important and still challenging task for which an entropy approach is well suited.High quality precipitation information is necessary for streamflow and flood forecasting, surface water management, agricultural management, climate process understanding and many other applications.However, precipitation is well known to be highly variable in both space and time [59] and often statistically represented by highly skewed distributions [60] making the application of parametric analysis methods difficult.These challenges also extend to entropy-based approaches for precipitation monitoring.For example, the marginal entropy has been found to be well correlated with total precipitation in northern Brazil because the probability distribution in regions with higher rainfall tended to be more uniform and less skewed [61].In contrast, Mishra et al. [59] found that the marginal disorder index (MDI), which is the ratio of observed entropy to the maximum possible entropy at a given site, was inversely related to mean annual rainfall in the U.S. state of Texas, where MDI was found to vary seasonally.Brunsell [30] studied the entropy from monitoring stations across the United States where little correlation was found between precipitation and marginal entropy with the exception of a breakpoint in entropy at −95 • longitude corresponding to high temporal variability precipitation patterns.It has also been noted by several studies that the temporal sampling of precipitation is an important consideration for calculating entropy and for designing precipitation networks [54,59].At finer timescales (hourly to daily), precipitation is highly variable resulting in higher overall entropy, whereas longer time periods (monthly to annual) have less variability resulting in lower marginal entropy [30,52,59].The dependence on spatial and temporal scales has also been identified in a network design application.Wei et al. [54] prioritized potential stations in Central Taiwan to maximize the joint entropy of the network at hourly, monthly and annual temporal scales, as well as 1-, 3-and 5-km spatial scales.They found that priority stations changed with both spatial and temporal scales, where changes in temporal scales resulted in more significant changes in station priority than spatial-rescaling.The decrease in entropy at longer timescales also had an impact on station density where fewer stations were required to reach a stable joint entropy value for longer time scales [54].These findings demonstrate the important first consideration of network objectives when determining the spatial and temporal sampling used to calculate entropy.However, the research on this topic is still limited, and more work is needed to provide robust guidance on sampling strategies.
Several approaches have been proposed to design or redesign a precipitation monitoring network using one or more entropic measures.Many of these approaches are initialized by building a network around a central station usually selected as the station with the highest marginal entropy [43,[62][63][64].In urban Rome, Ridolfi et al. [62] selected stations for the precipitation network by sequentially finding the next station that minimized the conditional entropy of the network and adding that station to the network.A similar approach was taken by Yeh et al. [63] to expand a precipitation network in Taiwan.Hourly rainfall data were normalized with a Box-Cox transform and kriging used to interpolate rainfall to candidate grid cells.The joint entropy of the network was calculated using an analytic equation for joint entropy valid for normal data [65], and stations were added sequentially that had the lowest conditional entropy with the rest of the network.The final number of stations needed by the network was accepted when 95% of the network information was captured [63].Awadallah [64] applied multiple entropy measures sequentially to add stations to a precipitation network.The first new stations were selected as those with the highest entropy.The second station was chosen to minimize the mutual information and the third as the station that maximized conditional entropy.
The aforementioned approaches all sequentially add single stations to a monitoring network based on a single criterion.Mahmoudi-Meimand et al. [43] presented a methodology to add stations to a network based on a multi-variate cost function.Precipitation data were spatially interpolated from existing stations using the kriging approach where the kriging error associated with the rainfall estimation is calculated as the kriging error variance.Their method selected the station that maximized transinformation entropy and minimized error variance using a weighted average of both measures as an objective [43].This approach balanced the information content in the network with the errors in the interpolation method.Xu et al. [57] used a multi-objective approach to simultaneously select a subset of stations that minimized the sum of pairwise mutual information, minimized bias and maximized Nash-Sutcliffe efficiency.Solutions were generated via Monte Carlo sampling, and network solutions falling along the Pareto front were found as compromise solutions.Coulibaly and Keum [66] and Samuel and Coulibaly [67] also used a multi-objective approach to add stations to snow monitoring networks in Canada.Their approach used a genetic algorithm to find networks that maximized the joint entropy and minimized the total correlation of the network to form a Pareto front of optimal network designs, some of which also included network cost in the optimization [35,67].
A challenge to an entropy-based approach to adding stations to a precipitation network is the requirement to have data available for candidate points.For precipitation, this can be challenging because data at shorter time scales in particular are well known to be non-normal.Most studies use the kriging approach for interpolation [43,63,64] and address the need for normally distributed data using a Box-Cox transform.Samuel and Coulibaly [67] addressed the interpolation problem by using the external data from the Snow Data Assimilation System (SNODAS) for candidate stations.Su and You [52] presented a unique approach to adding stations that maximized the information content of the network.In most literature cases, entropic measures at ungauged sites are determined by interpolating observations of precipitation across a watershed.Su and You [52] calculated the transinformation between neighbouring stations to develop a 2D transinformation-distance relationship.In contrast to transferring data to ungauged stations, this approach transferred transinformation to ungauged stations and selected a site with the maximum transinformation.This approach should be further tested and contrasted with the data transfer approach.
As previously stated, precipitation data are of critical importance for a variety of applications.Despite this, few studies have explored the impact of precipitation networks designed with an entropy approach for actual water resource applications.Applications found in the literature have taken the reasonable approach of using entropy to reduce network density for comparison to a network that included all stations.In Portugal, Santos et al. [50] compared artificial neural networks, K-means clustering and mutual information (MI) criteria for reducing the density of a precipitation network for drought monitoring at different time scales.They found the best performing reduction method was case dependent depending on the region and time scale applied, but noted that all methods performed well.They also found that all subset networks could reliably reproduce the spatial precipitation pattern.Xu et al. [57] used the multi-objective approach previously described to select a subset of precipitation stations from a dense network in the Xiangjiang River Basin in China.Rainfall from the subset networks was used to force the lumped Xinanjiang hydrological model [68] and the distributed SWAT hydrological model [69].The author's found that lumped model performance became stable with a subset of 20 to 25 stations, whereas the distributed model's performance continued to increase as more stations were added to the network [57].These analyses are important to demonstrate the utility of precipitation networks and the advantages of entropy-based approaches in designing precipitation networks.

Streamflow and Water Level Networks
Water quantity monitoring, such as streamflow rates and water level, is one of the essential tasks for water management to prevent damage to nature and human beings from flooding.A successful floodplain management or flood forecasting and warning system can be feasible through expert forecasters who implement well-calibrated models and reliable tools using quality data [70].The design of water quantity monitoring network has been well implemented because of not only the good performance of entropy-based methods, but also the unaffectedness by the zero effect, which is caused by discontinuity of probability density function due to zero values in data, except for the ephemeral or intermittent streams.To deal with the zero effect in entropy calculations, Chapman [71] and Gong et al. [60] separated the marginal entropy Equation (3) to nonzero terms and zero values, which are certain.While Gong et al. [60] summarized the possible issues in entropy calculations from hydrologic data as effects due to zero values, histogram binning including skewness consideration and measurement errors, some studies noticed that the length and the location of time window also affect entropy calculations and the corresponding network design.Fahle et al. [31] observed the temporal variability of station rankings by shifting the time window for the design of water level network of a ditch system in Germany.Mishra and Coulibaly [47] also found the dependency of the seasonality on the efficiency of hydrometric networks.Stosic et al. [51] found an inverse relationship between the network density and sampling time interval as the larger number of monitoring stations is required if the time interval is shorter and vice versa.Keum and Coulibaly [34] analyzed the temporal changes of entropy measures and optimal networks by applying daily time series for streamflow network design.They found that the information gain of a monitoring network is not significant when the length of time series is longer than 10 years, and the total correlation tends to stabilize within five years of data.The optimal networks using the data lengths of 5, 10, 15 and 20 years also show that there are no significant differences in the results from 10 years or longer while the optimal network using five years of data was evidently different from others.Werstuck and Coulibaly [56] analyzed scaling effects by considering two study areas.Specifically, one study area is a small watershed, which is a part of another study area.After applying the transinformation analysis and the multi-objective optimization, they concluded that the optimal networks tend to be affected by scaling while transinformation index does not.
Mishra and Coulibaly [46] evaluated the effects of the class intervals and the infilling missing data by applying the linear regression method to daily time series and concluded that the station rankings based on the transinformation values were not significantly changed.Li et al. [41] also investigated the changes of station rankings based on the maximum information minimum redundancy (MIMR) approach and obtained the similar conclusion.However, Fahle et al. [31] and Keum and Coulibaly [35] drew the opposite opinion that station rankings can be affected by the binning method that defines the class intervals.The conflict comes from the selection of the binning methods compared.The former group applied different parameters to a single binning method, the mathematical floor function.However, the latter group compared other binning methods with the floor function.Considering that Alfonso et al. [25] found that the design solutions were not common in some cases from the sensitivity analysis of the parameter of the mathematical floor function, it is not recommended to use a specific binning method without any consideration.
As discussed in the review of precipitation networks in Section 3.1, network redesign and network expansion require data at candidate locations, which are ungauged.Alfonso et al. [27] applied a one-dimensional hydrodynamic model to generate the discharge time series.The model estimated discharge at each segment, which divides rivers with approximately 200 m increments longitudinally.The use of hydrodynamic model enabled to determine the critical monitoring locations in the main stream and its tributaries.On the other hand, Samuel et al. [49] combined regionalization techniques with entropy calculation in order to estimate the discharge at candidate locations.They compared the performance of various regionalization methods including not only a conceptual hydrologic model, but also spatial proximity, physical similarity and their combinations with drainage area ratio.Based on the performance statistics by applying multiple basins, inverse distance weighting coupled with drainage area ratio performed the best, and this conclusion has been adopted in several studies [34,35,37,55].
Some studies have extended the entropy applications for the streamflow monitoring network design.Stosic et al. [51] proposed the concept of permutation entropy, which is able to differentiate based on the order of sequential observations, as well as the histogram frequency in basic Shannon entropy measures.Even though histograms from two different observations are the same, the permutation entropy value tends to be higher if there are more variations between time steps.However, the network design studies using the permutation entropy are still limited.On the other hand, Leach et al. [37] applied additional features to the network design.While the common objectives in water monitoring network design using an optimization technique are to maximize the information and to minimize the redundancy in the network, they additionally considered the physical properties of watersheds, such as the streamflow signatures [72,73] and the indicators of hydrologic alterations (IHAs) [74,75].After the comparison of the optimal streamflow monitoring networks with and without considerations of the streamflow signatures and IHAs, it was concluded that inclusion of basin physical characteristics yielded a better coverage of the selected locations of the optimal networks.

Soil Moisture and Groundwater Networks
Soil moisture is a critical water variable as the interface between the atmosphere and subsurface.Unfortunately, the monitoring of soil moisture is very sparse compared to its spatial variability.To design an optimum network for monitoring soil moisture in the Great Lakes Basin, Kornelsen and Coulibaly [36] proposed using data from the Soil Moisture and Ocean Salinity (SMOS) satellite [76] to design a soil moisture monitoring network using the DEMO algorithm of Samuel et al. [49].Grid cells were selected to add monitoring stations that optimally maximized joint entropy while minimizing total correlation using only the satellite data.The ascending and descending overpasses were found to contain different information, and the spatial distribution of a network designed with both overpasses was found to contain complimentary features from both datasets [36].
Groundwater monitoring allows for a better understanding of the hydrogeology in an area.This is achieved through groundwater quality and quantity monitoring.Groundwater quality monitoring is used to detect contaminant plumes or for long-term monitoring (LTM) of post remediation effects, and groundwater quantity monitoring is used to determine available water for drinking, irrigation and industry.However, monitoring groundwater is inherently difficult due to physical barriers between observers and the water.Through the understanding of subsurface flow physics and with flow and contaminant transport models such as MODFLOW, MODPATH and MT3D [77][78][79], we can simulate the behaviour of groundwater.Unfortunately, our simulations are not always accurate, and the models require real-world observations to be calibrated and validated.Due to constraints such as accessibility and financial cost, it is not feasible to monitor at every possible location in an area of interest.It is instead ideal for an optimal monitoring network to be designed to allow for the best placement of monitoring stations and to determine the ideal measurement frequencies.The merit of using information theory entropy has been shown in several cases of groundwater network design [31][32][33]38,44,48,53,58].
Various methods that utilize information theory entropy have been developed for use in designing optimal groundwater monitoring networks.These include the use of entropy measures in both single and multi-objective optimization problems and are used in network reduction [32,33], expansion [38,58] and redesign [44], as well as have been used to highlight vulnerable areas in an area that should be monitored [53].In identifying vulnerable areas in the Victoria County Groundwater Conservation District (VCGCD) in Texas, USA, Uddameri and Andruss [53] developed a monitoring priority index (MPI) based on a weighted stakeholder preference to highlight the areas of interest.They compared kriging standard deviation and marginal entropy as metrics to characterize groundwater variability and found entropy to be the more conservative metric.
In areas where there is excessive monitoring, Mondal and Singh [48] showed the information transfer index (ITI), the quotient of joint entropy and transinformation, could be used to evaluate the existing monitoring network.Through this evaluation, redundant monitoring stations (wells) could be identified and removed from the groundwater monitoring network.It may also be the case that the existing groundwater monitoring network is not adequate and additional monitoring stations are needed.Yakirevich et al. [58] developed a method that utilizes minimum cross entropy (MCE) to sequentially add monitoring stations to a network.MCE was used as a metric to quantify the difference between two variants of a Hydrus-3D model [80], and the monitoring stations were added to the network where the difference between models was largest.A multi-objective approach for adding monitoring stations to a groundwater monitoring network was applied by Leach et al. [38], which utilized two entropy measures, total correlation and joint entropy and a metric used to quantify the spatial distribution of annual recharge; the results of which were used to develop maps that highlight areas in which additional monitoring stations should be added.The majority of network design experiments look at the entire available time series when calculating entropy measures; however, Fahle et al. [31] showed that using a combination of MIMR and subsets of the data series could be more ideal.The subsets were used to represent the intra-annual variability of groundwater levels.This method identified locations which were consistently important through each subset and found that monitoring stations showed similarities during wet periods and uniqueness during dry periods.Fahle et al. [31] also suggest that a consequence of using subsets of data allows for the design of a network, which can be focused on floods or droughts.
One issue that can arise with entropy-based methods is the need for lengthy data series to produce accurate measures of entropy.Unfortunately, the area of interest for new monitoring stations will not have available data for all possible locations.To work around this limitation transinformation-distance (T-D) curves have been applied in the design of optimal groundwater monitoring networks [44,81].In these studies, T-D curves were developed for sub areas within the desired study area based in different clustering methods.Additionally, Masoumi and Kerachian [44] showed that this method could be applied temporally as transinformation-time curves which could then be used to optimize the temporal sampling frequency of the stations.It should be noted that both previously mentioned studies were applied in the same study area using slightly different methods for clustering monitoring stations, and both produced different groundwater monitoring networks that could be considered optimal.This highlights an issue with optimal monitoring network design in that it can be subjective and does not have a singular solution.A comparison of Hosseini and Kerachian [32,33] also illustrates this issue, where through the use of different entropy measures, marginal entropy and Bayesian maximum entropy and optimization techniques, one experiment found the optimal monitoring network included 42 monitoring stations while the other only included 33 stations.

Water Quality Networks
The importance of water quality monitoring networks is their ability to assist in identifying those parameters that exceed water quality standards.Several water quality monitoring strategies, including two methods that utilized entropy measures [42,45], were recently reviewed by Behmel et al. [15].This review found that identifying a single approach to water quality monitoring network design would be virtually impossible.Despite this, various applications of the transinformation-distance curve methods have shown promise in the optimal redesign and reduction of water quality monitoring networks [29,42,45].Lee [39] found that by maximizing the multivariate transinformation between chosen and unchosen stations, using the storm water management model to simulate the total suspended solids and a GA for optimization, an optimal water quality network could be designed for a sewer system.Banik et al. [82] compared information theory, detection time and reliability measures for the design of a sewer system monitoring network through both single and multi-objective optimization approaches.It was shown that for a small monitoring network, the methods had similar performances, while the single objective detection time-based method had slightly better performance when the number of monitoring station is larger.Alameddine et al. [24] used exceedance probabilities to determine violation entropy of dissolved oxygen and chlorophyll-a in the Neuse River estuary.Along with violation entropy, the total system entropy was used as a measure to identify areas of importance of monitoring.A multi-objective optimization scheme based on expert assigned weights was used to develop a compromise solution from the three entropy measures.Ultimately, the method allowed for the identification of high uncertainty areas, which would benefit from future water quality monitoring.Data availability is an issue when using entropy methods, particularly when attempting to use them in the design of a monitoring network in an ungauged basin.To address this, Lee et al. [40] developed a method that uses a measure analogous to marginal entropy.This method uses characteristics of the basin such as the length and number of reaches in the river network as part of the cost function, which is then optimized using a combined GA and filtering algorithm.This was shown to be a computationally-efficient method for use in optimal network design of an ungauged river basin.

Integrated Network Design
To the best of our knowledge, almost all of the previous studies about water monitoring network design have focused on a specific network type (i.e., considering a single hydrologic variable in each study) as reviewed in the previous sections.However, considering that hydrologic processes are interconnected in a water cycle, there are causes and effects between hydrologic variables.For instance, if a noticeable amount of precipitation occurs, streamflow or groundwater level is likely increased; hence, the information content of a variable may affect that of other variables.Keum and Coulibaly [35] developed a multivariate network design method by taking conditional entropy as the measure of information that is independent to a given variable.In their study, the method designed precipitation and streamflow monitoring networks simultaneously.Specifically, the method followed the traditional multi-objective approach that maximizes joint entropy and minimizes total correlation, but added another objective that maximizes conditional entropy of streamflow network given precipitation network to mimic the direction of the water cycle as streamflow may fluctuate due to precipitation.After comparing the integrated design with the single-variable design, their results showed that the effectiveness of network integration mostly came from reducing the number of additional precipitation stations.It was also found that the integrated network design approach allows adding a precipitation station at a location that will benefit the stream gauge network.

Conclusions and Recommendations
It is evident that successful water management cannot be achieved without proper water monitoring networks.Although there has been much progress in network design methods and applications, a standardized design methodology has not yet emerged.After the pioneering invention of information theory in the 1940s, entropy concepts have been applied in various applications with recent efforts on network design problems.The unique benefit of this approach is that a water monitoring network can be evaluated or designed based on the information the network monitors, which is in contrast to the set station densities proposed by WMO guidelines; the advantage of the former being that a network could be better tailored to specific applications or optimized to provide the most gain at densities lower than those suggested in WMO guidelines.In addition, when combined with multi-objective optimization techniques, users' specific criteria can be included in the optimal network design process.
This manuscript provides a comprehensive review of the recent research attainments and their applications in entropy-based water monitoring network design.The literature has demonstrated the use of various information theory measures and adaptations thereof for use in network design with an emerging consensus that the goal of these network design methods is to select the stations that provide the most information to the monitoring network while simultaneously being independent of each other.Through rigorous testing, information theory has proven to be a robust tool to use when evaluating and designing an optimal water monitoring network.However, when it comes to evaluating the optimal design, there are still issues that need to be addressed.
The first is that an optimal monitoring network design can be found based on specified design criteria; however, the practical application of the new optimal monitoring network is rarely evaluated in a hydrologic or other model [11,57].This type of numerical experiment is an important requirement to evaluate the utility of a network rather than just identifying its optimality or information content.Further, it is an important exercise to identify the benefits of entropy-based network designs in order to convince decision makers of the importance of adopting entropy approaches.
Another issue with the optimal network is that it can be subjective, based on choices made in the calculation of entropy and the design method chosen, especially when additional objective functions are considered in the design.This extends to the method selected for finding the optimal monitoring network, whether it is found using an iterative method where one station is added at a time or a collection of stations is added all at once.Research has also shown that data length, catchment scale and ordering can influence the design of an optimal network [31,34,51].Finally, when using discrete entropy, the binning method has been shown to influence the final network design [35].The influence of binning on entropy calculation has received greater attention in other geophysical network design applications [83][84][85], and similar consideration should be given in the field of water resources, particularly owing to the unique and difficult nature of water variables (e.g., streamflow, precipitation) spatial and temporal distribution [30,60].Thus, explicit consideration is needed when choosing the bins based on the intended application of the monitoring network and further research to provide guidance specific to water monitoring networks.Therefore, despite the possibility of finding an optimal network design in a formal sense, the subjectivity induced by the designer's choices, and the lack of standardized design methods, must be recognized.Future research should focus on comparative studies among multiple entropy design methods, discretization approaches and data characteristics.The current literature provides many novel entropy design approaches and the evolution of concepts, but rigorous comparisons are critical to provide generic guidelines for network design.Despite the potential sources of subjectivity identified, entropy methods remain one of the most objective approaches for network design.
In particular, more work is needed on spatial and temporal scaling of data for entropy calculation to provide robust guidance to decision makers.Many new methods and optimization techniques have been reviewed herein, but few examples were found in the literature that explored the data characteristics used in those techniques.Further research is required to provide guidance on the proper length of data in water monitoring network design [34], the sampling frequency of the data [54] and the spatial scale at which information should be measured for various monitoring network applications.
The aforementioned issues are considered crucial gaps that need to be filled to enable practical recommendations or guidelines for a widespread adoption of entropy approaches for designing optimal water monitoring networks.In addition, the comparative studies of entropy-based methods reviewed herein should be robustly compared to network design methods from other disciplines, such as geostatistics, to identify areas of equivalence and disparity [10].Considerable advances have occurred over the past decade as reviewed herein, and measures derived from Shannon's base equation [16] have reached a high level of maturity for the task of network design.We challenge the research community to put a similar creativity into the joint consideration of the nexus of data characteristics, network design and applications, all of which are intricately linked.

Table 1 .
Summary of significant contributions to water monitoring network design using entropy (author alphabetical order).
-Total correlation should be combined with joint entropy to get most information out of monitoring network Alfonso et al., 2013 [27] Streamflow Magdalena River, Colombia -Max(Joint Entropy) min(Total Correlation) -Rank-based iterative approach-Rank method is useful in finding extremes on Pareto front -When iteratively selecting stations, the information content of the network is not guaranteed to be maximum if the network contains the station with the most information -By creating an ensemble of solutions through varying the bin size of the initial Pareto optimal solution set, the authors highlight the uncertainty related to choosing bin size
-Found that networks obtain significant amount of information from 5 to 10 years of data periods, and total correlation tends to be stabilized within 5 years by applying daily time series -Recommended minimum 10 years data periods for designing precipitation or streamflow networks using daily time series