The Role of Topography on the Shape of Unit Hydrographs in Small and Medium Sized Watersheds through a Physical Model

: This study intends to establish the main relations between topographic characteristics of the watershed and the main parameters of the unit hydrograph measured at the outlet. It looks to remove the subjectivity found in traditional synthetic methods and the trial and error setting of the main parameters of the hydrograph. The work was developed through physical experimentation of the rainfall-runoff process using the observed information of different watersheds of Chiapas, Mexico, as the reference. The experiments were carried out on a state-of-the-art semi-automatic runoff simulator, which was designed and built speciﬁcally for this study. Polynomial regression and fuzzy logic models were obtained to conﬁrm the hypothesis of hydrological parameters being obtained from topographic data only by assuming uniform precipitation. Empirical relations were found for the peak ﬂow, time to peak, base time and volume of the unit hydrograph and the watershed area, the main stream average slope, and the length of the stream of highest order. The main ﬁnding is that a unit hydrograph can be described based only on the watershed area when fuzzy logic models are applied.


Introduction
The design of rainwater management structures and waterworks, the determination of floodplain limits, and the evaluation of river structures' safety for gauged or ungauged watersheds requires water discharge information. The underestimation of the quantity and velocity of runoff in a region can cause great damage. Clear examples of this are the 1986 disaster in California when the Yuba River broke a levee and damaged almost 4000 homes [1] and when, in 2010, hurricane Alex brought to Mexico flood damages up to 1.5 billion dollars mainly in the city of Monterrey [2]. It is of great importance then to have an accurate estimation of the relations between rainfall and runoff, especially if they can be applied in a simple manner and obtained from precise and available data like topography, geomorphology, or measurable characteristics of the watershed.
Since precipitation varies constantly within a watershed and the availability and precision of pluviometric data can be questionable, especially in developing countries like Mexico, many researchers have focused their efforts on understanding the relationships between the watershed's geomorphology and its hydrology through empiric relations using synthetic (the term "synthetic" suggests that it has been determined based on watershed measurable characteristics and not on the runoff-rainfall relationship information) methods [3][4][5][6][7][8][9][10][11][12][13] and not the actual hyetograph

General Description of the System
Based on the necessity of a deeper understanding of the relations between watershed characteristics and hydrograph parameters and on the fact that water behavior is too complex to model it appropriately, a 3D runoff simulator was designed and built to run physical experiments in order to search for the main topographic characteristics that determine the hydrograph shape of a watershed.
The complete model is comprised of four main parts: topographic surface, gantry robot, rainfall simulator, and runoff collector (see Figure 1). The actual model can be seen in Figure 2. The topographic model allows reproducing various types of 3D terrains at different scales through the elevation of rigid columns according to the DEM of each terrain. The elevation of the columns was done one-by-one by a gantry robot, and its accuracy was tested with a sample of 100 cells. The average error resulted in 1.41% calculated between the average elevation obtained from ArcGIS and the sampled scaled elevation within the physical model.

Robot
The cell size (3 cm by 3 cm) of the model was chosen as reduced as the physical elements and materials allowed and validated according to the distributed hydrological models recommended by Guerra-Cobian [30] since important scale issues arise due to this in hydrological simulations of watersheds with digital and physical models to determine hydrological parameters.
Once the terrain was complete, it was fully covered by a water-resistant liner. Since only one type of cover was used, the roughness of the terrain remained constant.
Precipitation was simulated through a seven nozzle fog jet sprinkler located at the center of the model and 1.50 m above the topographic surface.
The simulation's runoff was collected in an acrylic tank that was connected to a PVC pipe. Here, a pressure transducer was placed at the bottom to measure the water level. The transducer was configured to obtain one sample per second. This setting was designed to reduce the noise of the level readings, generated by the falling water in the tank.

Restrictions and Limiting Conditions
Physical simulation of the rainfall-runoff process has its limitations, and they must be taken into account for the experiments and the analysis of results. These limitations are: (1) Type of terrain and storage: Since the physical model does not allow reproducing different types of terrain, the watersheds to use in the experimentation should be as virgin as possible in order to avoid extensive urban areas and/or large hydraulic structures like reservoirs that have a great influence on the runoff behavior.
(2) Roughness: The model has a fixed value of roughness determined by the waterproof cover used.
(3) Moisture: Moisture conditions are considered such as saturated soil, since only direct runoff is simulated and the imperviousness of the model does not allow for different conditions.
(4) Homogeneity: Due to the assumption of homogeneous rainfall distribution over the watershed, the model is suitable only for small and medium sized watersheds, where this spatial assumption applies.
(5) Scale: Small and medium sized watersheds also minimize scale effects; therefore, these limited the study. In this case, watersheds between 170 and 980 km 2 in extent were studied.

Methodology
Since Gu et al. [31] addressed the weakness of current watershed studies and recommended the use of physically based hydrological models in order to advance hydrologic science, the experimental design was used to establish cause-effect relationships among a group of variables. Figure 3 shows the experimental methodology followed in this study.
A sample of nine real watersheds was used, selected from a region with a high frequency of rainfall events, which allowed for a higher amount of available hydrological data. The watersheds selected corresponded to relatively virgin terrains-no large urban areas and/or important hydraulic structures-and also a relation between the x, y, and z axis scale, which allowed for a distortion ratio below 2 (i.e., x axis scale/z axis scale < 2) at the time of scaling it to the physical model dimensions.
Since no recommendations in the literature were found on the acceptable distortion of hydrological models, a low value was selected, but this reduced the watershed sample to a model where the distortion ratios ranged between 0.96 and 2.  The DEM of the sampled area was obtained from INEGI [32] with a pixel size (X, Y) of 15 m. Each watershed selected represented a topographic sample, and for each sample, the DEM information was processed using ArcGIS 10 by ESRI (see Figure 4) and Excel 2010 by Microsoft (see Figure 5) to obtain an adequate format to feed the robot. The elevations matrix obtained through the processing of the DEM data was validated through a Visual Basic (VB) code (see Appendix B) that colored each cell according to the elevation in order to confirm that the arrangement agreed with the original watershed information (see Figure 5). The scaled matrix in cm, as in the physical model, was imported to ArcGIS to be processed and to obtain the morphologic parameters of the scaled watershed. Once the data were arranged in matrix form, they were then formatted into the Arduino language to be read by the controller of the robot.

Parameters
According to Sukristiyanti et al. [33], the morphometric analysis of a watershed helps in the interpretation of its shape and its hydrological characteristics, as well as in a comparative analysis among different watersheds. Several morphometric parameters were analyzed for every watershed. A number of parameters were measured using ArcGIS (watershed area A, length of the main stream L, maximum elevation H max , minimum elevation H min , distance from the outlet to the centroid of the watershed Lc, length of the stream of highest order Lω, average watershed slope Sb, watershed perimeter P, and watershed axial length Lb) or calculated through formulas (main stream average slope S, area ratio RA, bifurcation ratio RB, length ratio RL, circularity ratio Rc, and form ratio R f ). International system units were used for all of the calculated parameters: • S: The main stream average slope was calculated as: where H max is the maximum elevation in m and H min is the minimum elevation in m, which can be obtained from the DEM layer in ArcGIS in its properties in the statistics section. By using the stream order system, the Horton ratios (RA, RB, and RL) can also be obtained.
The area ratio is a geometric relationship between the average area drained by streams of a given orderĀ w and the corresponding order w and can be calculated as: • RB: The bifurcation ratio is a geometric relationship between the number of streams of a given order w and the number of streams of order w + 1 and can be calculated as: • RL: The length ratio is a geometric relationship between the average length of streams of a given order w and the average length of streams of order w − 1 and can be calculated as: • Rc: The circularity ratio of Miller is calculated as: where the area of the watershed (A) and its perimeter (P) must be in the same system of units (i.e., m 2 and m). • R f : The form ratio of Horton can be calculated as: where the area of the watershed (A) and its axial length (Lb) must be in the same system of units.
The hydrograph parameters used are: unit hydrograph peak flow (q p ), unit hydrograph time to peak (t p ), unit hydrograph base time (t b ), and unit hydrograph runoff volume (V) obtained from the unit hydrograph flow data as: where V is in m 3 /s/cm/h, q i is flow at time i in m 3 /s/cm, and t i is time i in h.

Study Cases
The nine watersheds selected are located in the state of Chiapas, Mexico. The state is located in the southeast of the country; it borders to the north the state of Tabasco, to the west the states of Veracruz and Oaxaca, to the south the Pacific Ocean, and to the east Guatemala. It has a total area of 74,415 km 2 , which represents 3.8% of the total country's area, which makes it the eight largest state in the country. Its total urban area is only 0.36% according to the soil use and vegetation layer from CONABIO [34].
The state capital is Tuxtla Gutierrez with a total population of 598,710 inhabitants to 2015 [35]. This and other important cities in Chiapas can be located in Figure 6.
Chiapas' climate is mainly warm and tempered with high precipitation throughout the year in the north region and in the summer season in the rest of the state. The total annual rainfall varies from 1200-4000 mm. The mean annual precipitation can be seen in Figure 6. The mean annual temperature varies depending on the region from 18-28 • C. The selected watersheds can be located in Figure 6. The topographic, hydrological, and cell size parameters for each watershed can be seen in Table 1.

Experimentation
Two phases of experimentation were carried out. The first phase had the objective of testing the homogeneity of the simulated rainfall at various pressure and height levels of the sprinkler. The second phase of experiments included the actual rainfall-runoff simulations starting with the topography reproduction, followed by the rainfall simulation over the terrain and finally collecting and recording the runoff at the outlet.

Sprinkler Pre-Testing
Three different experiments were held using different configurations of the sprinkler: as delivered by the manufacturer, with the center nozzle totally closed, and with the center nozzle orifice reduced.
An Inverse Distance Weighting (IDW) interpolation was done in ArcGIS using the average millimeters of the water column measured at different points over the model area. The results for the 45 psi for the closed nozzle and 30 psi for the reduced nozzle configurations can be seen in . Each level of pressure in the different configurations is related to rainfall intensity between 39 and 141 mm/h.  It was found that the variance of the data increased with pressure and decreased with height, the height being a more significant factor of change. It was decided to place the sprinkler at our highest physical limit of 1.50 m. The configuration with the closed nozzle resulted in a better dispersion of water over the complete surface, but the configuration with the reduced orifice resulted in lower variance for half of the simulation area. Since some of the study watersheds cover only half of the surface, this configuration was used in those cases. The first configuration resulted in very poor performance of homogeneity and therefore is not reported.

Rainfall-Runoff Process Experiments
Each of the nine watersheds selected was reproduced in the physical model and rainfall simulated over the terrain at different intensities to obtain runoff data related to them. For every experiment, level data in inches of the water column were measured every second at the collection tank placed under the watershed outlet. This was done from the moment the rainfall simulation started and until the level stabilized or stopped accumulating. The level data were then fit to a log-logistic function to eliminate noise and converted into runoff by obtaining the derivative of the log-logistic function, which was then plotted in time to produce the hydrographs to be analyzed for each watershed. The methodology followed can be seen in Figure 10.  Figure 10. Rainfall-runoff simulation methodology.
The first part (Steps 1-2) is related to the topography reproduction of the watershed for which a scale factor related to the z axis must be configured to convert the elevation data from meters to the model scale in centimeters. Once the watershed is modeled, rainfall is simulated according to the previous experiments during a relatively long time (5 min was sufficient to saturate the watersheds in all cases) to obtain first the time of concentration (T c ) (Steps 3-8). Plotting the measured flow at this stage results in a flat bell shape as shown in Figure 11 where T c is observed at the moment when the flow starts to be relatively stable-the equilibrium discharge-in this case at 280 s. A Loess smoothing was applied in R language to reduce the noise and obtain a better estimation of the experimental time of concentration. Once T c was known, rainfall was simulated over a time lower than this to obtain a bell-shape hydrograph; this was before the watershed became saturated. The data were recorded until the water level was stable again; this means that, even after the rainfall had stopped, the data were still being stored. To validate the test, the water level data were then converted to flow and plotted against time to obtain a hydrograph as the one shown in Figure 12 (Steps 9-16).

Physical Simulation Results
Five different pressures of the sprinkler related to different simulated rainfall intensities were tested for each watershed. For each individual test, the runoff was obtained from the measurements of a pressure transducer in inches of the water column that were then converted into millimeters of the water column. A log logistic function was fitted to this data to eliminate the error or noise caused by waves and other unknown perturbations.
An example showing the results from the 118th test of Chicomuselo at 25 psi can be seen in Figures 13-16. Several tests were run from which different unit hydrographs and their parameters were obtained. All of the results for Chicomuselo's tests are summarized in Table 2.
The theta parameters correspond to the log logistic function fit according to Equation (8), obtained by minimizing the Sum of Squared Errors (SSE) through Minitab's non-linear regression feature. It should be noted that a higher SSE usually relates to more perturbations during the test, although this value is only comparable among results from the same watershed due to the differences in magnitudes. The hydrograph and its parameters (q p , t p , t b , and V) were then obtained from the derivative of the log logistic function, through Equation (9), which resulted in runoff data or the volume of water over time. In this case, a represents time in seconds, b mmH 2 O, and b mmH 2 O per second, which can be converted to m 3 /s by multiplying mmH 2 O in meters by the caption tank area, which is 0.1726 m 2 , including the sensor container area used to reduce noise (liters were used to present the data in Table 2 for better visualization since the quantities of water during the simulation were very low).
Due to the changing conditions of a partly controlled physical simulation, the Nalimov test [36] was applied to each watershed's experimental data in order to eliminate the presence of outliers. This test is based on the elimination of data having an N parameter (see Equation (10)) higher than the critical value obtained from reference tables according to the sample size. This test was chosen for its applicability to small samples as is the case for each watershed's experiment set.
whered is the sample mean, std is the standard deviation of the sample, and n the number of values in the sample.

Data Analysis
Analysis on the final set of the experimental data of all nine watersheds was carried out to obtain an expression that could represent the hydrograph parameters based on the topography of the scaled watershed as stable and predictable as possible. Two different experiments were made, first using individual polynomial regressions for each response variable on the final dataset and second using a single fuzzy logic model for all response variables.
Before fitting the data into models, a feature selection technique was used in order to remove predictor variables that did not contribute in a meaningful way to each response variable. This was carried out with a neighborhood component analysis for regression data [37], where the generalization of all executions, i.e., for all response variables, was shown to be: A, S, and Lω. Although A had the highest importance in general, S and Lω were close behind in importance affecting the modeling for each response variable. The following experimentation analysis was executed with these variables only.
A polynomial regression, with a higher degree for each term, was selected in order to best describe the complex behavior of the relation between chosen predictor variables and response variables. Four models were trained, one for each response variable q p , t p , t b , and V; each using 80% of the data for training and the other 20% for validation. Two different base models were utilized, the first shown in Equation (11), where the polynomial is third order, used for predicting q p , and Equation (12), where the polynomial is second order, used for predicting t p , t b , and V.
where y m is the model output for a desired response variable, C n is the coefficient for the nth term, and x 1 , x 2 , and x 3 are the predictor variables A, S, and Lω, respectively. In Equation (11), the higher complexity is quite clear when compared to Equation (12); although the model for the first response variable, q p , can be fitted using the lower complexity of the model from Equation (12), the correlation coefficient results in a value of 0.88. On the other hand, if Equation (11) is used, the correlation reaches up to 0.94. All coefficients used for the four polynomial regression models are shown in Table 3, where the t p , t b , and V response variable models only require 10 coefficients as their models are simpler (second order polynomial regressions), whereas the model for q p requires double the number of coefficients (third order polynomial) compared to the latter. It is worth noticing that the model for q p does not require all coefficients, as these values are zero, but the more complex operations between these variables is what improves the performance of the model; i.e., the model could be compacted into a simpler model without sacrificing performance. −0.000243 The performance of the four models is shown in Table 4, where the coefficient of correlation, R coe f , was used. It should be noticed that the response variable q p did not perform as well when compared to the others, especially when considering that this model required a far more complex model. This can be explained by the fact that q p has a very complex and unpredictable behavior when compared to the other three response variables. Following the previous model's performance, another experiment was carried out in order to find the minimum number of response variables required to achieve an acceptable performance. For this case, a fuzzy logic model was used, more specifically a higher order Sugeno Type-1 fuzzy logic system [38]. In contrast to the first polynomial model shown above, less response variables are required to achieve a slightly better performance.
To allow for a direct comparison to the previous polynomial regression models, the same data separation scheme was used, that is 80% of the data for training and the remaining 20% for validation. It must be noted that, due to the more complex model training technique used by the fuzzy logic model, it was possible to achieve good results through only one response variable. Using this technique, response variable A was enough to model the behavior of all four predictor variables. Here, two hyperparameters for model training are required: the order of the Sugeno consequent and the radii used by the internal clustering of the technique. In this case, these two parameters were set to 2 and 0.8, respectively, for the best results.
The main characteristic of second order Sugeno consequents, as proposed by Castro et al. [38], is how they represent the output, that is, as shown in Equation (13), they can better model a more complex behavior when compared to first or zero order consequents.
where y l is the output of the lth rule, C l n is the nth coefficient of the lth rule, and x 1 is the input value. This resulted in a fuzzy logic model with 1 input, 4 outputs, and 3 rules (see Figure 17). The parameters of the model can be seen in Table 5, where the first set of parameters is for the antecedent Gaussian membership functions and the second set is for the coefficients for all second order Sugeno consequents.
Lastly, the performance was also measured through the coefficient of correlation R coe f , obtaining the results shown in Table 6. By comparing these results to the polynomial regression models, the fuzzy logic model clearly outperformed the latter, especially when considering that only one model was used, instead of four, and that better performance was achieved for two predictor variables, q p and V. This improvement was mostly due to the better training algorithm of the technique used, which used recursive least squares estimation parameter adjustment, compared to the polynomial regression, which uses least squares estimation parameter adjustment. Furthermore, fuzzy models can better adjust to complex behaviors by incrementing the number of rules. In the end, fuzzy logic resulted in a simpler model, which could achieve better modeling traits when compared to polynomial regression models. Antecedents Consequents Figure 17. Graphical representation of antecedents and consequents for the used fuzzy logic system. Antecedents are represented by Gaussian membership functions, and consequents are depicted by four outputs: y q p , y t p , y t b , and y V .

Discussion
The analysis of the experimental results shows that the hydrograph parameters of a watershed can be estimated with a high degree of accuracy through polynomial relations of second and third order of its topographic parameters, the most significant being the drainage area, main channel average slope, and length of the stream of highest order, or through fuzzy logic, where a better fit and less variables are needed, in this case only the drainage area. These relations were possible to obtain by enabling the repetition of different hydrological events, which is not possible in real life, through the use of a physical model that reproduces a specified terrain and simulates homogeneous rainfall over the area to finally measure the resulting direct runoff. Different watersheds under different rainfall intensities were simulated in order to reduce the degree of uncertainty of the results.
It is interesting to note that the resulting fuzzy logic model obtained the minimum number of response variables required to achieve an acceptable performance as the watershed drainage area. This is comparable to the widely used fixed base method of flow separation in hydrographs, which obtains a time after peak estimate based only on the watershed area [39].
Once the main parameters of the hydrograph are estimated, in this case peak flow, time to peak, base time, and volume, the complete shape of the hydrograph can be obtained, for example through the use of Hermitian interpolation [11], which does not include subjectivity or calibration techniques, and it provides a good fit when compared to observed hydrographs. Therefore, it is possible to finally estimate a hydrograph of a watershed based only on topographic information.
Although the roughness coefficient was not estimated in the study, it is considered that there are no relative effects on the results, since all the experiments were carried out with the same cover. It is clear that further analysis should be made in order to determine the relevance of this factor on the final results. This leads to an investigation that may begin in the near future.
The effects of the distortion were not included in the study since the analysis was based on the experimental data collected and no relation was attempted for the models to be applicable in real scale watersheds, which in this case were only a base for the construction of the experimental watersheds, but not a comparable reference between the resulting data.
The time scale is also very relevant for future research in order to find the relations between the experimental results and the real scale watersheds, but it is not part of the present work.
The next step in this process should be to develop a methodology that includes the effects of roughness, distortion, and time scale in order to transform the models found to be applied to the real scale. Since the degrees of magnitude for real watersheds are much higher than the experimental watersheds used, direct extrapolation is not possible.

Conclusions
The main outcomes of the study include, first, the fact that hydrological events, specifically the rainfall-runoff process, can be reproduced through the use of a physical model with certain limitations, but enabling repetition through experimentation. Second, empirical relations can be found between the topographic characteristics of a watershed and the hydrograph parameters of peak flow, time to peak, base time, and volume and that these relations can be as simple as second or third order polynomials. Third, the runoff behavior or the complete shape of the hydrograph can be described based only on topographic information of the watershed. The Fourth and main finding is that a unit hydrograph can be described based only on the watershed area when fuzzy logic models are applied.

Materials and Methods
ArcGIS 10 was used for DEM processing and topographical parameters measurement. Log logistic fit in Minitab 16 was used. Machine learning in MATLAB 2016b was used for the data analysis of the experiments. Excel + VB 2010 was used for the visual analysis of the experimental hydrographs. The R language was used for the estimation of the time of concentration of the experimental watersheds.
Arduino Mega was used as the robot controller. Power Point for Mac 2011 was used for the physical model drawings. Funding: This research received no external funding.

Acknowledgments:
We thank Consejo Nacional de Ciencia y Tecnología (National Council of Science and Technology) for the doctoral grant awarded to Alicia A. Del Rio for the development of this study.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript:  Figure A1. Visual Basic code for matrix arrangement and color of cells according to its elevation data..