A Climate-Mathematical Clustering of Rainfall Stations in the Río Bravo-San Juan Basin (Mexico) by Using the Higuchi Fractal Dimension and the Hurst Exponent

: When conducting an analysis of nature’s time series, such as meteorological ones, an important matter is a long-range dependence to quantify the global behavior of the series and connect it with other physical characteristics of the region of study. In this paper, we applied the Higuchi fractal dimension and the Hurst exponent (rescaled range) to quantify the relative trend underlying the time series of historical data from 17 of the 34 weather stations located in the Río Bravo-San Juan Basin, Mexico; these data were provided by the National Water Commission (CONAGUA) in Mexico. In this way, this work aims to perform a comparative study about the level of persistency obtained by using the Higuchi fractal dimension and Hurst exponent for each station of the basin. The comparison is supported by a climate clustering of the stations, according to the Köppen classiﬁcation. Results showed a better ﬁtting between the climate of each station and its Higuchi fractal dimension obtained than when using the Hurst exponent. In fact, we found that the more the aridity of the zone the more the persistency of rainfall, according to Higuchi’s values. In turn, we found more relation between the Hurst exponent and the accumulated amount of rainfall. These are relations between the climate and the long-term persistency of rainfall in the basin that could help to better understand and complete the climatological models of the study region. Trends between the fractal exponents used and the accumulated annual rainfall were also analyzed.


Introduction
Climate changes dynamically over time, and factors like the increase of population, cattle and agriculture, industry, among others contribute to these changes [1]. Climate changes could increase or decrease precipitations, which provoke floods or droughts [2], changes in plant and bacterial diversity [3], and soil sealing [4]. Moreover, global warming has been presented as a concerning factor, in recent years, that have increased heavy precipitations [5].
Rainfall characterization is considered as an useful information to make decisions about water resource availability, agricultural production, or prediction of precipitations [6,7]. Due to the complex nature of rainfall phenomena, some studies suggest obtaining characteristics using fractal algorithms, first because of their cyclic behavior over long-range time series [8], and second, because of the lack of regularity of occurrence of the phenomena [9,10]. Indeed, since decades ago, some studies have used algorithms based on fractals to characterize rainfalls, predict climatology and behavior of water processes [11][12][13][14][15][16][17], some of which have been dealt with the Hurst exponent (H) or the Higuchi fractal dimension (HFD) [16,18]. In addition, such methodologies have been used to quantify the persistency, anti-persistency or randomness of the data from the different weather stations with monthly records, in different periods [10,19].
The measure of the persistency of the series in the Hurst exponent consists of a dimensionless scale between 0 and 1 so that the closer the value is to 1, the more persistency [20]. In detail, if H < 0.5, the time series presents anti-persistency; if H = 0.5, it presents a behaviour similar to Gaussian noise; and H > 0.5, the time series presents persistency. In turn, HFD calculates persistency by values going from 1 (anti-persistency in the series) to 2 (persistency) [21].
In this context, the most used method to calculate Hurst exponent is rescaled range because of their simplicity to calculate it using statistical tools, [19,22], however, it has some disadvantages against trends, Kantelhardt et al. [23]. Rescaled range is not the only method to calculate Hurst exponent, wavelets [24], power spectrum [25] and Annys Lloyd [26] methods have been also reported. Rehman and Siddiqi [16] used a methodology based on Hurst exponent and wavelets to calculate persistency of climate predictability indices, founding that precipitation and temperature indices are independent. Chandrasekaran et al. [18] used H for rainfall prediction, concluding that such index must be close to 1 for obtaining an accurate forecast.
Moreover, Kalauzi et al. [27] calculated fractal dimension (FD) in rainfall time series using HFD with Fast Fourier Transform (FFT) to determine tendency of rainfalls of two regions. Ciobotaru et al. [28] related temperature-humidity index oscillations with HFD to calculated the impact on tourism activity.
The Río Bravo-San Juan (RBSJ) Basin covers an area of 34,000 km 2 within the Mexican States of Coahuila, Nuevo León, and Tamaulipas. It is located in the northeastern region of Mexico. RBSJ Basin belongs to the RH24 hydrological region, one of the largest watershed areas. RH24 is characterized to be constantly hit by hurricanes and heavy rainfalls that provoke dangerous landslides [29]. The main stem of RBSJ Basin is named Río San Juan, but other smaller branches such as Río Pilón, Río Salinas, Río Pesquería, Río Santa Catarina, and Río Ramos are important tributaries [30]. Figure 1 illustrates its three main types of climates according to the National Commission for the Knowledge and Use of Biodiversity (CONABIO) of Mexico [31]: hot semi-arid, hot desert and humid sub-tropical. In general semi-arid to arid (i.e., desert) climates dominate in the RBSJ Basin. These climatic conditions joined to the annual-cyclical patterns of rainfall [32], create an unstable water supply for the region [33]. The humid sub-tropical climate covers the majority of Monterrey Metropolitan Area (MMA), which is one of the third main metropolises in Mexico, and concentrates 90% of the population of Nuevo León. The rapid increasing water demand for industrial, agricultural and domestic usage in the MMA has stressed the current water supply, while severe droughts continue to exhaust the region's known reserves. For these reasons and also because of the constant pollution in the MMA [34], this region must be analyzed in order to prevent possible events that could cause disasters for the population.
In this work, time series obtained from 17 of the 34 weather stations of RBSJ Basin were analyzed. We investigated the underlying long-term dependency behavior patterns of those series through the Higuchi fractal dimension (HFD) and the rescaled range version of the Hurst exponent (Hrs). Then, we related our comparative analysis with the different types of climate of the basin. Finally, a classification or clustering of the stations with similar behaviors was carried out in order to discuss about the climatic characteristics of the region, the amount of rainfall per year, and their relation with the rainfall persistency. The structure of the paper is as follows: Section 2 describes the methodology, from the way to obtain the time series to the building of the fractal exponents used; Section 3 shows our results, including the clustering of the stations; finally, we interpret and discuss our results in Section 4, and show some concluding remarks in Section 5.

Methodology
In our case study, 17 of 34 weather stations of the RBSJ Basin were analyzed. They are located in the northeast region of Mexico, covering an approximate area of 29,420 km 2 . The selected stations were those having uninterrupted records within the period of years 1964-2016, this in order to have homogeneity in the quality of the data for the time series of all the stations. The region of study, including the 17 georeferenced stations are shown in Figure 1, tagged by their station number. For each weather station we have information of the month-accumulated rainfall, which leads to 636 data points for each series. We used monthly data because it is the available public information reported by CONAGUA.

Hurst Exponent (Hrs)
The version of the Hurst exponent we consider is the rescaled range, in which the information of each station is represented as vectors (X k ) k∈1:N , where N is the number of months considered for the analysis. Each vector (X k ) is divided into a set of d-subseries of shorter time with the same quantity of data, such that all the n-th subseries, n = 1, 2, . . . , d, have a length of m. Then, according to [36,37], the following steps are carried out for each subseries:

1.
Calculate the mean M n and the standard deviation S n of the subseries.

2.
Determine the variation V in of each term with respect to the mean: 3.
Obtain the accumulated sum of variation until the ith-term: 4. Calculate the range of each subseries:

5.
Normalize the calculated ranges (this is why it is called rescaled range): Once this is done for each subseries of length m, they are averaged: 7.
Finally, the relation of the statistic R n / S n is given by the following power law: where H refers to Hrs and its a dimensionless number, and c > 0 is a constant.
In this way, Hrs can be estimated by rewriting (7) in logarithmic terms: so that Hrs is the slope of the straight line. In this work, we used a total of d = 53 partitions, each of them with a length of m = 12 months.

Higuchi Fractal Dimension (HFD)
Higuchi fractal dimension (HFD) approximates the fractal dimension in a similar way to Hrs, in the sense of dividing the total time series in windows, however, their procedure are some different. For a time series X, with N data points [38], expresses the fractal length of the subseries X m k , where k ∈ {1, . . . , k max } indicates the k-th time, m ∈ {1, . . . , k} is the interval time of that subseries, and k max is the total number of subseries used. In this work, we used k max = 8, which is a significant number about an intermediate value between the Köppen classification by semesters [35] and the annual periodicity of the phenomenon in the region [37]. In addition, we carried out the calculation of HFD with several k max , obtaining that k max = 8 gave the best relation between HFD and climate in terms of better limits between clusters, see Table A1 in Appendix A.
Thus, the total fractal length L(k) is defined as the mean of all the L m (k)'s, so that In this way, HFD is the slope F (dimensionless) that fits to the equation where b > 0 is a constant.

Results
For illustration of the general behaviour, time series of three representatives rainfall stations (Allende, El Cuchillo and La Popa) are plotted in Figure 2, from 1964 to 2016. In this sense, Allende represents the humid subtropical climate (Cwa), whereas El Cuchillo and La Popa represent the arid climates, namely, hot semi-arid (Bsh) and hot desert (BWh), respectively.  Figure 3 shows the linear adjustments applied to Equation (7) for the selected stations. The calculation of the Hurst exponents was made by means of the least squares method. The value of the slope of the curve fitted is Hrs.  (7), obtaining the coefficient of determination R 2 when considering d = 53, m = 12, which leads to seven data points to fit.
In turn, Figure 4 shows the linear adjustments applied to Equation (10), in order to perform the calculation of HFD for the selected stations, by means of the least squares method. The value of the slope of the curve fitted is HFD.   Table 1 contains the summary of the adjustments carried out for all the stations analyzed. For each station, the table shows its number, name, coordinates (latitude and longitude), the obtained values of Hrs and HFD, accumulated annual rainfall (A.A.R.) in mm, and climate (with colors). Results are shown in an ascending order according to the obtained value of HFD. All the values of Hrs and HFD were calculated up to the second decimal place as there is no significance from the third place [39]. Table 1. Characteristics and results for the 17 stations analyzed. Climate is shown by the corresponding color taken from the CONABIO's clustering that uses the Köppen climate classification, see Figure 1. Rows follow an ascending order with respect to the value of HFD.

Discussion
Accumulated rainfall per month are shown in time series of Figure 2. The amount and dispersion of rainfall are larger for Allende, which belongs to Cwa than for El Cuchillo and La Popa, which belong to more arid and hotter climates, Bsh and BWh, respectively. Table 1 shows the persistency level obtained by Hrs and HFD, and relates their order with the climate clustering of the region provided by the Köppen classification. Hrs and HFD were calculated by fitting the illustrations in Figures 3 and 4, following their respective procedures of Section 2. As seen in Table 1, values of Hrs go from 0.48 to 0.71, i.e., about 0.23 units in range, whereas HFD goes from 1.78 to 1.92, having a range of 0.14 units. Instead of this, HFD fits better to the clustering of climates than Hrs. With the exception of Casillas and Las Enramadas, it is seen that HFD ∈ [1.78, 1.87] correspond to the Cwa, while HFD ∈ [1.88, 1.89] correspond to Bsh, and HFD ∈ [1.90, 1.92] correspond to BWh. Moreover, station Casillas is very close the border between Cwa and Bsh geographical zones, see Figure 1. This could suggest a relation between aridity and long-term memory of rainfall for this region study. In fact, there is a precedent that supports such assumption, in which there was found more persistency in rainfall time series associated to more arid zones of the Sahel region in Africa, although persistency was also found in subtropical areas [40]; nevertheless, the amount of rainfall is only considered for the internal classification of arid climates (type B), according to Köppen [35] in which the A.A.R. and its distribution throughout the year define the sub-climates.
Then, our results of A.A.R. in Table 1 relate to the climate in an inverse way than HFD, so that, the more the amount of rainfall the more humidity. Similarly to HFD, the border between BWh and Bsh is well identified by the amount of rainfall (about 300 mm of separation), but three stations do not permit a compact clustering for separating Bsh (572-731 mm) from Cwa (557-1098 mm), namely, Las Enramadas, and San Juan and El Pajonal. The two latter stations are also located in a geographical border of climates, see Table 1 and Figure 1. In this way, Las Enramadas is the only station, geographically far to the borders, that does not fit to the relations HFD-climate and A.A.R.-climate.
We associate the difference in fit of each model to the way in which they define or calculate fractality. Namely, HFD fits better because each i-measure consists of the difference between rainfall of two timely consecutive i-subseries, in Equation (8), which agrees with the division by seasons in the arid classification of Köppen. In contrast to this, Hrs measures the range of each n-subseries to calculate fractality, in Equation (3).
Regarding our obtained values of Hurst, all of them are higher than 0.5 and do not have a direct relation with climate, as mentioned before. This agrees with the results of López-Lambraño et al. [24], who found persistency in their precipitation series, obtaining Hurst values from 0.5 to 0.9, but contrasts with Kantelhardt et al. [23], who did not found relation between hydrologic persistence and Hurst (values lower than 0.5). Such observations could be related to two main reasons. On the one hand, there is the climate-physical part, which deals with the fact that López-Lambraño et al. [24] analyzed a very similar region about climate than ours, while Kantelhardt et al. [23] studied three very different regions to our work. On the other hand, there is the climate-methodological part, since the studies mentioned in Kantelhardt et al. [23] dealt with large time series of more than 100 years, in which climate could have changed and led to low values of persistence. In fact, López-Lambraño et al. [24] found a decrease in the values of Hrs by increasing the length of their time series from 5 to 30 years.
Additionally, we measured and plotted the relations Hrs-A.A.R. and HFD-A.A.R. in Figure 5. Despite a clear trend in both curves, there is a lot of dispersion indicating a weak relation (R 2 ≈ 0.1 for HFD-A.A.R. relation and R 2 ≈ 0.3 for Hrs-A.A.R. ). In turn, relation HFD − HRS is also weak (R 2 ≈ 0.1), see Figure 6. We associated the larger dispersion in our curves to the predominance of the subtropical climate of the region since arid climates possess more roughness, as mentioned in [41]. Therefore, our final observations are focused on three main points: • The RBSJ Basin is a complex region composed by mainly three climates Cwa, Bsh and BWh. Its complexity consists of a geographical mixing of those climates, which makes it difficult to perform a good interpolation by traditional techniques like the Köppen climate classification and A.A.R., and even by more complex techniques like Hrs and HFD. Indeed, about five weather stations are located in the border of zones cataloged as subtropical (Cwa) and semi-arid (Bsh) climates. • Nevertheless, fractal exponents have shown to have a relation with climates (HFD) and in a weaker sense with A.A.R., which have been reported previously in regions with similar climates [40,41]. In this manner, our study aims to suggest the use of those fractal exponents as an alternative way to understand and complete the climate maps of the region study, positioning HFD as a measure of classification at the same level of A.A.R., and having as an advantage the memory of the time that the method posses. • As future research, we propose to determine the relation between HFD and the weather to a regional scale, extending this to east and north, where the weather is similar, and considering other regions like Veracruz, Tabasco and Chiapas, which are southeast Mexico's states where the weather is different to the studied region. In this form, it could be possible to analyze if the correlation between HFD and the weather change lightly or radically. In the first case, the relation will be not a regional case, and it will derive to a larger scale.

Conclusions
This work aimed to analyze the dependency of long-range time series by using Hurst Exponent and Higuchi Dimensional Fractal. We found a strong dependency between HFD and the climates, but not for Hurst Exponent and the climates.
We evaluated the persistency of 17 rainfall time series of the RBSJ Basin in Mexico utilizing Hrs and HFD, and compared their values with the climate classification proposed by the CONABIO, which uses the Köppen climate classification. We found more relation between the clusters formed by the Köppen classification and the obtained Higuchi's values.
In detail, our Higuchi's results suggest that the persistency of the series increase with the aridity of the zone, so that, the stations located at the humid subtropical zone have the lowest HFD's values, HFD ∈ [1.78, 1.87]; those located at the hot semi-arid zone have HFD ∈ [1.88, 1.89]; and finally, the hot desert zone contains the stations with the highest values HFD ∈ [1.90, 1.92]. This suggests that the arid zones of the study region have more persistent rainfalls, although not necessarily predictable, which has been reported previously in region studies with similar climates [40,41]. In turn, Hrs did not fit very well to the climates and had a weak relation with the A.A.R.
In this way, we concluded that HFD is a powerful indicator for analyzing rainfall time series that can help to understand the climates of the case study in terms of combining a climate classification with a mathematical measure. Thus, the climate-mathematical clustering technique that we present is a primary study that quantifies the limits of the climatic zones within the RBSJ Basin, which could help to complete or redefine the climatological models of the study region.