Precipitation Complexity and its Spatial Difference in the Taihu Lake Basin, China

Due to the rapid urbanization development, the precipitation variability in the Taihu Lake basin (TLB) in East China has become highly complex over the last decades. However, there is limited understanding of the spatiotemporal variability of precipitation complexity and its relationship with the urbanization development in the region. In this article, by considering the whole urbanization process, we use the SampEn index to investigate the precipitation complexity and its spatial differences in different urbanization areas (old urban area, new urban area and suburbs) in TLB. Results indicate that the precipitation complexity and its changes accord well with the urbanization development process in TLB. Higher urbanization degrees correspond to greater complexity degrees of precipitation. Precipitation in old urban areas shows the greatest complexity compared with that in new urban areas and suburbs, not only for the entire precipitation process but also the precipitation extremes. There is a significant negative correlation between the annual precipitation and its SampEn value, and the same change of precipitation can cause a greater complexity change in old urbanization areas compared with the new urban areas and suburbs. It is noted that the enhanced precipitation complexity in a new urban area during recent decades cannot be ignored facing the expanding urbanization.


Introduction
Under the aggravating influence of climate change and anthropogenic effects, the hydrological cycles in many basins and regions worldwide are significantly changing [1,2]. As an important part of the hydrological cycle, precipitation and its variability and change are often related to water disasters. They can cause adverse consequences to economic development and people's lives in affected regions [3,4]. Therefore, detecting the precipitation variability under the changing environment is important and is a necessary basis for better understanding hydrological variability, effectively implementing the prevention of flood-waterlog and drought disasters, and planning and managing water resources.
Hydrological time series is the embodiment and reflection of the comprehensive effects of hydrological variability [5,6] with complex evolution at multi-time scales [7]. Thus, hydrological time series analysis is an effective approach to identify the hydrological variability and changes and further separates the different components in a hydrological time series by considering statistical significance [8,9]. Traditional methods are mainly based on probabilistic statistical theory to analyze certain variability types of hydrological time series, such as monotonic trend [10], jump (or phase

Study Area
The TLB is located in the downstream tributary of the Yangtze River Basin in China. With a total area of 36895 km 2 , the basin is dominated by the plain, accounting for 4/6 of the total area, while the mountain and hills make up only 1/6 in the west, and the remaining 1/6 is composed of the interconnected and crisscrossed water systems. The topographic features in TLB are high in the surrounding area and low in the middle region. According to the terrain and water system of the watershed, the whole basin can be divided into seven parts, as shown in Figure 1.
The TLB has a typical subtropical monsoon climate, dry and cold in winter, but hot and humid in summer. The average annual precipitation in TLB is about 1130. 8 mm (1965-2013), and the average rainfall in flood season (including rainy season from May to July, and typhoon season in August and September) is about 683.2 mm, accounting for 60.4% of annual precipitation. The TLB has a typical subtropical monsoon climate, dry and cold in winter, but hot and humid in summer. The average annual precipitation in TLB is about 1130. 8 mm (1965-2013), and the average rainfall in flood season (including rainy season from May to July, and typhoon season in August and September) is about 683.2 mm, accounting for 60.4% of annual precipitation.
With a 77.6% urbanization rate in 2013, the TLB is one of the most urbanized areas in China. In 2013, the population in the basin was 59.97 million, and the Gross Domestic Product (GDP) reached ¥6.69 trillion, accounting for 9.9% of the national GDP, while the per capita GDP (GDPPC) is 2.3 times that of the whole country. Over the past decades, the large-scale expansion of city groups (centered on Shanghai, including Changzhou, Wuxi, Suzhou, Huzhou, Hangzhou, Jiaxing, and many rapidly developing towns) has been accompanied by the rapid succession of underlying surface and rapid accumulation of population and industry, which has been obviously changing the hydrological regimes in the region. The urbanization process in TLB has been rapidly increasing since the early 1990s. Figure 2 shows the increasing process of two urbanization indicators in TLB. The proportion of urban population in 2013 was 61.3%, with an average increase rate of 1.39% per year over the past decade, much bigger than 0.61% per year before the 1990s. The other indicator of GDPPC also With a 77.6% urbanization rate in 2013, the TLB is one of the most urbanized areas in China. In 2013, the population in the basin was 59.97 million, and the Gross Domestic Product (GDP) reached ¥6.69 trillion, accounting for 9.9% of the national GDP, while the per capita GDP (GDPPC) is 2.3 times that of the whole country. Over the past decades, the large-scale expansion of city groups (centered on Shanghai, including Changzhou, Wuxi, Suzhou, Huzhou, Hangzhou, Jiaxing, and many rapidly developing towns) has been accompanied by the rapid succession of underlying surface and rapid accumulation of population and industry, which has been obviously changing the hydrological regimes in the region. The urbanization process in TLB has been rapidly increasing since the early 1990s. Figure 2 shows the increasing process of two urbanization indicators in TLB. The proportion of urban population in 2013 was 61.3%, with an average increase rate of 1.39% per year over the past decade, much bigger than 0.61% per year before the 1990s. The other indicator of GDPPC also indicates a similar phenomenon.

Data Used
The precipitation data measured at 49 weather stations in TLB, with the period of 1965-2013, are used for the study. The monthly precipitation data are used to interpret the spatial difference of precipitation complexity in TLB, and the daily precipitation data are used for analyzing the temporal variability of precipitation complexity. The quality, consistency and reliability of all data are checked to ensure the accuracy of the results. Due to the spatial heterogeneity in urbanization development, the precipitation complexity in TLB has a spatial difference. The effect of urbanization on precipitation changes is considered here by detecting the spatial division following urbanization, and the precipitation complexity in different urbanization areas is compared; after that, the relationship between precipitation complexity and urbanization development is explored.
In order to clarify the urbanization development and its impacts on the precipitation complexity in TLB, it is necessary to first analyze the temporal and spatial evolution of the urbanization process. For such a rapidly developing region, the areas with mature urbanization before 1990 are taken as old urban areas, while the fast-urbanizing areas after 1990 are taken as new urban areas, and the rest of the slow-urbanizing areas are suburbs. We calculate the built-up area ratios (the ratio of urban area in the total areas) around 5 km × 5 km of each station according to the urban distribution in 1990 and 2013. They are interpreted from the remote sensing images by using the nine-point classification method. According to the spatial distribution of built-up areas in 2013, those stations with built-up area ratios less than 66.7% (i.e., less than 6 points in the 9 points around 5km × 5 km of the station) are considered as suburb stations, and the rest of the stations are further divided into old and new city stations based on the results in 1990, that is, the old city stations with the built-up area ratios greater than 66.7% but the new city stations less than 66.7% in 1990. It is known that new city stations are affected slightly by urbanization before 1990 but strengthened after 1990. The classification of stations is shown in Figure 3, including 14 old city stations, 18 new city stations and 17 suburb stations. The area precipitation data in the three areas are obtained by computing the average precipitation at the corresponding stations.

Data Used
The precipitation data measured at 49 weather stations in TLB, with the period of 1965-2013, are used for the study. The monthly precipitation data are used to interpret the spatial difference of precipitation complexity in TLB, and the daily precipitation data are used for analyzing the temporal variability of precipitation complexity. The quality, consistency and reliability of all data are checked to ensure the accuracy of the results. Due to the spatial heterogeneity in urbanization development, the precipitation complexity in TLB has a spatial difference. The effect of urbanization on precipitation changes is considered here by detecting the spatial division following urbanization, and the precipitation complexity in different urbanization areas is compared; after that, the relationship between precipitation complexity and urbanization development is explored.
In order to clarify the urbanization development and its impacts on the precipitation complexity in TLB, it is necessary to first analyze the temporal and spatial evolution of the urbanization process. For such a rapidly developing region, the areas with mature urbanization before 1990 are taken as old urban areas, while the fast-urbanizing areas after 1990 are taken as new urban areas, and the rest of the slow-urbanizing areas are suburbs. We calculate the built-up area ratios (the ratio of urban area in the total areas) around 5 km × 5 km of each station according to the urban distribution in 1990 and 2013. They are interpreted from the remote sensing images by using the nine-point classification method. According to the spatial distribution of built-up areas in 2013, those stations with built-up area ratios less than 66.7% (i.e., less than 6 points in the 9 points around 5 km × 5 km of the station) are considered as suburb stations, and the rest of the stations are further divided into old and new city stations based on the results in 1990, that is, the old city stations with the built-up area ratios greater than 66.7% but the new city stations less than 66.7% in 1990. It is known that new city stations are affected slightly by urbanization before 1990 but strengthened after 1990. The classification of stations is shown in Figure 3, including 14 old city stations, 18 new city stations and 17 suburb stations. The area precipitation data in the three areas are obtained by computing the average precipitation at the corresponding stations.

Methods
The index of SampEn is used to quantify the complexity of precipitation in TLB. It is calculated as follows: (I) For a N-point precipitation series (II) Define the maximum distance between corresponding scalars x(i+k) and x(j+k) as the distance [ ] ( ), ( ) d X i X j between vectors X(i) and X(j): (III) Set a threshold r, which is a preset tolerance factor and adopted as a percentage of series' standard deviation, to measure the tolerance of mismatch between the two vectors X(i) and X(j). The two vectors X(i) and X(j) are considered as similar if d[X(i), X(j)]< r. The number of vectors X(j) (j≠i) being similar to X(i) is given as Bi, and then the ratio of Bi to the total number N-m-1 is noted as Bi m (r), and B m (r) is the average Bi m (r) for all i: (IV) Repeat steps (I)-(III) to yield B m+1 (r) for vectors of length m + 1, and then the theoretical SampEn value of the series x(i) is given by: In Equations (2)(3)(4), it is noted that ≠ j i , meaning that self-matches (compare a vector with itself) are excluded. The sample entropy for a finite length sequence can be simplified as:

Methods
The index of SampEn is used to quantify the complexity of precipitation in TLB. It is calculated as follows: (I) For a N-point precipitation series x(i) (1 < i < N), construct N − m + 1 new series X(i) of m dimensional vectors from series x(i): (II) Define the maximum distance between corresponding scalars x(i + k) and x(j + k) as the distance d[X(i), X(j)] between vectors X(i) and X(j): (III) Set a threshold r, which is a preset tolerance factor and adopted as a percentage of series' standard deviation, to measure the tolerance of mismatch between the two vectors X(i) and X(j). The two vectors X(i) and X(j) are considered as similar if d[X(i), X(j)] < r. The number of vectors X(j) (j = i) being similar to X(i) is given as B i , and then the ratio of B i to the total number N − m − 1 is noted as B i m (r), and B m (r) is the average B i m (r) for all i: (IV) Repeat steps (I)-(III) to yield B m+1 (r) for vectors of length m + 1, and then the theoretical SampEn value of the series x(i) is given by: In Equations (2)-(4), it is noted that j = i, meaning that self-matches (compare a vector with itself) are excluded. The sample entropy for a finite length sequence can be simplified as: where m and r are two important parameters for computing the sample entropy. They would influence the sample entropy value but would not affect its variation due to the intrinsic consistency of sample entropy. Here, set m = 2 and r = 0.15STD, STD is the standard deviation of x(i), in the analysis of precipitation data in TLB. More details of the SampEn calculation can be found in Reference [26]. The analysis of precipitation complexity using the sample entropy index can also be found in Figure 4. Besides, the Mann-Kendall (MK) test is used to identify the significance of the precipitation and SampEn time series.

Spatial Difference of Precipitation Complexity in TLB
To analyze the spatial difference of precipitation complexity in TLB, the static SampEn values of the monthly precipitation data (with 588 samples) in the old urban area, new urban area and suburbs during 1965-2013 are calculated (shown in Table 1). It shows that the static SampEn values in the three areas have relative difference, with the highest value (2.168) in old urban areas, followed by new urban areas (2.103) and suburbs (2.066), reflecting the relative difference of precipitation complexity among them. Thus, it is thought that the spatial heterogeneity of urbanization development in TLB and its influence on the precipitation variability have spatial differences. Higher

Spatial Difference of Precipitation Complexity in TLB
To analyze the spatial difference of precipitation complexity in TLB, the static SampEn values of the monthly precipitation data (with 588 samples) in the old urban area, new urban area and suburbs during 1965-2013 are calculated (shown in Table 1). It shows that the static SampEn values in the three areas have relative difference, with the highest value (2.168) in old urban areas, followed by new urban areas (2.103) and suburbs (2.066), reflecting the relative difference of precipitation complexity among them. Thus, it is thought that the spatial heterogeneity of urbanization development in TLB and its influence on the precipitation variability have spatial differences. Higher degree of urbanization corresponds to greater entropy value, greater precipitation complexity and lower predictability of precipitation.

Dynamic Change of Precipitation Complexity in TLB
The SampEn values can reflect the complexity of a time series, but it cannot dynamically quantify the change of its complexity over time. Here, the M-SampEn values of precipitation time series during 1965-2013 in the three areas are calculated to further depict the dynamic changes of precipitation complexity, where the 120 months is taken as the sliding window and 1 month as the slide step. The results are shown in Figure 5. In the whole period of 1965-2013, the M-SampEn values of monthly precipitation in the three urbanization areas of TLB are significantly different (p < 0.05), based on the one-way Analysis of Variance (ANOVA) at the 5% significance level (Table 2) and the box plots of M-SampEn values in Figure 6. degree of urbanization corresponds to greater entropy value, greater precipitation complexity and lower predictability of precipitation.

Dynamic Change of Precipitation Complexity in TLB
The SampEn values can reflect the complexity of a time series, but it cannot dynamically quantify the change of its complexity over time. Here, the M-SampEn values of precipitation time series during 1965-2013 in the three areas are calculated to further depict the dynamic changes of precipitation complexity, where the 120 months is taken as the sliding window and 1 month as the slide step. The results are shown in Figure 5. In the whole period of 1965-2013, the M-SampEn values of monthly precipitation in the three urbanization areas of TLB are significantly different (p < 0.05), based on the one-way Analysis of Variance (ANOVA) at the 5% significance level (Table 2) and the box plots of M-SampEn values in Figure 6. Furthermore, results in Figure 5 indicate that the precipitation complexity has periodic variability and increases with time, especially in the old urban area and new urban area. Before 1980, the M-SampEn values in the three areas are similar, but the difference gradually arises afterwards, especially for the extreme points. After 1980s, the SampEn value of precipitation in old urban areas shows greater complexity than that in new urban areas and suburbs. Particularly, the M-SampEn value in new urban areas also obviously increases after the 1990s, implying that the complexity of precipitation is enhanced by the rapid urbanization. Furthermore, results in Figure 5 indicate that the precipitation complexity has periodic variability and increases with time, especially in the old urban area and new urban area. Before 1980, the M-SampEn values in the three areas are similar, but the difference gradually arises afterwards, especially for the extreme points. After 1980s, the SampEn value of precipitation in old urban areas shows greater complexity than that in new urban areas and suburbs. Particularly, the M-SampEn value in new urban areas also obviously increases after the 1990s, implying that the complexity of precipitation is enhanced by the rapid urbanization.   Meteorological and geographical conditions are the two main physical factors that influence regional precipitation. Because the city group in TLB is mainly located in the plain river network area, the difference of regional terrain conditions is less influential. Besides, the trends of annual precipitation in the old and new urban areas and suburbs are basically the same, and their correlation coefficient values are bigger than 0.88, which indicates that precipitation in different zones in TLB is controlled by the same climate conditions. However, there is a slight difference of M-SampEn values among the three zones, which is mainly attributed to the different urbanization degrees. Since 1990, the urbanization process in TLB has been accelerating, accompanied by the underlying surface succession, population growth and industrial development. All kinds of anthropogenic heat and carbon sources affect the physical and chemical properties of atmosphere. Therefore, the urbanization would inevitably cause the changes in local terrestrial surface and energy balance, which may cause mutual enhancement or superposition of urban "rain island effect" and "heat island effect", and increase precipitation complexity, especially in old urban areas and new urban areas that usually lead to large water accumulation and terrible urban waterlog problems.

Temporal Complexity of Precipitation in TLB
The SampEn value of daily precipitation data in each urbanization zone is calculated to further compare its difference with the annual precipitation in each year. As shown in Table 3, the annual precipitation shows a significant upward trend at 10% significance level before 1990, but decreases in all areas after 1990 (shown in Figure 7), although the trend is not significant. Furthermore, Figure 7 exhibits that there is a significant negative correlation between the long-term variation of annual precipitation and that of its SampEn value (with the negative correlation coefficient smaller than −0.30) and a good correlation between the peak precipitation and SampEn valley value. Larger annual precipitation corresponds to lower SampEn value and less complexity. Conversely, smaller annual precipitation with a bigger SampEn value is more complex. Meteorological and geographical conditions are the two main physical factors that influence regional precipitation. Because the city group in TLB is mainly located in the plain river network area, the difference of regional terrain conditions is less influential. Besides, the trends of annual precipitation in the old and new urban areas and suburbs are basically the same, and their correlation coefficient values are bigger than 0.88, which indicates that precipitation in different zones in TLB is controlled by the same climate conditions. However, there is a slight difference of M-SampEn values among the three zones, which is mainly attributed to the different urbanization degrees. Since 1990, the urbanization process in TLB has been accelerating, accompanied by the underlying surface succession, population growth and industrial development. All kinds of anthropogenic heat and carbon sources affect the physical and chemical properties of atmosphere. Therefore, the urbanization would inevitably cause the changes in local terrestrial surface and energy balance, which may cause mutual enhancement or superposition of urban "rain island effect" and "heat island effect", and increase precipitation complexity, especially in old urban areas and new urban areas that usually lead to large water accumulation and terrible urban waterlog problems.

Temporal Complexity of Precipitation in TLB
The SampEn value of daily precipitation data in each urbanization zone is calculated to further compare its difference with the annual precipitation in each year. As shown in Table 3, the annual precipitation shows a significant upward trend at 10% significance level before 1990, but decreases in all areas after 1990 (shown in Figure 7), although the trend is not significant. Furthermore, Figure 7 exhibits that there is a significant negative correlation between the long-term variation of annual precipitation and that of its SampEn value (with the negative correlation coefficient smaller than −0.30) and a good correlation between the peak precipitation and SampEn valley value. Larger annual precipitation corresponds to lower SampEn value and less complexity. Conversely, smaller annual precipitation with a bigger SampEn value is more complex.   We further calculate the ratio u between the relative changes in SampEn value and precipitation changes before and after 1990: 1991 2013 ,1965 1990 ,1991 2013 ,1965 1990 SampEn SampEn where P ρ ( SampEn ρ ) is the average change rate of annual precipitation (SampEn value). Table 4 shows that the value of u in old urban area is 0.00033, which is larger than 0.00021 in the new urban area and 0.00011 in suburbs. The u value in the old urban area is three times that in suburbs. It indicates that the impact of urbanization also varies with zones, which causes more precipitation decrease in the old urban area than that in new urban area, followed by suburbs. Therefore, the same precipitation change can cause greater change in precipitation complexity in the old urban area, although the new urban area is also greatly affected by urbanization. We further calculate the ratio u between the relative changes in SampEn value and precipitation changes before and after 1990: where ρ P (ρ SampEn ) is the average change rate of annual precipitation (SampEn value). Table 4 shows that the value of u in old urban area is 0.00033, which is larger than 0.00021 in the new urban area and 0.00011 in suburbs. The u value in the old urban area is three times that in suburbs. It indicates that the impact of urbanization also varies with zones, which causes more precipitation decrease in the old urban area than that in new urban area, followed by suburbs. Therefore, the same precipitation change can cause greater change in precipitation complexity in the old urban area, although the new urban area is also greatly affected by urbanization.

Conclusions
The spatial heterogeneity of urbanization in TLB has a big influence on the spatial distribution of precipitation and its complexity in the region. In this study, the SampEn index is employed to investigate the spatial and temporal variability of precipitation complexity in different urbanization areas in TLB. Results indicated that the static SampEn value of precipitation in the old urban area is the highest, followed by the new urban area and suburbs, being consistent with the diverse development degrees of urbanization. Overall, a higher degree of urbanization causes a greater SampEn value of precipitation, higher precipitation complexity and lower predictability of precipitation.
Comparatively, precipitation in the old urban area shows more significant variability than that in the new urban area and suburbs, especially for extreme values, which involve the dynamic change of precipitation complexity. The precipitation complexity in new urban areas is enhanced by the rapid urbanization after 1990s. The complexity of precipitation in both old and new urban areas shows an upward trend, especially in recent years. The scale of urban agglomeration has been continuously expanding, and the urban "heat island effect" and "rain island effect" are intensified and superimposed on each other, causing the increase in urban precipitation complexity.
In the whole period, there is a significant negative correlation between annual precipitation and its SampEn value. Larger precipitation has a lower SampEn value and low complexity. By comparing the three urbanization zones, the relative complexity of precipitation in old urban area is the highest, indicating that the same precipitation changes can cause a greater change of precipitation complexity in the old urban area rather than the new urban area and suburbs.
In summary, the urbanization development significantly affects the spatial and temporal distribution of precipitation complexity in TLB, which will be aggravated along with urban agglomeration. It not only promotes us to further explore the physical mechanisms of precipitation change, but also provides a more useful guide for flood control and disaster mitigation, water resources management and project construction in TLB.
Author Contributions: J.H. and Y.L. did the data analysis work and wrote the paper; Y.-F.S. guided the entire study and contributed to the discussion part.