Analysis of the Lake ‐ Effect on Precipitation in the Taihu Lake Basin Based on the GWR Merged Precipitation

: Based on the high ‐ density gauged rainfall, the geographically weighted regression (GWR) was used to fuse the daily precipitation of rain gauges with those of Multi ‐ source Weighted ‐ Ensemble Precipitation V2.1 (MSWEP V2.1) and a new merged daily precipitation was generated (referred to as GWR merged precipitation, denoted by GWRMP). Then, the precipitation accuracy at 0.1° × 0.1° grid scale and the lake ‐ effect on precipitation in the Taihu Lake Basin were investigated. Results show that GWRMP is characterized with higher precision and stronger spatial recognition ability compared with MSWEP in the whole basin at 0.1° × 0.1° grid scale, and lake area with a relatively sparse network of rain gauges is no exception. Topography is the most important influencing factor of rainfall in the Taihu Lake Basin, the Pearson correlation coefficient (r) between DEM and the main precipitation type (EOF ‐ 1) in the whole basin is 0.64, resulting in a rainy area in the southwestern mountain, and less rain at plain and lake area based on the GWRMP. The multi ‐ year average precipitation in the lake upwind area is 8.31% lower than that in the downwind area. Different with the influence mechanism of precipitation in the southwestern mountainous area characterized by high consistency between the spatial distribution of precipitation and the climatic elements derive from the ERA5 meteorological reanalysis data (|r| > 0.6), there is a lower consistency in the lake downwind area (|r| < 0.5) and no consistency in the lake upwind area at the 0.25° × 0.25° grid scale. The southeast monsoon is deduced as the most important factor affecting the procedure of lake ‐ effect on precipitation in the Taihu Lake Basin. The distribution of wind direction and wind speed determines the dynamic changes of surface water vapor to a certain extent, and the lake ‐ effect on precipitation is most likely occurs in July.


Introduction
The information on the spatial distribution of precipitation is the basis for water resources management in the basin, and it is also the important information needed for regional flood control and disaster alleviation. Due to the uneven distribution of surface rainfall stations, there are significant differences in the capabilities of precipitation observation under different terrain conditions such as mountain, plain and lake regions. Especially in the large lake with a relatively sparse network of rain gauges, it is of great significance to accurately estimate the spatio-temporal distribution of precipitation, which is directly related to the flood control, storage, and water resources allocation in the basin. It is also important to diagnose the lake-effect on precipitation which always exists in the large lake and its upwind and downwind areas. Studies [1][2][3] have shown that there is a typical lake effect on snowfall in the Great Lakes region of North America, and the snowfall is increasing. Compared with lake effect on snowfall, the relevant experimental and analytical results about lake-effect on precipitation are very limited. At high altitude, it is found that lake effect precipitation does exist and dominates the spatial distribution of precipitation, as it was shown for the lakes of Tibetan [4].
In recent years, the rapid development of different precipitation products such as multi-source satellites, weather radars, and numerical simulations has greatly enriched the access to the information of precipitation and effectively improved the capacity of capturing the spatial and temporal distribution of precipitation. By making full use of the advantages of different acquisition methods and merging the spatial distribution information of different precipitation data, the multisource fusion precipitation with high precision and space continuity can completely capture the information on the spatial distribution of precipitation. It has significant advantages in precipitation diagnosis under complex terrain conditions and is becoming an important part of the current researches on the precipitation characteristics in global and regional regions. Compared with the presently most widely used multi-source fusion precipitation, such as multi-source satellite integrated precipitation TRMM 3B42 [5] (covering 50° N-50° S, data time sequence is 1997-2014, and data spatial-temporal resolution is 3 h, 0.25° × 0.25°), GPM IMERG precipitation [6] (covering 60° N-60° S, data time sequence is 2014-present, and data spatial-temporal resolution is 30 min, 0.1° × 0.1°), and ERA5 [7] reanalysis precipitation data (covering the global regions, data time sequence is from 1979 to the present, and spatial-temporal resolution is 1 h, 0.25° × 0.25°), MSWEP V2.1 proposed by Beck et al. [8,9] integrating the advantages of precipitation data from different sources, such as Daily gauge data [10], GPCC FDR [11,12], CMORPH [13], GSMaP-MVK [14], TMPA 3B42RT [5], ERA Interim [15], GridSat [16], JRA-55 [17] and WorldClim [18], etc. has a higher spatial-temporal resolution (3 h, 0.1° × 0.1°), longer time series (1979-present). Many pieces of research [19][20][21] on the daily precipitation accuracy of MSWEP are conducted based on the dense rain gauges in different regional areas, and the results showed that MSWEP is overall highly consistent with the surface observation precipitation and has higher precision than TRMM 3B42V7. The researches on MSWEP in different countries such as India [22], Iran [23] and China [24] under different time scales and different levels of rainfall events show that MSWEP has a slightly weaker ability of monitoring extreme precipitation, but possesses a generally higher accuracy of daily precipitation and has a great potential for the analysis of global and regional precipitation and hydrological simulation [25,26].
Considering the rainy characteristics in a subtropical humid area, it is very necessary to conduct further research on the lake-effect on precipitation of the large lake, especially for the lake area surrounded by highly developed population and economy, such as Taihu Lake. The precipitation distribution in the lake and its upwind and downwind areas not only affects the basin and regional flood control safety but also relate to the water resources allocation of surrounding cities. This work attempts to use MSWEP V2.1 with strong spatial recognition ability and observation precipitation of dense rain gauges with high precision rainfall reflection characteristics to study the lake-effect on precipitation and its influence mechanism in the Taihu Lake basin. In the process of developing MSWEP V2.1, Beck et al [8][9][10] used the observation precipitation of global limited rainfall gauges, which may result in inaccurate ground calibration. It is necessary to do a fusion calculation of the precipitation information from the dense rainfall gauges and MSWEP V2.1 product to obtain newly merged precipitation, which qualify the ability of strong spatial discrimination as well as an accurate estimation of precipitation. Taihu Lake Basin has a long record of observation precipitation information with high-density rain gauges, while Taihu Lake, located in the center of the basin, has a large water area and is a very suitable study area for the verification and revision of MSWEP V2.1 as well as diagnosis of lake-effect on precipitation. Researching on the spatial distribution of precipitation in Taihu lake and the possible influence mechanism of lake-effect on precipitation is of great significance. On the one hand, it overcomes the inaccurate estimation of precipitation in the lake region due to sparse rain gauges network, and on the other hand, a data support based on the quantitatively identified lake-effect precipitation distribution can be provided for further improving the scheme of flood control and storage and the allocation of water resources in the Taihu Lake Basin.
Thesis ideas are as follows: firstly, the precipitation data with the highest accuracy in different topographic conditions was determined; then, the spatial distribution of precipitation in the whole basin was extracted to analyze the precipitation characteristics in Taihu lake and its upwind and downwind areas, and at final, the lake-effect precipitation was diagnosed. The paper is organized as follows: Section 2 introduces the study area and the datasets used in the study. Section 3 presents the procedure used to generate the benchmark and newly merged precipitation and the methodologies adopted to quantify the precipitation accuracy of different precipitation data at the 0.1° × 0.1° grid scale, as well as to diagnostic the phenomenon of Lake-effect on precipitation and its influence mechanism are declared. Section 4 displays the results and related discussion. Section 5 summarizes the conclusions and prospects.

Study Area
Taihu Lake Basin is located in the Yangtze River Delta of China. It borders the Yangtze River in the north, Qiantang River in the south, the sea in the east, and Tianmu Mountain and Maoshan Mountain in the west. It spans Jiangsu Province, Zhejiang Province, Anhui Province, and Shanghai, with a total basin area of 36,869 km 2 . The multi-year average precipitation is 1185 mm, of which the precipitation in the flood season (May to September) is 726 mm [27]. There are mountains, plains, lakes, and dense river network in this basin. The central Taihu Lake is the third-largest freshwater lake in China, with a water area of close to 2338 km 2 . The complex terrain conditions make the basin low in the middle and high in the surrounding areas, causing the flood is easy to occur but difficult to be alleviated. Meanwhile, it is densely populated and has many large and medium-sized cities, making it one of the most developed regions as well as the most vulnerable area with large losses in China. According to the terrain features, the Taihu Lake Basin is divided into three subareas: mountainous area, lake area and plain area. Figure 1 shows the geographical overview of the Taihu Lake Basin and the distribution of rainfall gauges.

Datasets
The daily precipitation of MSWEP V2.1 from 1979 to 2016 released by the EU Joint Research Center (EU/JRC), and the daily precipitation of rain gauge reorganized, quality-controlled and published by the basin hydrological yearbook were mainly adopted. The spatial-temporal resolution of MSWEP V2.1 is 3 h and 0.1° × 0.1° respectively, and the data download address is http://www.gloh2o.org/. There is a dense rainfall gauge network in the study area ( Figure 1), but the number of available rain gauge varies from year to year during the study period. From 1979 to 1989, the number of available and effective rain gauges is 175, and from 1989 to 2005, this number is 139, and from 2006 to 2016, this number is 196.
The data on meteorological elements are derived from the ERA5 monthly climate reanalysis data from 1979-2016 provided by KNMI Climate Explorer, including five meteorological elements: surface 2 m temperature (tem), 10 m wind speed (wspd), surface sensible heat flux (shtfl), surface latent heat flux (lhtfl) and 850 mb high-altitude specific humidity (q). The spatial resolution is 0.25° × 0.25°, and the data download address is http://climexp.knmi.nl/start.cgi.
The Version 4 DMSP-OLS Nighttime Lights Time Series (referred to as DMSP) issued by the National Oceanic and Atmospheric Administration of US from 2000 to 2013 was employed as a quantitative indicator of the urbanization level in the Taihu Lake Basin. The spatial resolution of this data is 1 Km × 1 Km, and the data download address is http://www.ngdc.noaa.gov/eog/dmsp/.

GWR-Based Rainfall Merging
The dataset of surface observational precipitation used to integrate MSWEP V2.1 comes from the global limited rain gauges only. In order to make full use of ground observation precipitation information, the geographically weighted regression model was used to combine MSWEP V2.1 with precipitation of high-density rain gauges to obtain GWR merged precipitation (GWRMP) in the Taihu Lake Basin. Hu Q F et al. firstly proposed the residual-based rainfall merging scheme using geographically weighted regression (GWR) in 2015 [28], then GWR-based merging method has been widely used in the analysis of precipitation fusion [29][30][31][32]. The generation model of GWRMP involves three primary processes: (1) obtaining the rainfall bias at gauges between gauge precipitation and corresponding grid rainfall of MSWEP V2.1; (2) Taking the local characteristics of MSWEP V2.1 as the weight, rainfall bias at gauge is interpolated to get the spatial distribution of rainfall error based on the regression principle; (3) by the inverse operation of the method for the estimation of rainfall bias at the gauge, the bias field generated by GWR was superimposed onto the MSWEP V2.1 field to obtain the GWRMP. The specific calculation formula of each content refers to references 30 and 31. The general framework of GWR-based rainfall merging algorithms is displayed in Figure 2.

Generation of the Benchmark Precipitation
Based on the observational rainfall of gauges, the spatial interpolation method was used to generate the surface benchmark precipitation (benchmark) with the spatial resolution of 0.1° × 0.1°, which was planned to be used as the basis for evaluating the accuracy of MSWEP V2.1 and GWRMP at 0.1° × 0.1° grid scale. Considering the influence of terrain on precipitation, the Inverse Distance Weighted (IDW) interpolation method [33] was adopted in the lake region and the plain area, while the GWR method was used to carry out the spatial interpolation of daily precipitation in the mountainous area with latitude and longitude and elevation as variables. The calculation method of benchmark precipitation in the mountain area is consistent with GWR-based rainfall merging, but the difference is to adjust the independent variable from the bias at rain gauge and MSWEP V2.1 to the precipitation at the rain-gauging station and DEM elevation, and then the spatial interpolation of the daily precipitation was performed.

Precipitation Accuracy Evaluation
The quantitative accuracy indicators, such as the Relative Bias (RB), Root Mean Square Error (RMSE) and Correlation Coefficient (CC) were adopted to quantitatively assess the consistency between MSWEP V2.1 and GWRMP with respect to the benchmark at 0.1° × 0.1° grid scale. The Heidke skill score (HSS) [34] and the Volumetric Hit Index (VHI) [35] were respectively used as the classification indicators to comprehensively characterize the ability of MSWEP V2.1 and GWRMP to detect the occurrence and volume of precipitation at grid scale. The HSS and VHI are calculated as follows: n n -n n n + n n + n + n + n n + n (1) where 11 n is the frequency of daily precipitation events detected by both the benchmark and the evaluated precipitation (MSWEP V2.1 or GWRMP); 01 n is the frequency of the daily precipitation events detected by the benchmark while not detected by the evaluated precipitation; 10 n is the frequency of the daily precipitation events detected by the evaluated precipitation while not detected by the benchmark; 00 n is the frequency of non-precipitation events detected by both the benchmark and the evaluated precipitation.
where i S and i G represent the daily precipitation of the evaluated and the benchmark data respectively at grid; and t P is the threshold value of the daily precipitation event, This work also used the structural similarity index (SS) [36] as the complement of the evaluation indicator system for the accuracy of MSWEP V2.1 and GWRMP. It is usually used in image quality analysis as a part of the structural similarity index measure (SSIM), which was originally proposed by the Laboratory for Image and Video Engineering in the University of Texas at Austin. The SS index can comprehensively compare the spatial matching between the predicted and the benchmark daily precipitation, which visually reflects the ability of the merged precipitation to reproduce the spatial pattern of benchmark precipitation. , where S and G represent the precipitation of the evaluated and the benchmark data respectively at daily image; S  and G  are the standard deviation of corresponding data; SG  is the covariance of the evaluated and benchmark daily precipitation. 3 c is a constant to avoid the system errors caused by the denominator being zero.

Lake-Effect on Precipitation Diagnosis
The empirical orthogonal function (EOF) was used to extract the spatial distribution of precipitation under the influence of different dominant factors of precipitation in the Taihu Lake Basin. The Pearson correlation coefficient and t test were used to analyze the consistency of different precipitation distribution types obtained by EOF decomposition in the basin with topography (DEM), location (longitude, latitude), urbanization and other major precipitation influencing factors. The most important distribution type of precipitation was selected, and the distribution pattern of precipitation was analyzed in-depth to diagnose the abnormal distribution of precipitation between the upwind and downwind areas of Taihu Lake. Combined with ERA5 meteorological reanalysis data, the meteorological elements such as 2 m surface temperature (tem), 10 m wind speed (wspd), latent heat flux (lhtfl), sensible heat flux (shtfl), and high-altitude specific humidity (q) were used to explore the lake-effect on precipitation in the Taihu Lake Basin. Then, combined with the distribution of monthly precipitation and meteorological elements, the most probable time of lake-effect precipitation was determined. Figure 3 shows the scatter plot composed by the MSWEP V2.1 and GWRMP daily precipitation with the benchmark precipitation respectively in the flooding season in the Taihu Lake Basin and different terrain subareas at grid scale from 1979 to 2016. The regression variance R 2 in the entire basin and different subareas is above 0.7, both the MSWEP V2.1 and GWRMP have the strong ability to explain the changes in the surface daily precipitation. Compared with MSWEP V2.1, the GWRMP scattered points in entire basin and different subareas are more concentrated near the regression line, and abnormal points that deviate significantly from the regression line are significantly reduced, indicating that the ability of GWRMP to explain the changes in surface daily precipitation is significantly higher than that of the MSWEP V2.1 without merged benchmark precipitation. MSWEP V2.1 has an underestimation of daily precipitation, especially for the heavy precipitation, which is a universal systematic error in precipitation products, and this problem has been verified in many studies [9][10][11][18][19][20]. In the process of the fusion of precipitation data by GWR model, the underestimation by MSWEP V2.1 was corrected using the precipitation at the surface rainfall gauges, but since the MSWEP V2.1 is smaller than the measured precipitation as a whole, the calculated error at the rain gauges is mainly positive error, and when GWR is used for error space interpolation, some grids may introduce unnecessary error, and when superimposed with the background field of MSWEP V2.1, a systematic error in GWRMP was introduced for some grids.  , while GWRMP has a certain degree of overestimation (RB < 20%). The RMSE of GWRMP is overall 2 mm lower than that of MSWEP, both CC and HSS of GWRMP is greater 1 than that of MSWEP, and the VHI of GWRMP is about 0.3 higher than MSWEP. All the indices of time sequence accuracy of each grid reveal that GWRMP has significantly higher accuracy than that of MSWEP V2.1, suggesting that the precipitation of GWRMP has a stronger ability to classify and identify the daily precipitation events than MSWEP V2.1 in the time series variation in the Taihu Lake Basin. Although the structural similarity index SS has a large variation range, the average value corresponding to MSWEP V2.1 and GWRMP is not less than 0.8, and the average value corresponding to GWRMP is as high as 0.9, indicating that both the MSWEP V2.1 and GWRMP are highly consistent with the benchmark precipitation in the daily spatial distribution. The precipitation accuracy of different topographic sub-regions was also measured and analyzed. From an average point of view, the RB, HSS, and VHI showed that there is a slightly lower precipitation accuracy in lake area compared with plain and mountainous area, and CC indicates the accuracy of precipitation in mountain area is slightly better, while RMSE suggest Lake is an area with higher precipitation precision compared with the other sub-regions. Considering the area difference of three sub-regions, i.e., plain area > mountain area > lake area, RMSE and CC may be biased due to the relatively fewer grid samples in the Lake area. It can be reflected from spatial similarity index SS, the similarity of precipitation distribution and benchmark between Lake area and mountain area is slightly lower than that of plain area. Comprehensive to is it can know the accuracy of precipitation estimation of MSWEP, as well as GWRMP, is higher in plain area, followed by mountain area and Lake area. It is difficult to detect the precipitation accurately in mountainous areas, which is a common feature of many global precipitation data. The distribution of relatively sparse and uneven rain gauges may lead to the relatively low consistency between the benchmark and MSWEP V2.1 and GWRMP in the lake region. Considering the daily grid precipitation generated by spatial interpolation has certain error as the benchmark in Taihu lake with relatively sparse and uneven rain gauges. With the limited gauged rainfall in the lake and coastal areas as the new benchmark, the MSWEP V2.1 and GWRMP at grid scale corresponding to the rain gauges were extracted to conduct the check on the grid precipitation accuracy at the scale of rain gauges in lake area ( Figure 5). After the fusion of the gauged precipitation, the accuracy of GWRMP is significantly improved compared to MSWEP V2.1, the RB is controlled within 16%, the CC is increased to above 0.85, the maximum of RMSE is reduced from 10.4 to 8.4, and the HSS and VHI are as high as 0.7 and 0.99, respectively. All the precision indices reveal that the GWRMP has significantly higher accuracy than MSWEP V2.1 at the gauged scale, which is consistent with the evaluation results at the grid scale. Compared with MSWEP V2.1, GWRMP is closer to the real precipitation and can be used to estimate the precipitation and analyze the spatial distribution in the lake area. In order to distinguish the characterization capabilities of the benchmark, MSWEP V2.1 and GWRMP in Taihu Lake, the comparative analysis of rainfall and spatial distribution in the lake has been carried out respectively in the flood season ( Figure 6 and Table 1). Consistent with the accuracy evaluation results, the precipitation underestimation of MSWEP V2.1 is obvious, while GWRMP is slightly higher than the benchmark. The highest precipitation occurs in June, and the smallest occurs in May and September. In the west of Taihu lake is basically a heavy rainfall area, while there is not much rainfall in the middle and east, and this spatial distribution pattern is likely to cause a highwater level in Taihu Lake, resulting in increasing flood control pressure in Lake area and its downstream. GWR integrates the information on precipitation spatial distribution at rain gauges and of MSWEP V2.1, effectively reducing the rainfall difference with the benchmark on the one hand, and allowing identification of strong precipitation spatial distribution, on the other hand, characterized by more precipitation in the southwest of Taihu Lake and less precipitation in the northeast. The precipitation in July is significantly less in the east and more in the west.

The Spatial Distribution of Precipitation and Its Influencing Factors
Because GWRMP has a stronger ability to characterize the time-space distribution of precipitation than MSWEP, gridded GWRMP was used to conduct the analysis of the spatial distribution and main influencing factors of precipitation in Taihu Lake Basin. The EOF analysis and the North test (Figure 7) of the monthly GWRMP at 0.1° × 0.1° grid scale in the flooding season from 1979 to 2016 were carried out. The results show that at the 0.95 significance level, the variance contribution rate of the first EOF mode (referred to as EOF-1) is 95%, representing the main type of spatial distribution of precipitation-high precipitation value is distributed in the southwest of the Taihu Lake Basin and low precipitation value is distributed in the eastern plain region (upwind area of Taihu Lake). The time coefficient over the years shows precipitation in September 1999 is the most typical type of EOF-1. The variance contribution rate of the second mode EOF (EOF-2) is 2.69%, and the precipitation is characterized by a typical band-shaped distribution, that is to say, the precipitation gradually increases or decreases from the south to the north. The variance contribution rate of the third and fourth mode EOF (EOF-3, EOF-4) is less than 1%. EOF-3 also shows a bandshaped distribution of precipitation, but the rainfall gradually increases or decreases from east to west. The precipitation of EOF-4 is semi-circularly distributed from west to east, the precipitation distribution in the east, south and north outskirts is opposite to that in the central western region.  Table 2 shows the Pearson correlation coefficient between four kinds of precipitation distribution of EOF and potential influencing factors at 0.1° × 0.1° grid scale and the t-test results. The correlation coefficient between the precipitation of EOF-1 and DEM is the highest (0.64), indicating that topography is the dominant influence factor determining the main spatial distribution pattern of precipitation in the entire basin. The precipitation of EOF-2 and EOF-3 have the highest correlation with latitude and longitude respectively, and both their correlation coefficients are above 0.90, it can be verified that EOF-2 and EOF-3 are typical latitude zonal and longitude zonal precipitation distribution pattern respectively. The correlation coefficient between the precipitation of EOF-4 and the four influencing factors is not high, relatively speaking, the correlation with DMSP is the highest and passed the t test. The consistency of the spatial correlation coefficient between DMSP and the precipitation of EOF-4 in time series is further diagnosed as shown in Figure 8. The period from 2000 to 2010 witnessed the rapid urbanization of the entire basin (DMSP shows a significant incremental change), and although the spatial correlation coefficient between DMSP and the precipitation of EOF-4 is low, its variation trend is basically same as DMSP. After 2010, the Taihu Lake Basin has entered a stage of stable urbanization (the DMSP increases slowly), and the spatial correlation coefficient between DMSP and the precipitation of EOF-4 also increases slowly. Overall, EOF-4 is the spatial distribution pattern of precipitation under the leading role of urbanization. Note: n represents the total raster number in entire basin at 0.1° × 0.1°grid scale.

The Effect of the Taihu Lake on Precipitation Spatial Distribution
Based on the spatial variation of EOF-1 and the distribution pattern of DEM, the EOF-1 precipitation data were divided into mountain rainy area (EOF-1 ≥ 0.056), plain moderate rain area (0.05 < EOF-1 < 0.056), and lake upwind less rainfall area (EOF-1 < 0.05) (Figure 9). The Pearson correlation analysis between the precipitation distribution in these three typical divisions and DEM was performed and t test was carried out ( Table 3). The correlation coefficient between the precipitation in mountain rainy area and the DEM was the highest (0.64). Although having passed the t test at 0.99 significance level, the correlation coefficient between the precipitation in plain moderate rain area and the DEM is less than 0.2, while there is no significant correlation between the precipitation in lake upwind less rainfall area and DEM. The precipitation distribution of EOF-1 under the dominant function of terrain has a good response relationship with DEM only in the southwestern mountainous area. The relationship between the precipitation distribution in the plain and the lake upwind area and the DEM is not significant. Large-scale lakes have a certain influence on the temporal and spatial distribution of precipitation [2,3]. According to the scope of the lake region, and the distribution of the dominant wind direction in the summer in the study area, with the vertical centerline of the lake as the boundary, the lake region is divided into the upwind and downwind areas (Figure 9b). The precipitation of EOF-1 has a significant difference in the upwind and downwind areas. According to statistics, the precipitation in the upwind area is 8.31% less than that in the downwind area. Figure  9b shows that the area with EOF-1 <0.05 is located in the lake upwind area, Considering the division by EOF-1 value, it is an area with low precipitation value. The precipitation distribution pattern of upwind and downwind areas is consistent with lake-effect precipitation. In order to further explored the evidence of the lake-effect on precipitation in Taihu Lake, the ERA5 monthly meteorological reanalysis data were used to analyze the possible influence mechanism. The analysis results of the correlation between the precipitation spatial distribution of EOF-1 in various divisions and various climatic elements ( Figure 10) show that the correlation coefficient in southwestern mountain rainy area is generally high (|r| > 0.6); the correlation coefficient in the plain moderate rain area is less than 0.5, and in the lake upwind less rainfall area the correlation coefficient has not passed the t test at the 0.90 confidence level. This work also analyzes the relationship between the precipitation of EOF-1 in the lake upwind and downwind areas and the climatic elements. The precipitation distribution in the lake upwind area has no significant correlation with the climatic elements, but in the lake downwind area the correlation coefficient is significantly improved. In summary, the distribution of precipitation in the mountain rainy area is highly consistent with the climatic elements, while the precipitation distribution in the plain area is weakly correlated with the climatic elements, especially in the lake upwind area, where the consistency between the distribution of precipitation and the spatial distribution of climatic elements is low.
Correlation analysis (Figure 10a) shows that the precipitation of mountain rainy area in the southwest is directly proportional to lhtfl and q , but inversely proportional to shtfl , tem and wspd. Since ERA5 data define the heat released from the surface as negative value, both lhtfl and shtfl in flood season are mainly negative values, it can be known that rainfall in mountain rainy area is directly proportional to shtfl , but inversely proportional to lhtfl . Combining with the multi-year average spatial distribution of climatic factors in the flood season (Figure 11), it can be found that in the southwest mountain rainy area, the shtfl and q are high, lhtfl is medium, tem and wspd are low. The condition of higher DEM, smaller horizontal wind speed, stronger convection (high shtfl ), larger high-altitude specific humidity q , and cooler surface temperature in mountain area, provides an advantage of precipitation, resulting in a heavier rainfall than that in other subareas. The correlation between the climatic factors and the precipitation distribution in the lake downwind area is relatively consistent with that in the southwest mountainous area, but the consistency in the upwind area is poor.  With reference to the response relationship between climatic factors and precipitation in the southwest mountainous area, and combined with the factors of topography, location, and prevailing wind direction, the formation mechanism of precipitation in the lake up-wind and down-wind areas was explored. In summer, the southeast monsoon is prevailing in the Taihu Lake Basin (Figure 1d). The coastal area has low tem, high wspd, and low vertical heat flux (Figure 11), precipitation is not easy to be formed under such meteorological conditions. As the southeast monsoon moves towards the upwind of Taihu Lake, the wspd weakens, the tem and lhtfl increase significantly, and shtfl firstly increases then significantly decreases. Although the wind speed of the southeast monsoon is weakened, the vertical heat flux in the lake upwind area is low, and the regional horizontal transport of air is still higher than the vertical turbulent movement. This is reflected in the fact that the southeast monsoon carries the wet and hot water vapor in the upwind area to the lake area, so the precipitation in the upwind area is suppressed. Because the Taihu lake area itself has large wspd, tem and lhtfl , but low shtfl , under the further strengthening by the lake area, the volume of wet and hot water vapor increases significantly, and it is transported to the downwind area. Finally, in the environment with high temperature and low wind speed in the downwind area, the wet and hot water vapor rises vertically to form clouds and cause rain, resulting in abundant rainfall in the downwind area, and less precipitation in the upwind area. In summary, the southeast monsoon is the dominant factor affecting the lake-effect precipitation in Taihu Lake Basin, which suppresses precipitation in the lake area and the upwind area and increases rainfall in the downwind area.
According to the above precipitation influencing mechanism, the spatial distribution of monthly precipitation and wind speed in flood season in the Taihu Lake Basin were compared and analyzed ( Figure 12). The precipitation in the Taihu Lake Basin is mainly distributed in the southwest mountainous area, while the precipitation in the lake area and plain area is less, but in July, the high precipitation occurs in the west, and the precipitation increases from the east to the west, except for the southwestern mountainous area, the lake downwind plain area is also rainy area. By comparing the spatial distribution of wspd in the basin, it can be found that the wspd in the southwest mountain rainy area in flood season is generally small but always high in the coastal and lake area. There is not much difference in wspd to each month between the upwind and downwind areas of Taihu Lake, but in July, the wind speed in the lake downwind area is significantly lower than that in the upwind area, which is consistent with the distribution of precipitation in this month, indicating that July may be the time when lake-effect precipitation is most likely to occur in the Taihu Lake Basin. Considering that the Taihu Lake Basin is usually in the late Meiyu period in July, when the water level in the lake is high, so the pattern of more precipitation in the lake downwind area (upstream area) will further exacerbate the flood risk in Taihu lake. The upwind area (downstream area) precipitation is less, but the population and economy here are highly concentrated and the water demand of society and economy is large. It also further increases the difficulty of water resources regulation and storage in Taihu Lake.

Conclusions
Based on the benchmark, the accuracy of daily precipitation of MSWEP V2.1 and GWRMP was evaluated at grid scale, and the differences between the above three precipitation data in the analysis of the spatial distribution of precipitation in Taihu Lake were compared. Based on the analysis of the spatial distribution of precipitation and its main influencing factors in the Taihu lake Basin, the lakeeffect on precipitation was explored by combining with ERA5 meteorological reanalysis data. On the whole, the accuracy of GWRMP is better than that of MSWEP V2.1. It can more accurately capture precipitation events and precipitation at 0.1 × 0.1° grid scale in the lake area as well as in the entire basin. The precipitation distribution dominated by terrain is the most important precipitation type in the Taihu Lake Basin. The lake-effect on precipitation does exist and is mainly affected by the southeast monsoon. Based on the results obtained, the following conclusions can be drawn: (1) At 0.1° × 0.1° grid scale, GWR merged precipitation has a strong ability to detect the daily precipitation in Taihu Lake Basin from 1979 to 2016, and its accuracy is higher than that of MSWEP V2.1. It has a significant advantage in the analysis of precipitation in Taihu Lake, which can basically restore the actual distribution of precipitation in Taihu Lake. Except for the distribution pattern of more rainfall in the west and less in the east in July, more precipitation is distributed in the southwest and less rainfall is distributed in the middle, east and north areas in Taihu Lake. (2) The spatial distribution of precipitation under the effect of topography (EOF-1) is the dominant spatial distribution (95% variance contribution rate). It shows a good response relationship with DEM in the southwest rainy mountainous area (r = 0.64), but no significant relationship in the lake upwind area. The phenomenon of lake-effect on precipitation does exist, and the multi-year average precipitation in the lake upwind area is 8.31% less than that in the lake downwind area. (3) The distribution of precipitation in the southwest mountain rainy area has a higher consistency with climatic factors (|r| > 0.6) than that in the plain area, especially in the lake upwind area. The southeast monsoon is deduced as the most important factor affecting the lake-effect on precipitation. The distribution of wind direction and wind speed determines the dynamic changes of surface water vapor to a certain extent-it brings the wet and hot water vapor in the upwind area to the lake area, and under the further strengthening by the lake, the enhanced wet hot water vapor is carried to the downwind area, which increases the regional precipitation in lake downwind area, while suppressing precipitation in the lake area and upwind area. The lake-effect on precipitation is most evident in July. (4) Based on the monthly GWRMP and ERA5 meteorological reanalysis data, the possible influence mechanism of lake-effect on precipitation in Taihu Lake region was explored preliminarily at 0.25° × 0.25° grid scale. The research mainly uses the distribution consistency determined by the correlation analysis method as the evaluation metric about the possible influence mechanism. It is only a qualitative analysis not quantitative evaluation on the impact threshold of each meteorological element. Thus, a quantitative study on factors that affect the lake-effect on precipitation should be strengthened by further mathematical models and more detailed meteorological data collection in the Taihu Lake Basin.