1. Introduction
The Qinghai–Tibetan Plateau (QTP) is often referred to as the “Asian Water Tower” and the “Third Pole of the Earth” [
1]. As the source region of several major Asian rivers, including the Yangtze, Yellow, Lancang and Yarlung Zangbo Rivers, it plays a vital role in regional and transboundary water security [
2,
3]. Precipitation is a key component of the Plateau’s hydrological cycle and directly influences glacier and snow melt, permafrost stability, ecosystem dynamics and downstream runoff and flood risk [
4,
5,
6]. Because precipitation is closely linked to cryospheric processes, changes in its amount and timing can lead to amplified environmental and hydrological impacts over the Plateau. This makes the QTP particularly sensitive to climate change [
7,
8,
9,
10].
Among currently available satellite precipitation datasets, the Global Satellite Mapping of Precipitation (GSMaP), developed by the Japan Aerospace Exploration Agency (JAXA) under the Global Precipitation Measurement (GPM) program, has been widely used in plateau and mountain regions because of its relatively high spatial and temporal resolution and its continuously updated retrieval algorithms [
11,
12,
13]. GSMaP integrates observations from multiple passive microwave and infrared sensors and applies time-based propagation and smoothing techniques to generate continuous precipitation fields. It has been released in several versions (v05–v08) and includes multiple product types, such as GSMaP Near-Real-Time (NRT), GSMaP Moving Vector with Kalman filter (MVK), GSMaP gauge-adjusted Near-Real-Time (GNRT) and GSMaP gauge-adjusted product (Gauge), for real-time monitoring, climate analysis and hydrological applications [
14,
15,
16]. Since 2018, GSMaP has been increasingly applied to investigate the spatial and temporal patterns of precipitation over the Qinghai–Tibetan Plateau, support hydrological simulations and analyze extreme precipitation events. Accordingly, its performance in this region has attracted growing attention (the detailed list of abbreviations and their full names is provided in
Table S1) [
17,
18,
19].
Previous studies suggest that GSMaP can reasonably describe the large-scale spatial pattern and seasonal variation of precipitation over the QTP. In particular, it is able to capture the general decrease in precipitation from the southeast to the northwest and shows relatively good continuity during the summer monsoon season. For example, Lei et al. reported that GSMaP performs better for warm-season precipitation in the eastern and southeastern Plateau, whereas errors are larger in high-elevation and arid regions [
20]. Li et al. found that GSMaP can generally identify the timing of daily precipitation events, but it still shows clear bias in precipitation amounts [
21]. Other comparison studies have further shown that GSMaP tends to overestimate light precipitation (0 ≤
p ≤ 1 mm/day), while underestimating moderate precipitation (1 <
p ≤ 5 mm/day), heavy precipitation (5 <
p ≤ 10 mm/day) and extreme precipitation (
p > 10 mm/day) events, with these problems becoming more pronounced in high-elevation areas and snow-covered regions [
22].
These performance limitations are closely related to the environmental conditions of the Plateau. Precipitation processes in high-altitude regions are often influenced by complex terrain and low-temperature conditions, and their microphysical characteristics differ from those at lower elevations. Across much of the high-elevation Qinghai–Tibetan Plateau, precipitation often occurs in solid or mixed-phase form, especially during winter and transitional seasons. Compared with warm-rain processes, snowfall formation involves ice-phase microphysical processes, which can affect the radiative signals received by satellite sensors. In addition, lower temperatures and limited moisture at high elevations often lead to shallower cloud systems and weaker precipitation, increasing the uncertainty of passive microwave and infrared retrievals. As a result, satellite precipitation products in high-altitude regions are more likely to overestimate light precipitation while underestimating heavier rainfall [
23,
24].
At the same time, clear differences have also been reported among GSMaP products and versions over the Qinghai–Tibetan Plateau. Several studies indicate that the gauge-corrected GSMaP Gauge product generally shows smaller systematic bias in most regions, and its precipitation amount and spatial pattern are closer to rain gauge observations. In contrast, near-real-time products without gauge correction still show relatively large uncertainty in the interior Plateau and other data-sparse areas [
25,
26]. With continued algorithm updates, some studies suggest that newer GSMaP versions show better consistency and temporal stability than earlier versions [
20]. However, improvements in estimating heavy precipitation and cold-region precipitation remain limited. Overall, existing studies indicate that GSMaP has considerable potential for application over the Qinghai–Tibetan Plateau, but its performance still depends strongly on region, season and precipitation type [
27].
Despite these efforts, several important gaps remain in the current literature. First, most previous studies have focused on a single GSMaP version or a relatively short study period, which limits comprehensive inter-version comparison [
25], limits the understanding of how GSMaP performance evolves over time, and makes it difficult to compare different versions clearly under the Plateau’s complex terrain and climatic conditions. As a result, the long-term differences among GSMaP versions are still not fully understood [
28]. Although the temporal coverage of different GSMaP versions is not fully consistent and the available periods of the gauge observations and satellite data are also not entirely the same, including v05–v08 together within the data range available for this study still helps reveal the overall trajectory of GSMaP version evolution over the QTP. In particular, v05 provides background information on the earlier stage of product performance, while v07 helps connect the transitional changes between v06 and v08. Second, many evaluations mainly focus on total precipitation and overall statistical agreement with rain gauge observations, while less attention has been given to precipitation event structure, intensity classes and the ability of GSMaP to detect different types of events, especially heavy rainfall and extreme precipitation [
29]. These aspects are important for practical applications, because even when long-term precipitation totals appear reasonable, errors in event structure and intensity can still strongly affect hydrological simulations, flood analysis and climate impact assessments [
30]. Third, previous studies often differ in reference samples, study periods and evaluation settings, which makes their results difficult to compare directly. Without a unified evaluation framework, it is also difficult to determine whether performance differences arise from algorithm updates, product type, regional conditions, or study design. This, in turn, limits a clearer understanding of the reliability of GSMaP over the Qinghai–Tibetan Plateau [
26].
Based on these considerations, we hypothesize that the performance of GSMaP over the Qinghai–Tibetan Plateau is influenced not only by algorithm updates but also by precipitation regime and high-altitude environmental conditions. As a result, its performance may vary across versions, product types and temporal scales. To test this hypothesis, this study conducts a long-term and systematic comparison of multiple GSMaP versions (v05–v08) and products over the Qinghai–Tibetan Plateau within a unified evaluation framework. Compared with previous studies that often focused on a single version, a limited number of products, or a relatively short study period, this study emphasizes a more consistent inter-version comparison and a more integrated assessment of product differences across multiple dimensions. Using long-term daily rain gauge observations as a reference, the precipitation estimates from GSMaP v05–v08 are evaluated in terms of both quantitative agreement and precipitation event characteristics. The analysis further incorporates multiple perspectives, including temporal aggregation, seasonal variation, spatial differences, elevation effects, precipitation intensity and extreme precipitation, in order to examine whether version-related improvements are consistent under different environmental conditions. In this way, the study not only compares whether newer versions perform better but also clarifies the conditional dependence and non-uniformity of GSMaP performance evolution over the Plateau. This framework is expected to provide a clearer basis for product selection in different applications and may also offer useful insights for improving satellite precipitation retrieval algorithms in high-altitude and complex-terrain regions.
4. Results
4.1. Inter-Comparison of GSMaP v05–v08
To provide an overall comparison of the precipitation estimation performance of different GSMaP versions and products, this study conducts an integrated analysis based on multiple evaluation metrics, so as to enable a consistent comparison across versions and product types. It should be noted that this part of the analysis is based on the full available period of each version, and the specific temporal coverage is shown in
Figure 2. The distributions of the results for different versions and products are presented in
Figure 3.
From the perspective of version changes, the overall performance of GSMaP products appears to show a general tendency toward improvement with successive version updates. Whether grouped by product or by version, the newer releases generally exhibit better overall results than the earlier ones. In v07 and v08, the result distributions of most products shift toward higher values, while the spread becomes somewhat smaller, suggesting that the later versions may have improved both average performance and consistency across stations.
Figure 3 further shows that low-value cases are more common in the earlier versions but become less frequent in the later ones, which may indicate that the updated versions have achieved some improvement at stations where performance was previously weaker.
From the product perspective, the Gauge product shows the best overall performance and the highest stability. Across different versions, its results remain at a relatively high level and are more concentrated in distribution, indicating closer agreement with ground observations and comparatively lower uncertainty. In contrast, the overall performance of NRT, GNRT and MVK is relatively weaker and their result distributions are more dispersed, suggesting that these products are more sensitive to regional differences, precipitation type and temporal variability. Among them, NRT generally shows the largest dispersion.
The multiple metrics shown in
Figure 3 further indicate that the Gauge product generally performs better in terms of correlation and precipitation event detection, with typically higher values of the CC, POD and CSI, as well as relatively lower values of RMSE and RB. These characteristics together support its stronger overall performance. By comparison, the non-gauge products in the earlier versions are more likely to show lower correlation, higher false alarm rates and more pronounced biases, which may, to some extent, constrain their overall performance.
Within individual versions, the advantage of the Gauge product over the other products is more evident in v06 and v07, whereas the differences among NRT, GNRT and MVK are relatively small. By v08, all products show improvement to varying degrees, with MVK and Gauge exhibiting the more noticeable gains, while the gap among products also becomes smaller. This suggests that continued improvements in retrieval algorithms, spatiotemporal processing schemes and input data may also have enhanced the stability and reliability of the non-gauge products, although a certain gap still remains compared with the Gauge product.
The Taylor diagram in
Figure 4 further illustrates the differences among GSMaP products and versions in terms of correlation, variability representation and centered error [
57]. Overall, the Gauge product is consistently located closer to the reference point across versions, with correlation coefficients generally close to or above 0.9, standard deviation ratios near 1 and relatively small centered root mean square differences. This suggests that the gauge-adjusted product has stronger agreement with observations and shows more stable performance in representing precipitation variability and controlling random error. In contrast, the NRT, GNRT and MVK products generally show correlation coefficients in the range of 0.6–0.8, together with larger departures of the standard deviation ratio from 1 and relatively higher centered root mean square differences. This indicates that these products still involve some uncertainty in reproducing the magnitude of precipitation variability and in controlling estimation errors, particularly in regions where gauge correction is unavailable or limited.
Based on the above analysis, from the perspective of version evolution, most products show a gradual improvement in correlation, variability reproduction capability and error control from v05 to v08. This suggests that, with the continuous advancement of retrieval algorithms and data processing methods, the overall ability of satellite precipitation estimation has been enhanced. However, compared to the Gauge product, other products still show some gap, indicating that gauge adjustment plays a crucial role in improving the consistency of precipitation estimates.
Furthermore, due to the differences in the time periods covered by different versions, the results of the correlation analysis may be influenced by inconsistencies in the temporal coverage. To eliminate this effect, the subsequent analyses will be conducted within a unified time frame, allowing for the exclusion of any time-related discrepancies and further verifying whether the version evolution has led to actual improvements in performance over the Qinghai–Tibetan Plateau.
4.2. Inter-Comparison Within the Common Period of GSMaP v06, v07, and v08 (2017–2022)
To reduce the influence of inconsistent temporal coverage among GSMaP versions, this section compares v06, v07 and v08 over the common period (2017–2022) under consistent temporal conditions. Four products are examined: Gauge, GNRT, NRT and MVK. Version differences are evaluated from two main aspects: the consistency of precipitation amounts with ground observations and the ability to detect precipitation events. All metrics were first transformed into dimensionless form, and their directions were unified so that higher values consistently indicate better performance. Radar plots were then constructed using the station-scale median values for each version and product, as shown in
Figure 5.
Figure 5 shows some performance differences among GSMaP versions and products. Overall, the Gauge product generally has the largest radar extent and maintains a relative advantage for most metrics, indicating better overall performance. Its POD remains high, while its CC and KGE are also generally higher. At the same time, the direction-adjusted RMSE and RB also perform well, suggesting that Gauge is comparatively stable in terms of precipitation consistency, error control and event detection. By contrast, GNRT, NRT and MVK show smaller overall radar extents and are generally weaker than Gauge, especially in metrics such as the CC, CSI and KGE, indicating that non-gauge-adjusted products still lag behind to some extent.
In terms of version changes, v08 shows a larger radar area for most products, although this improvement is not uniform across all metrics and varies by product. For GNRT and MVK, the increases in KGE and the CC are more evident in v08 and the normalized RB and RMSE values are also generally improved, which may indicate better consistency and reduced error at the station scale in the newer version. NRT also shows some improvement from v06 to v08, mainly in KGE, the CC and RMSE, although the magnitude of change remains relatively limited. In contrast, the POD and CSI do not increase consistently across all products, and for some non-gauge products, the differences between v07 and v08 are small, with a few metrics in v07 remaining comparable to, or slightly better than, those in v08. These version-related differences may be associated with algorithm refinements and updates in data processing, but their effects are clearly product-dependent.
The station-scale boxplots show broadly similar patterns (
Figure S1). For consistency-related metrics, median KGE values in v08 are generally higher than those in v06 and v07 for most products, while the CC is mostly stable or shows a slight increase, with relatively clearer improvement for MVK and Gauge. For error-related metrics, the median RMSE in v08 is generally lower than, or close to, that in the earlier versions, suggesting some reduction in error levels. RB values in v08 are also more concentrated overall, and extreme biases appear less pronounced. This may indicate more stable bias behavior at the station scale, although it does not necessarily imply uniform improvement under all rainfall conditions.
For precipitation event detection, the Gauge product maintains the highest POD and relatively high CSI across all three versions, indicating the strongest overall ability to identify precipitation events. However, the figure also shows that the POD for Gauge decreases slightly in v08 compared with the previous two versions, whereas the CSI increases and the FAR decreases noticeably. This suggests that the v08 improvement is not simply reflected in a higher hit rate but more likely in a better balance between missed events and false alarms. For GNRT, NRT and MVK, the FAR generally shifts in a more favorable direction in v08, but the POD does not increase consistently. Their improvements therefore appear to be more related to false alarm control and overall metric balance than to simultaneous enhancement in all detection measures.
Overall, under the same analysis period, GSMaP shows a general tendency toward improved performance from v06 to v08, particularly in terms of consistency, error control and overall stability, with v08 often performing better. However, this improvement should not be interpreted as a uniform enhancement in all aspects of rainfall representation. Rather, it is better understood as a structured improvement that varies by product and metric. Gauge remains the best-performing product overall. Although the non-gauge products become somewhat more stable in v08 and show improved error-related behavior, they still remain clearly behind Gauge.
4.3. Long-Term Comparison Between GSMaP v06 and v08 over the Full Period (2001–2022)
Figure 6 compares the differences in several station-based evaluation metrics among the four GSMaP products in v06 and v08. Overall, the changes associated with the version update are not entirely consistent across products, but most metrics suggest that v08 shows improvement over v06 in several respects. Among the four products, Gauge still exhibits the best overall performance in both versions, maintaining relatively high levels in correlation, consistency and event detection. In v08, its KGE, CC and CSI increase further, while the FAR decreases markedly, suggesting that its overall performance and stability have both improved. MVK also shows relatively clear improvement, especially in KGE and the CSI, where the gains are more evident. By contrast, the changes in GNRT and NRT are more limited. Some metrics show only slight improvement, and their overall gains are smaller than those of Gauge and MVK.
Looking at the individual metrics, the consistency-related indicators show relatively clear version differences. KGE is higher in v08 than in v06 for all four products, with the improvement being more evident for Gauge and MVK, suggesting that the newer version may have enhanced overall consistency. In contrast, the changes in the CC are relatively small. Most products show only a slight increase or remain broadly stable, indicating that the improvement in correlation in v08 is relatively limited.
For the error metric, RMSE does not show a consistent decrease across all products. RMSE is slightly lower for Gauge in v08 and MVK also shows some improvement, whereas the changes in NRT and GNRT are relatively small and in some cases even slightly higher. This suggests that the improvement in error control in v08 is not uniform across products but instead shows clear product-dependent differences.
For precipitation event detection, the CSI generally increases across all four products, while the FAR shows an overall downward tendency, with the most pronounced reduction found for Gauge. This suggests that v08 may have some advantage in reducing false alarms and improving overall event detection performance. However, the POD does not show a consistent increase, and for some products it is even slightly lower in v08. Combined with the Wilcoxon test results shown in
Table 3, v06 appears to perform better overall in terms of the POD, whereas v08 performs better in the FAR and CSI. This indicates that the improvement in event detection in v08 is more closely related to better false alarm control and a more balanced overall detection capability rather than simply to a higher hit rate.
Figure 7 shows the interannual variation in annual mean precipitation and its spread for different GSMaP products during 2001–2022 [
58]. Overall, clear differences can be observed among the products in both interannual variation patterns and the degree of deviation from the observations, and the long-term behavior of v06 and v08 is not fully consistent.
For the Gauge product, both versions follow the observed interannual variation relatively well. Annual mean precipitation is mostly within 1.1–1.6 mm/day. Compared with OBS, v06 is slightly higher in most years, whereas v08 is generally closer to the observations and has a narrower uncertainty range, indicating better long-term consistency. Version differences are more evident for GNRT. Compared with v06, v08 generally follows the observed interannual pattern more closely, suggesting improved temporal consistency. However, noticeable overestimation still appears in some later years, indicating that the improvement is more evident in temporal tracking than in magnitude control. MVK also shows clear version differences. In v06, some years show stronger fluctuations and higher peaks, with values clearly above the observations. By contrast, v08 is smoother, with weaker peaks and values closer to the long-term observed mean, suggesting improved long-term stability. Even so, MVK remains generally higher than the observations in both versions. NRT shows the strongest interannual fluctuation and the widest spread among the four products, indicating the weakest long-term stability. In both v06 and v08, its annual mean precipitation is generally higher than the observations and its variability is clearly larger than that of the other products.
Overall, the response to version updates differs among products from v06 to v08. Gauge shows the most stable improvement, while GNRT and MVK also improve to some extent. In contrast, NRT still shows strong fluctuations and persistent positive bias, suggesting limited long-term stability.
Figure 8 also supports the results above. In general, compared with v06, some v08 products have point clouds that are more concentrated and closer to the 1:1 line under low-to-medium monthly precipitation, mainly below about 100–150 mm/month. This is more clear for Gauge and MVK. It shows better agreement with gauge observations under common monthly precipitation conditions. For NRT, the change from v06 to v08 is small. GNRT shows different changes for different metrics.
From the fitted lines, some v08 products have slopes closer to 1. But the amount of improvement is different among products. Gauge is still the best product in both versions. It has the most concentrated point cloud, the highest correlation and the lowest RMSE. MVK also shows a clearer drop in bias and spread in v08. In contrast, NRT changes only a little and GNRT does not improve in the same way for all statistics.
At higher monthly precipitation, especially above about 200 mm/month, the scatter is still wide for all products and the distance from the 1:1 line is still clear. Gauge mainly shows underestimation in the high-value range. GNRT, MVK and NRT show both underestimation and local overestimation. This shows that the version update works better under low-to-medium monthly precipitation, while estimates at high monthly precipitation are still more uncertain.
The supplementary comparison (
Figure S2) further shows that, with version progression, the point clouds of GNRT and MVK become more compact and move closer to the 1:1 line. Gauge maintains the most stable scatter structure across versions, whereas NRT shows the largest dispersion in all versions, especially under high precipitation conditions.
Overall, at both the common-period and long-term levels, GSMaP shows some performance improvement from v06 to v08, although this improvement differs clearly among products and metrics and is not uniform across all aspects. Among them, Gauge consistently remains at the highest level, while GNRT and MVK also show relatively clear improvement. NRT also improves, but its overall performance remains comparatively weaker. The metric-level results further show that the main advantages of v08 are reflected in the KGE, CSI and FAR, suggesting better overall consistency, false alarm control and event detection balance. By contrast, the POD is consistently higher in v06 for all products, indicating that the improvement in v08 does not come from a simple increase in hit rate but rather from a more balanced event detection performance. The long-term interannual analysis and the monthly scatterplots also show that Gauge remains closest to the observations and most stable in both versions. GNRT and MVK are generally closer to the observations in v08, especially under typical precipitation conditions, whereas NRT still shows stronger fluctuations, wider dispersion and persistent positive bias. Overall, compared with v06, v08 is generally better in terms of consistency, error control and overall stability, but this improvement is better understood as a structured optimization rather than a uniform enhancement across all products, metrics and precipitation conditions.
4.4. Seasonal Characteristics of GSMaP v06 and v08
To evaluate the seasonal performance of GSMaP over the Qinghai–Tibetan Plateau, this study compares four products (Gauge, GNRT, NRT and MVK) from versions v06 and v08 across the four standard meteorological seasons: spring (March–May), summer (June–August), autumn (September–November) and winter (December–February). At the seasonal scale, the Distance between Indices of Simulation and Observation (DISO) is used as a composite error metric to rank the overall performance of different versions and products in each season (
Figure 9). This method integrates correlation, bias and error magnitude into a unified framework and uses a single value to represent the distance from the ideal state; a smaller DISO value indicates better agreement with observations and better overall performance [
57,
59].
According to
Figure 9 and
Table S3, GSMaP precipitation estimates over the Qinghai–Tibetan Plateau show clear seasonal differences. Overall performance is best in summer, followed by autumn, while spring shows larger variability and winter performs worst. This suggests that seasonal conditions exert a strong influence on product performance. One important reason is that precipitation type, cloud structure and surface background change markedly across seasons. In the warm season, precipitation is more often liquid and associated with stronger convective activity, which usually produces clearer microwave and infrared signals and helps improve consistency and event detection. In the cold season, by contrast, solid or mixed-phase precipitation, lower temperatures, snow cover and more complex land-surface conditions tend to increase retrieval uncertainty, resulting in larger errors, higher false alarms and weaker detection skill.
More specifically, spring precipitation is generally weak and unevenly distributed, so consistency remains relatively low for all products. The CC is mostly in the range of 0.1–0.4, while the POD and CSI are often below 0.4 and the FAR remains comparatively high. Gauge still performs best overall, with the CC reaching about 0.7, but the differences between versions are small, suggesting that version updates provide only limited improvement under light-precipitation conditions. Summer is the best-performing season. The POD is usually above 0.6 and the CSI increases to about 0.4–0.6, indicating that warm-season precipitation is more effectively detected by satellite and shows the strongest agreement with observations. Gauge remains the most stable product in terms of correlation and event detection and GNRT and MVK also show some improvement in v08. However, RMSE and RB remain relatively high in summer, with RMSE commonly in the range of 30–60 mm/day, suggesting that magnitude errors during heavy rainfall are still not fully resolved. Autumn is a transition season, and its performance generally lies between summer and winter. Some non-gauge-adjusted products show increases of about 0.05–0.10 in the CC and CSI in v08, but RB does not decrease accordingly, and bias still exceeds 20% for some products. Winter shows the weakest overall performance. For most products, the CC falls below 0.2, the CSI and ETS approach 0, and the FAR and RB increase markedly; RB can even exceed 100% for some unadjusted products. Overall, Gauge remains the most stable product across all seasons, GNRT and MVK show some improvement in the warm season, and NRT still exhibits relatively large uncertainty in winter and overall. More detailed seasonal metric values are provided in
Table S3.
In terms of product type, Gauge shows the strongest and most stable performance in all seasons, highlighting the importance of gauge correction for improving the consistency and stability of precipitation estimates over the Qinghai–Tibetan Plateau. GNRT and MVK show intermediate performance and more evident improvement in the warm season, whereas NRT has the greatest uncertainty overall, especially in winter. In terms of version differences, v08 outperforms v06 in some seasons, mainly through improved consistency and better false alarm control in spring and autumn, while the improvement in winter is limited and the difference in summer is relatively small.
Overall, GSMaP performance over the Qinghai–Tibetan Plateau shows strong seasonal dependence. Products generally perform better in the warm season, when liquid precipitation dominates and precipitation signals are clearer, whereas errors increase substantially in the cold season because of solid precipitation and complex surface conditions. Although version updates help improve stability and consistency in some seasons, they are still not sufficient to overcome fully the limitations imposed by winter conditions and the complex high-elevation environment. Therefore, when applying satellite precipitation products over the Qinghai–Tibetan Plateau, product type and seasonal context should be considered first, rather than relying only on the latest version.
4.5. Spatial Variability of GSMaP v06 v07 and v08
Two diagnostic approaches are used to examine GSMaP performance over the complex terrain of the Qinghai–Tibetan Plateau: station-based spatial pattern analysis (
Figure 10) and elevation-stratified statistical analysis (
Figure 11). First, station-level evaluation metrics are mapped to reveal long-term mean performance and spatial heterogeneity across the Plateau. Second, stations are divided into three elevation classes: low elevation (<2500 m, 24 stations), middle elevation (2500–3500 m, 37 stations) and high elevation (>3500 m, 25 stations). This allows an analysis of how performance metrics vary with elevation and helps identify the systematic influence of topography on precipitation retrieval. In addition, elevation-stratified boxplots of precipitation amounts are used to compare how satellite products and ground observations represent rainfall magnitude under different elevation conditions. Together, these approaches examine the spatial characteristics of GSMaP performance from multiple perspectives, including geographic location, elevation gradient and precipitation structure, and help identify terrain-related factors affecting precipitation estimates [
61,
62].
To more clearly reveal the spatial performance of different GSMaP products and versions over the Qinghai–Tibetan Plateau, this study first constructed a Composite Performance Index (CPI) based on multiple station-derived evaluation metrics and used it for the spatial display in
Figure 10. The CPI integrates information from several aspects, including correlation, error, consistency and precipitation event detection, and can therefore be used to summarize the overall performance level at different stations. Compared with any single metric, the CPI is more helpful for identifying, from an overall perspective, the relative areas of stronger and weaker performance among different products and versions. At the same time, to further examine the spatial characteristics associated with version updates,
Figure 11 presents the spatial distribution of the differences in the individual metrics between v08 and v06, so as to illustrate the relative improvement or decline of different products at the station scale and to provide additional explanation for the overall spatial pattern reflected in
Figure 10. The specific calculation procedure of the CPI is shown in
Table 2.
According to
Figure 10, GSMaP shows fairly clear spatial differences over the Qinghai–Tibetan Plateau, especially in the contrast between the eastern and southeastern regions and the high-elevation central and western areas. Overall, stations in the eastern Plateau and along the southeastern margins generally have higher CPI values, suggesting relatively better overall estimation performance in these regions. In contrast, CPI values are generally lower in the high-elevation interior of the central and western Plateau, indicating that these areas remain the main zones of relatively weak GSMaP performance. This spatial pattern is broadly consistent across products and versions, suggesting that terrain conditions, elevation differences and the regional precipitation background may jointly influence the spatial behavior of satellite precipitation retrievals. Compared with v06, v08 shows slightly higher CPI values in some areas, especially in the east and parts of the central Plateau, but these improvements are expressed mainly as local optimization and do not substantially alter the overall spatial pattern.
Figure 11 further reveals how these version-related differences appear in the individual metrics. Overall, the improvements from v06 to v08 show clear spatial heterogeneity, and both the direction and magnitude of change vary across products and metrics. For the CC, KGE′ and the CSI, many stations in the eastern and parts of the central Plateau show positive changes, suggesting some improvement in correlation, overall consistency and integrated event detection performance in these regions. In contrast, improvements are relatively limited in the high-elevation central–western interior, and some stations still show negative changes. The differences in the POD indicate that v08 does not show improvement in all regions; at many stations, the POD is actually lower than in v06, suggesting that improvements in the hit rate are not spatially consistent in the newer version.
Consistent with this pattern, the differences in the FAR are negative over much of the Plateau, with more evident improvement at some eastern and southeastern stations, suggesting that v08 has some advantage in reducing false alarms. By contrast, the differences in RMSE show a more complex spatial pattern. In some relatively wet regions, RMSE in v08 does not decrease clearly and even increases slightly at some stations, indicating that the benefits of the version update for absolute error control are not uniform. The differences in RB also show marked regional variation, and the direction of change is not consistent across products or regions, implying that errors in precipitation magnitude estimation are still influenced by the combined effects of regional precipitation background and terrain conditions.
From the product perspective, Gauge shows a relatively favorable overall spatial pattern in both versions. However, the metric-difference maps in
Figure 11 do not indicate a widespread and consistent improvement of v08 over v06. Several factors may help explain this result. First, the Gauge product is already adjusted using gauge information and appears to have had a relatively high baseline performance in v06, leaving more limited room for further improvement. Second, algorithmic updates introduced in a new version may not translate into comparable gains across all performance dimensions. This may be especially true for gauge-adjusted products, in which differences associated with upstream retrieval improvements could be partly smoothed by the subsequent correction process. Third, the complex terrain of the Qinghai–Tibetan Plateau, regional differences in precipitation processes and the sparse station distribution in high-elevation areas may also constrain how consistently the benefits of version updates are expressed in space. Therefore, in this study, Gauge may be more appropriately interpreted as showing relatively strong cross-version stability, rather than a marked version-related improvement across most metrics.
Overall, GSMaP shows a relatively clear and broadly stable spatial performance pattern over the Qinghai–Tibetan Plateau, with generally better performance in the eastern region and along the southeastern margins, while the high-elevation central and western interior remains a relatively weak-performance zone. Compared with the other products, Gauge shows relatively better overall spatial stability and integrated performance in both versions. However, this apparent advantage should be interpreted with caution, because Gauge is a gauge-corrected product and its better agreement with station observations may partly reflect the influence of gauge-based correction rather than fully independent skill alone. At the same time, the improvements associated with the version update are not spatially uniform, nor are they expressed as consistent enhancement across all regions and all metrics. This is particularly evident in the high-elevation interior, where positive changes related to the version update remain relatively limited, suggesting that achieving more substantial performance gains under complex terrain and sparse-station conditions may still be challenging [
61,
63].
The elevation-based analysis further supports the patterns described above.
Figure 12 and
Figure S4 show that, with increasing elevation, the performance of most GSMaP products generally tends to weaken, although the magnitude of change is not fully consistent across products or metrics. Therefore, these elevation-related differences should be interpreted with caution, as they may be associated not only with elevation itself but also with the combined effects of station distribution, regional precipitation background and complex terrain conditions [
64].
Among the three elevation classes, the <2500 m group performs relatively well overall. In this group, most products generally show higher CC, POD, CSI and KGE′ values, together with a relatively lower FAR and RMSE, suggesting that consistency, error control and event detection are generally better in low-elevation areas. However, some dispersion is still evident in the boxplots of all products, indicating that inter-station differences remain noticeable. The product differences become clearer in the 2500–3500 m group. In this elevation band, the CC, POD, CSI and KGE′ for most products decrease relative to the low-elevation group, while the RB and RMSE show a wider range of variation, suggesting that retrieval performance is relatively less stable in this elevation range. In the >3500 m group, the performance of some products weakens further. This is particularly evident for the products without gauge correction, which generally show weaker correlation, consistency and event detection, as well as more dispersed result distributions, reflecting greater uncertainty [
65].
The difference between Gauge and the other products is especially evident.
Figure 12 shows that, across all three elevation classes, Gauge generally has higher CC, POD, CSI and KGE′ values, along with a relatively lower FAR and RMSE, and its boxplots are overall more compact. This indicates closer agreement with ground observations and relatively higher stability. This advantage remains evident in the middle- and high-elevation zones, suggesting that gauge correction may, to some extent, help reduce the adverse effects associated with complex terrain and enhanced precipitation variability. By contrast, NRT, GNRT and MVK tend to show wider result distributions and greater uncertainty at higher elevations, with NRT and MVK displaying more pronounced dispersion in some metrics.
From the perspective of version comparison, v08 shows some improvement over v06 in certain elevation bands and for some metrics, with the clearest gains appearing in the Gauge product. For example, Gauge in v08 generally shows higher KGE′, CC and CSI values, together with a lower FAR, across several elevation classes. Some improvement is also found in the other products, but these changes do not appear consistently across all elevation bands and all metrics, suggesting that the gains associated with the version update are somewhat condition-dependent.
Overall, GSMaP over the Qinghai–Tibetan Plateau shows a relatively stable elevation-related performance pattern: low-elevation areas generally perform better, whereas higher-elevation areas, especially those under complex terrain conditions, remain regions of relatively weaker performance. Compared with the other products, Gauge consistently performs better across different elevation bands, further highlighting the importance of gauge correction. By contrast, the improvements associated with version updates alone appear to be relatively limited and are not always very clear at high elevations.
4.6. Precipitation Intensity Analysis
To compare how different GSMaP versions and products describe rainfall intensity over the QTP, this section uses daily station precipitation data from 2001 to 2022 and focuses on the distribution of rainfall intensity classes. Since precipitation in this region is often light and intermittent, the distribution of rainfall intensity provides a direct way to examine how satellite products detect and quantify precipitation. The 1 mm/day threshold defined earlier is used only for the event detection metrics. By contrast, the intensity analysis in this section uses descriptive daily precipitation intervals, where
p denotes daily precipitation amounts of 0 ≤
p ≤ 1 mm/day, 1 <
p ≤ 5 mm/day, 5 <
p ≤ 10 mm/day and
p > 10 mm/day to compare the rainfall intensity structure.
Figure 13 compares the proportions of these four intervals for GSMaP versions (v06 and v08) and products (MVK, Gauge, NRT and GNRT) against ground observations [
66]. Overall, the 0 ≤
p ≤ 1 mm/day interval accounts for the largest proportion over the QTP, followed by the 1 <
p ≤ 5 mm/day interval, whereas the 5 <
p ≤ 10 mm/day and
p > 10 mm/day intervals contribute much less. All GSMaP products reproduce this broad intensity structure, suggesting that they capture the main pattern of regional rainfall intensity distribution.
However, clear differences among versions and products remain across the intensity intervals. Most GSMaP products underestimate the 0 ≤ p ≤ 1 mm/day interval and overestimate the 1 < p ≤ 5 mm/day and 5 < p ≤ 10 mm/day intervals, suggesting a shift in the rainfall contribution structure from weaker to moderate intensities. This pattern is more evident in v06 and v07. By contrast, v08 shows a larger proportion in the 0 ≤ p ≤ 1 mm/day interval and a distribution that is generally closer to the observations, suggesting some improvement in the representation of weaker precipitation.
Among the products, Gauge and GNRT generally show intensity distributions that are closer to the observations, whereas MVK and NRT tend to assign a larger proportion to the 1 < p ≤ 5 mm/day and 5 < p ≤ 10 mm/day intervals. For the p > 10 mm/day interval, all GSMaP products tend to underestimate its contribution, and the differences between versions remain relatively small. This indicates that representing the highest daily precipitation interval remains difficult under the complex terrain conditions of the QTP.
As shown in
Figure 14, the contribution–bias patterns across rainfall intensity intervals are broadly similar among the different GSMaP products, although the bias magnitude and its response to version changes still differ by product [
60]. Overall, the 0 ≤
p ≤ 1 mm/day and 1 <
p ≤ 5 mm/day intervals both show positive bias, indicating that their contributions are generally overestimated. The positive bias is about 10–18% for the 0 ≤
p ≤ 1 mm/day interval and about 8–10% for the 1 <
p ≤ 5 mm/day interval. By contrast, the 5 <
p ≤ 10 mm/day and >10 mm/day intervals both show negative bias, indicating underestimation of the contribution from higher-intensity precipitation. The negative bias is relatively small for the 5 <
p ≤ 10 mm/day interval, usually around 3–7%, whereas the underestimation is strongest for the
p > 10 mm/day interval, with negative bias of about 15–20%. This suggests an overall shift in rainfall contribution from higher-intensity intervals toward lower-intensity intervals.
From a version perspective, the bias values in v08 are generally smaller than those in v06 for most intervals. For the 0 ≤ p ≤ 1 mm/day and 1 < p ≤ 5 mm/day intervals, the positive bias in v08 is overall lower than in v06, typically by about 1–3%. For the 5 < p ≤ 10 mm/day and p > 10 mm/day intervals, the negative bias in v08 is also reduced, with the clearest improvement seen in the p > 10 mm/day interval, although the underestimation there remains evident overall. Even so, the general bias structure remains similar in both versions.
Differences among products are also apparent. Gauge and GNRT show relatively smaller biases across all intervals, suggesting that their rainfall intensity structures are closer to the observations. By contrast, MVK and NRT show larger positive bias in the 0 ≤ p ≤ 1 mm/day interval and stronger negative bias in the p > 10 mm/day interval, suggesting that these two products are more likely to overestimate the contribution of weaker precipitation while underestimating the contribution of the highest-intensity precipitation.
Figure 15 compares the station-scale estimates of extreme precipitation indices from different GSMaP products in v06 and v08. The gray boxplots represent ground observations (OBS) and are used as a reference [
67]. The indices include relative bias and several extreme precipitation metrics based on percentiles, intensity and frequency. The results show clear differences among products and the error pattern changes with index type.
Figure 15 compares several station-scale extreme precipitation indices, including high-percentile precipitation amounts (R95p and R99p), maximum 1-day and 5-day precipitation amounts (Rx1day and Rx5day) and the annual number of heavy precipitation days exceeding 10 mm and 20 mm (R10 and R20). Here, R95p and R99p denote the accumulated precipitation from days exceeding the station-specific 95th and 99th percentile thresholds of wet-day precipitation, respectively.
Panel (a) shows that, compared with OBS, most satellite products display positive bias in extreme precipitation indices, meaning that extremes are often overestimated. The bias is generally larger for intensity-based indices (Rx1day and Rx5day), commonly reaching 50–150%, while frequency-based indices (R10 and R20) usually fall within 20–80%. This indicates that satellite products tend to amplify rainfall intensity rather than increase the number of events. The Gauge product shows the smallest bias, with most indices within 30%, and is closer to OBS, reflecting the effect of gauge adjustment. Compared with v06, v08 shows reduced bias for most indices, with typical decreases of 10–30% for GNRT and MVK, indicating clearer improvement.
Panel (b) presents the station distributions of R95p and R99p, which represent the contributions of heavy and very heavy precipitation to total rainfall. Relative to OBS, NRT and MVK show higher medians, with R95p often exceeding observations by 500–1500 mm and wider spreads, suggesting enhanced high-percentile contributions. GNRT and Gauge show more compact distributions and medians closer to observations. R99p values are lower than R95p, but their boxes and whiskers are longer, indicating that more extreme events are more sensitive to retrieval error. Compared with v06, v08 shows narrower ranges and fewer outliers, suggesting improved stability.
Panel (c) shows the station distributions of Rx1day and Rx5day. For all products, Rx5day has higher medians and larger spread than Rx1day, with median values often 2–3 times larger, indicating stronger spatial variability for persistent heavy rainfall. Most satellite products overestimate both indices relative to OBS and show many high-value outliers, while GNRT and Gauge have tighter boxes. Compared with v06, v08 shows slightly lower medians and smaller interquartile ranges, but uncertainty for persistent heavy rainfall remains relatively high.
Panel (d) indicates that differences among products are smaller for frequency-based indices than for intensity-based ones. Compared with OBS, most satellite products still tend to overestimate the number of heavy rainfall days, with R10 often higher by 20–40 days. When the threshold increases to R20, product differences become clearer and GNRT and Gauge show more concentrated and stable distributions. Compared with v06, median values in v08 are generally lower for both indices, suggesting some reduction in overestimated frequency.
In general, GSMaP products can represent the main statistical features of station-scale extreme precipitation. However, large uncertainty remains for intensity-based and high-percentile indices. Compared with non-adjusted products, the Gauge product is closer to OBS and shows better stability and consistency. In addition, v08 performs better than v06 for many extreme indices. However, for very intense or long-lasting precipitation events, accurate representation is still limited, and version updates bring only modest improvement.
5. Discussion
5.1. Interpretation of GSMaP Version Performance over the QTP
Results from spatial patterns, terrain groups, rainfall intensity and different time scales show that GSMaP performance over the Qinghai–Tibetan Plateau varies across regions. From v06 to v08, most products show some improvement in correlation, error control and time stability [
43]. However, this improvement does not appear evenly across the Plateau. It should be noted that these classification schemes were mainly retained to remain consistent with the thresholds and grouping framework used in the earlier analyses, so that the results from different parts of the study can be compared more directly.
At the spatial scale, clearer improvement is mainly found in the eastern Plateau and the southeastern margins. In these areas, rainfall is more continuous and mostly liquid. Rain gauges are also more dense [
68]. Under these conditions, algorithm updates, such as improved microwave retrievals, multi-sensor merging, infrared cloud tracking and gauge correction, help improve consistency and stability at both station and regional scales. In contrast, performance remains lower in the high-elevation interior and western Plateau. Differences between versions are also small there. This suggests that complex terrain, mixed precipitation types and limited ground observations still restrict improvements in satellite precipitation estimates [
64]. Therefore, the improvements from version updates are more evident in warm and wet regions than in cold, high-elevation areas, mainly because precipitation processes and retrieval difficulty differ under different environmental conditions. In warm and wet regions, precipitation is usually dominated by liquid rainfall, the rainfall process is more continuous, and the cloud structure and microwave/infrared signals are relatively clearer. Under these conditions, algorithm updates are more likely to lead to better consistency and lower errors. In contrast, in cold and high-elevation areas, solid or mixed-phase precipitation is more common, the land surface is more complex, and cloud systems are often shallow with weak precipitation intensity. These factors increase the uncertainty of satellite retrievals and weaken the improvements brought by version updates. These results may also provide some further understanding of algorithm evolution. At the current stage, it is still difficult to isolate which single algorithmic modification matters most, because the differences among GSMaP versions likely reflect the combined effects of multiple updates. However, the fact that improvements are more evident in warm and relatively wet regions suggests that updates related to precipitation detection, multi-sensor merging, error control and the treatment of liquid precipitation processes may play a relatively important role. By contrast, the still limited gains in cold, high-elevation and complex terrain areas imply that uncertainties related to solid precipitation, mixed-phase precipitation and complex surface backgrounds remain less fully resolved.
More broadly, the results suggest that algorithm evolution should not be understood simply as a uniform increase in accuracy from one version to the next. Instead, version-related changes appear to be conditional, with improvements depending on environmental setting, precipitation regime and performance dimension. In this sense, the value of multi-version comparison lies not only in identifying whether a newer version performs better but also in showing where current algorithm development appears more effective and where important limitations still remain.
These performance limitations are also related to the physical characteristics of precipitation processes in high-altitude regions. Over much of the Qinghai–Tibetan Plateau, precipitation is frequently dominated by solid or mixed-phase precipitation, particularly during winter and transitional seasons. Ice-phase microphysical processes, such as ice crystal growth and aggregation, can modify the particle size distribution and phase structure of precipitation, which affects the radiative signals detected by satellite sensors. In addition, lower temperatures and limited atmospheric moisture often lead to relatively shallow cloud systems and weaker precipitation intensity. These conditions increase the uncertainty of passive microwave and infrared retrievals and may partly explain the tendency of satellite products to overestimate light precipitation and underestimate heavier rainfall in high-elevation areas.
At the time-scale level, long-term analysis shows that v06 and v08 are comparable over multiple years; v08 shows better interannual stability and long-term consistency than v06. Seasonal results show that GSMaP performs better in warm seasons (spring, summer and autumn) than in winter [
69]. Version updates lead to clearer improvements in warm seasons, especially for consistency and error control. In winter, performance differences between versions are smaller. This is likely linked to solid precipitation and complex surface conditions. Monthly analysis further shows that v08 agrees better with observations for low-to-moderate rainfall. For months with high rainfall, systematic underestimation still appears [
45].
From the rainfall intensity view, all versions describe light rainfall in a relatively stable way. Bias remains for moderate rainfall. Strong and long-lasting rainfall is often underestimated. Product type plays an important role here. Products with gauge correction are more stable across intensity ranges and closer to observations. This shows that ground data help control rainfall magnitude and reduce systematic bias [
38]. This also shows that different products do not benefit from version updates in the same way. In general, gauge-corrected products often show more obvious improvement after version updates. This suggests that when algorithm updates work together with external gauge constraints, they are more helpful for improving consistency, error control and overall stability. In contrast, products that mainly rely on satellite retrievals or near-real-time processing are more sensitive to complex terrain, changes in precipitation phase and sparse observations. As a result, their improvements are usually more limited and less stable.
Near-real-time (NRT) products are more sensitive. Their results respond more strongly to algorithm changes and local rainfall conditions. As a result, their performance varies more across regions, rainfall intensities and time scales. This study also includes GSMaP v05 and v07, but their use differs; v05 products are less complete, and some products cannot be directly compared with later versions. For this reason, v05 is not used for deeper multi-scale analysis. Due to limits in gauge data coverage, v07 is only compared with v06 and v08 over their common time period. It is used to examine short-term algorithm changes, not long-term performance.
Overall, version updates are associated with some improvement in GSMaP performance over the Qinghai–Tibetan Plateau [
65], but the extent of this improvement is clearly influenced by terrain conditions, precipitation characteristics, time scale and product design. These limitations remain relatively pronounced under strong rainfall conditions, during localized events and at finer scales. These results may provide a useful reference for hydrological modeling and climate research over the Qinghai–Tibetan Plateau. Differences among products in estimating precipitation amount, maintaining consistency and detecting precipitation events may, to some extent, affect watershed runoff simulations, flood risk assessments and long-term hydro-climatic analyses. Therefore, in practical applications, the selection of satellite precipitation products should take into account the characteristics of the study region, precipitation regime and specific research objectives, rather than relying only on version updates. At the same time, this study also helps improve understanding of the uncertainties associated with satellite precipitation estimates in high-altitude regions and may provide useful reference for future improvement of precipitation retrieval algorithms under complex terrain and high-elevation conditions.
5.2. Implications for the Application of GSMaP
This study builds a simple scoring matrix from the quantitative evaluation results to compare the overall performance of different GSMaP versions and products across several application scenarios. For each evaluation dimension, including precipitation intensity classes, extreme precipitation, seasonal changes, elevation zones and application-related factors, the performance of each product is summarized using key metrics and converted to a unified 0–1 scale, where higher scores indicate better performance. The scores are organized into a matrix and shown as a heatmap with a continuous color scale. This design makes the performance differences between products and versions easy to see across all evaluation dimensions.
The scoring results are shown in
Figure 16. Clear differences appear among GSMaP versions and products across the evaluated dimensions. This result shows that no single product performs best under all conditions. Gauge-adjusted products perform well in most categories, especially for extreme precipitation, seasonal scales and different elevation ranges. This indicates improved quantitative agreement and enhanced stability. Among the products, v08-Gauge attains the highest scores and shows the strongest performance in general evaluation, extreme precipitation detection and across different environmental conditions.
The NRT product shows the highest score for near-real-time use, which reflects its value for operational monitoring and real-time applications. At the same time, its lower scores for precipitation intensity structure and quantitative accuracy suggest that it should be used carefully in detailed analyses and hydrological applications. The GNRT and MVK products show performance between NRT and gauge-adjusted products [
70]. They perform well under some intensity ranges and seasonal conditions, but their overall stability and scores are lower.
The scoring matrix also shows clear effects of season and terrain. Performance in warm seasons (spring, summer and autumn) is generally better than in winter. Scores at low and middle elevations are usually higher than those at high elevations. Uncertainty remains high in high-elevation plateau areas. These performance differences may be partly related to the physical processes of precipitation formation in high-altitude environments. In many high-elevation areas of the Qinghai–Tibetan Plateau, precipitation is often dominated by solid or mixed-phase precipitation, where ice-phase microphysical processes—such as ice crystal growth and aggregation—play an important role. These processes can change the size and phase structure of precipitation particles, which affects the radiative signals detected by satellite sensors. In addition, lower temperatures and limited moisture at high elevations often lead to shallow clouds and weaker precipitation, increasing the uncertainty of passive microwave and infrared retrievals. For this reason, GSMaP products should be chosen and interpreted based on the study area, season and research purpose.
5.3. Uncertainties and Limitations
This study still has some limitations. First, due to natural and observational constraints, rain gauge stations over the Qinghai–Tibetan Plateau are very unevenly distributed. Most stations are located in the eastern and southeastern low-elevation areas, while observations in the interior and high-elevation regions are relatively scarce. As a result, the evaluation results tend to represent areas with denser observations better, and the uncertainties of satellite precipitation estimates in data-sparse regions may not be fully captured. In addition, although the available stations provide valuable long-term observations across the Plateau, the limited number of gauges and their uneven spatial distribution may affect the overall spatial representativeness of station-based evaluations. This means that some regional characteristics, particularly in remote high-elevation areas, may not be fully reflected in the current assessment.
Second, different GSMaP versions have different time coverage and starting years, which makes direct cross-version comparisons more difficult. Although a common study period was selected to reduce the influence of time-scale differences, variations in record length and associated climate background may still affect the comparability of the results. In particular, the data coverage of v05 and v07 is relatively limited. Therefore, they are more suitable for supplementing the background of version evolution and showing transitional changes between versions, rather than for long-term comparisons at the same depth as v06 and v08.
Third, the analysis of precipitation intensity mainly relies on long-term statistics and focuses on overall distributions and mean characteristics. As a result, the spatiotemporal evolution of individual extreme precipitation events during their onset, development and decay is not fully represented, limiting the assessment of dynamic satellite errors at the event scale. In addition, the composite evaluation indices used in this study depend on the selection and normalization of multiple individual metrics. Since these metrics differ in physical meaning and sensitivity, the composite results should be interpreted together with individual indicators and relevant physical processes.
Moreover, the analysis is primarily based on daily precipitation data and does not further distinguish precipitation phase, leading to limited assessment of solid precipitation such as snowfall and mixed rain–snow events. Over high-elevation areas of the Qinghai–Tibetan Plateau, solid precipitation contributes substantially to annual precipitation and regional hydrological processes. However, satellite retrievals still show considerable uncertainty in identifying snowfall and mixed-phase precipitation. Because passive microwave and infrared remote sensing have limited capability under low precipitation rates and ice-phase conditions, the results of this study mainly reflect the overall performance of precipitation estimates. A more detailed assessment of solid precipitation processes would require additional information on precipitation phase as well as data with higher temporal resolution.
At the same time, it should be noted that the Gauge and GNRT products used in this study are both gauge-corrected products. Based on the currently available public information, we cannot fully confirm whether the gauge data used in their correction overlap directly or indirectly with the CMA station network used in this study. Therefore, the comparison between these two products and ground observations cannot be treated as a fully independent external validation in a strict sense. In other words, the better performance of the gauge-type products may be partly related to the gauge-correction process itself. Accordingly, the results for Gauge and GNRT should be interpreted with additional caution. Their relatively better agreement with station observations may reflect not only product skill but also, to some extent, the influence of gauge-based correction. In this sense, the present comparison is more appropriate for evaluating their practical performance under gauge-constrained conditions, rather than for drawing a strictly independent conclusion about the intrinsic superiority of these products over non-gauge-corrected estimates.
In high-elevation regions of the Qinghai–Tibetan Plateau, solid precipitation contributes substantially to annual precipitation and hydrological processes, while satellite retrievals still show large uncertainties in identifying such precipitation types. Therefore, the results mainly reflect overall precipitation estimation performance, and their applicability to solid precipitation processes requires further investigation.
At the same time, this study does not separate precipitation characteristics under different climate regimes or large-scale circulation conditions. Precipitation over the Plateau is jointly influenced by the Indian monsoon, the westerlies and local thermal processes, with dominant climate drivers varying across seasons and years. These changes in climate background may influence satellite retrieval performance and associated error patterns. Accordingly, the results mainly represent long-term average behavior, and their applicability under specific climate anomalies or particular circulation conditions remains to be tested [
71].
6. Conclusions
This study focuses on the Qinghai–Tibetan Plateau and evaluates the precipitation estimation performance of several GSMaP versions (v05–v08) and products (Gauge, GNRT, MVK and NRT) at the station scale. By matching satellite estimates with ground-based daily observations, we built an evaluation framework that includes quantitative consistency metrics and a precipitation event detection indicator. We also analyzed precipitation intensity classes and their structure. This allows the comparison of GSMaP products from several aspects, including overall performance, spatial and seasonal variation and rainfall intensity features. The main findings are summarized below.
(1) GSMaP performance generally improves in later versions, but the level of improvement depends on region and product type.
From v06 to v08, GSMaP products show some improvement in correlation, composite performance and temporal stability. Statistical significance tests further indicate that these improvements are evident for some metrics but not consistently significant across all indicators and products. These improvements are more visible along the eastern and southeastern parts of the Plateau. In the interior and high-elevation areas, performance remains relatively low for all versions, and gains from algorithm updates are limited. This suggests that complex terrain, varied land-surface conditions and a climate with frequent light precipitation still affect the accuracy of satellite rainfall estimates.
(2) Product type has a clear influence on performance, and gauge-adjusted products tend to be more stable.
Across most metrics and time scales, the Gauge product shows higher correlation, smaller errors and more reliable event detection. This shows the role of gauge adjustment in improving GSMaP performance over the Qinghai–Tibetan Plateau. In contrast, NRT, GNRT and MVK products respond more strongly to regional conditions and version changes and they show larger variation, especially in areas with light precipitation and complex terrain. The statistical results also support that some of these product differences are significant, although the significance level varies depending on the metric considered.
(3) GSMaP performance changes with season, with better results in warm seasons than in cold seasons.
In summer and autumn, when precipitation is stronger and more continuous, all products show better event detection, higher POD and CSI values and clearer differences between versions. In winter, light and solid precipitation occur more often. During this time, correlation decreases and false alarms increase, which lowers detection performance. Version updates do not show clear or consistent improvement in winter.
(4) GSMaP products show a similar pattern of spatial differences across the Plateau.
Spatial analysis shows that performance is generally better in the eastern and southeastern Plateau than in the central and western interior. Performance is also better in wetter areas than in drier ones. This pattern appears in all versions, which suggests that terrain complexity, moisture conditions and precipitation type play an important role in shaping GSMaP accuracy over time.
(5) Rainfall intensity analysis suggests that version updates are associated with some improvement in rainfall intensity representation, but underestimation remains evident in the higher-intensity intervals. In general, GSMaP versions reproduce the lower-intensity intervals (0–1 mm/day and 1–5 mm/day) relatively well, whereas the 5–10 mm/day and >10 mm/day intervals are still generally underestimated. Compared with v06, v07 and v08 are generally closer to gauge observations in some intensity intervals, indicating partial improvement in rainfall intensity representation. However, density scatterplots, fitted slopes below 1 under higher precipitation conditions and the large spread at high intensities all suggest that underestimation and instability remain evident for stronger rainfall. This limitation is particularly pronounced over the Qinghai–Tibetan Plateau.
In summary, GSMaP precipitation estimates over the Qinghai–Tibetan Plateau show some improvement across versions, but this improvement is not uniform. Terrain and climatic conditions still constrain overall performance. Statistical significance analysis further indicates that the differences between versions are metric-dependent rather than uniformly significant across all indicators. The gains in newer versions are mainly reflected in overall consistency, false alarm control and integrated performance balance, rather than in a universal improvement across all products, metrics and rainfall conditions. In practical applications, the choice of GSMaP version and product should depend on the research objective, time scale and regional setting.