Evaluation of MERRA-2 Precipitation Products Using Gauge Observation in Nepal

Precipitation is the most important variable in the climate system and the dominant driver of land surface hydrologic conditions. Rain gauge measurement provides precipitation estimates on the ground surface; however, these measurements are sparse, especially in the high-elevation areas of Nepal. Reanalysis datasets are the potential alternative for precipitation measurement, although it must be evaluated and validated before use. This study evaluates the performance of second-generation Modern-ERA Retrospective analysis for Research and Applications (MERRA-2) datasets with the 141-gauge observations from Nepal between 2000 and 2018 on monthly, seasonal, and annual timescales. Different statistical measures based on the Correlation Coefficient (R), Mean Bias (MB), Root-Mean-Square Error (RMSE), and Nash–Sutcliffe efficiency (NSE) were adopted to determine the performance of both MERRA-2 datasets. The results revealed that gauge calibrated (MERRA-C) underestimated, whereas model-only (MERRA-NC) overestimated the observed seasonal cycle of precipitation. However, both datasets were able to reproduce seasonal precipitation cycle with a high correlation (R ≥ 0.95), as revealed by observation. MERRA-C datasets showed a more consistent spatial performance (higher R-value) to the observed datasets than MERRA-NC, while MERRA-NC is more reasonable to estimate precipitation amount (lower MB) across the country. Both MERRA-2 datasets performed better in winter, post-monsoon, and pre-monsoon than in summer monsoon. Moreover, MERRA-NC overestimated the observed precipitation in mid and high-elevation areas, whereas MERRA-C severely underestimated at most of the stations throughout all seasons. Among both datasets, MERRA-C was only able to reproduce the observed elevation dependency pattern. Furthermore, uncertainties in MERRA-2 precipitation products mentioned above are still worthy of attention by data developers and users.


Introduction
Precipitation is an important variable in atmospheric circulation for weather and climatic studies and is the dominant driver of land surface hydrologic conditions [1][2][3][4]. However, precipitation is a complex variable to predict and estimate as it varies highly in space and time due to large-scale atmospheric circulation patterns and the geographic and topographic factors of the region [2,5]. Recent climate change has impacted precipitation distribution and triggered its extremes, such as drought, floods, and soil erosion around the globe [6,7].
The precipitation is one of the crucial factors for causing major environmental changes and disasters, such as drought, floods and landslides across Nepal [8][9][10][11]. Thus, understanding the precipitation patterns and other changes in the hydrometeorological cycle requires high-resolution long-term precipitation records. Rain gauge-based measurement typically provides precise measurement of precipitation on the earth's surface [12,13]. However, the spatial extent and temporal resolution of rain gauge-based precipitation measurements in the country are inadequate to support the creation of regional precipitation datasets. This is mainly true for high-elevation areas due to the complex geography and remote location [9,14]. Meanwhile, the discontinuity and missing values in meteorological data records even worsened the results and interpretation of precipitation [15]. Consequently, the newly developed climate and weather prediction model (reanalysis) and satellitebased precipitation products (SBPPs) are the potential alternatives to precipitation measurement for the station sparse region [16,17]. However, the existing satellite-based products have limited (short period) period of precipitation records. For example, recent GPM IMERG and TRMM are only available after 1998, and GSMaP product is no exception [18][19][20][21]. PERSIANN products are available for a longer period, although most of the satellite products only provide precipitation data [22]. In contrary, reanalysis product provides comprehensive insights of weather and climate conditions at the regular grid for long periods [23]. These reanalysis datasets assimilate the in situ and remote sensing observations into Numerical Weather Prediction (NWP) model to provide global estimates of land surface, oceanic and atmospheric conditions at different timescale [3]. Therefore, evaluation and validation of reanalysis precipitation products are necessary for the improvement of product quality and hydrometeorological research.
Reanalysis precipitation datasets have been evaluated for different applications around the globe [24][25][26][27][28][29]. The performance of reanalysis precipitation datasets was relatively more reliable and accurate in the flatter regions than that of complex terrains [30]. Different climatic models used in reanalysis datasets may not resolve precipitation in areas of complex topographic relief with drastic elevation changes (e.g., Nepal). The moisture convergence, in such areas, is locally determined and local convective precipitation is strongly dependent upon local thermal forcing of the terrain. The reanalysis datasets (ERA-Interim) overestimated the precipitation over the northern slope of the Himalaya [31]. Whereas, Modern-ERA Retrospective analysis for Research and Applications (MERRA) datasets show more consistent performance to observed precipitation than ERA-Interim and CFSR in a complex topography region of Central Asia [5]. Similarly, MERRA precipitation datasets reproduced observed spatial patterns and their extremes over the United States [32]. The MERRA datasets were more reasonable to capture inter-annual variation and long-term precipitation trends than other reanalysis datasets over the East Asian domain [33].
A comprehensive evaluation and validation of global reanalysis datasets are very limited in Nepal. For example, Barros and Lang [34] evaluated the performance of the National Center for Environmental Prediction (NCPE-NCAR) reanalysis and found that precipitation was consistently underestimated in Marshyangdi river basin (mid-elevation areas of central region) using a limited number of stations. The authors also mentioned that these reanalysis datasets were able to present precipitation trends and spatial variability. Meanwhile, Ichiyanagi et al. [35] found that NCEP datasets showed a positive correlation between monthly, annual and seasonal precipitation with All India-Rainfall (AIR) in western Nepal, while precipitation in eastern Nepal was negatively correlated. However, both studies did not validate or compare their performance concerning observed datasets over the country. Moreover, systematic evaluation of new generation reanalysis product and intercomparison (spatio-temporal performance) with observed datasets, to date, has not yet been performed in Nepal. Therefore, this study aims to address this research gap by evaluating the performance of second-generation MERRA (MERRA-2) with 141-gauge observations from Nepal. Among the MERRA-2 product, model-only datasets are selected to evaluate the accuracy of MERRA simulation, while gauge calibrated to quantify the performance improvement after the gauge adjustment. The evaluation is based on monthly, seasonal, and annual timescale, spanning the period of 19 years from 2000 to 2018. Further, the evaluation of MERRA-2 precipitation product over unique topographic and climatic regions helps to select these datasets for different hydrometeorological application and identify its tendency and discrepancies, specific weaknesses, and strengths of the product under different circumstances.

Study Area
The study area includes the southern slopes of central Himalaya, Nepal. The lowlands (~60 m) in the south to the high Himalayas (up to 8848 m) in the north represent varying landscapes, topography, weather, and vegetation within a small width (average ~192 km) of the country ( Figure  1). Nepal is broadly divided into three Ecological zones, named as Terai (Lowland areas), hills, and mountains. The wettest (Lumle) and driest (Mustang and Manang) areas of Nepal are located in the Gandaki river basin. These areas explain how the topography mediates the precipitation distribution, i.e., being located in the windward (Lumle) and leeward (Manang and Mustang) side of the Annapurna mountain ranges [9,36]. The Eastern, Central and Western regions are divided based on three major river basins, namely, Koshi, Gandaki, and Karnali, respectively. Furthermore, the South Asian monsoon and Western disturbances determine the climatology and distribution of the precipitation in Nepal. South Asian monsoon brings widespread precipitation from June to September through the Bay of Bengal, with higher amounts in the Central region than Eastern and Western regions [8,35]. Western disturbances bring precipitation from December to February, which are eastward-moving air currents in mid-latitudes that enter the Western region of Nepal passing through Iran, Afghanistan, Pakistan, and northwest India [37,38]. The average annual precipitation during 1982-2015 in Nepal is 1428 mm [39].

Datasets
The meteorological station network of Nepal is maintained by the Department of Hydrology and Meteorology (DHM). This network of gauge stations is irregularly distributed [4]; denser on the southern lowlands (below 2500 m elevation) and sparse, especially in the northern mountainous region (above 2500 m elevation), where the terrain is very complex (Figure 1). Such an irregular distribution of network creates an information gap, which ultimately hinders precipitation-related studies over the study areas. Most of the gauge-based datasets are manually collected and are subjected to personnel and instrumental errors [40]. Additional errors for the gauges that are located in high-elevation regions come from the wind effect. Initially, data from 400 stations were collected from DHM (https://www.dhm.gov.np/contents/resources), and quality control was performed for each station; according to WMO standard [41], a year having missing data more than 15% is excluded from the analysis. After quality control, daily data from 141 DHM stations (with ~1044 km 2 coverage by single station) were selected from January 2000 to December 2018 for this study.
MERRA is an emerging reanalysis product of the National Aeronautics and Space Administration (NASA) that provides long-term atmospheric and surface records globally [3,42]. MERRA is a modern reanalysis system, which applies advanced numerical models and assimilation schemes to combine gauge observations from multiple sources [42]. Based on the grid to point interpolation, MERRA precipitation product applies three-dimension variational (3D-Var) data assimilation. The second version of the MERRA reanalysis system (MERRA-2) is the latest atmospheric reanalysis of the modern era, produced by NASA's Global Modeling and Assimilation Office (GMAO) [43]. To provide an advanced product for weather and climate applications, MERRA-2 uses the latest V5 Goddard Earth Observing System Model (GOES) data assimilation with an updated grid to point statistical interpolation system [44,45]. MERRA-2 includes two precipitation datasets, namely PRECTOT (MERRA-NC) and PRECTOTCOR (MERRA-C). MERRA-NC is the model generated precipitation data, while MERRA-C is corrected with CPC Unified Gauge-Based Analysis of Global Daily Precipitation (CPCU) product and the CPC Merged Analysis of Precipitation (CMAP) based precipitation product. With such an advanced and updated data assimilation system, MERRA-2 can be a good alternative to monitor precipitation and hydrological application in the unique physiographic region, where the station is very sparse. This study applies to mean monthly total precipitation (M2TMNXFLX) MERRA-2 data, with a spatial resolution of 0.50° × 0.625°, which was downloaded from NASA's website (https://disc.gsfc.nasa.gov/datasets/M2TMNXFLX).

Methodology
Observed precipitation datasets are in point scale (station), while both MERRA-2 datasets provide precipitation at a 0.5 km grid box. The complex topographic gradients and heterogeneous distribution of rain gauge station within the study region restrain the accurate rainfall interpolation [46]. Thus, the point-to-pixel method was adopted to compare rain gauge observation with gridbased MERRA-2 datasets [47][48][49][50]. Grid-based precipitation datasets were extracted to rain-gauge station location using the original resolution of MERRA-2, instead of interpolating the gauge observations to avoid accumulating additional errors by gridding the observed data [51,52]. Meanwhile, stations falling under the same grid were averaged for better representation of the pixel precipitation with station-based datasets.
Observed datasets are in daily timescale, while MERRA-2 in the monthly timescale. To make consistent timescale at each station, monthly data are computed for the station with the availability of more than 25 days of precipitation data in a month; else, the precipitation in a particular month is considered a missing value. If the corresponding monthly data was missing from the observed datasets, then the monthly precipitation data from MERRA-2 were also considered as a missing value for consistency. The number of rain gauge with complete data series in each year during the study period are presented in Figure S1.
Mean, annual and seasonal precipitation (pre-monsoon, summer monsoon, post-monsoon, and winter season) of the observed and both MERRA-2 datasets were calculated for each station. Further, spatial consistency was performed by comparing the spatial distribution of mean precipitation at different seasons, while for temporal consistency, monthly time-series, and annual cycle of precipitation in all datasets are analyzed, respectively. Similarly, both of these datasets are compared for different elevation bins to quantify the performance from low-elevation to high-elevation areas. Additionally, two percentile-based precipitation indices (95th and 5th percentile threshold values of observed datasets at each station) were calculated for different seasons. The total frequency of very wet events (R95p) and very dry (R5p) events [53,54] is relevant for floods and agriculture management (drought) over the region, respectively. The bias in the frequency of these two percentile indices (R95p and R5p) for MERRA-2 product were calculated for each station.
Four different statistical metrics were calculated to quantify the performance of MERRA-2 datasets (Equations (1)-(4)). The Root Mean Square Error (RMSE) measures the average magnitude of the deviation of MERRA-2 dataset from the observed data. Bias (difference) measures any persistent tendency of a dataset to either overestimate or underestimate; correlation coefficient (R) reflects the strength and direction of the linear association between datasets; Nash-Sutcliffe efficiency (NSE) determines the relative magnitude of the residual variance (noise) compared to the observed data variance. RMSE and Bias Equations (1) and (2) indicate a perfect match between observed and predicted values when it equals to 0, with increasing RMSE and mean bias (MB) values indicating an increasingly poor consistency. The NSE Equation (3) ranges between −∞ and 1 and 1 being a perfect score. The R-value in Equation (4) ranges between 0 and 1, with higher values indicating less error variance.
where, O is the observed data, E is the estimated precipitation by both MERRA-2 datasets, and n is the sample size.

Seasonal Pattern of Precipitation
The temporal evaluation of precipitation is critical for many different hydrometeorological applications. The monthly precipitation time series (mm) of observed and MERRA-2 product average over Nepal during the study period is shown in Figure 2. Among the MERRA-2 datasets, MERRA-NC overestimated, while MERRA-C underestimated the observed precipitation throughout the study period. The observed data revealed that precipitation peaks during the monsoon (June-September) season. Meanwhile, both the datasets can capture high peaks of summer precipitation during June-September. Further, the estimation of the precipitation peaks was higher and lower for MERRA-NC and MERRA-C during the monsoon season as compared to another season, respectively.  Additionally, boxplots of the annual performance metrics (R, RMSE, MB) for MERRA-2 product and observation are presented in Figure 3. These metrics were generated by comparing point data for interannual timescale. MERRA-C datasets show the median R, RRMSE and MB of 0. 64, 190.16 mm/year and −152.44 mm/year, respectively. As similar to Figure A1, the model-only MERRA-NC overestimated the annual precipitation amount with a larger error as indicated by higher RMSE values than MERRA-C. Overall, MERRA-C depicts a slightly high median of R-value; however, the difference is nominal. The monthly cycle of the precipitation average over 2000-2018, shows the maximum precipitation from June to September (summer, 80% of annual precipitation) in the observed dataset, followed by pre-monsoon (13%), post-monsoon (4%), and winter (3%) (Figure 4). Precipitation initially increases in June, reaching a peak during July, and decreasing in August and September. The highest precipitation of ~480 mm was observed in July and the lowest precipitation of ~30 mm during November and December. A similar pattern is also shown by both MERRA-C and MERRA-NC ( Figure 4). MERRA-C and MERRA-NC showed the highest precipitation of ~300 mm and 650 mm in July, respectively. It is worth noting that both datasets showed a significant error during the monsoon season as compared to other seasons. Statistics metrics were also calculated by averaging all stations for monthly mean precipitation value (Table 1). Among both datasets, MERRA-NC outperforms MERRA-C in estimating the annual precipitation cycle with lower MB (39.35 mm/month) and RMSE (75.10 mm/month). However, both datasets can capture monthly precipitation variation with a high correlation (R ≥ 0.95) across the country.

Spatial Distribution of Precipitation
The mean annual precipitation is calculated ( Figure 5) to study the spatially distributed precipitation in observed and both MERRA-2 datasets. In general, the topographical characteristics establish the large spatial variability of precipitation in the country. Precipitation tends to decrease from east to west, with maximum annual precipitation (>3000 mm/year) was observed in midelevation areas of the central region (Lumle areas), whereas the minimum precipitation (<500 mm/year) appeared in the high-elevation areas of central and western region (Figure 5a). The presence of the high mountain range in the central region (Figure 1) acts as a south-north (uplifting) moisture barrier and creates an orographic ascent. This phenomenon generates a "rain shadow" to the northern slope (high elevation areas of the central and western region) and considerably increase precipitation in the southern slope of the mountainous region (Lumle). Among the MERRA-2 datasets, only MERRA-C demonstrated a spatial pattern similar to that of the observation, with a noticeable decreasing pattern from the east to west (Figure 5b); and maximum precipitation (~1800 mm/year) in lower reaches of the eastern region, followed by low-elevation areas of the central region (~1500 mm/year). Meanwhile, MERRA-NC showed the annual precipitation with the range of 2500 to 4500 mm/year in mid-elevation areas of the country, with the highest precipitation (~4500 mm/year) in the high-elevation of the central region (Figure 5c). In general, MERRA-C underestimated, and MERRA-NC overestimated mean annual precipitation across the country. However, the spatial patterns depicted by MERRA-C resemble the known precipitation regime over the country, although differences exist in both datasets.   (Figure 6a,b). Both the datasets showed monsoon distribution as similar to pre-monsoon, but with larger MB (Figure 6c,d). Meanwhile, in post-monsoon and winter season, both datasets showed a very smaller MB and mostly underestimated by MERRA-C, while overestimated by MERRA-NC at most of the stations across the country (Figure 6e-h). Model-only datasets (MERRA-NC) overestimated the observed precipitation, especially in mid and highelevation areas, whereas, gauge adjusted MERRA-C severely underestimated the observed precipitation, except for few stations. Among the different seasons, both datasets were more consistent with observed datasets in the winter season followed by a post-monsoon, pre-monsoon, and summer monsoon. This might be related to the amount of precipitation in respective seasons (less consistent for a higher amount of precipitation). Overall, the result indicates that the gauge calibration in MERRA-2 effectively reduces positive bias. Statistical metrics were calculated to quantify the overall performance of the MERRA-2 product, by averaging monthly datasets at each season and presented in Table 2. Performance metrics showed noticeable differences between both datasets for different seasons. MERRA-C underestimated the mean precipitation in all seasons, while MERRA-NC only underestimated during the pre-monsoon season. RMSE in both datasets was very similar, indicating that these two datasets have a similar magnitude of error distribution; however, slightly smaller RMSE and higher R in MERRA-C revealed more spatial consistency with observed datasets. Gauge corrected MERRA-C performed reasonably well with the lowest RMSE, and higher R in all seasons, although estimated precipitation (lower MB) by MERRA-NC was more reliable during pre-monsoon. For the annual performance, MERRA-C archived best overall performance with lower RMSE (86.46 mm/month), higher R (0.34), and NSE of −0.87, although estimated precipitation amount by MERRA-NC was more reliable (lower MB) than MERRA-C. In general, MERRA-C datasets showed more consistent spatial performance (lower RMSE, higher R, and NSE closer to 1) with observed datasets, indicating that the gauge calibrated MERRA-C is more reasonable to reproduce the spatial pattern of observed precipitation over the study area.  Figure 7 shows the spatial distribution of correlation coefficient (R) and RMSE in summer monsoon for both MERRA-2 datasets at each station during the study period (2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015)(2016)(2017)(2018). The previous result shows that the large error occurs during the monsoon season (Table 2 and Figure 6) at most of the stations across the country. Both of these MERRA-2 datasets showed a very similar correlation at most of the station, although MERRA-C achieves a slightly higher correlation value at mid-elevation areas of the central region (Figure 7a). MERRA-C showed the larger RMSE (>400 mm/month) at most of the station located at high precipitation areas (central region), while MERRA-NC showed large RMSE at most of the station over mid-and high elevation areas of central and western region (Figure 7b). It is possible that the systematic error in MERRA-C was reduced from MERRA-NC due to bias correction, but the random error might be the same in both the datasets. Hence, the results of comparison over these mountainous regions are uncertain, indicating complex terrain lead to higher uncertainty in rainfall estimates over these regions.

Extreme Precipitation Events
The bias distribution of very wet (R95p) and very dry (R5p) precipitation events in both MERRA-2 datasets was calculated for different seasons. Figure 8 shows the spatial distribution of bias in the total frequency of R95p at each station across the country. Both datasets show the large positive bias (>10 events) at high-elevation areas of the country during the pre-monsoon (Figure 8a,b). In the monsoon season, MERRA-NC largely overestimated (>20 events) at most of the stations in mid-and high-elevation areas, whereas MERRA-C shows more consistent performance with smaller bias (<−10 events) at most of the stations (Figure 8c,d). For post-monsoon and winter season, MERRA-NC overestimated, and MERRA-C underestimated the total frequency of very wet events (Figure 8e-h).
Overall, MERRA-NC shows large positive bias and tends to overestimate the frequency of very wet events, while MERRA-C shows more consistent performance to reproduce the very wet events (smaller bias) across the country.  Figure 9 shows the bias in total low intensity related extreme events (R5p) for different season at each station over the country. Both datasets show very similar bias distribution of dry events (mostly underestimated by >−10 events) during the pre-, post-monsoon, and winter season. In contrast, MERRA-C largely overestimated (with larger bias >20 events) at most of the station during the monsoon season (Figure 9c,d). In this season, MERRA-NC shows very consistent performance with smaller bias (>−10 events). It is worth noting that MERRA-NC shows large error for R95p events, while MERRA-C shows larger error for R5p during the monsoon season at most of the stations across the country.

Precipitation Based on the Elevation Gradient
Nepal is a mountainous country, and the precipitation distribution mainly depends on its orography characteristics. Therefore, we examined the variability of mean annual precipitation for individual rain gauge stations and partitioned the station data into a 500 m elevation range over the study area. The elevation dependency was only calculated below 3000 m due to a limited number of stations in high-elevation areas. Figure 10 presents the elevation dependency pattern of mean annual precipitation over 19 years (2000-2018) from observed and both MERRA-2 datasets with the respective number of stations at each interval. The observed data showed the annual precipitation slightly increased from ~1800 mm/year to 2000 mm/year for elevations range 500-1000 m, and decrease to 1700 mm/year for elevations between 1000 and 1500 m with maximum precipitation of ~ 2200 mm/year at an elevation range of 1500-2000 m. The pattern shows precipitation initially increases up to 2000 m elevation and decreases with increasing elevation. The MERRA-NC precipitation distribution showed a sharp increase in the range between ~1400 mm/year and ~3200 mm/year for elevations below 2500 m, while the MERRA-C data showed the initial increase in precipitation up to 2000 m and subsequent decrease with elevation, with the highest precipitation of ~1200 mm/year between 1500 and 2000 m elevation. Nevertheless, MERRA-C was able to reproduce the observed elevation dependency pattern, and MERRA-NC fails to capture the pattern. It is worth to note that model-only MERRA-NC datasets fail to reproduce the spatial pattern of precipitation ( Figure 5), which might be the reason for inconsistent elevation dependency pattern with observation. Some of these DHM gauges might be the reason for the better overall performance of MERRA-C datasets. The result suggests that after gauge calibration MERRA-C is able to reproduce the evident elevation dependency pattern; however, significant underestimation was observed. The orographic effect on these datasets is further discussed in Section 4.

Discussion
This study evaluates and quantifies the spatial and temporal performance of gauge calibrated (MERRA-C) and model-only (MERRA-NC) gridded precipitation datasets from MERRA-2 product using 141-gauge stations of Nepal. We found that MERRA-C datasets underestimated the monthly precipitation cycle and MERRA-NC was largely overestimated, particularly in wet months. In terms of seasonal performance, both the datasets performed poorly in monsoon season (when precipitation amount was higher) than that of the other three seasons. Due to the complex mountainous terrain and interference by wind, DHM gauges may not capture the actual precipitation in the study region. Meanwhile, the lack of solid precipitation measurement in DHM station probably has an impact on the performance of the MERRA-2 product during the winter season. The large uncertainties in the reanalysis product for the wettest season are also reported in previous studies conducted in Assiniboine River Basin (Canada-US border) by Xu et al. [55], Central Africa by Nkiaka et al. [56], and in India by Shah and Mishra [57]. In contrast, Wang et al. [58] found that the reanalysis product showed large bias during the winter season in the Eastern Fringe of Tibetan Plateau. Such uncertainties are primarily related to topographic nature and precipitation distribution of the region. Furthermore, the variational accuracy in interannual performance (Figure 3 and A1) is subject to errors from considerable changes in the number gauge network over the reanalysis period and errors in the input gauge measurements [3].
The spatial distribution of observed precipitation showed large scale precipitation variability with the highest precipitation in the central region ( Figure 5). Due to the spatially heterogeneous pattern of precipitation with complex physiography, model datasets have difficulty in resolving the orography effect [5]. The errors over the areas of complex orography may be partially related to the weakness of reanalysis models in simulating the effects of complex terrain [55,58,59]. To minimize such uncertainties, MERRA-C precipitation datasets utilize the several daily CPC gauge-based precipitation datasets from different countries to calibrate the model-only MERRA-NC datasets. The CPC precipitation datasets are derived from ground-based observation data, which is used to adjust MERRA-C datasets. The performance of MERRA-C highly depends on the performance of CPC datasets and adjusted gauge density within each grid box. Hence, the interpolation techniques to generate MERRA gridded datasets from the point and sparse gauge measurements can introduce a considerable level of uncertainty into the gridded dataset, particularly in the high-elevation and remote regions, where a sufficient number of rain gauge stations are usually not available [13,60]. Meanwhile, 45 DHM gauge datasets are also used to generate CPC datasets [61], although information of these assimilated gauge datasets is minimal. The previous study also mentioned that the quality and temporal range of assimilated rain gauge also significantly influence the performance of gauge calibrated precipitation product [62]. DHM rain gauge stations are denser in lowland areas, while too sparse in high-elevation areas of the study region. When these sparse gauges stations were used to calibrate model-only datasets, which may sometimes deteriorate the accuracy of the calibrated product (Tables 1 and 2), it is worth noting that, after gauge correction, MERRA-C moderately captured the spatial distribution of observed precipitation. Similar to this result, most of the previous study revealed the improved performance of gauge calibrate product over the denser gauge network in the larger basin area of West Africa by Nicholson et al. [63] and Malaysia by Mahmud et al. [64]. Meanwhile, Balcutt et al. [65] evaluated the potential bias correction techniques to improve the rainfall representations with spatially aggregated precipitation for driving hydrological simulation in Nepal. These studies also mentioned that a higher number of gauges at each grid during the interpolation would enhance the performance of the calibrated product. Further, the dense gauge network is more representative to provide the actual distribution of precipitation and also more able to capture the orographic effect than the sparse gauge network, thus improving product accuracy [66]. Moreover, there are other various factors in the construction process of MERRA-2, such as boundary layer parameterization, land-atmosphere interactions, and/or convective precipitation parameterization, and this may cause the uncertainty to correctly reflecting the observation precipitation pattern [67].
The evident elevation dependency pattern (i.e., precipitation initially increases with elevations up to 2000 m and decrease with increasing elevation) is similar to previous studies conducted in the same study region [8]. The high mountain blocks the large-scale monsoon flow moving upward and makes the windward (leeward) side of the central region very wet (dry). As mentioned earlier, reanalysis models have difficulty in resolving the orographic effect of precipitation; this might be the reason for the highest precipitation above 2000 m in model-only MERRA-NC datasets (Figures 5 and  10). However, after the gauge calibration, MERRA-C was able to reproduce the observed elevation dependency pattern. The scatter plots with relative statistical metrics were further discussed to demonstrate the overall performances of MERRA-2 product and observed datasets ( Figure 11). The statistics showed that MERRA-C relatively performed better with higher correlation (R = 0.71), and lower RMSE (163.85 mm/month) than MERRA-NC; although the estimated precipitation amount by MERRA-NC was more consistent (lower MB) with observed datasets. Besides, NSE also shows the MERRA-C outperformed MERRA-NC during the study period. As similar to Section 3.1, MERRA-C overestimated the observed precipitation, especially in wettest months (>500mm/month) when precipitation amount is generally higher (Figure 11a). Meanwhile, MERRA-NC overestimated the lower precipitation value (Figure 11b). Overall, MERRA-C were more consistent with reproducing the spatial pattern of observed datasets, while MERRA-NC datasets were more reliable to estimate the observed precipitation amount. The results are in line with the inclusion of the weather stations into the reanalysis gridded datasets that significantly increased the correlation between observed and the gridded products [68]. It is important to mention while interpreting the results, the CPC was derived from point ground-based measurements, and some of these DHM gauges might be the reason for the better overall performance of MERRA-C.  Table 2). Our results provide evidence that interpolation techniques and the sparseness of the gauge measurements can create uncertainties on the performance of gauge calibrated MERRA-2 product. On the other hand, difficulty in resolving the orography effect by the reanalysis model also influenced the overall accuracy of MERRA-2 product [67,68]. Further, the inclusion of a diverse set of station data may increase the accuracy of the reanalysis of data [5]. In conclusion, this study provides shreds of evidence that interpolation techniques and limited or sparseness of the gauge measurements create uncertainties on the performance of reanalysis products, especially in the complex terrain like Nepal. Thus, there is a necessity of adequate gauge stations in the high elevation areas, which not only reduces the uncertainty in the production and validation of gridded datasets but will also be useful for climate and environmental change studies.

Conclusions
In this study, we evaluated the performance of MERRA-2 (MERRA-C and MERRA-NC) datasets based on rain-gauge observation over Nepal between 2000 and 2018. The study region features complex terrain and has a significant impact on the distribution of precipitation. Based on the above results, the following conclusions were drawn: The average seasonal cycle of precipitation during the study period shows the peak precipitation from June to September. MERRA-C and MERRA-NC underestimated and overestimated the observed seasonal cycle of precipitation, respectively. However, MERRA-C and MERRA-NC can reproduce the overall precipitation pattern, but the accuracy is poor in the wettest months for both MERRA-2 datasets.
Gauge calibrated MERRA-C depicted broadly similar spatial distributions of annual precipitation, i.e., precipitation decreased from east to west, with maximum and minimum precipitation in mid-elevation areas of the central region and high-elevation areas of the central and western region, respectively. Both MERRA-2 datasets performed better in winter, post-monsoon, and pre-monsoon than in summer monsoon. For the spatial performance, MERRA-C achieved a higher correlation than MERRA-NC, while both datasets showed a similar magnitude of RMSE. Furthermore, a high correlation in MERRA-C and lower MB in MERRA-NC indicates the better spatial distribution and reliable estimation of precipitation, respectively. MERRA-NC and MERRA-C datasets overestimated the very wet events (R95p) and very dry events (R5p) during the monsoon season, respectively. Whereas, both datasets underestimated the R95p and R5p events for other three seasons at most of the station across the country.
The variability of mean annual precipitation is observed with a 500 m elevation range over the study area. The pattern shows precipitation initially increases up to 2000 m elevation and decreases with increasing elevation. Nevertheless, MERRA-C was able to reproduce the observed elevation dependency pattern, and MERRA-NC fails to capture the observed elevation dependency pattern. The result suggests that MERRA-C datasets have useful implications in Nepal. However, further, improvement is still needed in the MERRA-2 reanalysis product, particularly in the case of mountainous areas such as Nepal.
Evaluation of MERRA-2 precipitation datasets is especially important for understanding the spatio-temporal distribution of precipitation in Nepal. This will eventually benefit from understanding the hydrological processes and water resource management, as it affects the output accuracy of the hydrological model. However, to enhance the precipitation measurement accuracy, the allocation of new measuring stations is still needed to further evaluate the different gridded precipitation products in Nepal.