Long-term meteorological datasets have a significant role in advancing our scientific understanding of the nature of climate change as well as projecting future changes. Analyses of climate data for various meteorological parameters can be used to infer the possible connections between extreme meteorological events and climate change. However, despite significant improvements in global/regional models, precipitation is still a challenging element to quantify and parameterize accurately through models and observations [1
]. Although several platforms are considered as sources for precipitation datasets, rain-gauge measurements are treated as accurate measurements over land. Hence, rain-gauge-based datasets are considered as a reference for evaluating other precipitation estimates obtained by satellites and modeling. Rain-gauges deployed over different spatial domains (regional/continental) with high/low density are often converted to uniform grids of specific spatial/temporal resolutions with robust statistical interpolation/extrapolation methods to overcome rain-gauge point specific issue, i.e., to obtain areal precipitation, etc. [4
]. Efforts are also made to try to improve the interpolation scheme, for instance, the representation of local geographical/physical factors such as topography, the climatology of a specific variable, and the spatial variability of parameters in interpolation methods.
Against this background, several different kinds of long-term meteorological datasets based on ground-based observations have emerged in recent decades. Among these, a few are exclusively generated for basic parameters such as precipitation and temperature [6
]. The preparation of such datasets involves significant effort, especially in obtaining daily station data from different jurisdictions within a studied area. Meteorological organizations have developed such public domain datasets like the Global Summary of the Day (GSOD) or Global Historical Climatology Network (GHCN) [13
]. Nevertheless, the limited density and low quality of measurements often prevent them from being applicable for specific applications [14
]. Therefore, data developers have often attempted to combine several openly accessible datasets (GSOD or GHCN) with regional station datasets after performing rigorous quality checks and improving interpolation methods [6
]. Extensive quality checks and robust interpolation methods certainly improve the quality and applicability of the datasets [4
]. However, when the original station datasets are merged to develop gridded products, there is often a lack of information about the location of the station and time during which the 24-h precipitation was accumulated.
Asian Precipitation-Highly-Resolved Observational Data Integration Towards Evaluation of Water Resources (APHRODITE-1) [6
] developed long-term daily precipitation and temperature datasets for the meteorological community as well as those working in other related scientific fields. The APHRO_V1101 algorithm was designed and developed to evaluate water resources (e.g., precipitation over the mountains and the river catchments) [6
]. Enormous feedback from data users on APHRODITE-1 products and its wide applicability motivated the developers of APHRODITE-2 (operated from 2016 to 2019) to extend the availability of APHRODITE products up to 2015, with significant efforts to enhance the quality of the product. APHRODITE-2 version 1801 (hereafter APHRO_V1801R1) and version 1901 (APHRO_V1901) datasets over Monsoon Asia are exclusively generated to evaluate extreme precipitation events (available at http://aphrodite.st.hirosaki-u.ac.jp
Significant efforts have been made to achieve this goal. For example, we improved quality control (QC) schemes, tested some interpolation schemes, and carefully examined the end of the day (EOD) of each original dataset and applied EOD adjustments. Such efforts were intended to make the APHRODITE-2 products particularly suitable for evaluating extreme precipitation. For the earlier product APHRO_V1801R1, we segregated different EOD data within particular geographical domains/countries. In the latest APHRODITE-2 product APHRO_V1901, we adjusted the end of the day to 00:00–24:00 h UTC and established a gridded product over Monsoon Asia (APHRO_MA).
The purpose of this paper is to explain and conceptualize EOD and to show the results of a test examining 24-h accumulation time (EOD). This resulted in the production of APHRO_V1801R1. To examine EOD, we used both a satellite precipitation product, namely, CMORPH [18
], and a meteorological reanalysis dataset, namely, ERA-Interim [20
]. The rest of the article is organized into three sections. Section 2
explains datasets used in this study, the concept of EOD, and the methodology for estimating EOD. Section 3
presents the results and discussion. A brief conclusion is presented in Section 4
2. Materials and Methods
2.1. Rain-Gauge Data
The rain-gauge data used in this study consist of two categories, namely, offline and global telecommunication system (GTS)-based data. During the analysis of both APHRODITE-1 and APHRODITE-2, we used individual collection data as well as pre-compiled data that included data on more than one country (e.g., ASEAN compendium climate database, former USSR data by CDIAC). These are referred to as offline data in this study. Here, datasets that include data on more than one country are separated into data on individual countries.
In addition, we used global surface summary of the day (GSOD), which is a widely used set of GTS-based data. It is available at the website of the National Centers for Environmental Information (NCEI) of the National Oceanic and Atmospheric Administration (NOAA) (https://data.nodc.noaa.gov/cgi-bin/iso?id=gov.noaa.ncdc:C00516
). GSOD provides information about 18 essential meteorological elements on a daily scale. Nearly 9000 weather stations collect information on 18 surface meteorological variables around the globe. GSOD is assembled from sub-daily or daily reports to GTS, and each set of precipitation data has a flag showing the quality information including processes to get “daily” precipitation value of the day which was applied at NCEI/NOAA relevant persons. Each “daily” precipitation sometimes involves the summation of three 6-hourly precipitation reports or one 12-hourly precipitation report (Table 1
). Therefore, we have only considered those reports of which the accumulation represents daily precipitation. The three kinds of report are thus four 6-hourly reports (D), two 12-hourly reports (F), and one daily report (G). These different types of data source reports are indicated with different flags (e.g., D, F, and G), as mentioned in GSOD readme file (ftp://ftp.ncdc.noaa.gov/pub/data/gsod/readme.txt
After the EOD check described in this study, we selected a station precipitation dataset to use country by country and year by year. A list of the countries for both offline and GSOD over Monsoon Asia is available in the Appendix A
, with information on EOD and on whether or not we used it for APHRO_V1801R1.
The Climate Prediction Center (CPC) MORPHing technique, CMORPH [18
], is a multi-satellite-based precipitation product derived by a morphing technique that exclusively makes use of low-level orbit passive microwave (PMW) observations to estimate precipitation. Precipitation features obtained from microwave observations are propagated in the spatial/temporal domain by a forward/backward morphing technique. Cloud system advection vectors derived from infrared measurements are used to propagate precipitation features over grids, where PMW observations are not available. The dataset is available with multiple spatial and temporal resolutions from 1998 to the present over the latitude range of 60° N to 60° S.
A reprocessed and bias-corrected version (V1.0) of CMORPH daily precipitation data has been released into the public domain. The latest version of CMORPH data has several advantages, such as fixed interpolation algorithms and microwave observations being considered from identical versions of passive microwave sensors. The updated version of CMORPH precipitation retrievals showed potential improvement over the earlier version in reproducing the spatial and temporal variability of precipitation. Moreover, the latest CMORPH precipitation estimates have been proven to be better than widely used TRMM-3B42 datasets at sub-daily/daily scales [19
]. Although CMORPH precipitation estimates are available with various spatial/temporal resolutions, we used a special version of hourly precipitation estimates with 0.25° latitude/longitude spatial resolution provided by Dr. Xie for EOD estimates.
EOD detection with satellite-based estimates of precipitation was impracticable in the pre-satellite era given the limited number of observations. Moreover, satellite-based estimates of precipitation do not work well over high altitudes. Thus, we used 3-hourly/0.75° precipitation forecasts of ERA-Interim to estimate the EOD over various sub-continental regions. We compared the performance of ERA-Interim and CMORPH datasets to evaluate their strengths and weaknesses. ERA-Interim is a long-term global atmospheric reanalysis dataset produced by the European Centre for Medium-range Weather Forecasts (ECMWF). It contains multiple meteorological variables available from 1979 to August 2019. The ERA-Interim dataset was developed with a four-dimensional data assimilation scheme by incorporating analysis/forecast cycles [20
]. In each cycle, the available observations, which do not include precipitation, are combined with prior information from a forecast model to estimate the evolving state of the global atmosphere and the surface underlying it. We used 3-h, 6-h, 9-h, 12-h precipitation forecasts starting with 00:00 h UTC (hereafter 00 UTC) and 12:00 h UTC (12 UTC) (Figure 1
). Each successive 3-h precipitation value used in this study is obtained by subtracting accumulated precipitation at consecutive time intervals. For example, 3-h precipitation between 03–06 UTC is obtained by subtracting 3-h forecast initialized at 00 UTC from 6-h forecast initialized at 00 UTC. The computational steps to estimate the 3-hourly precipitation from ERA-Interim measurements are shown in Figure 1
2.4. Definition of EOD
EOD is one of the significant aspects through which APHRODITE-2 products constitute improvements over earlier versions. The estimated or deterministic EOD information was used for the time-adjusted product (APHRO_V1901) as well as to choose the station data for APHRO_V1801R1.
Rain-gauge measurements are available with different temporal intervals. Daily data may often be collected from different data holders within the same country/region using various 24-h accumulation periods. Each dataset of an individual country is generated according to the local organization’s convenience following WMO guidance. For example, in the case of India, the Indian Meteorological Department measures the daily rain-gauge data in the morning at 08:30 h (IST, which corresponds to 03 UTC) and stamps on that specific day. Precipitation (in mm/day) stamped on 11th July is the actual precipitation accumulated from the previous day (10 July) 08:30 h to 11 July 08:30 h IST. Therefore, EOD is 3 UTC for IMD rain-gauge measurements (see Figure 2
a). EOD may differ for each dataset collected from various data sources.
GSOD daily precipitation data are available with different rain flags D, F, and G. Precipitation accumulated with D, F, and G flags represents the summation of four reports of 6-h precipitation (D), two reports of 12-h precipitation (F), and one report of 24-h precipitation (G); however, they are not associated with EOD information. Mixing such kinds of datasets, especially GSOD data with different flags with offline data, may reduce the extreme values. In a previous version of APHRODITE products, namely APHRO_V1101, we mixed all available data including GSOD with flags other than D, F, and G as well as various offline data. Mixing precipitation data with varying EOD may change the nature of the event. To avoid such data mixing ambiguity, the APHRODITE-2 team carefully selected data with unmixed EOD to be used for generating APHRO_V1801R1 gridded products (see Appendix A
2.5. Methodology for EOD Estimation
EOD is estimated for each rain-gauge as a specific data source using statistical metrics such as correlation coefficient (r) and root mean square error (RMSE). We used the updated version (V1.0) of the CMORPH and ERA-interim dataset to estimate EOD information for each rain-gauge location.
“Daily” precipitation time series were computed from hourly CMORPH (3-hourly ERA-interim) as shown in Figure 2
b. In addition to the universal “daily” precipitation from the day 00 UTC to 24 UTC (SOD = 0, EOD = 24), we prepared 71 (23 for ERA-interim) different time series starting from SOD = −24 (EOD = 0) to SOD = 47 (EOD = 71). Namely, for each rain-gauge station data, there are 72 (24) different “daily” precipitation time series which were summed up from CMORPH hourly (ERA-interim 3-hourly) precipitation with 1-h (3-h) increments.
The r and RMSE between referenced rain-gauge daily precipitation data and the nearest CMORPH 0.25° grid pixel (ERA-interim 0.75° grid pixel) daily accumulated precipitation are computed. Namely, we obtain 72 correlation coefficients (r), between a daily rain-gauge station precipitation and CMORPH “daily” precipitation with different SOD (−24 to 47). Figure 3
shows an example of r and RMSE between the daily station precipitation time series and 72 different CMORPH “daily” time series of the Philippines for 1998. After computing r and RMSE for each station at each CMORPH time series, we averaged r and RMSE of the 52 stations in the Philippines. The averaged highest r and lowest RMSE were obtained at SOD = 0 (EOD = 24) (Figure 3
), which is the correct 24-h accumulation period for the Philippines.
For satellite precipitation data and for reanalysis precipitation data, we often use SOD to show the time of observation/forecasts. However, rain-gauge observations are often recorded when the hydro-meteorological service measures precipitation, which is EOD. Hence, we mainly use the word EOD rather than SOD for our analysis. Several earlier studies reported that satellite-based remote sensing products underperform in winter and over cold regions [19
]. Thus, we estimated EOD for summer (JJAS) as well as the whole year.
Following the computation of EOD for each rain-gauge, we selected rain-gauges with a correlation coefficient equal to or greater than 0.4 (r ≥ 0.4) and the number of valid days (data availability) as 270 days to obtain stable EOD information (Figure 4
and Figure 5
). The thresholds for the summer are similar to the annual ones, except that the number of valid days is relaxed to 90 days (Figure 4
). The r and number of valid days are relaxed to 0.1 and 50 days for the GSOD data (Figure 6
The proposed methodology is also applied to estimate EOD information using ERA-Interim data, such that the suitability of ERA-Interim data could be evaluated and hence would be used in the pre-satellite era (before 1979) to determine EOD information. Moreover, it would also be very useful to compare the detection ability of EOD estimates obtained from two kinds of reference datasets, such as CMORPH and ERA-Interim.
This paper proposes a novel approach to judge the end of the day (EOD) timestamp for rain-gauge datasets across the globe with the help of satellite-based precipitation estimates and reanalysis of datasets. Bias-corrected CMORPH V1.0 precipitation data were used for the current approach. The scope of ERA-Interim data was validated to judge EOD as estimated by CMORPH, such that it would be useful to determine the EOD of rain-gauge data obtained during the pre-satellite era. The current methodology was applied to rain-gauge station datasets (1998–2015) collected from various countries by APHRODITE projects. The present method was evaluated for certain selected countries with known EOD (deterministic). Based on the evaluation results, it was concluded that CMORPH works better to judge the approximate EOD of rain-gauge stations over India. Although ERA-Interim has coarser temporal/spatial resolution than that of CMORPH, it estimates almost similar EOD distribution with that of CMORPH. To retain the nature of GSOD flagged daily precipitation data, the proposed method was also implemented for GSOD data. The methodology adopted in this study should be very helpful to judge EOD information of rain-gauge data with unknown EOD information. Having EOD information would be beneficial for users of rain-gauge data to avoid any ambiguity regarding timestamps.
Concerning the EOD judgment results, the estimated EOD for offline datasets over most of Asia appeared to be around ≥3 and <27 (e.g., India (3 UTC), China (12, 24 UTC), and Japan (15 UTC)). EOD information of GSOD data reflected the effect of the combination of rain-gauge data with different accumulation periods, but generally, EOD of GSOD is the same or earlier than that of offline. The impact of EOD on APHRODITE gridded product was evaluated qualitatively/quantitatively over three selected countries. The spatial distribution of extreme events and precipitation fraction revealed a significant impact of EOD on the gridded product. Based on a test for China, the number of extreme events obtained with unmixed EOD was shown to be much higher than the extreme events captured with mixed EOD.