Ranking of CMIP5 GCM Skills in Simulating Observed Precipitation over the Lower Mekong Basin, Using an Improved Score-Based Method

: This study assessed the performances of 34 Coupled Model Intercomparison Project Phase 5 (CMIP5) general circulation models (GCMs) in reproducing observed precipitation over the Lower Mekong Basin (LMB). Observations from gauge-based data of the Asian Precipitation-Highly Resolved Observational Data Integration Towards Evaluation of Water Resources (APHRODITE) precipitation data were obtained from 1975 to 2004. An improved score-based method was used to rank the performance of the GCMs in reproducing the observed precipitation over the LMB. The results revealed that most GCMs effectively reproduced precipitation patterns for the mean annual cycle, but they generally overestimated the observed precipitation. The GCMs showed good ability in reproducing the time series characteristics of precipitation for the annual period compared to those for the wet and dry seasons. Meanwhile, the GCMs obviously reproduced the spatial characteristics of precipitation for the dry season better than those for annual time and the wet season. More than 50% of the GCMs failed to reproduce the positive trend of the observed precipitation for the wet season and the dry season (approximately 52.9% and 64.7%, respectively), and approximately 44.1% of the GCMs failed to reproduce positive trend for annual time over the LMB. Furthermore, it was also revealed that there existed different robust criteria for assessing the GCMs’ performances at a seasonal scale, and using multiple criteria was superior to a single criterion in assessing the GCMs’ performances. Overall, the better-performed GCMs were obtained, which can provide useful information for future precipitation projection and policy-making over the LMB.


Introduction
Precipitation is a key climate variable in studying the effects of climate change [1]. Changes in precipitation patterns induced by climate change directly or indirectly cause variations in the hydrological cycle and ecological system, as well as in socioeconomic development and human health [2][3][4][5]. Under the business-as-usual (BAU) scenario, the world will face a 40% water deficit by 2030 [6]. Therefore, climate change poses severe challenges for humans in facing their existence and development. Thus, climate change assessments have been conducted through precipitation simulations, and the response measures to climate change effects have appeared to be particularly significant.
General circulation models (GCMs) are valuable tools for studying past, present, and future climate trends and variability [7,8]. The fifth phase of the Coupled Model Intercomparison Project 5 (CMIP5) of the World Climate Research Programme (WCRP) provided numerical numbers of GCMs compared to CMIP3 GCMs to enhance the understanding of the mechanisms of climate system change and to improve the capability to simulate climate change [9][10][11][12]. For example, Sperber et al. 2013 [9] showed that the CMIP5 multimodel mean (MMM) had better skills in simulating pattern correlations with respect to observations than the CMIP3 MMM did. Sillmann et al. 2013 [10] found that there existed some improvements of the CMIP5 ensemble in the representation of the magnitude of precipitation indices compared to the CMIP3. Meher et al. 2017 [12] showed that CMIP5 GCMs were more skillful in simulating the annual cycle of interannual variability of precipitation compared to CMIP3 GCMs over the Western Himalayan Region. Additionally, a study by Hasson et al. 2016 [13] showed that CMIP5 GCMs had improved skills in describing the seasonality of precipitation regimes compared to their predecessors over the Asian monsoon region, which includes the Mekong River Basin, but the performance of the GCMs varied for different river basins. Because of the existence of uncertainty in precipitation simulations, it is necessary to know how well GCMs can effectively simulate precipitation on a regional scale such as the river basin before projecting future climate change.
Numerous scholars have assessed the performance of the CMIP5 GCMs in simulating precipitation at the global scale [14][15][16][17][18], regional scale [11,[19][20][21], and subregional scale [12,[22][23][24]. Thus far, the multimodel ensembles of CMIP5 for projection of climate variables have been effectively used [20], and some authors consider these ensembles to be better than individual GCM [25]. However, some studies have suggested that multimodel ensembles are deficient in their projection [14,26], and thus it may be essential to consider acceptable GCMs for specific assessments rather than simple multimodel ensembles [8]. Moreover, ensemble methods could be applied on the best-performing GCMs [21]. Thus, assessing the performance of GCMs also provided useful information for future climate change studies with respect to the application of multimodel ensembles.
Climate change may increase the frequency and intensity of extreme hydrological events, as well as the frequency of years with above-normal monsoons or extremely low precipitation [27]. In the past few decades, the demand for water resources has increased with population growth and economic development [28]. Moreover, further increases in population and accelerated urbanization have exacerbated the demand for water resources [29]. Additionally, changes in precipitation are likely to have great impacts on the water cycle system, water resources, and agricultural production of the Lower Mekong Basin (LMB) [5,30,31]. Therefore, assessing GCM performances in simulating precipitation is essential for future precipitation change and policy-making over the LMB. In previous studies, assessing GCM performance has been mainly based on satellite data rather than gauge-based data [17,24,32,33], and the same for the LMB [13]. However, satellite data likely underestimated observed precipitation over the Mekong River Basin [34]. Thus, we used gauge-based data of the Asian Precipitation-Highly Resolved Observational Data Integration Towards Evaluation of Water Resources (APHRODITE) [35] as the observational data for assessing GCM performance over the LMB.
In addition, few studies have focused on the CMIP5 GCM assessments at a river basin of the Asian monsoon regions such as the LMB, as well as dividing the time scales into the wet season, dry season, and annual time to make comprehensive assessments. Therefore, this study assessed the performance of CMIP5 GCMs in simulating observed precipitation over the LMB to provide useful guidance for the assessment of climate change effects.

Study Area
The Lower Mekong Basin (Figure 1) is located in Southeast Asia within the countries of Laos, Thailand, Cambodia, and Vietnam [36]. It has a catchment area of about 630,000 km 2 , with a total length of about 2668 km. The climate of this area belongs to the tropical monsoon climate, with the wet season from May to October, and the dry season from November to the following April. Mean annual precipitation ranges from less than 1000 mm in northeast Thailand to more than 3500 mm in north-central Laos [37]. High terrain is predominant in the Laos, whereas flat terrain is predominant in northeast Thailand, Cambodia, and the delta in Vietnam (Figure 1). wet season from May to October, and the dry season from November to the following April. Mean annual precipitation ranges from less than 1000 mm in northeast Thailand to more than 3500 mm in north-central Laos [37]. High terrain is predominant in the Laos, whereas flat terrain is predominant in northeast Thailand, Cambodia, and the delta in Vietnam (Figure 1).

GCM Data
Thirty-four GCMs from CMIP5 were used in this study [38], including precipitation outputs, specific humidity, and wind (eastward wind and northward wind) data from 1975 to 2004. Table 1

GCM Data
Thirty-four GCMs from CMIP5 were used in this study [38], including precipitation outputs, specific humidity, and wind (eastward wind and northward wind) data from 1975 to 2004. Table 1 gives an overview of the home institution of the models and their resolution. Further details can be found at the CMIP5 website (http://cmip-pcmdi.llnl.gov/cmip5/index.html).

Precipitation Data
This study used gauge-based data of the APHRODITE precipitation data as the observational data. APHRODITE precipitation data are gauge-based daily data, with a high horizontal resolution of 0.25 • × 0.25 • [35]. APHRODITE has been proven to be a better-gridded precipitation product for the Mekong River Basin, contributing to studies such as climate change, Asian water resources, statistical downscaling, forecast improvements, verification of numerical model simulation, and satellite precipitation estimates [34]. Daily precipitation data of the observed stations from the Global Surface Summary of the Day (GSOD) and the Global Historical Climatology Network (GHCN) were obtained from the National Climatic Data Center (https://gis.ncdc.noaa.gov). Precipitation of the APHRODITE, GSOD, and GHCN were calculated for monthly data. As shown in Figure 2, monthly precipitation of GSOD and GHCN showed significant correlation with APHRODITE precipitation at a significance of 0.01, with an R-squared (R 2 ) of 0.834, indicating that APHRODITE precipitation was suitable for assessing the GCMs' performances over the LMB.
Water 2018, 10, x FOR PEER REVIEW 6 of 24

Precipitation Data
This study used gauge-based data of the APHRODITE precipitation data as the observational data. APHRODITE precipitation data are gauge-based daily data, with a high horizontal resolution of 0.25° × 0.25° [35]. APHRODITE has been proven to be a better-gridded precipitation product for the Mekong River Basin, contributing to studies such as climate change, Asian water resources, statistical downscaling, forecast improvements, verification of numerical model simulation, and satellite precipitation estimates [34]. Daily precipitation data of the observed stations from the Global Surface Summary of the Day (GSOD) and the Global Historical Climatology Network (GHCN) were obtained from the National Climatic Data Center (https://gis.ncdc.noaa.gov). Precipitation of the APHRODITE, GSOD, and GHCN were calculated for monthly data. As shown in Figure 2, monthly precipitation of GSOD and GHCN showed significant correlation with APHRODITE precipitation at a significance of 0.01, with an R-squared (R 2 ) of 0.834, indicating that APHRODITE precipitation was suitable for assessing the GCMs' performances over the LMB. Due to the various resolutions of the GCMs and the observations, monthly precipitation outputs for all of the GCMs and APHRODITE precipitation data were converted to 2.5 × 2.5 grid using bilinear interpolation, and 21 grids were selected for comparison ( Figure 1). In this study, we considered 1975-2004 as the reference period, including the periods of annual time, January to December; the wet season, May to October; and the dry season, November to the following April. As for the APHRODITE precipitation, precipitation of the mean wet season and dry season was 1213.8 mm and 216.9 mm, which accounted for approximately 86.1% and 13.9% the annual precipitation over the LMB. The sample size for the annual time, the wet season, and the dry season were 360, 180, and 180, respectively. The LMB precipitation data were calculated by taking the arithmetic mean of the 21 grids.

National Centers for Environmental Prediction (NCEP) Reanalysis Data
Monthly specific humidity and wind data were used from NCEP reanalysis data (https://www.esrl.noaa.gov/psd/) from 1975 to 2004. In order to explain the differences of precipitation simulation over the LMB, we explored the three best-ranked GCMs (overall results) in reproducing the main features of atmospheric circulation over the LMB. Spatial distributions of mean monthly specific humidity and wind at the 850 hPa level for the wet season and the dry season of the Due to the various resolutions of the GCMs and the observations, monthly precipitation outputs for all of the GCMs and APHRODITE precipitation data were converted to 2.5 × 2.5 grid using bilinear interpolation, and 21 grids were selected for comparison ( Figure 1). In this study, we considered 1975-2004 as the reference period, including the periods of annual time, January to December; the wet season, May to October; and the dry season, November to the following April. As for the APHRODITE precipitation, precipitation of the mean wet season and dry season was 1213.8 mm and 216.9 mm, which accounted for approximately 86.1% and 13.9% the annual precipitation over the LMB. The sample size for the annual time, the wet season, and the dry season were 360, 180, and 180, respectively. The LMB precipitation data were calculated by taking the arithmetic mean of the 21 grids.

National Centers for Environmental Prediction (NCEP) Reanalysis Data
Monthly specific humidity and wind data were used from NCEP reanalysis data (https://www. esrl.noaa.gov/psd/) from 1975 to 2004. In order to explain the differences of precipitation simulation over the LMB, we explored the three best-ranked GCMs (overall results) in reproducing the main features of atmospheric circulation over the LMB. Spatial distributions of mean monthly specific humidity and wind at the 850 hPa level for the wet season and the dry season of the period 1975-2004 over the LMB were calculated to make comparisons with the corresponding results from the NCEP reanalysis data.

Methods
Multiple criteria were used for the assessment, including root mean square error (RMSE), percentage bias (PBIAS), linear correlation coefficient (r) for monthly series and for spatial distribution, the Mann-Kendall test statistic (Z), Sen's slope, the Brier score (BS), and the significance score (S score ). The main assessment steps were as follows: First, we calculated the statistics of the eight criteria. Then, based on the statistics of the criteria, we used an improved rank score (RS) method [39] to calculate the ranking scores of the GCMs' performances by a single criterion. Finally, the overall ranking scores of the GCMs' performances were calculated using ranking scores of the GCMs' performances by multiple criteria. The specific calculation methods were as follows: RMSE, a common way for representing the difference between the GCM and the observed values, was defined as follows: where pmi and poi represent the monthly precipitation for the GCM and the observed value of the LMB at i time step, respectively, and n represents the total number of the time steps. A smaller RMSE value indicated a relatively better performance of a GCM. The PBIAS was used to represent the tendency of the difference between the GCM and the observed values, and was defined as follows: The variables in the formula are the same as those described in Equation (1). A PBIAS value closer to zero indicated a relatively better performance of a GCM.
The linear correlation coefficient (r) was used to assess both the monthly series and spatial distribution of precipitation between the observation and GCMs. For the monthly series correlation coefficient (r), the correlation coefficient was calculated between observed and modeled long-term monthly mean values, and the sample sizes were 6, 6, and 12 for the wet season, dry season, and annual time, respectively. For the spatial distribution correlation coefficient (r), the sample sizes were 21 for all the three time periods, and r was calculated according to the 21 grids based on the observation and GCMs, including the mean annual values, mean values for the annual wet season, and mean values for the annual dry season. The formula was defined as follows: Here, for the monthly series correlation coefficient (r), pmi and poi represent the monthly precipitation for the GCM and the observation of the LMB at i month, respectively, and the pm and po represent the mean values for precipitation of the GCM and the observation, respectively. For the spatial distribution correlation coefficient (r), pmi and poi represent precipitation for the GCM and the observation of the mean annual values, mean values for the annual wet season, or mean values for the annual dry season at i grid, respectively, and pm and po represent the corresponding mean values for precipitation of the GCM and the observation of all grids, respectively. A larger value of r indicated a relatively better performance of a GCM.
The Mann-Kendall test statistic (Z) and Sen's slope were used to obtain the trends and their magnitudes for GCMs and observation. Thus, the effectiveness of the GCMs in representing the observed trends could be determined. We used the annual time series for the analysis, which included the annual wet season values, annual dry season values, and annual values, which we attributed to the wet season, dry season, and annual time for this analysis, respectively.
The Mann-Kendall test statistic (Z) is defined as follows [40]: Here x k and x i are the sequential precipitation values, n is the length (29) of the dataset, where t is the extent of any given tie, and ∑ denotes the summation over all ties. Sen's slope was defined as follows [41,42]: where 1 < j < i < n, and the slope estimator β represents the median of the entire data set. The BS and S score were used to assess the GCM probability density functions (PDFs) of monthly precipitation. The formulas were defined as follows: Here, Bmi and Boi represent the probability of the GCM and observed values at the ith of each bin, respectively, and n is the number of bins, which was set as 30 according to the data range. BS is a measurement of mean squared error for probability prediction [43], and S score is a measurement of the degree of overlap between the simulated probability distribution and the observed value [44]. Thus, a smaller BS value and a larger S score value indicated relatively better performance of a GCM [45].
As for the RS method, a smaller RMSE value for the relative error indicates better performance of a GCM, as does a larger r value of the non-error index for the correlation coefficient (r), which can easily lead to inconsistent assessment results [39,45]. The improved RS distinguished the inconsistency between the relative error index and the nonrelative error index, which could be used for the assessment of multiple criteria and climatic variables to synthetically assess the performance of GCMs in applicable regions [39]. The improved RS of each assessment criterion could be calculated according to its statistic as follows [39]: T max −T min , T represents the relative error index T max −T min , T represents the non-relative error index (11) Here, RS i represents the GCM score calculated by the assessment criterion i. For the relative error indices of RMSE, PBIAS, and BS, T i represents the absolute value of the statistic for a GCM, and T min and T max represent the corresponding minimum and maximum values, respectively, in all GCMs. For the relative error indices of Z and Slope, T i represents the absolute error of the statistic calculated between GCM and observation, and T min and T max represent the corresponding minimum and maximum values, respectively, in all GCMs. For the nonrelative error index of correlation coefficient (r) and S score , T i represents the absolute value of the statistic for a GCM, and T min and T max represent the corresponding minimum and maximum values, respectively, in all GCMs.
Then, the RS for precipitation could be calculated as follows: Here, RS pw , RS pd , and RS pa represent the RS of precipitation for the wet season, the dry season, and the annual time, respectively. Where n = 8, i represents an assessment criterion, with 1-RMSE, 2-PBIAS, 3-Z, 4-Slope, 5-r for monthly distribution, 6-r for spatial distribution, 7-BS, and 8-S score , respectively. W i represents the weight for an assessment criterion, and W s represents the sum weight of all assessment criteria. Z and Slope were part of the trend analysis, and BS and S score were part of the PDF analysis. Thus, we set a 0.5 weight for Z, Slope, BS, and S score , and a 1.0 weight for RMSE, PBIAS, r for monthly distribution, and r for spatial distribution.
According to RS i , the overall RS for the criterion RS io could be calculated as follows: Here, RS iw , RS id , and RS ia represent the RS i for the wet season, dry season, and annual time, respectively. We set 0.5, 0.5, and 1 as their respective weights.
Then, the overall RS for precipitation (RS po ) could be calculated as follows: Here, the variables are the same as those defined for Equation (12).

Annual Cycle of Precipitation
Precipitation variation for the observation and 34 GCMs in the mean annual cycle of the period 1975-2004 over the LMB is shown in Figure 3. Most of the GCMs effectively reproduced the single-peak pattern of precipitation for the mean annual cycle, with the mean maximum precipitation of the observation occurring in August (247.1 mm), whereas the mean minimum occurred in January (12.6 mm) over the LMB. The mean annual precipitation of the observation over the LMB was 1430.7 mm, whereas the values for the GCMs ranged from 1379.7 mm to 2022.9 mm. Of the 34 GCMs, 29 (approximately 85.3%) had higher mean annual precipitation than the observation. Precipitation of the mean wet season and dry season for the observation for the LMB was 1213.8 mm and 216.9 mm, whereas the values for the GCMs ranged from 1083.5 mm to 1701.9 mm, and 115.1 mm to 559.6 mm, respectively. Of the 34 GCMs, 29 and 21, or approximately 85.3% and 61.8%, had higher precipitation than the observation for the wet season and the dry season, respectively. This indicated that the GCMs tended to overestimate precipitation compared to the observation, especially for the wet season.

Characteristics of the Statistics of the Criteria
The statistics of the criteria for precipitation concerning the observation and the GCMs simulation were calculated and are shown in terms of scatter plots for the wet season, dry season, and annual time (   HadGEM2-CC had the lowest absolute PBIAS values at 0.2%, 0.003%, and 0.8%, respectively, for the wet season, dry season, and annual time, indicating good simulation of the observed precipitation. However, MIROC4h, INMCM4.0, and MIROC4h had the highest PBIAS values for the wet season, dry season, and annual time, at 40.2%, 158.0%, and 41.4%, respectively, which represented poor simulation of the observed precipitation.        The monthly series r was obviously high in the annual time and lower for the wet and dry seasons, with mean absolute values of 0.96, 0.86, and 0.84, respectively. Moreover, no obvious difference was noted among GCMs for the annual time due to the absolute r values range of 0.89 to 0.99, indicating that GCMs had good ability in simulating the time series characteristics of precipitation for the annual time. Of the 34 GCMs, 28 and 25, or approximately 82.4% and 73.5%, had relatively higher absolute r values higher than 0.8 for the wet and dry seasons, indicating that GCMs represented a relatively better ability in reproducing the time series characteristics of precipitation for the dry season compared to those for the wet season. NorESM1-M, IPSL-CM5A-LR, and IPSL-CM5A-LR had the highest absolute r values for the wet season, dry season, and annual time, at 0.99, 0.99, and 0.99, respectively, but showed the lowest values for GISS-E2-H, MIROC-ESM, and IPSL-CM5B-LR, at 0.36, 0.50, and 0.89, respectively. The spatial correlation r for the dry season was obviously higher than that for the annual time and the wet season, with mean absolute values of 0.69, 0.44, and 0.34, respectively. Of the 34 GCMs, all had an absolute r value higher than 0.5 for the dry season, whereas 5 and 12 of the GCMs, or approximately 14.7% and 35.3%, had absolute r values higher than 0.5 for the wet season and the annual time, respectively. This phenomenon was also detected in a monsoon region that exhibited low spatial correlation for the wet season [12]. CMCC-CMS, EC-EARTH, and EC-EARTH had the highest absolute r values for the wet season, the dry season, and the annual time, at 0.63, 0.90, and 0.71, respectively, whereas the lowest were shown by MIROC4h, INMCM4.0, and IPSL-CM5B-LR, at 0.02, 0.51, and 0.03, respectively.
The observed precipitation showed positive trends for the wet season, the dry season, and the annual time, with a Z statistic of 0.21, 1. The monthly series r was obviously high in the annual time and lower for the wet and dry seasons, with mean absolute values of 0.96, 0.86, and 0.84, respectively. Moreover, no obvious difference was noted among GCMs for the annual time due to the absolute r values range of 0.89 to 0.99, indicating that GCMs had good ability in simulating the time series characteristics of precipitation for the annual time. Of the 34 GCMs, 28 and 25, or approximately 82.4% and 73.5%, had relatively higher absolute r values higher than 0.8 for the wet and dry seasons, indicating that GCMs represented a relatively better ability in reproducing the time series characteristics of precipitation for the dry season compared to those for the wet season. NorESM1-M, IPSL-CM5A-LR, and IPSL-CM5A-LR had the highest absolute r values for the wet season, dry season, and annual time, at 0.99, 0.99, and 0.99, respectively, but showed the lowest values for GISS-E2-H, MIROC-ESM, and IPSL-CM5B-LR, at 0.36, 0.50, and 0.89, respectively. The spatial correlation r for the dry season was obviously higher than that for the annual time and the wet season, with mean absolute values of 0.69, 0.44, and 0.34, respectively. Of the 34 GCMs, all had an absolute r value higher than 0.5 for the dry season, whereas 5 and 12 of the GCMs, or approximately 14.7% and 35.3%, had absolute r values higher than 0.5 for the wet season and the annual time, respectively. This phenomenon was also detected in a monsoon region that exhibited low spatial correlation for the wet season [12]. CMCC-CMS, EC-EARTH, and EC-EARTH had the highest absolute r values for the wet season, the dry season, and the annual time, at 0.63, 0.90, and 0.71, respectively, whereas the lowest were shown by MIROC4h, INMCM4.0, and IPSL-CM5B-LR, at 0.02, 0.51, and 0.03, respectively.
The observed precipitation showed positive trends for the wet season, the dry season, and the annual time, with a Z statistic of 0.21, 1.

Comparison of Ranking Scores of the GCMs' Performances by a Single Criterion
For the different criteria at the same time period, a GCM may have performed well for one criterion but poor for another (Figure 7). For example, MIROC-ESM-CHEM had the highest-ranking score of 1 based on the RMSE criterion for the wet season, but a low-ranking score of 0.467 for the S score . Although CESM1(WACCM) had the highest-ranking scores, both 1, for the BS and S score, it had a low-ranking score of 0.321 for the Z for the wet season. The same characteristics were also found for the dry season and the annual time. Moreover, for the same criterion at different time periods, a GCM may have performed well for one time period but poor for another period or for an overall result (Figure 7). For example, ACCESS1.3 had the highest-ranking score of 1 by the PBIAS criterion for the dry season, but low-ranking scores of 0.419 and 0.529 for the wet season and the annual time, respectively. Additionally, a GCM may not have performed the best for one period, two periods, or three periods, but showed the best performance for the overall result ( Figure 7). For example, MPI-ESM-LR did not obtain the highest-ranking scores by the Sen's slope criterion for the three time periods, but had the highest-ranking score of Sen's slope for the overall result. This indicated that the results of GCMs' performances relied mainly on the assessment of the criterion, and the GCMs' performances varied as the criterion changed. Thus, it is essential to comprehensively assess GCMs by using a multiple criteria method, rather than a single criterion method.

Overall Ranking Scores of the GCMs' Performances by Multiple Criteria
As shown in Table 2 Figure 8 shows that the mean ranking scores of the GCMs of the wet season were slightly higher than those of the annual time, the dry season, and the overall results. However, the range ability of ranking scores of the GCMs showed the smallest for the dry season compared to the others. This indicated that the GCMs performed relatively better for the wet season, and the seasonal performance was comparatively different.

Overall Ranking Scores of the GCMs' Performances by Multiple Criteria
As shown in Table 2

Sensitivity Analysis of Ranking Scores of the GCMs' Performances
As shown in Figure 9, except Z, Sen's slope, and BS showing no significant correlations between ranking scores of the GCMs obtained from multiple criteria and ranking scores of the GCMs obtained from a single criterion over the LMB for the dry season, all the criteria showed significant correlations

Sensitivity Analysis of Ranking Scores of the GCMs' Performances
As shown in Figure 9, except Z, Sen's slope, and BS showing no significant correlations between ranking scores of the GCMs obtained from multiple criteria and ranking scores of the GCMs obtained from a single criterion over the LMB for the dry season, all the criteria showed significant correlations (p < 0.05 or 0.01) for the wet season, the annual time, and the overall results. The RMSE and PBIAS showed relatively high r values compared to other criteria, whereas the monthly series correlation showed the lowest r for the wet season, the annual time, and the overall results. The results indicated that the criteria were robust criteria for assessing performance of the GCMs. However, there existed different robust criteria for assessing performance of the GCMs at a seasonal scale.
Water 2018, 10, x FOR PEER REVIEW 17 of 24 Figure 9. Correlation between ranking scores of the GCMs obtained from multiple criteria and ranking scores of the GCMs obtained from a single criterion over the LMB: (a), (b), and (c) represent the time periods of the wet season, the dry season, and the annual time, respectively; (d) represents the correlation between the overall ranking scores and the weight criteria ranking scores; ** represents that correlation was significant at the 0.01 level; * represents that correlation was significant at the 0.05 level.

Atmospheric Circulation
As shown in Figures 10 and 11, the three best-ranked GCMs (overall results) generally represented similar distributions of specific humidity and wind compared to the NCEP reanalysis for the wet season and the dry season, suggesting that a good representation of the regional circulation pattern could also indicate efficiency of model performance [46]. However, the distributions were entirely different between the wet season and the dry season. For the wet season, specific humidity was higher than the dry season, and the prevailing wind direction was dominated by a westerly wind, with large amounts of moisture brought to the LMB from the Bay of Bengal. For the dry season, the prevailing wind direction showed an easterly wind originating from inland, which was characterized by dry weather over the LMB. This indicated that precipitation amounts for the wet season were much higher compared to the dry season, and therefore it was also more likely that the GCMs had larger absolute errors for the wet season compared to the dry season over the LMB. Figure 9. Correlation between ranking scores of the GCMs obtained from multiple criteria and ranking scores of the GCMs obtained from a single criterion over the LMB: (a), (b), and (c) represent the time periods of the wet season, the dry season, and the annual time, respectively; (d) represents the correlation between the overall ranking scores and the weight criteria ranking scores; ** represents that correlation was significant at the 0.01 level; * represents that correlation was significant at the 0.05 level.

Atmospheric Circulation
As shown in Figures 10 and 11, the three best-ranked GCMs (overall results) generally represented similar distributions of specific humidity and wind compared to the NCEP reanalysis for the wet season and the dry season, suggesting that a good representation of the regional circulation pattern could also indicate efficiency of model performance [46]. However, the distributions were entirely different between the wet season and the dry season. For the wet season, specific humidity was higher than the dry season, and the prevailing wind direction was dominated by a westerly wind, with large amounts of moisture brought to the LMB from the Bay of Bengal. For the dry season, the prevailing wind direction showed an easterly wind originating from inland, which was characterized by dry weather over the LMB. This indicated that precipitation amounts for the wet season were much higher compared to the dry season, and therefore it was also more likely that the GCMs had larger absolute errors for the wet season compared to the dry season over the LMB.

Discussion
In this paper, we obtained the better-performing GCMs in reproducing the observed precipitation over the LMB for the wet season, the dry season, the annual time, and the overall results. A previous study by Sperber et al. 2013 [9] showed that the IPSL-CM5A-LR and IPSL-CM5A-MR models were top performers in representing the interannual variability of the Indian monsoon. Research by Kadel et al. 2018 [23] showed that ACCESS1.0, CNRM-CM5, EC-EARTH, and HadGEM2-ES were the four best models for precipitation simulation in the central Himalayas. Additionally, a study by Hasson et al. 2016 [13] showed that CCSM4, GFDLCM3, MIROC-ESM-CHEM, MIROC-ESM, MIROC5, and NorESM1-M simulated mostly a realistic active duration of the monsoon due to a rapid fractional accumulation (RFA) slope similar to that of the observations in the Mekong River Basin. In our study, we highlighted similarly good performances from HadGEM2-ES, CCSM4, and IPSL-CM5A-MR over the LMB, which is part of the Asian monsoon region, and part of the Mekong River Basin, which are both affected by the southwest monsoon. However, our results showed differences from the above studies, which indicated that it is significant to assess GCM performance not only at a large scale, but also at a regional scale: A river basin such as the LMB should especially be taken into important consideration due to its special geographical position and climatic characteristics, as well as its significant effects [30,31].
The results showed that there existed different abilities in reproducing the observed precipitation, such as the different statistics of criteria and ranking scores of the GCMs, especially for the differences at the seasonal scale. Actually, this may have really reflected the ability of reproducing the South Asian summer monsoon, which can be caused by issues related to large-scale atmospheric circulations and underrepresentation of real orography [13]. Moreover, previous studies have shown that atmospheric circulation was a good indicator for explaining the discrepancies of simulations by GCMs [33,47]. Thus, how well the GCMs performed in reproducing atmospheric circulation aids in understanding the performance of the GCMs in precipitation simulation. More than 50% of the GCMs failed to reproduce the positive trend of the observed precipitation for the wet season and the dry season (approximately 52.9% and 64.7%, respectively), and approximately 44.1% of the GCMs failed to reproduce positive trend for the annual time over the LMB. Other studies of southeastern Australia by Fu et al. 2013 [45] and the Western Himalayan region by Meher et al. 2017 [12] reported that trend analysis was not a robust criterion for assessing the performance of GCMs. However, our results showed that the Z and Sen's slope were robust criteria for assessing the performance of GCMs except for the dry season, indicating that the trend analysis method could be used as a robust criterion for assessing GCM performance over the LMB. Nevertheless, in fact more than 50% of the GCMs failed to reproduce a positive trend for the wet season and the dry season. This can be attributed to the parameter we used for assessment and that the ranking scores of the Z and Sen's slope were based on absolute error between precipitation of the observation and precipitation of the GCM simulation. As shown in Figure 12, the absolute errors of the Z and Sen's slope showed a significant correlation at the 0.01 significance level with ranking scores of the GCMs except for the dry season, which showed the same significant correlation at the 0.01 significance level between the ranking scores of the GCMs obtained by the Z and Sen's slope and the ranking scores obtained by multiple criteria (Figures 9 and 12). These results indicated that although the trend analysis method was a robust criterion for assessing the GCMs' performances except for the dry season, it did not mean a high ability to reproduce the observed precipitation trend. Furthermore, using multiple criteria to assess GCM performance was superior to a single criterion method.
A score-based method proved to be applicable for assessing GCM performance [39,45]. In this paper, we used APHRODITE precipitation data as the observations based on an improved score-based method to provide more detailed assessment results of the GCMs under the three time periods over the LMB. The results provided useful information for further studies related to multimodel ensemble method application and future climate change over the LMB and for monsoon regions that have geographic and climatic features similar to those of the LMB. Although the APHRODITE precipitation data have high resolution and have proven to be a better-gridded precipitation product for the Mekong River Basin, it is important to make a comparison to other gridded precipitation products, such as precipitation data from the Climate Research Unit (CRU) [48], the Global Precipitation Climatology Project (GPCP) [49], and others. In addition, because of the release of a new generation of climate models (CMIP6) in the near future, the improved score-based method can be used for assessing their performance in climatic variables simulation and for their comparison with CMIP5 GCMs.
Water 2018, 10, x FOR PEER REVIEW 20 of 24

Figure 12
Correlation between the absolute errors for the statistics and ranking scores of the GCMs. Absolute errors for the statistics were between precipitation of the observation and precipitation of the GCM simulations. The black bar represents the Z; the gray bar represents Sen's slope; ** represents that correlation was significant at the 0.01 level.
A score-based method proved to be applicable for assessing GCM performance [39,45]. In this paper, we used APHRODITE precipitation data as the observations based on an improved scorebased method to provide more detailed assessment results of the GCMs under the three time periods over the LMB. The results provided useful information for further studies related to multimodel ensemble method application and future climate change over the LMB and for monsoon regions that have geographic and climatic features similar to those of the LMB. Although the APHRODITE precipitation data have high resolution and have proven to be a better-gridded precipitation product for the Mekong River Basin, it is important to make a comparison to other gridded precipitation products, such as precipitation data from the Climate Research Unit (CRU) [48], the Global Precipitation Climatology Project (GPCP) [49], and others. In addition, because of the release of a new generation of climate models (CMIP6) in the near future, the improved score-based method can be used for assessing their performance in climatic variables simulation and for their comparison with CMIP5 GCMs.

Conclusions
This study focused mainly on the assessment of the performance of 34 CMIP5 GCMs in simulating observed precipitation over the LMB. The performance was assessed through RMSE, PBIAS, monthly series correlations, spatial correlations, Z, Sen's slope, BS, and Sscore under three periods including the wet season, the dry season, and annual time. The overall ranking scores were Figure 12. Correlation between the absolute errors for the statistics and ranking scores of the GCMs. Absolute errors for the statistics were between precipitation of the observation and precipitation of the GCM simulations. The black bar represents the Z; the gray bar represents Sen's slope; ** represents that correlation was significant at the 0.01 level.

Conclusions
This study focused mainly on the assessment of the performance of 34 CMIP5 GCMs in simulating observed precipitation over the LMB. The performance was assessed through RMSE, PBIAS, monthly series correlations, spatial correlations, Z, Sen's slope, BS, and S score under three periods including the wet season, the dry season, and annual time. The overall ranking scores were obtained for GCM performance over the LMB. The main results of this study are presented in the following points.
Precipitation in the observations were 1430.7 mm, 1213.8 mm, 216.9 mm for the mean annual, the mean wet season, and the mean dry season, whereas the precipitation of the GCMs ranged from 1379.7 mm to 2022.9 mm, 1083.5 mm to 1701.9 mm, and 115.1 mm to 559.6 mm, with higher precipitation than the observation at GCM numbers of 29, 21, and 29 (approximately 85.3%, 61.8%, and 85.3%, respectively). This indicated that the GCMs tended to overestimate precipitation compared to the observation, especially for the wet season.
The GCMs showed good ability in reproducing the time series characteristics of precipitation for the annual period compared to those for the wet and dry seasons, and the GCMs obviously reproduced the spatial characteristics of precipitation for the dry season better than those for the annual time and the wet season. More than 50% of the GCMs failed to reproduce the positive trend of the observed precipitation for the wet season and the dry season (approximately 52.9% and 64.7%, respectively), and approximately 44.1% of the GCMs failed to reproduce the positive trend for the annual time over the LMB. However, most showed good ability in reproducing the characteristics of the probability distribution function of the observed precipitation.
The results showed that a GCM may perform well for one criterion but poorly for another criterion at the same period. Moreover, for the same criterion at different periods, a GCM may perform well for one time period but poorly for another period or for the overall result. For example, MIROC-ESM-CHEM had the highest-ranking score of 1 based on the RMSE criterion for the wet season, but a low-ranking score of 0.467 for the S score . Moreover, for the same criterion at different time periods, ACCESS1.3 had the highest-ranking score of 1 by the PBIAS criterion for the dry season, but low-ranking scores of 0.419 and 0.529 for the wet season and the annual time, respectively. This indicated that the results of GCM performances relied mainly on the assessment of the criterion, and GCM performance varied as the criterion changed. Thus, it is essential to comprehensively assess the GCMs by using a multiple criteria method, rather than a single criterion method.
Based on the ranking scores of the GCMs, the top five ranking scores of the GCMs over the Assessing performances of the GCMs in reproducing observed precipitation is significant for projecting future climate change. The results of this study can provide useful information for further study related to multimodel ensemble methods application and future climate change over the LMB and for monsoon regions that have geographic and climatic features similar to those of the LMB.