Assessing the Performance of CMIP 5 GCMs for Projection of Future Temperature Change over the Lower Mekong Basin

In this study, we assessed the performance of 34 Coupled Model Intercomparison Project Phase 5 (CMIP5) general climate models (GCMs) for simulating the observed temperature over the Lower Mekong Basin (LMB) in 1961–2004. An improved score-based method was used to rank the performance of the GCMs over the LMB. Two methods of multi-model ensemble (MME), sub-ensemble from the top 25% ranked GCMs and full ensemble from the entire GCMs, were calculated using arithmetic mean (AM) method and downscaled using the Delta method to project future temperature change during two future time periods, the near future (2006–2049) and the far future (2050–2093), under representative concentration pathways (RCP2.6, RCP4.5, and RCP8.5 scenarios) over the LMB. The improved score-based method combining multiple criteria showed a robust assessment of the GCMs performance over the LMB, which can provide good information for projecting future temperature change. The results showed a significant increase in temperature over the LMB under the two ensembles. However, there were differences in the magnitudes of the future temperature increase between the two ensemble methods, with a higher mean annual temperature increase from full ensemble and sub-ensemble at 1.26 ◦C (1.09 ◦C), 1.90 ◦C (1.70 ◦C), and 2.97 ◦C (2.78 ◦C) during 2050–2093 under the RCP2.6, RCP4.5, and RCP8.5 scenarios compared to the values at 0.93 ◦C (0.87 ◦C), 0.99 ◦C (0.95 ◦C), and 1.09 ◦C (1.06 ◦C) during 2006–2049, respectively, relative to the reference time period of 1961–2004. In the future (2006–2093), the temperature is likely to increase at 0.04 ◦C, 0.16 ◦C, and 0.37 ◦C decade-1 under the RCP2.6, RCP4.5, and RCP8.5 scenarios by the sub-ensemble, while a higher temperature increase at 0.05 ◦C, 0.17 ◦C, and 0.39 ◦C was found by the full ensemble over the LMB, relative to the reference time period of 1961–2004. On the whole, the higher warming mainly occurred in the northern and central areas over the LMB, while the lower warming mainly occurred in the southeast and the southwest, especially under the RCP4.5 and RCP8.5 scenarios, with the warming increased with increasing RCP for both ensembles. Moreover, in order to reduce the uncertainty of temperature projection in further studies in the LMB, multiple methods of GCMs ensemble should be considered and compared.


Introduction
Climate change has become one of the most important topics to scientists, the public, and governments around the world.The past 30 years (1983-2012) were probably the hottest in the Northern Hemisphere in the last 1400 years, with the warmest being the first 10 years of the 21st century [1].Regionally, mean temperature is expected to rise 0.79 • C by the year 2030 compared to 1951-2000 in the Mekong River Basin [2].Observational evidence from mainland and oceans has shown that the regional climate change associated with the rise in temperature has significant impacts on the population and environment [3].
Recently, multiple criteria have been widely used for assessing the performance of the CMIP5 GCMs in climate variable simulation [4][5][6][7][8].Rupp et al. [4] assessed 41 CMIP5 GCMs' abilities in temperature and precipitation simulation of the Pacific Northwest, USA, based on observed metrics such as the correlation and variance of mean seasonal spatial patterns, amplitude of seasonal cycle, diurnal temperature range, annual-to decadal-scale variance, and other metrics.Miao et al. [5] assessed the performance of 24 CMIP5 GCMs in simulating intra-annual, annual, and decadal temperature over Northern Eurasia based on the criteria of correlation, the centered root-mean-square-error, and the amplitude of the standard deviations.Ahmadalipour et al. [6] used criteria such as mean, standard deviation, coefficient of variation, relative change (variability), Mann-Kendall test, Kolmogorov-Smirnov test (KS test), and other criteria to assess the performance of 20 CMIP5 GCMs for selecting suitable GCMs for climate change impact analysis over the Columbia River Basin in the Pacific Northwest, USA.Dong et al. [7] assessed the performance of 22 CMIP5 GCMs in simulating annual mean surface air temperature over the critical Belt and Road region using criteria of temporal, spatial, and trend analysis.Das et al. [8] assessed the performance of 34 CMIP5 GCMs in reproducing observed temperature statistics over the Western Himalayan Region of India using criteria including mean seasonal cycle, temporal trends, and spatial correlation.Furthermore, additional researches on the comparison of the ability to reproduce the observed climate variables between CMIP5 and CMIP3 have been done, and many of them showed that CMIP5 models showed improvements compared to CMIP3 models [9][10][11][12].In addition, many studies have been conducted assessing the performance of the GCMs in reproducing patterns of large-scale climate variability, such as the North Atlantic Oscillation (NAO), Pacific/North America pattern (PNA), North Pacific Oscillation (NPO), and El Niño Modoki [13][14][15].Lee and Black [13] found that the low-frequency mode structure from model biases has a significant effect on representing associated regional anomalies in surface air temperature and storm track behavior.Ning and Bradley [14] found that the winter temperature correlation changes over the eastern USA are usually highly affected by the NAO and PNA.Wang et al. [15] showed that future projections of El Niño Modoki of the multi-model simulations can be analyzed based on the changes in the warm wind-evaporation-SST (WES) feedback intensity and simulations of the NPO.Moreover, Hawkins and Sutton [16] suggested that model uncertainty was more important than internal variability for decadal time scales and regional spatial scales (~2000 km).Therefore, a GCM that can simulate observed temperature reasonably well should be selected before a climate change projection is made [5,17,18].
The fifth assessment report of IPCC put forward a new set of scenarios called representative concentration pathways (RCPs) for climate change projection.In detail, four typical pathways include the radiative forcing levels of 2.6 W/m 2 (RCP2.6),4.5 W/m 2 (RCP4.5),6.0 W/m 2 (RCP6.0)and 8.5 W/m 2 (RCP8.5)by the end of this century, which represent the low (RCP2.6), the low-medium (RCP4.5), the medium-high (RCP6.0),and the high (RCP8.5)greenhouse gas emissions [1].However, because of the coarse spatial resolution of the GCMs, the outputs from the future projections of the GCMs are inadequate to project the regional climate scenarios in detail [19][20][21].Thus, downscaling methods have been applied to transfer the coarse outputs of the GCMs to regional or local high-resolution climate change scenarios [21,22].Currently there are three downscaling methods that are being applied: dynamic downscaling, statistical downscaling, and a combination of both.Statistical downscaling is advantageous in that it requires a small amount of calculation, simple to use and does not consider the effects of boundary conditions on the predicted results [23].A statistical downscaling method called the Delta method, which is simple and easy to operate, has been widely used for projection of future climate change around the world [22,[24][25][26][27][28][29].
Future climate change will affect water yield and rice cultivation in the tributary countries over the Lower Mekong Basin (LMB) [30].Due to the population growth in the Lower Mekong Basin (LMB), the demands for agricultural products from the LMB is expected to increase by 20-50% in the next 30 years [31].Additionally, it is likely that the demands for water for agriculture in 2010-2050 will increase as a result of the higher temperatures [32].Therefore, assessing performance of the GCMs in temperature simulation is essential for projection of future temperature change and for policy-making over the LMB.In previous studies, performance of the GCMs and projection of future temperature change have been mainly focused on the watersheds in the Mekong River Basin [2,[33][34][35].However, fewer studies have focused on the LMB.Meanwhile, no research exits that assessing performance of the GCMs in simulating the observed temperature, based on an improved score-based method, using gauge-based data of the Asian Precipitation-Highly Resolved Observational Data Integration Towards Evaluation of Water Resources (APHRODITE) temperature data in the Lower Mekong Basin.Moreover, the multi-model ensemble projection can improve the temporal simulation relative to a single GCM in the Mekong River Basin [35].Thus, in this study, we aim to assess the performance of the CMIP5 models in temperature simulation and use multi-model ensemble (MME) to project future temperature change during two periods of the near future (2006-2049) and the far future (2050-2093) under RCP2.6,RCP4.5, and RCP8.5 scenarios over the LMB.

GCM Data
Thirty-four GCM models from the CMIP5 [36] were used in this study.Table 1 provides a detailed description of the origin and their spatial resolution.More details can be found on the CMIP5 website (http://cmip-pcmdi.llnl.gov/cmip5/index.html).Meanwhile, detailed information for the future projection of 2006-2093 on the RCP2.6,RCP4.5, and RCP8.5 scenarios were also acquired from the website.

Temperature Data
Gauge-based data from the Asian Precipitation-Highly Resolved Observational Data Integration Towards Evaluation of Water Resources (APHRODITE) temperature dataset were used as the observational data.APHRODITE temperature data have a high spatial resolution of 0.25 • × 0.25 • [37], which has been proven to be a better gridded temperature product for studies of climate changes in the Mekong River Basin [38].APHRODITE temperature data are taken daily.Thus, we calculated them into monthly data before assessment.
Because of the different spatial resolutions of the GCMs and the observation, monthly temperature outputs for the 34 GCMs and APHRODITE temperature data were converted to 2.5 • × 2.5 • using bilinear interpolation, and 21 grids were selected for comparison (Figure 1).In this paper, we consider 1961-2004 as the reference time period.The LMB temperature data were calculated by taking the arithmetic mean of the 21 grids.38].APHRODITE temperature data are taken daily.Thus, we calculated them into monthly data before assessment.
Because of the different spatial resolutions of the GCMs and the observation, monthly temperature outputs for the 34 GCMs and APHRODITE temperature data were converted to 2.5° × 2.5° using bilinear interpolation, and 21 grids were selected for comparison (Figure 1).In this paper, we consider 1961-2004 as the reference time period.The LMB temperature data were calculated by taking the arithmetic mean of the 21 grids.

Assessment of the Performance for CMIP5 GCMs
A GCM that performs well for a specific statistic does not necessarily perform well for a different statistic; moreover, an individual assessment criterion can produce a biased assessment [39].Thus, it is beneficial to use multiple criteria for comprehensively assessing the performance for CMIP5 GCMs.Thus, we used seven assessment criteria, including Mean value (M), Standard deviation (SD), Normalized Root mean square error (NRMSE), linear correlation coefficient (r) for spatial distribution, Mann-Kendall test statistic Z, Sen's slope, and Significance score (Sscore).Then we used an improved RS (Rank Score) method [39,40] to rank the GCMs to comprehensively assess the performance of GCMs for temperature simulation over the LMB.

Assessment of the Performance for CMIP5 GCMs
A GCM that performs well for a specific statistic does not necessarily perform well for a different statistic; moreover, an individual assessment criterion can produce a biased assessment [39].Thus, it is beneficial to use multiple criteria for comprehensively assessing the performance for CMIP5 GCMs.Thus, we used seven assessment criteria, including Mean value (M), Standard deviation (SD), Normalized Root mean square error (NRMSE), linear correlation coefficient (r) for spatial distribution, Mann-Kendall test statistic Z, Sen's slope, and Significance score (S score ).Then we used an improved RS (Rank Score) method [39,40] to rank the GCMs to comprehensively assess the performance of GCMs for temperature simulation over the LMB.

1.
Mean value The mean value (M) is defined as follows: Here T i represents the monthly temperature for the LMB at ith time step, and n represents the total number of the time steps.

2.
Standard deviation The standard deviation (SD) is defined as follows: Here T i represents the monthly temperature for the LMB at ith time step, T represents the mean monthly temperature for the LMB, and n represents the total number of the time steps.A smaller value of SD indicates a better performance of a GCM.

3.
Normalized root mean square error The normalized root mean square error (NRMSE) is defined as follows: Here Tm i and To i represent the monthly temperature for the GCM and the observed value for the LMB at ith time step, respectively.To represents the mean value of the observed value for the LMB.n represents the total number of the time steps.A smaller value of NRMSE indicates a better performance of a GCM. 4.
Linear correlation coefficient (r) for spatial distribution The correlation coefficient (r) was used to as a measure to compare spatial distribution of temperature between the observation and the GCMs.The sample size is 21, and r was calculated between the observation and the GCMs for long-term mean values of each grid.The formula is defined as follows: Tm i and To i represent temperature for the GCM and the observation of the mean annual values at ith grid, respectively, and the Tm and To represent the corresponding mean values for temperature of the GCM and the observation of all grids, respectively.A larger value of the r indicates a better performance of a GCM. 5.
Mann-Kendall test statistic Z and Sen's slope The Mann-Kendall test statistic Z and Sen's slope were used to obtain the trends and their magnitudes for GCMs and observations.Thus, the ability of how well the GCMs represent the variation trend of the observations can be obtained.The statistics of the annual time series were used for analysis.The Mann-Kendall test statistic Z is defined as follows [41][42][43]: , S < 0 and Here x k , x i are the sequential temperature values, n is the length (44) of the dataset, and and Here t is the extent of any given tie and ∑ denotes the summation over all ties.The Sen's slope is defined as follows [44,45]: Here 1 < j < i < n, and the slope estimator β represents the median of the entire dataset.6.
Probability density functions (PDF) The Significance score (S score ) was used to assess the GCM's probability density functions (PDF) for monthly temperature.The formula is defined as follows [46]: Here B mi and B oi represent the probability of GCM and observed temperature values at the ith of bin, respectively.The n is the number of bins, and it was set to 100 according to the data range.The S score is a measurement of the degree of overlap between the simulated probability distribution and the observed value [46].Thus, a larger S score value indicates a better performance of a GCM [39].7.
Improved RS (Rank Score) The improved RS distinguishes between the relative error index and non-relative error index in comparison to the RS method, which could avoid inconsistent results [40].For example, a smaller value of NRMSE of the relative error index indicates a better performance of a GCM, while a larger value of r of the non-error index indicates a better performance of a GCM.Thus, the improved RS can be used for different assessment criteria and climatic variables to comprehensively assess the performance of GCMs in the regions.The Rank Score of each assessment criterion can be calculated by its statistic [40]: , T represents the relative error index Here RS i represents the score for GCM calculated by an assessment criterion i.For the relative error indexes of M, SD, Z and Sen's slope, T i represents absolute error that was calculated between a GCM and the observation (Equation ( 12)), and T min and T max represent the corresponding minimum and maximum among all GCMs.Moreover, for the relative error indexes of NRMSE, T i represents the absolute value of statistic for a GCM, and T min and T max represent the corresponding minimum and maximum among all GCMs.For the non-relative error index of r and S score , T i represents the absolute value of the statistic for a GCM, and T min and T max represent the corresponding minimum and maximum among all GCMs.
Here T sm and T so represent the statistics of the GCM and the observation, respectively.Therefore, the overall RS for temperature can be calculated as follows: Here RS T represents the overall RS of temperature for the GCM.Here n = 7 and i represents an assessment criterion, such as M, SD, NRMSE, Z, Sen's slope, spatial distribution r, and S score .W i represents the weight for an assessment criterion i, W s represents the sum weight of all the assessment criteria.Since Z and Sen's slope are part of trend analysis, we set 0.5 weight for Z, Sen's slope, respectively, while 1.0 weight for M, SD, NRMSE, r, and S score , respectively.

Projection of Future Temperature Change
Based on the results of the performance for GCMs, we used two multi-model ensemble (MME) methods (sub-ensemble and full ensemble) to make comparison of future temperature projection under RCP2.6,RCP4.5, and RCP8.5 scenarios, which was calculated using arithmetic mean (AM) method.The formula can be defined as follows: For the sub-ensemble method, j represents the GCM that ranked within the top 25% of all the GCMs, T ji represents the future monthly temperature of GCM j at ith time step, and n represents the total number of the GCMs ranked within the top 25% of all the GCMs.For the full ensemble method, j represents the entire GCMs, which have outputs of three RCP scenarios, T ji represents the future monthly temperature of GCM j at ith time step, and n represents the total number of the GCMs that has data of three RCP scenarios.
Subsequently, we applied the Delta method to calculate temperature changes between the historical MME outputs of temperature for the reference period 1961-2004 and projections of temperature under the same time duration of 44 years in the future, which can be defined as follows: Here T s,i and T r,i represent mean temperature for month i of the projections of 44 years in the future and the historical MME outputs of the reference period of 1961-2004, respectively.We focused on two 44 years of 2006-2049 and 2050-2093 under RCP2.6,RCP4.5, and RCP8.5 scenarios for the projection of future temperature changes.

Annual Cycle of Temperature
Temperature variation of the mean annual cycle for the observation and the 34 GCMs for the period 1961-2004 over the LMB is shown in Figure 2. Most of the GCMs had effectively reproduced the single-peak pattern of temperature in the mean annual cycle.The errors between the 34 GCMs and the observed temperature ranged from-3.02C for the mean temperature of annual, spring (March-April-May), summer (June-July-August), autumn (September-October-November), and winter (December-January-February), respectively (Figure 2).Moreover, the number of the negative errors showed by the 34 GCMs were 29, 18, 26, 32, and 32 for the mean temperature of annual, spring, summer, autumn, and winter, respectively.This indicated that most of the GCMs tended to underestimate temperature relative to the observation, especially during the winter.

Characteristics of Statistics in Criteria
Table 2 shows the M, SD, NRMSE, r, Mann-Kendall test statistic Z, Sen's slope, and S score of the temperature for the 34 GCMs.The M for the temperature of the GCMs ranged from 21.60 • C to 25.68 • C, with mean and median values at 23.65 • C and 23.67 • C, respectively.A smaller absolute error indicates a better correspondence with the observed temperature for the GCM.CanESM2, HadGEM2-ES, and HadGEM2-CC had the smallest absolute errors (0.14 • C, 0.17     Most of the GCMs showed relatively high r, with minimum and mean r values of 0.83 and 0.92, respectively, and 26 of the 34 GCMs had higher r (r > 0.9), indicating good reproducibility of characteristics of time series and spatial distributions of the observed temperature.ACCESS1.3,GFDL-CM3, and GFDL-ESM2M had the highest spatial r at 0.972, 0.972, and 0.959, respectively, but the lowest values for INMCM4.0,CanESM2, and FGOALS-g2 were at 0.829, 0.874, and 0.883, respectively.

Characteristics of Statistics in Criteria
The observed temperature showed a significantly positive trend, with a Z statistic of 2.80, at the 0.01 significance level and a Sen's slope of 0.01 • C/year.The Z statistics and Sen's slope of the 34 GCMs ranged from 0.31 to 5.27, 0.0026 to 0.0233, respectively.Out of the 34 GCMs, all were able to reproduce the positive trend of the observed temperature, and 30 (28) GCMs showed a significantly positive trend of observed temperature at the significance level of 0.05 (0.01) compared to that at 0.01 significance level for the observed temperature.Moreover, only six GCMs showed statistics of Sen's slope smaller than the observed temperature, indicating that most of the GCMs simulate higher temperature increase during 1961-2004 compared to the observed temperature.The results indicated that more than 80 percent of the GCMs showed relatively good reproducibility of the positive trend of the observed temperature.FIO-ESM, GFDL-ESM2M, and NorESM1-M had Z statistic values of 2.78, 2.78, and 2.84 closest to the observed temperature.CESM1(WACCM), CSIRO-Mk3.6.0, and GISS-E2-H had Sen's slope values of 0.0101, 0.0098, and 0.0096, which are in near agreement with the observed temperature.
The S score of the GCMs ranged from 0.51 to 0.70, with mean and median values at 0.61 and 0.62, respectively.Out of the 34 GCMs, 21 showed the S score values of more than 0.6.This indicated that more than half of the GCMs models had relatively good ability in reproducing the characteristics of the probability distribution function of the observed temperature.

Comparison of the Performance of the GCMs
Ranking scores of the seven criteria of the GCMs are shown in Figure 3, which were calculated based on the criteria statistics.Overall ranking scores of the GCMs are shown in Figure 4, which were calculated based on the ranking scores of the eight criteria.For a set of different criteria, a GCM may perform well for one criterion but badly for another.For example, CanESM2 had the highest ranking score value of 1 based on the M but the low ranking score value at 0.14 for the Sen's slope.Although GFDL-ESM2M had the lowest ranking score value at 0 by S score , it had the highest ranking score value at 1 by Z.For the overall ranking scores of the GCMs, a GCM may perform well for one or two criteria but badly for the overall result.For example, ACCESS1.3 had the highest ranking score value at 1 by r but low overall ranking scores at 0.51.Although INMCM4.0 had high ranking score value at 0.92 by Z, it had the lowest ranking score value at 0.21 for the overall result.This indicated that the results of GCMs performance relied mainly on the assessment of the criterion.Thus, using multiple criteria to comprehensively assess the performance of GCMs is essential.The Sscore of the GCMs ranged from 0.51 to 0.70, with mean and median values at 0.61 and 0.62, respectively.Out of the 34 GCMs, 21 showed the Sscore values of more than 0.6.This indicated that more than half of the GCMs models had relatively good ability in reproducing the characteristics of the probability distribution function of the observed temperature.

Comparison of the Performance of the GCMs
Ranking scores of the seven criteria of the GCMs are shown in Figure 3, which were calculated based on the criteria statistics.Overall ranking scores of the GCMs are shown in Figure 4, which were calculated based on the ranking scores of the eight criteria.For a set of different criteria, a GCM may perform well for one criterion but badly for another.For example, CanESM2 had the highest ranking score value of 1 based on the M but the low ranking score value at 0.14 for the Sen's slope.Although GFDL-ESM2M had the lowest ranking score value at 0 by Sscore, it had the highest ranking score value at 1 by Z.For the overall ranking scores of the GCMs, a GCM may perform well for one or two criteria but badly for the overall result.For example, ACCESS1.3 had the highest ranking score value at 1 by r but low overall ranking scores at 0.51.Although INMCM4.0 had high ranking score value at 0.92 by Z, it had the lowest ranking score value at 0.21 for the overall result.This indicated that the results of GCMs performance relied mainly on the assessment of the criterion.Thus, using multiple criteria to comprehensively assess the performance of GCMs is essential.

Sensitivity Analysis of the GCMs Performance
The sensitivity analysis of the GCMs performance were made in two ways.First, the overall ranking scores were compared to the ranking scores after removing one criterion (Figure 5).The results showed overall ranking score had not been changed as a whole after removing each criterion individually, indicating that improved score-based method combining multiple criteria showed a robust assessment of the GCMs performance.Second, the temporal changes of the overall ranking scores were evaluated by comparing the results of two different reference time periods.In this paper, we selected the longest time series of 1961-2004 as the reference time period.Thus we selected a relatively short time series of 1975-2004 as another reference time period for comparison.As shown in Figure 6a, as a whole, the ranking scores of the GCMs had not changed much as the reference time periods changed.Moreover, the top 25% ranked GCMs were listed separately for making comparison under the two reference time periods (Figure 6b,c).Although the ranking scores of the top 25% ranked GCMs were different under the two reference time periods, ACCESS1.0,GISS-E2-H, GISS-E2-R, HadGEM2-ES, HadGEM2-CC, MIROC5, and MIROC4h were all ranked within the top 25% under the two reference time periods, except CanESM2 and CNRM-CM5 (red bar in Figure 6b) ranked within the top 25% for the reference time period of 1975-2004 and CESM1(WACCM) and MPI-ESM-MR (red bar in Figure 6c) ranked within the top 25% for the reference time period of 1961-2004.These indicated that the results of performance of the GCMs based on the improved score-based method are reliable and can be used as a reference for projecting temperature change in the future over the LMB.

Sensitivity Analysis of the GCMs Performance
The sensitivity analysis of the GCMs performance were made in two ways.First, the overall ranking scores were compared to the ranking scores after removing one criterion (Figure 5).The results showed overall ranking score had not been changed as a whole after removing each criterion individually, indicating that improved score-based method combining multiple criteria showed a robust assessment of the GCMs performance.Second, the temporal changes of the overall ranking scores were evaluated by comparing the results of two different reference time periods.In this paper, we selected the longest time series of 1961-2004 as the reference time period.Thus we selected a relatively short time series of 1975-2004 as another reference time period for comparison.As shown in Figure 6a, as a whole, the ranking scores of the GCMs had not changed much as the reference time periods changed.Moreover, the top 25% ranked GCMs were listed separately for making comparison under the two reference time periods (Figure 6b,c).Although the ranking scores of the top 25% ranked GCMs were different under the two reference time periods, ACCESS1.0,GISS-E2-H, GISS-E2-R, HadGEM2-ES, HadGEM2-CC, MIROC5, and MIROC4h were all ranked within the top 25% under the two reference time periods, except CanESM2 and CNRM-CM5 (red bar in Figure 6b) ranked within the top 25% for the reference time period of 1975-2004 and CESM1(WACCM) and MPI-ESM-MR (red bar in Figure 6c) ranked within the top 25% for the reference time period of 1961-2004.These indicated that the results of performance of the GCMs based on the improved score-based method are reliable and can be used as a reference for projecting temperature change in the future over the LMB.Comparison of overall RS (ranking scores) and RS after removing one criterion.The numbers in the X axis represent the ID numbers in Table 1. Figure 5.Comparison of overall RS (ranking scores) and RS after removing one criterion.The numbers in the X axis represent the ID numbers in Table 1.respectively.The numbers on the x-axis (a) represent the ID numbers in Table 1.

Future Temperature Projection
The top 25% ranked GCMs (except the ACCESS1.0,HadGEM2-CC, and MIROC4h because of not having the outputs under the RCP 2.6 scenario) and the entire GCMs (except the ACCESS1.0,ACCESS1.3,CMCC-CMS, HadGEM2-CC, MIROC4h, INMCM4.0, and IPSL-CM5B-LR because of not having the outputs under the RCP 2.6 scenario) were used for calculating projections from the sub-ensemble and the full ensemble by arithmetic mean (AM) method, respectively.Then, we assessed the future temperature change under two periods of the near future (2006-2049) and the far future (2050-2093) under RCP 2.6, RCP 4.5, and RCP 8.5 scenarios based on the Delta method.
Table 3 shows that the LMB had higher mean annual temperature increase during 2050-2093 under RCP2.6,RCP4.5, and RCP8.5 scenarios compared to the period of 2006-2049, respectively, relative to the  1.

Future Temperature Projection
The top 25% ranked GCMs (except the ACCESS1.0,HadGEM2-CC, and MIROC4h because of not having the outputs under the RCP 2.6 scenario) and the entire GCMs (except the ACCESS1.0,ACCESS1.3,CMCC-CMS, HadGEM2-CC, MIROC4h, INMCM4.0, and IPSL-CM5B-LR because of not having the outputs under the RCP 2.6 scenario) were used for calculating projections from the sub-ensemble and the full ensemble by arithmetic mean (AM) method, respectively.Then, we assessed the future temperature change under two periods of the near future (2006-2049) and the far future (2050-2093) under RCP 2.6, RCP 4.5, and RCP 8.5 scenarios based on the Delta method.
Table 3 shows that the LMB had higher mean annual temperature increase during 2050-2093 under RCP2.6,RCP4.5, and RCP8.5 scenarios compared to the period of 2006-2049, respectively, relative to the reference period of 1961-2004.Moreover, the projection by the full ensemble showed higher mean annual temperature increase compared to the sub-ensemble, with mean temperature increase at 1. showed highest warming for RCP8.5 scenario, followed by RCP4.5 scenario, with the RCP2.6 scenario showing the lowest warming, which is likely to increase at 0.04 • C, 0.16 • C, and 0.37 • C decade −1 under RCP2.6,RCP4.5, and RCP8.5 scenarios by the sub-ensemble, while higher temperature increase at 0.05 • C, 0.17 • C, and 0.39 • C by the full ensemble, with Z statistics of annual temperature increase series at 4.72 (3.83), 10.26 (8.26), and 12.16 (12.07) for the sub-ensemble and full ensemble, respectively (Figure 7), indicating significant temperature increase over the LMB in the future.to the reference period of 1961-2004 under RCP2.6,RCP4.5, and RCP8.5 scenarios, respectively.On the whole, the distributions of the mean annual temperature change showed relatively similar spatial distributions between the near future (2006-2049) and the far future (2050-2093) under projections from the sub-ensemble and the full ensemble, showing that the higher warming mainly occurred in the northern and central areas of the LMB, while the lower warming mainly occurred in the southeast and the southwest, especially under the RCP4.5 and RCP8.5 scenarios.Moreover, the warming for individual grid increased as the RCP increased for the sub-ensemble and the full ensemble.For the seasonal change, the similar spatial patterns of temperature changes were also found in the MAM, JJA (June-July-August), SON (September-October-November), and DJF, but the MAM warming showed relatively higher in comparison with other seasonal warming.
Atmosphere 2019, 10, x FOR PEER REVIEW 17 of 26 from the sub-ensemble and the full ensemble, showing that the higher warming mainly occurred in the northern and central areas of the LMB, while the lower warming mainly occurred in the southeast and the southwest, especially under the RCP4.5 and RCP8.5 scenarios.Moreover, the warming for individual grid increased as the RCP increased for the sub-ensemble and the full ensemble.For the seasonal change, the similar spatial patterns of temperature changes were also found in the MAM, JJA (June-July-August), SON (September-October-November), and DJF, but the MAM warming showed relatively higher in comparison with other seasonal warming.

Discussion
In this paper, most of the CMIP5 GCMs leaned towards underestimating the temperature relative to the observation, with mean annual bias at −3.02 °C for winter over the LMB.Similar results also been shown by Huang et al. [35] that the CMIP5 GCMs underestimated the annual mean surface air temperature relative to the Climate Research Unit temperature data (CRU TS 3.21) in Mekong River Basin.Other study in monsoon region of Sichuan Basin conducted by Bannister et al. [17] showed that mean temperature was underestimated by CMIP5 GCMs, especially during the winter, with bias exceeding −3 °C.Furthermore, another study in the Qinghai-Tibetan Plateau also showed that CMIP5 GCMs underestimated annual and seasonal temperatures, with bias at −2.3 °C for the annual mean, and larger cold biases for autumn and winter [47].However, a study by Miao et al. [5] showed that most of the CMIP5 GCMs overestimated the annual mean surface air temperature in Northern Eurasia, especially during the winter.In addition, Zazulie et al. [48] also found that most

Discussion
In this paper, most of the CMIP5 GCMs leaned towards underestimating the temperature relative to the observation, with mean annual bias at −3.02 • C for winter over the LMB.Similar results also been shown by Huang et al. [35] that the CMIP5 GCMs underestimated the annual mean surface air temperature relative to the Climate Research Unit temperature data (CRU TS 3.21) in Mekong River Basin.Other study in monsoon region of Sichuan Basin conducted by Bannister et al. [17] showed that mean temperature was underestimated by CMIP5 GCMs, especially during the winter, with bias exceeding −3 • C. Furthermore, another study in the Qinghai-Tibetan Plateau also showed that CMIP5 GCMs underestimated annual and seasonal temperatures, with bias at −2.3 • C for the annual mean, and larger cold biases for autumn and winter [47].However, a study by Miao et al. [5]

Conclusions
This study focused mainly on the assessment of the performance of 34 CMIP5 GCMs in simulating the observed temperature using an improved score-based method and the projection of future temperature change over the LMB.The performance was assessed using seven criteria including the M, SD, NRMSE, r, Z, Sen's slope, and S score over the LMB.Then, based on the results of the performance for these GCMs, two ways of the multi-model ensemble (MME), sub-ensemble from the top 25% ranked GCMs, full ensemble from the entire GCMs were calculated using arithmetic mean (AM) method and downscaled using the Delta method to project future temperature during two future time periods, the near future (2006-2049) and the far future (2050-2093) under RCP2.6,RCP4.5, and RCP8.5 scenarios, respectively.The major findings of this study are summarized below: The results of the GCMs assessment varied as the criterion changed.It is essential to comprehensively assess the GCMs by multiple criteria rather than using single criterion.Based on the overall ranking scores, the top 25% ranked GCMs were GISS-E2-H, GISS-E2-R, MIROC5, HadGEM2-ES, ACCESS1.0,MPI-ESM-MR, HadGEM2-CC, CESM1(WACCM), and MIROC4h, which can be used to build the multi-model ensemble (MME) for future temperature projection over the LMB.Meanwhile, the improved score-based method combining multiple criteria showed a robust assessment of the GCMs performance, which can provide good information related to the studies of future temperature change.
The results showed significant temperature increase in the future over the LMB, with higher mean annual temperature increase from full ensemble and sub-ensemble at 1.Moreover, the projected seasonal temperature increase from the far future (2050-2093) also showed significantly higher than the near future (2006-2049).In addition, the projection by the full ensemble showed higher temperature increase than the sub-ensemble.The two ensembles showed similar spatial distributions of temperature change, with the higher warming mainly occurred in the northern and central areas of the LMB, while the lower warming mainly occurred in the southeast and the southwest, especially under the RCP4.5 and RCP8.5 scenarios.Moreover, the warming for individual grid increased as the RCP increased for the two ensembles.
The results provided an application of an improved score-based method in the GCMs assessment and the multi-model ensemble (MME) to project future temperature change over the LMB under three scenarios, which can be useful for GCMs assessment and policy making in this region.However, there remains a need to understand why most of the GCMs underestimated the observed temperature over the LMB, and a need to establish more meteorological stations for long-term observation over the LMB as well as a need to extend temperature projection to a larger region such as the entire Mekong Basin or the Southeast Asia for better understanding regional climate change impacts.Moreover, there existed differences about the magnitudes of the future temperature change between the two ways of ensemble.Therefore, in order to reduce the uncertainty of temperature projection in further studies in the LMB, multiple methods of GCMs ensemble should be taken into considered and compared.

Figure 1 .
Figure 1.Location of the Lower Mekong Basin (LMB) and the 21 selected grids (shade of yellow) over the LMB.

Figure 1 .
Figure 1.Location of the Lower Mekong Basin (LMB) and the 21 selected grids (shade of yellow) over the LMB.

Figure 2 .
Figure 2. Variation of the observed temperature and GCMs of the mean annual cycle during the reference period 1961-2004 over the LMB.

Table 2
shows the M, SD, NRMSE, r, Mann-Kendall test statistic Z, Sen's slope, and Sscore of the temperature for the 34 GCMs.The M for the temperature of the GCMs ranged from 21.60 °C to 25.68 °C, with mean and median values at 23.65 °C and 23.67 °C, respectively.A smaller absolute error indicates a better correspondence with the observed temperature for the GCM.CanESM2, HadGEM2-ES, and HadGEM2-CC had the smallest absolute errors (0.14 °C, 0.17 °C, and 0.20 °C, respectively), while INMCM4.0,NorESM1-M, and EC-EARTH had the largest absolute errors (3.02 °C, 2.62 °C, and 2.49 °C, respectively).

Figure 2 .
Figure 2. Variation of the observed temperature and GCMs of the mean annual cycle during the reference period 1961-2004 over the LMB.The SD values of the GCMs range from 1.57 to 3.77, with mean and median values of 2.96 and 3.05, respectively.CanESM2, GISS-E2-R, and FGOALS-g2 had the lowest absolute errors at 0.01, 0.33, and 0.40, respectively, indicating good simulation of the observed temperature.However, ACCESS1.3,CSIRO-Mk3.6.0,FIO-ESM had the highest absolute errors at 1.64, 1.42, and 1.23, respectively, indicating poor simulation of the observed temperature.The NRMSE values of the GCMs ranged from 0.52 to 1.57, with mean and median values at 0.86 and 0.92, respectively.CanESM2, GISS-E2-R, and MIROC5 had the smallest NRMSE values (0.52, 0.54, and 0.60, respectively), while INMCM4.0,NorESM1-M, and GFDL-ESM2G showed the largest values (1.57, 1.41, and 1.32, respectively).

Figure 3 .
Figure 3. Ranking scores of criteria of the GCMs over the LMB.

Figure 3 .
Figure 3. Ranking scores of criteria of the GCMs over the LMB.

Figure 4 .
Figure 4. Ascending order of ranking scores for the performance of the GCMs over the LMB.

Figure 4 . 26 Figure 5 .
Figure 4. Ascending order of ranking scores for the performance of the GCMs over the LMB.Atmosphere 2019, 10, x FOR PEER REVIEW 13 of 26

Figure 5 .
Figure 5.Comparison of overall RS (ranking scores) and RS after removing one criterion.The numbers in the X axis represent the ID numbers inTable 1.

Figure 6 .
Figure 6.Comparison of overall ranking scores of GCMs under reference periods of 1975-2004 and 1961-2004 (a) and top 25% ranking scores of GCMs under reference period of 1975-2004 (b) and 1961-2004 (c), respectively.The numbers on the x-axis (a) represent the ID numbers in Table1.

Figure 6 .
Figure 6.Comparison of overall ranking scores of GCMs under reference periods of 1975-2004 and 1961-2004 (a) and top 25% ranking scores of GCMs under reference period of 1975-2004 (b) and 1961-2004 (c), respectively.The numbers on the x-axis (a) represent the ID numbers in Table1.

Figure 7 .
Figure 7. Future temperature changes for RCP2.6,RCP4.5, and RCP8.5 scenarios relative to the period of 1961-2004 over the LMB under two ways of multi-model ensemble after smoothing with five years moving average.The red lines represent the mean values of all the grids.The shaded areas represent the values of variation range of all the grids.The dotted line represents the linear regression.The degrees of freedom (DOF) are all 83 for the RCP2.6,RCP4.5, and RCP8.5 scenarios.Future temperature change under RCP2.6,RCP4.5, and RCP8.5 scenarios of all 21 grids over the LMB are shown in Figures 8-12.It is projected that the temperature will increase over the LMB.The projected mean annual temperature increase from the far future (2050-2093) showed significantly higher than the near future (2006-2049), and the projection by the full ensemble showed higher temperature increase than the sub-ensemble, with the values range from 1.06 °C to 1.39 °C, 1.55 °C to 2.14 °C, and 2.46 °C to 3.33 °C for the far future (2050-2093) and 0.81 °C to 1.00 °C, 0.83 °C to 1.10 °C, 0.92 °C to 1.20 °C for the near future (2006-2049) by the full ensemble, while range from 0.92 °C to 1.21 °C, 1.45 °C to 1.86 °C, and 2.36 °C to 3.03 °C for the far future (2050-2093) and 0.77 °C to 0.99 °C, 0.82 °C to 1.06 °C, and 0.93 °C to 1.15 °C for the near future (2006-2049) by the sub-ensemble, relative to the reference period of 1961-2004 under RCP2.6,RCP4.5, and RCP8.5 scenarios, respectively.On the whole, the distributions of the mean annual temperature change showed relatively similar spatial distributions between the near future (2006-2049) and the far future (2050-2093) under projections

Figure 7 .
Figure 7. Future temperature changes for RCP2.6,RCP4.5, and RCP8.5 scenarios relative to the period of 1961-2004 over the LMB under two ways of multi-model ensemble after smoothing with five years moving average.The red lines represent the mean values of all the grids.The shaded areas represent the values of variation range of all the grids.The dotted line represents the linear regression.The degrees of freedom (DOF) are all 83 for the RCP2.6,RCP4.5, and RCP8.5 scenarios.Future temperature change under RCP2.6,RCP4.5, and RCP8.5 scenarios of all 21 grids over the LMB are shown in Figures 8-12.It is projected that the temperature will increase over the LMB.The projected mean annual temperature increase from the far future (2050-2093) showed significantly higher than the near future (2006-2049), and the projection by the full ensemble showed higher temperature increase than the sub-ensemble, with the values range from 1.06 • C to 1.39 • C, 1.55 • C to 2.14 • C, and 2.46 • C to 3.33 • C for the far future (2050-2093) and 0.81 • C to 1.00 • C, 0.83 • C to 1.10 • C,

Table 1 .
Basic information of the CMIP5 models used in this study.

Table 2 .
Statistics of assessment criteria of the GCMs over the LMB."Obs" represents the observation, the ID numbers represent the GCMs, which the same as the ID in Table1.
Atmosphere 2019, 10, x FOR PEER REVIEW 11 of 26 temperature.Moreover, only six GCMs showed statistics of Sen's slope smaller than the observed temperature, indicating that most of the GCMs simulate higher temperature increase during 1961-2004 compared to the observed temperature.The results indicated that more than 80 percent of the GCMs showed relatively good reproducibility of the positive trend of the observed temperature.FIO-ESM, GFDL-ESM2M, and NorESM1-M had Z statistic values of 2.78, 2.78, and 2.84 closest to the observed temperature.CESM1(WACCM), CSIRO-Mk3.6.0, and GISS-E2-H had Sen's slope values of 0.0101, 0.0098, and 0.0096, which are in near agreement with the observed temperature.

Table 3 .
Changes in mean annual and seasonal temperature ( • C) over the LMB under RCP2.6,RCP4.5, and RCP8.5 relative to the period of 1961-2004 under two ensemble methods.Values in parentheses represent the results of the full ensemble.