Performance Assessment of General Circulation Model in Simulating Daily Precipitation and Temperature Using Multiple Gridded Datasets

: The performance of general circulation models (GCMs) in a region are generally assessed according to their capability to simulate historical temperature and precipitation of the region. The performance of 31 GCMs of the Coupled Model Intercomparison Project Phase 5 (CMIP5) is evaluated in this study to identify a suitable ensemble for daily maximum, minimum temperature and precipitation for Pakistan using multiple sets of gridded data, namely: Asian Precipitation– Highly-Resolved Observational Data Integration Towards Evaluation (APHRODITE), Berkeley Earth Surface Temperature (BEST), Princeton Global Meteorological Forcing (PGF) and Climate Prediction Centre (CPC) data. An entropy-based robust feature selection approach known as symmetrical uncertainty (SU) is used for the ranking of GCM. It is known from the results of this study that the spatial distribution of best-ranked GCMs varies for different sets of gridded data. The performance of GCMs is also found to vary for both temperatures and precipitation. The Commonwealth Scientiﬁc and Industrial Research Organization, Australia (CSIRO)-Mk3-6-0 and Max Planck Institute (MPI)-ESM-LR perform well for temperature while EC-Earth and MIROC5 perform well for precipitation. A trade-off is formulated to select the common GCMs for different climatic variables and gridded data sets, which identify six GCMs, namely: ACCESS1-3, CESM1-BGC, CMCC-CM, HadGEM2-CC, HadGEM2-ES and MIROC5 for the reliable projection of temperature and precipitation of Pakistan. gridded data. MPI-ESM-LR is found to receive the second highest score for the maximum temperature, while ACCESS1-3 for the minimum temperature. Different GCMs are found to receive the higher score for precipitation. EC-EARTH is found to receive the highest score for CPC and PGF while MIROC5 for APHRODITE. EC-Earth is found to receive the second highest score for APHRODITE while MIROC5 for CPC and PGF.


Introduction
Simulations of general circulation models (GCM) are generally used for the assessment of climate change. However, the GCM simulated climate is associated with large uncertainties due to model structure, parametrization, assumption, calibration processes and so on [1,2]. Therefore, all the GCMs cannot be directly used to project the future climate in certain regions. In general, a small group of GCMs is selected for a given region by excluding those which have no similarity to the climate in order to reduce the uncertainty associated with the GCMs [3][4][5][6].
Generally, GCMs are selected based on several approaches; first, the past performance-based approach which selects GCM based on its ability to replicate historical climate [7] and second, the envelope approach where GCMs are selected according to their agreement in the projections of the future climate [8]. The past performance evaluation method does not take account of the future projection of the climate while the envelope-based evaluation does not consider the GCM ability to replicate the past climate. Therefore, the third approach known as the hybrid approach, which combines the past-performance approach with the envelope method is often used. In this approach, the GCMs are screened based on their past performance. Then, the final set of GCMs are selected based on envelope approach [3]. However, when a whole pool of GCMs is considered for the selection of an ensemble, the past-performance approach is more realistic as the pool of GCMs covers the whole range of projections. Most of the previous studies also suggest that past performance evaluation is one of the most suitable approaches because the GCMs that are best in simulating the past climatic conditions are more likely to predict the future climate [9,10].
The past-performance of GCMs is usually assessed by comparing the GCM simulated climatic variables with the gridded climatic variables over a historical period. It is assumed that all gridded data used for this purpose represent the seasonal and spatial variability of the climate of the desired region. However, the performance of gridded climate data varies dramatically with space and time. This is mainly due to the type of interpolation technique, the quality of observed data and the spatial distribution of the gauges [11]. Additionally, the temporal variation in the number of observed data also affects the performance of gridded data over time. Therefore, the past performance assessment of GCMs based on single gridded data over a longer period may not be reliable.
Feature selection approaches are generally used for the selection of GCMs. The feature selection algorithm can be broadly classified into filter and wrapper. In the case of a filter approach, the GCMs are selected based on the underlying characteristics of the data using certain criteria such as correlation, mean error and so on. A wrapper approach uses a learning process to select a subset of features that have a higher capability to predict the independent variable. In the case of a filter approach, each feature is considered individually without consideration of correlation among the features. For GCM selection, it is not necessary to consider the relation between GCM simulations or predictive capability of GCMs. Therefore, filters are generally used for the selection of GCMs [12].
Various attempts have been conducted to determine the performance of GCMs in respect to gridded data using different types of filters which include weighted skill score [13,14], clustering hierarchy approach [15,16], spectral analysis [17], Bayesian weighting [18], different performance indicators [19,20] and so on. Besides, various statistical indicators have been used for GCM evaluation, ranking and selection [21][22][23][24][25][26][27]. The performance matrices are mostly evaluated on the basis of the mean state of the climatic condition where temporal variability such as trends or seasonal variability of the climate is not given full attention [28].
In recent years, one of the widely used algorithms for the variable selection is the symmetrical uncertainty (SU) algorithm. This filter-based tool [29] selects variables in an unbiased and reliable manner. The SU can track the similarity and dissimilarity between the time series of GCM simulation for historical period and observed climate variable and select the GCMs based on their ability to simulate observed climate variable. The major advantage of SU is that its performance does not depend on the underlying distribution and different conditional dependencies of data. It provides a general aspect of the association between the independent and the dependent variables. Therefore, it can be used to avoid multiple statistical metrics for performance assessment.
The objective of this study is to evaluate the capability of 31 CMIP5 GCMs in reconstructing of temperature and precipitation from different gridded data using SU to identify the most suitable ensemble of GCMs for the reliable climate projection. This study is applied to the case of Pakistan where a diverse climate with large spatial and seasonal variability exist. The ensemble of GCMs selected in this study can be used for reliable projection of climate and mitigation planning for Pakistan which is highly vulnerable to climate change.

Geography and Climate of Pakistan
Pakistan (latitudes 24 • 00 N-37 • 30 N and longitudes 61 • E-78 • E) covers an area of 796,095 km 2 showing a predominantly arid to semi-arid climate [30]. The topography of Pakistan significantly varies from a high and elevated region in the north to the coastal flat plains in the south (Figure 1), depending on which huge spatial and temporal variation in the climatic condition is observed. The precipitation also varies widely with less than 100 mm in the western and northwestern area and more than 1100 mm in the northeastern and northern part of the country. The precipitation is observed to increase from the arid southern coastal regions to the northern and northeastern region of the country. However, the major parts of the country have a precipitation less than 500 mm [31]. The south eastern parts of the country have the highest rainfall during summer (June-September) and least rainfall during winter (December-February), while the northern parts have the highest rainfall during winter and the lowest rainfall in summer. The maximum temperature is often above 50 • C in the south in May, while the minimum temperature is below −20 • C in the northern Himalaya and sub-Himalaya region in January [30]. Pakistan (latitudes 24°00′ N-37°30′ N and longitudes 61° E-78° E) covers an area of 796,095 km 2 showing a predominantly arid to semi-arid climate [30]. The topography of Pakistan significantly varies from a high and elevated region in the north to the coastal flat plains in the south (Figure 1), depending on which huge spatial and temporal variation in the climatic condition is observed. The precipitation also varies widely with less than 100 mm in the western and northwestern area and more than 1100 mm in the northeastern and northern part of the country. The precipitation is observed to increase from the arid southern coastal regions to the northern and northeastern region of the country. However, the major parts of the country have a precipitation less than 500 mm [31]. The south eastern parts of the country have the highest rainfall during summer (June-September) and least rainfall during winter (December-February), while the northern parts have the highest rainfall during winter and the lowest rainfall in summer. The maximum temperature is often above 50 °C in the south in May, while the minimum temperature is below −20 °C in the northern Himalaya and sub-Himalaya region in January [30].

Gridded Datasets
Four gridded climate datasets are used in this study namely, Berkeley Earth Surface Temperature (BEST), Climate Prediction Centre (CPC), the Princeton Global Meteorological Forcing (PGF) and the Asian Precipitation Highly-Resolved Observational Data Integration Towards Evaluation of Water Resources (APHRODITE). The choice of these datasets is based on the availability of daily data for a longer period. The gridded data products which have the daily resolution are used for the assessment of the performance of GCMs with an expectation that GCMs  a longer period. The gridded data products which have the daily resolution are used for the assessment of the performance of GCMs with an expectation that GCMs selected based on those gridded data can be used to assess the changes in different climatic extremes which are estimated by daily gridded data and selected by GCMs. Assessment of the changes of climate extremes due to global warming is important for Pakistan as most of the studies revealed that major impact of climate change in the country will be due to the changes of climatic extremes such as heat waves, hot days, dry spells and heavy rainfall [32]. GCMs selected from daily rainfall and temperature gridded data is suitable to this purpose. The major characteristics and sources of the gridded data used in this study are summarized in Table 1. The BEST dataset is obtained from data from 36,866 stations [33], while the CPC dataset is developed using 30,000 stations data [34]. The PGF dataset is developed by merging the reanalysis data with observed data collected across the globe [35,36]. On the other hand, APHRODITE is developed using 5000 to 12,000 valid stations in the Monsoon Asian region (60 • E-150 • E longitude, 15 • S-55 • N latitude) [37][38][39].
Among the four gridded datasets maximum and minimum temperature data of BEST, CPC and PGF are used to assess the GCMs' temperature, while the precipitation of CPC, PGF and APHRODITE are used for the assessment of GCMs' precipitation. The annual mean maximum, minimum and precipitation for each dataset are presented in Figure 2. The selected data products are available in different spatial resolution range from 0.25 to 1 • . The gridded datasets are regridded to a common spatial resolution of 2 • × 2 • to be in accord with the agreed resolution of the GCM. The placement of these grid points across the study area is shown in Figure 1. Grid average technique is used to upscale the gridded data into the coarse resolution. In this approach, temperature and precipitation data of all the grids within each 2 • × 2 • domain are averaged to convert the higher resolution data to lower resolution. selected based on those gridded data can be used to assess the changes in different climatic extremes which are estimated by daily gridded data and selected by GCMs. Assessment of the changes of climate extremes due to global warming is important for Pakistan as most of the studies revealed that major impact of climate change in the country will be due to the changes of climatic extremes such as heat waves, hot days, dry spells and heavy rainfall [32]. GCMs selected from daily rainfall and temperature gridded data is suitable to this purpose. The major characteristics and sources of the gridded data used in this study are summarized in Table 1. The BEST dataset is obtained from data from 36,866 stations [33], while the CPC dataset is developed using 30,000 stations data [34]. The PGF dataset is developed by merging the reanalysis data with observed data collected across the globe [35,36]. On the other hand, APHRODITE is developed using 5000 to 12,000 valid stations in the Monsoon Asian region (60° E-150° E longitude, 15° S-55° N latitude) [37][38][39].
Among the four gridded datasets maximum and minimum temperature data of BEST, CPC and PGF are used to assess the GCMs' temperature, while the precipitation of CPC, PGF and APHRODITE are used for the assessment of GCMs' precipitation. The annual mean maximum, minimum and precipitation for each dataset are presented in Figure 2. The selected data products are available in different spatial resolution range from 0.25 to 1°. The gridded datasets are regridded to a common spatial resolution of 2° × 2° to be in accord with the agreed resolution of the GCM. The placement of these grid points across the study area is shown in Figure 1. Grid average technique is used to upscale the gridded data into the coarse resolution. In this approach, temperature and precipitation data of all the grids within each 2° × 2° domain are averaged to convert the higher resolution data to lower resolution.   A total of 31 CMIP5 GCM simulations are used in this study. The name, modeling center and resolution of the GCMs used in the study are given in Table 2. The GCM simulations are downloaded from the Inter-governmental Panel on Climate Change (IPCC) portal [44]. The GCMs are selected on the basis of the availability of daily simulation for two representative concentration pathways (RCP) scenarios, RCP4.5 and RCP8.5. The RCP4.5 is an intermediate pathway scenario which shows a good agreement with the latest policy of lower greenhouse gas emission by the global community. Therefore, it is often considered as a very-good-case scenario in the context of recent policy directions [45]. On the other hand, RCP 8.5 is the business-as-usual scenario which provides the possible highest impact on climate change. Therefore, RCP 4.5 and RCP 8.5 were selected as these two scenarios can provide a possible complete range of impact. The GCMs are available in different resolutions. Therefore, all selected CMIP5 temperature and precipitation simulations are regridded into the common 2 • × 2 • grid for fair comparison using bilinear interpolation technique. This technique uses a minimum of four points from the domain and nearby areas to generate point output from each GCM at each grid point and thus provides a smooth interpolation. It is widely used for regridding of GCMs for comparison [4].

Procedure
Temporal alignment does not exist between the daily simulation of GCMs and observed data. However, it is expected that GCMs are able to characterize the wet and dry seasons and warm and cold seasons reasonably well. Therefore, the daily gridded datasets are converted to monthly scale and the association of GCM data with gridded data is compared at a monthly scale to evaluate the performance of GCMs. The gridded data products which have daily simulations are used for the assessment of the performance of GCMs in order to facilitate the changes in different climatic extremes. The procedure for the evaluation of GCMs based on their historical simulation of temperature and precipitation is outlined below: (1) The GCMs are evaluated and ranked using SU at each grid point (total 36 grid points at 2 • × 2 • resolution to cover whole Pakistan, Figure 1) according to their capability to replicate gridded temperature and precipitation data for the period 1979-2005. (2) The GCMs are ranked based on their positions at different grid points using a weighting technique, where more weight is given to GCMs which archive a higher rank in most of the grid points. A separate list of rank is prepared for each climatic variable (maximum and minimum temperature and precipitation) and each gridded dataset. (3) The common GCMs appeared above the 50-th percentile of all GCMs in each list are identified as the most appropriate ensemble for the projection of temperature and precipitation of Pakistan.
The daily mean temperature can be used for the selection of GCMs instead of selecting GCMs separately for maximum and minimum temperature. For the projection of extremes, better simulation of minimum and maximum temperature instead of mean temperature is required. However, better simulation mean the temperature does not guarantee a better simulation of maximum and minimum temperature. Therefore, GCMs are selected in this study based on both maximum and minimum temperature. The ranking of GCMs at each grid point using SU and the ranking of GCMs for whole of Pakistan considering the SU rank at each grid points using a weighting technique are elaborated in the following sections.

Symmetric Uncertainty (SU)
It is expected that GCMs have a capability to replicate the annual and seasonal variations of observed rainfall and temperature for the historical period. SU is a filtering approach based on information theory which can be used to assess the similarity between two time series [46]. In this study, SU is used to measure the similarity between the time series of gridded data and GCMs historical simulation [4,5]. For example, the monthly rainfall time series of the gridded data product is compared with result of GCMs at each grid point to assess the capability of the GCMs to replicate rainfall at each grid point.
SU uses the concept of mutual information (MI) to assess the association between two variables as shown in Figure 3. MI estimates the amount of information common between two variables. Let A and B be gridded rainfall and GCMs historical rainfall simulation at a grid point. If p(A) and p(B) are the probability density functions and p(A, B) is the joint probability density function of A and B, then MI between A and B is [47,48], MI estimates common information between two variables as the difference between the sum of the entropies and their joint entropy: where, H(A) and H(A, B) denotes Shannon's entropy of A and the joint entropy of A and B, respectively. The MI estimated using Equation (1) indicates the amount of mutual information between gridded rainfall and GCMs historical rainfall simulation have. If the variables are independent to each other, the MI is zero while a higher value of MI indicates the GCMs simulated rainfall has higher similarity with the gridded rainfall.
The MI is biased toward the variable having higher values. This can be overcome using SU where the estimated MI is divided by the total entropies of GCMs simulation (B) and gridded data (A): SU value 1 means a complete agreement between gridded data and GCMs simulation, while 0 indicates no agreement [49]. Details of the selection of GCMs based on SU are found in [4,5]. It is expected that GCMs have a capability to replicate the annual and seasonal variations of observed rainfall and temperature for the historical period. SU is a filtering approach based on information theory which can be used to assess the similarity between two time series [46]. In this study, SU is used to measure the similarity between the time series of gridded data and GCMs historical simulation [4,5]. For example, the monthly rainfall time series of the gridded data product is compared with result of GCMs at each grid point to assess the capability of the GCMs to replicate rainfall at each grid point.
SU uses the concept of mutual information (MI) to assess the association between two variables as shown in Figure 3. MI estimates the amount of information common between two variables. Let A and B be gridded rainfall and GCMs historical rainfall simulation at a grid point. If p(A) and p(B) are the probability density functions and p(A, B) is the joint probability density function of A and B, then MI between A and B is [47,48], MI estimates common information between two variables as the difference between the sum of the entropies and their joint entropy: where, H (A) and H(A, B) denotes Shannon's entropy of A and the joint entropy of A and B, respectively. The MI estimated using Equation (1) indicates the amount of mutual information between gridded rainfall and GCMs historical rainfall simulation have. If the variables are independent to each other, the MI is zero while a higher value of MI indicates the GCMs simulated rainfall has higher similarity with the gridded rainfall.
The MI is biased toward the variable having higher values. This can be overcome using SU where the estimated MI is divided by the total entropies of GCMs simulation (B) and gridded data (A): SU value 1 means a complete agreement between gridded data and GCMs simulation, while 0 indicates no agreement [49]. Details of the selection of GCMs based on SU are found in [4,5].

Ranking of GCMs using Weighting Method
SU provides a ranking of GCMs at each grid point. A simple multi-criteria decision-making approach is used in this study for the selection of GCMs for the whole study area based on the ranking of GCMs obtained at different grid points. If a GCM achieved 1st, 2nd, 3rd, 4th and 5th ranks at X1, X2, X3, X4 and X5 number of grid points, the GCMs are assigned a weighted score: The GCMs rank more than 5th at a grid point are not considered because it is considered that they are not fully capable to simulate temperature or precipitation at that grid point.

Ranking of GCMs using Weighting Method
SU provides a ranking of GCMs at each grid point. A simple multi-criteria decision-making approach is used in this study for the selection of GCMs for the whole study area based on the ranking of GCMs obtained at different grid points. If a GCM achieved 1st, 2nd, 3rd, 4th and 5th ranks at X 1 , X 2 , X 3 , X 4 and X 5 number of grid points, the GCMs are assigned a weighted score: The GCMs rank more than 5th at a grid point are not considered because it is considered that they are not fully capable to simulate temperature or precipitation at that grid point.

Ranking of GCMs
The ranking of GCMs in respect of their capability to replicate maximum and minimum temperature of BEST, CPC and PGF and precipitation of APHRODITE, CPC and PGF obtained by SU are shown in Figure 4 (heatmap plots). The x-axis and y-axis of the plot represent GCMs and their ranks, respectively. The number in the y-axis corresponds to GCM number which can be found in Table 2. For example, GCM number 1 represents ACCESS1-0, number 2 represents ACCESS1-3 and so on as seen in Table 1. The color ramp in the plot indicates the number of grid points. For example, CSIRO-Mk3-6-0 (GCM 12) achieved rank 1 at 11 grid points, EC-EARTH (GCM 13) is not ranked top at any grid point.

Ranking of GCMs
The ranking of GCMs in respect of their capability to replicate maximum and minimum temperature of BEST, CPC and PGF and precipitation of APHRODITE, CPC and PGF obtained by SU are shown in Figure 4 (heatmap plots). The x-axis and y-axis of the plot represent GCMs and their ranks, respectively. The number in the y-axis corresponds to GCM number which can be found in Table 2. For example, GCM number 1 represents ACCESS1-0, number 2 represents ACCESS1-3 and so on as seen in Table 1. The color ramp in the plot indicates the number of grid points. For example, CSIRO-Mk3-6-0 (GCM 12) achieved rank 1 at 11 grid points, EC-EARTH (GCM 13) is not ranked top at any grid point. Ranking of the GCM according to their performance in replicating maximum and minimum temperature and precipitation. The x-axis represents the rank of the GCM while the y-axis represents the GCM number as described in Table 2. The color bar shows the number of grid points. Different GCMs are ranked top at a different number of grid points (Figure 4). However, there is a consistency among the plots which indicate a similar ranking of GCMs at each grid points obtained from different gridded datasets. For example, CSIRO-Mk3-6-0 (GCM 12) is found to be best at 11 grid points for CPC and BEST, while it is found best at 12 grid points for PGF. The MPI-ESM-LR (GCM 28) is found best at 6 to 8 grid points for different gridded datasets. Overall, the BEST and CPC show better similarity in GCM ranks compared to PGF.
The similar consistency is observed in the case of minimum temperature. BEST and CPC produce similar rankings of GCMs in the case of minimum temperature. The GCM ranks using PGF is also found very close to that obtained by BEST and CPC. In the case of precipitation, better Ranking of the GCM according to their performance in replicating maximum and minimum temperature and precipitation. The x-axis represents the rank of the GCM while the y-axis represents the GCM number as described in Table 2. The color bar shows the number of grid points. Different GCMs are ranked top at a different number of grid points (Figure 4). However, there is a consistency among the plots which indicate a similar ranking of GCMs at each grid points obtained from different gridded datasets. For example, CSIRO-Mk3-6-0 (GCM 12) is found to be best at 11 grid points for CPC and BEST, while it is found best at 12 grid points for PGF. The MPI-ESM-LR ( GCM 28) is found best at 6 to 8 grid points for different gridded datasets. Overall, the BEST and CPC show better similarity in GCM ranks compared to PGF.
The similar consistency is observed in the case of minimum temperature. BEST and CPC produce similar rankings of GCMs in the case of minimum temperature. The GCM ranks using PGF is also found very close to that obtained by BEST and CPC. In the case of precipitation, better similarity is observed in the case of APHRODITE and PGF. A slight variation in the ranking is observed in the CPC. It is known from the results that rankings of GCMs vary when different gridded datasets are used although the intensity of variation is not significantly high.

Spatial Distribution of Top Ranked GCMs
Although the number of GCMs ranked top are found to vary less for different gridded datasets as shown in the previous section, it is still required to assess whether the spatial distribution of the ranking varies for different gridded data or not. For this purpose, the spatial distribution of the GCMs ranked top using different gridded data is prepared. Figure 5 shows the spatial distribution of top ranked GCMs in simulating maximum and minimum temperature and precipitation, respectively at different gridded data. They show a significant difference in the spatial distribution of the top-ranked GCMs for different gridded data for the same climatic variable. Figure 5 shows best GCMs at different grid points in term of their capability to simulate gridded maximum and minimum temperatures. CSIRO-Mk3-6-0 (GCM 12) shows the best result in terms of simulating the maximum temperature at most of the grid points (8 to 14 grid points for different gridded datasets) followed by MPI-ESM-LR (5 to 7 grid points). The performance of CSIRO-Mk3-6-0 (GCM 12) is mostly noticed in the western and north-western mountainous regions of Pakistan, while MPI-ESM-LR (GCM 28) in the south-eastern parts. Few other GCMs are found best at 1 to 3 grid points at different locations. CPC and PGF show a similar spatial pattern in best performing GCMs, while it is found a bit different from BEST.
The best GCMs in simulating the minimum temperature are found similar to results for the maximum temperature at most of the grid points. CSIRO-Mk3-6-0 (GCM 12) performes best at most of the grid points (8 to 11 grid points for different gridded datasets) followed by CMCC-CM (GCM 9) and ACCESS1-3 (GCM 2). CSIRO-Mk3-6-0 (GCM 12) is best in simulating the minimum temperature in the northern coastal region and southern mountainous regions, while CMCC-CM (GCM 9) in the central-western part. Similar to the maximum temperature, CPC and PGF show a similar spatial pattern in best performing GCMs in simulating minimum temperature while the BEST showed a bit different pattern.
In the case of precipitation, the significant difference in the spatial pattern of best performing GCM is observed for different gridded data. MIROC5 (GCM 27) is found as best performing GCM in simulating precipitation for APHRODITE (13 grid points) followed by EC-EARTH (GCM 13) (9 grid points). For CPC the HadGEM2-CC (GCM 19) is found as the best performing (11 grid points) followed by EC-EARTH (GCM 13) (8 grid points). In the case of PGF, HadGEM2-CC (GCM 19) is found best at most of the grid points (12 grid points) followed by HadGEM2-CC (GCM 19) (7 grid points). Overall, HadGEM2-CC (GCM 19) is found to perform best in the southwest part of Pakistan for all the gridded datasets, while no consistency in GCM is observed in other parts.  Table 2.

Ranking of GCMs for Pakistan
The rankings of GCMs are estimated by Equation (4). The estimated scores for each GCMs in simulating the temperature and the precipitation over Pakistan are shown in Figure 6. Based on the scores the GCMs are ranked for different dataset, the ranks for each of the three datasets in the maximum, the minimum and the precipitation are averaged to calculate the score of the GCMs. CSIRO-Mk3-6-0 is found to receive the highest score for both the maximum and the minimum temperature for all the gridded data. MPI-ESM-LR is found to receive the second highest score for the maximum temperature, while ACCESS1-3 for the minimum temperature. Different GCMs are found to receive the higher score for precipitation. EC-EARTH is found to receive the highest score for CPC and PGF while MIROC5 for APHRODITE. EC-Earth is found to receive the second highest score for APHRODITE while MIROC5 for CPC and PGF.  Table 2.

Ranking of GCMs for Pakistan
The rankings of GCMs are estimated by Equation (4). The estimated scores for each GCMs in simulating the temperature and the precipitation over Pakistan are shown in Figure 6. Based on the scores the GCMs are ranked for different dataset, the ranks for each of the three datasets in the maximum, the minimum and the precipitation are averaged to calculate the score of the GCMs. CSIRO-Mk3-6-0 is found to receive the highest score for both the maximum and the minimum temperature for all the gridded data. MPI-ESM-LR is found to receive the second highest score for the maximum temperature, while ACCESS1-3 for the minimum temperature. Different GCMs are found to receive the higher score for precipitation. EC-EARTH is found to receive the highest score for CPC and PGF while MIROC5 for APHRODITE. EC-Earth is found to receive the second highest score for APHRODITE while MIROC5 for CPC and PGF. To assess the performance of SU in the selection of GCMs, a ranking of GCMs obtained from SU for the whole of Pakistan is compared with that obtained from conventional statistics. Two statistical metrics namely coefficient of determination (R 2 ) and normalized root mean square error (NRMSE) are used for this purpose. The two metrics are estimated for all the GCMs at each grid point and then Equation (4) is used to estimate the score for each GCM for whole Pakistan. The scores obtained using SU, R 2 and NRMSE for whole Pakistan are given in Table 3. The scores for R 2 and NRMSE are To assess the performance of SU in the selection of GCMs, a ranking of GCMs obtained from SU for the whole of Pakistan is compared with that obtained from conventional statistics. Two statistical metrics namely coefficient of determination (R 2 ) and normalized root mean square error (NRMSE) are used for this purpose. The two metrics are estimated for all the GCMs at each grid point and then Equation (4) is used to estimate the score for each GCM for whole Pakistan. The scores obtained using SU, R 2 and NRMSE for whole Pakistan are given in Table 3. The scores for R 2 and NRMSE are averaged. GCMs are then ranked according to the combined score. The ranking obtained by different GCMs using SU and a combination of conventional statistical metrics shows very similar results, particularly for the top positions. For example, CSIRO-Mk3-6-0 is ranked top by SU while combined statistics ranked it as 3; MPI-ESM-LR is ranked as 2 by SU and 1 by combined statistics. This indicates that SU can be used for the ranking of GCMs by avoiding the use of a number of conventional statistical metrics. In the next step, GCMs are ranked based on their performance scores obtained for different sets of gridded data using SU and Equation (4). The average of scores obtained by each GCM for different sets of gridded data is calculated for the ranking of GCMs. The average scores obtained from different GCMs in simulating temperature and precipitation are summarized in Table 4. CSIRO-Mk3-6-0, MPI-ESM-LR, ACCESS1-3, NorESM1-M and CMCC-CM estimate the highest average score for the maximum temperature, CSIRO-Mk3-6-0, ACCESS1-3, CMCC-CM, MPI-ESM-LR, MRI-CGCM3 for the minimum temperature, MIROC5, EC-EARTH, HadGEM2-CC, CESM1-BGC, CNRM-CM5 for the precipitation.

Selection of GCM Ensemble
Both temperature and precipitation projections are required for most of the climate change impact assessments. Therefore, GCMs that can simulate both temperature and precipitation properly are desirable for climate change impact analysis. In this study, the final ensemble of GCMs is selected based on their simulation capability.
For this purpose, GCMs for each climate variable are first ranked based on their average score obtained using different sets of gridded data. If the average score of a GCM for a certain climate variable is high, it can be considered that the GCM is capable of simulating that climate variable and vice-versa. In this study, it is assumed that GCM is not capable of simulating the climate variable when GCMs do not achieve a position above 50-th percentile of all GCMs for the climate variable. Therefore, those GCMs which received a position less than the 50-th percentile of all GCMs for a climate variable were given a zero score for that specific climate variable. For example, CMCC-CMS receives an average score of only 1.3 and thus a rank of 18 for the maximum temperature below 50-th percentile position (rank 16) of all GCMs. Thus, it is considered that CMCC-CMS is not well capable to simulate the maximum temperature for Pakistan and the zero score is assigned for this GCM.
The scores of GCMs after applying this criterion are given in Table 5. It shows only six GCMs namely; ACCESS1-3, CESM1-BGC, CMCC-CM, HadGEM2-CC, HadGEM2-ES and MIROC5 receive the scores for all climate variables. Some of the GCMs which receive a higher average score for one climate variable are found to receive the zero score for other climate variables. For example, EC-Earth receives the highest average score for precipitation, but the zero score for both the maximum and minimum temperature. ACCESS1-3 received the high score for the maximum temperature and precipitation, but the zero score for the minimum temperature. Therefore, those GCMs which cannot score for a certain climate variable, models are considered incapable and undesirable to project climate change in Pakistan.   Figure 7 based on selected RCP scenarios (RCP4.5 and RCP8.5). A significant increase of projection range estimated by all GCMs for all the cases is found in Figure 7. The ensemble mean of the projected temperature changes shifts systematically to the value higher when compared to the full ensemble mean values. Furthermore, the selected best models indicate that the maximum temperatures and precipitation (Figure 7b,c) do not include well-below average outlier models of the full ensemble. On the other hand, a higher increase of both temperature and precipitation is expected under the condition of RCP8.5 compared to RCP4.5.
full ensemble. On the other hand, a higher increase of both temperature and precipitation is expected under the condition of RCP8.5 compared to RCP4.5.

Conclusions
The performance of CMIP5 GCMs in simulating temperature and precipitation is assessed using multiple gridded datasets. In order to understand the influence of gridded data in the selection of GCMs and to identify the common GCMs that perform well for different gridded data and climatic variables. It is known from the results that the performance of GCMs varies according to different gridded data and climatic variables. A trade-off is made for the selection of GCMs based on their performance for different gridded datasets and climatic variables. Six GCMs namely, ACCESS1-3, CESM1-BGC, CMCC-CM, HadGEM2-CC, HadGEM2-ES and MIROC5 are found to be the most suitable for projection of climate of Pakistan according to their performance in reconstructing the historical climate for the period . The selected GCMs are found to be different from those found by [10]. This is due to the selection of GCMs based on different set of gridded data. The results emphasize the use of different gridded data in selection of GCMs to avoid uncertainty of selection. The GCMs are selected in this study solely based on their performances in simulating historical temperature and precipitation. Future projections of GCMs can be considered along with their past performance in the selection of GCMs.

Conclusions
The performance of CMIP5 GCMs in simulating temperature and precipitation is assessed using multiple gridded datasets. In order to understand the influence of gridded data in the selection of GCMs and to identify the common GCMs that perform well for different gridded data and climatic variables. It is known from the results that the performance of GCMs varies according to different gridded data and climatic variables. A trade-off is made for the selection of GCMs based on their performance for different gridded datasets and climatic variables. Six GCMs namely, ACCESS1-3, CESM1-BGC, CMCC-CM, HadGEM2-CC, HadGEM2-ES and MIROC5 are found to be the most suitable for projection of climate of Pakistan according to their performance in reconstructing the historical climate for the period . The selected GCMs are found to be different from those found by [10]. This is due to the selection of GCMs based on different set of gridded data. The results emphasize the use of different gridded data in selection of GCMs to avoid uncertainty of selection. The GCMs are selected in this study solely based on their performances in simulating historical temperature and precipitation. Future projections of GCMs can be considered along with their past performance in the selection of GCMs.