Evaluation of Historical CMIP6 Model Simulations of Seasonal Mean Temperature over Pakistan during 1970–2014

This work employed recent model outputs from coupled model intercomparison project phase six to simulate surface mean temperature during the June–July–August (JJA) and December– January–February (DJF) seasons for 1970–2014 over Pakistan. The climatic research unit (CRU TS4.03) dataset was utilized as benchmark data to analyze models’ performance. The JJA season exhibited the highest mean temperature, whilst DJF displayed the lowest mean temperature in the whole study period. The JJA monthly empirical cumulative distribution frequency (ECDF) range (26 to 28 ◦C) was less than that of DJF (7 to 10 ◦C) since JJA matched closely to CRU. The JJA and DJF seasons are warming, with higher warming trends in winters than in summers. On temporal scale, models performed better in JJA with overall low bias, low RMSE (root mean square error), and higher positive CC (correlation coefficient) values. DJF performance was undermined with higher bias and RMSE with weak positive correlation estimates. Overall, CanESM5, CESM2, CESM2-WACCM, GFDL-CM4, HadGEM-GC31-LL, MPI-ESM1-2-LR, MPI-ESM1-2-HR, and MRI-ESM-0 performed better for JJA and DJF.


Introduction
The surface temperature increase has placed scientists on alert since the post-industrial era for its harmful impacts on the earth's ecosystems and human welfare in general. The average warming rate over land has increased by 0.72 • C since 1951 and may increase further by 1.8 to 4 • C by the end of the 21st century [1]. Changes in climate mechanisms, like an increase in temperature, have a considerable impact on hydrology, ecology, and socioeconomics in the form of floods, droughts, heatwaves, and a decrease in low-temperature magnitude and frequency [2,3]. South Asia is the most vulnerable to climate change, under temperature increase, threatening one fourth (1.8 billion) of the global population [4,5]. Rapidly melting Himalayan glaciers under noticeable warming rates, Summer and winter are hydrologically the most important seasons in Pakistan, where most (51-80%) of country water needs are met in the form of rain and snow [11]. Researchers found recently that most of the temperature changes and variability occurred during summer and winter seasons. The minimum temperatures in winters are rising rapidly (0.17-0.37 • C/decade) compared to the summer temperature (which is maximum) over Pakistan [12]. For instance, Ullah et al. [13] observed maximum warming rates (0.22 and 0.33 • C/decade for Tmax and Tmin) during winter and summer (0.20 and 0.25 • C/decade) seasons over Pakistan. Further, Adnan et al. [14] observed maximum temperature during June-July-August (JJA) and minimum temperature during December-January-February (DJF), with the increase in winter minimum temperatures in recent years. The temperature intensity and frequency dynamics with spatiotemporal changes may lead to surface warming, droughts, and heatwaves; under changes in land cover, deforestation, and other human activities [15]. Pakistan, since recent times, is under a warming trend, with a rapid increase in temperature over northern and southwestern mountainous regions, largely due to snow/ice-albedo feedback mechanism, and other factors [16]. Factors like vegetation cover, topography, cloud cover, urbanization scale, land-use change, agricultural practices, industrialization, and others may influence temperature through direct and indirect mechanisms [15][16][17]. The significant cold extremes are warming faster than the warm extremes, over higher latitudes, and altitudes under polar amplification and snow-albedo feedback processes; land surface extremes warm faster than air temperature for the same period [18]. According to [12,[19][20][21], the mean temperature manifests the average change and trends in extremes of temperature, i.e., Tmax and Tmin. This link can be useful in understanding and predicting future trends of extreme weather events. Moreover, the Tmean trend changes may be due to changes in either Tmax or Tmin or both [4].
Climate models are fundamental tools to acquire past and future climate information through utilization of estimation techniques for the land, ocean, atmosphere, and ice interactions [22]. General circulation models (GCMs) involve complex mathematical equations and physics, resulting in simulations at various spatiotemporal scales of a day to centuries [23]. The CMIP6 models are the latest generation of the Climate Models Intercomparison Project (CMIP), formulated around three specific scientific questions for earth response to forcings, the origin of the models' systematic biases and errors, and understanding future climate amidst the internal variability, predictability, and uncertainties. The scientific backdrop of CMIP6 is the seven grand scientific challenges covering climate change dimensions under varying spatiotemporal scales. The CMIP6 has a federated structure with a large number of experiments, uniquely designed. Each CMIP6-endorsed model intercomparison project (21 MIPS) covers unique climate themes. The diagnostic, evaluation, and characterization of Klima (DECK) experiments and historical simulations are developed under different forcings scenarios of past climates. The five newly developed future scenarios of CMIP6, i.e., shared socioeconomic pathways (SSPs), follow the pre-industrial and CO 2 forcings of CMIP5-RCPs with new forcings included subsequent to industrial, socioeconomic policy, technological, and human-induced impacts on the climate [24][25][26].
The CMIP6 models exhibit higher sensitivity to greenhouse gases (GHG) emissions compared to CMIP5 [27]. Grose et al. [26] observed CMIP6 has improved aerosols' effect, models' resolution, parameterization schemes, and more earth system models included. CMIP6 differs from CMIP5 with higher change bias unless bias correction is applied on models to resolve model spread [38]. The remaining part of the paper is arranged as follows: Section 2 provides models and observation used and methods; Section 3 provides the results and discussion, while Section 4 gives a conclusion drawn from the findings.

Study Area
Pakistan is situated in South Asia, extending between 23-37.5° N latitudes and 61-78° E longitudes ( Figure 1) and covering an area of 880,940 km 2 . China borders it in the north, India in the east, Afghanistan and Iran in the west, and the Arabian Sea in the south. It features diverse geography such as the world's highest mountainous regions (K2 peak-8200 m) in the north and northwest, to sea levels of 0 m at the Arabian Sea in the south. The central areas consist of the Indus Plains of Punjab and Sindh, while deserts are spread in the southeast and hyper-arid lands in the southwest [39]. The overall climate is arid (60% areas), featuring hot summers prominently across southern parts and cold winters in portions of northern humid Himalayan foothills (northeast and west) and northern mountains. The central regions have semi-arid to arid features, while coastal areas retain a unique coastal climate [40,41]. Summers are hot while winters are cold across the country; however, for a few past years, winters have been warming faster than summers [42].

Data and Methods
This study utilized thirteen CMIP6 climate models (Table 1), mainly covering the 1970-2014 period (https://esgf-node.llnl.gov/search/cmip6) for historical seasonal (JJA and DJF) mean near-surface temperature (tas) simulations over Pakistan. The 1970-2014 timescale was chosen since it is considered the new climatological normal for Pakistan, particularly in temperature change [13,16]. At first, each model's single members were acquired from first members (member_id = r1i1p1f1) for each model by averaging members' ensembles from 1970 through 2014. Next, models The overall climate is arid (60% areas), featuring hot summers prominently across southern parts and cold winters in portions of northern humid Himalayan foothills (northeast and west) and northern mountains. The central regions have semi-arid to arid features, while coastal areas retain a unique coastal climate [40,41]. Summers are hot while winters are cold across the country; however, for a few past years, winters have been warming faster than summers [42].

Data and Methods
This study utilized thirteen CMIP6 climate models (Table 1), mainly covering the 1970-2014 period (https://esgf-node.llnl.gov/search/cmip6) for historical seasonal (JJA and DJF) mean near-surface temperature (tas) simulations over Pakistan. The 1970-2014 timescale was chosen since it is considered the new climatological normal for Pakistan, particularly in temperature change [13,16]. At first, each model's single members were acquired from first members (member_id = r1i1p1f1) for each model by averaging members' ensembles from 1970 through 2014. Next, models were treated for standard date format, standard variable unit, and common temporal length to get uniformed interpretable simulations. The model outputs were regridded to a common grid of 1.4 × 1.4 • resolutions using the nearest neighbor interpolation technique. The nearest neighbor interpolation allowed better classification of similar close points by weighted average using data triangulation [43]. The land surfaces, particularly hilly terrains, were better interpolated due to sub-regionalization of grid points by the nearest cell center of an input grid [44]. The multi-model ensemble (MM-Ensemble) was created by simple averaging of regridded models for each season, i.e., summer and winter seasons; MM-Ensemble was preferred and believed to contain information from all models [45]. Regridded data were converted into monthly and then to respective seasonal scales. The CRU [46] mean temperature data was utilized as the benchmark for quantifying model simulations performance for summer and winter following [27]. The historical mean climatology of available CMIP6 models and CRU was plotted for summer and winter season. Further, the spatial and temporal performance of each model and MM-Ensemble simulation for seasonal mean simulations against CRU seasonal observations for 1970-2014 was in detail assessed by computing statistical metrics of bias; root mean square error (RMSE), and correlation coefficient (CC). The mathematical procedure of the above metrics was referenced from the following equations [47,48]: Atmosphere 2020, 11, 1005 where M is model-simulated and O is observed values series, i refer to observed and simulated pairs, while n is the total number of pairs. Reliable future projections depend on the accurate picture of the mean climate and past climate trends. The temporal trends in observed and simulated climatology were detected using the Mann-Kendall (MK) trend test [49,50]. The Mann-Kendall test is commonly used in numerous time-series trend studies [9,13,27,51]. This method exempts datasets from normal distribution requirements to handle outliers and missing values [13,49]. The MK test hypothesis states that the trend does not exist (H0), and the trend does exist (H1) in a time series. MK correlation coefficient/Kendall's Tau establishes trends in time series. The strength of the trend is proportional to the magnitude of the MK test statistic, where greater magnitude exhibits stronger trends and lesser magnitudes show a weaker trend.
Trend test statics S is defined as: where x 1 , x 2 ..x n represent n data points, x j represents the data point at jth time. Positive S indicates an increasing trend while low S value exhibits decreasing trends, i.e.: where x j and x i represent the time series observations, and n is length of the time series. When n ≥ 10, the S is approximately independently distributed data with the mean of 0 is with a variance given as: where n is number of data points, m is number of tied groups (a tied group is a set of sample data having the similar value), and t i is number of data points in the ith group. MK test statistic, Z, is computed as: If Z values remain beyond the confidence level of ±1.96, it shows a statistically significant trend at the 95% confidence level. The trend is considered decreasing if Z is negative and vice-versa. In this study, this method was applied to seasonal and annual (temporal only) time-series data. Moreover, Sen's slope (SE) computes trend slope in time series [13,40,49]. For an existing linear trend in a time series, slope estimates of n pairs of data can be computed using the relation: where x j and x i represent data values at time j and i, while (j > i). The median of the N values of T i is considered as Sen's estimator of slope, calculated as: Positive Q i exhibits an increasing trend, while negative shows decreasing trends. For odd N, the upper part of Equation (9) is used to get SE, and for odd N, the lower part of the equation is used. Further, the true slope is obtained by applying a two-sided t-test on Q i at a 100% (1-α) confidence interval. Time series data on seasonal and annual scale was treated with MK and Sen's slope estimator.
Further, the cumulative distribution function (F(x)) was used to fit different theoretical distributions of models' simulations and compare them with the ones from observed patterns, to determine models' symmetries deviating from observed patterns [40,48]. The raw model data was initially sorted, standardized, and processed in the high-performance computing (HPC) platform of Nanjing University of Information Science and Technology. Python, MATLAB, and few other open source software were utilized to analyze the datasets and plot the outputs in this work.

Mean Temperature Annual Cycle
The annual mean temperature cycle over Pakistan for 1970-2014 is shown in Figure 2. All the datasets display a bell-shaped distribution of mean temperature in the yearly cycle. A model is said to underestimate temperature when it simulates temperature values lower than the observed dataset; and vice versa in overestimation (higher values than observed patterns) of temperature. The CRU showed the annual highest mean temperatures during summer (JJA) and lowest mean temperatures during the winter (DJF) season. The months of June, July, and August exhibited 27 • C, 26 • C, and 25 • C temperatures, respectively, while December, January, and February showed 9 • C, 7 • C, and 9 • C. The MM-Ensemble, CESM2, CESM2-WACM, and MRI-ESM0 were consistent with CRU patterns. MIROC6, however, highly overestimated temperatures in April-October, while IPSL-CM6A-LR highly underestimated temperatures throughout the cycle. For summer (JJA) and winter (DJF), both interestingly displayed the maximum and minimum mean temperatures in the annual cycle. Overall, most of the models and MM-Ensemble simulations (except CNRM-ESM2-1, MIROC6, and IPSL-CM6A-LR) were consistent with CRU for all months. The December-March seasons show the highest underestimations, and May-November displayed the highest overestimations. Summer displayed annual maximum, while winter showed annual minimum temperatures. Previous works observed [20,52] that global circulation patterns influence temperatures of certain months or seasons (winter and summer season months by North Atlantic oscillations) with their contribution to winds, convection, diversion, and through dynamical thermal effects, henceforth, changing temperature magnitudes across this region.

Summer Mean Temperature Climatology
The summer (JJA) mean temperature climatology patterns and simulations for 1970-2014 are plotted ( Figure 3) over Pakistan. CRU patterns show the mean temperature in the range of 30 to 35 °C across southeastern, southern, and southwestern parts. Central-western regions exhibit a temperature of 25 °C to 30 °C and 5 to 20 °C over northern parts. The majority of models simulated north-south temperature dipole with high temperatures (20 °C to >35 °C) over central to southern regions and colder over the north. MM-Ensemble, Can-ESM5, CESM2, CESM2-WACCM, FGOALS, GFDL-CM4, and MRI-ESM-0 replicated CRU temperature over central-east, southeast, southwest, and western sides; some models also underestimated temperature over these regions. When compared to other models, MIROC6 highly overestimated temperature over the central to southern parts of Pakistan. Over northern regions, MM-Ensemble, CESM2, CESM2-WACCM, FGOALS-g3, and MIROC6 showed nearly consistent simulations with CRU patterns, and other models overestimated observed patterns. Overall, MM-Ensemble, CNRM-CM6-1, CNRM-ESM2-1, HadGEM-GC31-LL, MPI-ESM-2-HR, and MRI-ESM-0 models were consistent with CRU patterns (5 °C -35 °C) over the whole country. A study by [53] showed similar results with cold and warm bias over the northern and central regions of Pakistan, displaying a dipole pattern with higher temperatures over southern parts and relatively lowers at higher latitudes.

Summer Mean Temperature Climatology
The summer (JJA) mean temperature climatology patterns and simulations for 1970-2014 are plotted ( Figure 3) over Pakistan. CRU patterns show the mean temperature in the range of 30 to 35 • C across southeastern, southern, and southwestern parts. Central-western regions exhibit a temperature of 25 • C to 30 • C and 5 to 20 • C over northern parts. The majority of models simulated north-south temperature dipole with high temperatures (20 • C to >35 • C) over central to southern regions and colder over the north. MM-Ensemble, Can-ESM5, CESM2, CESM2-WACCM, FGOALS, GFDL-CM4, and MRI-ESM-0 replicated CRU temperature over central-east, southeast, southwest, and western sides; some models also underestimated temperature over these regions. When compared to other models, MIROC6 highly overestimated temperature over the central to southern parts of Pakistan. Over northern regions, MM-Ensemble, CESM2, CESM2-WACCM, FGOALS-g3, and MIROC6 showed nearly consistent simulations with CRU patterns, and other models overestimated observed patterns. Overall, MM-Ensemble, CNRM-CM6-1, CNRM-ESM2-1, HadGEM-GC31-LL, MPI-ESM-2-HR, and MRI-ESM-0 models were consistent with CRU patterns (5 • C -35 • C) over the whole country. A study by [53] showed similar results with cold and warm bias over the northern and central regions of Pakistan, displaying a dipole pattern with higher temperatures over southern parts and relatively lowers at higher latitudes.

Winter Mean Temperature Climatology
Winter (DJF) mean temperature climatology for CRU, MM-Ensemble, and models is displayed in Figure 4. During DJF, CRU patterns showed a temperature of 20 • C to 25 • C over the southeast and coastal belt, 5 • C to 15 • C over central and western parts, and −10 • C to 10 • C over northern regions. MM-Ensemble, Can-ESM5, CESM2, MIROC6, MPI-ESM1-2-HR, and MRI-ESM-0 were consistent with observed patterns well in a range of −5 • C to 25 • C over southern parts. Over the north, consistent to the observed temperature patterns were simulated (in range of −20 • C to 5 • C) by MM-Ensemble, Can-ESM5, CESM2, CESM-WACCM, and MIROC6, while the remaining models largely overestimated the mean temperature following the results of [53]. Overall, MM-Ensemble, Can-ESM5, CESM2 CESM2-WACCM, and HadGEM-GC31-LL performed well (range of −5 • C to 25 • C) over the country. The DJF simulations followed a dipole structure with higher temperatures over southern parts and lower at higher and northwestern latitudes, as well as on the east-west stretch. The overall simulations for JJA and DJF temperature matched the findings of previous studies [2,27,35].

Winter Mean Temperature Climatology
Winter (DJF) mean temperature climatology for CRU, MM-Ensemble, and models is displayed in Figure 4. During DJF, CRU patterns showed a temperature of 20 °C to 25 °C over the southeast and coastal belt, 5 °C to 15 °C over central and western parts, and −10 °C to 10 °C over northern regions. MM-Ensemble, Can-ESM5, CESM2, MIROC6, MPI-ESM1-2-HR, and MRI-ESM-0 were consistent with observed patterns well in a range of −5 °C to 25 °C over southern parts. Over the north, consistent to the observed temperature patterns were simulated (in range of −20 °C to 5 °C) by MM-Ensemble, Can-ESM5, CESM2, CESM-WACCM, and MIROC6, while the remaining models largely overestimated the mean temperature following the results of [53]. Overall, MM-Ensemble, Can-ESM5, CESM2 CESM2-WACCM, and HadGEM-GC31-LL performed well (range of −5 °C to 25 °C) over the country. The DJF simulations followed a dipole structure with higher temperatures over

JJA Empirical Cumulative Distribution Function
The empirical cumulative distribution frequencies (ECDFs) of temperature for JJA presented in Figure 5 were utilized to get an insight into the frequency of occurrence and underestimations/ overestimations in monthly temperatures for CRU, MM-Ensemble, and models over Pakistan. The MM-Ensemble, CanESM5, CESM2, CESM2-WACCM, CNRM-CM6, CNRM-ESM2-1, GFDL-CM4, HadGEM-GC31-LL, IPSL-CM6A-LR, and MPI-ESM1-2-LR underestimated the monthly mean temperature (between 20 • C and 27 • C), complimentary to mean JJA temperature underestimates in the northwestern and central regions displayed in Figure 3. FGOALS-g3, MIROC6, MPI-ESM1-2HR, and MRI-ESM-0 overestimated the mean temperature between 26 • C and 30 • C, therefore complemented to Figure 3. CESM2, CESM2-WACCM, FGOALS-g3, and MRI-ESM-0 complemented CRU temperature distribution between 26 • C and 28 • C. A study by Tatebe et al. [54] using the MIROC6 model observed the highest warm bias and RMSE for surface temperature across Asia and the Middle East for MIROC6 compared to those of MIROC5. It was attributed to underestimating mid-level cloud cover, downward OSR (sum of net shortwave and net longwave radiation), and aerosol-radiation interaction. Such bias also usually occurs in many climate models.

Summer and Winter Spatiotemporal Trend Analysis
The summer (JJA) and winter (DJF) spatiotemporal trend change (  Overestimations and underestimations identified biases in models, the presence of systematic inherent errors that developed under unrealistic response to forcings, or unpredictable internal variability different from observations. Other errors included errors of convective parameterizations and unresolved sub-grid scale orography. Many errors were solvable through bias correction techniques; however, biases due to nonlinearity and complex dynamical processes were uncorrectable at the current model developments [55].

Winter Empirical Cumulative Distribution Function
The winter (DJF) ECDF for 1970-2014 over Pakistan is shown in Figure 6. Other studies [35,53] also approved an increase in DJF surface mean temperature over Pakistan with higher warming rates than in JJA season, dominantly over the northern regions. These results were sufficiently in agreement with the spatial JJA and DJF climatology (Figures 3 and 4) and trends (Figures 7 and 8) in this study. The JJA spatial trend change distribution for models, ensemble, and observed datasets over Pakistan is displayed in Figure 7. The results indicated a warming pattern of temperature over most of Pakistan (0.01 °C -0.06 °C/year), except for models such as CESM2-WACCM, CNRM-ESM2-1, and MIROC6 and CRU display, which showed negative trends over some northern and southern regions. CanESM and HadGEM-GC31-LL displayed the highest warming rate of 0.01 °C to 0.08 °C/year over the entire country, with the highest shown over northern and northwestern regions.

Summer and Winter Spatiotemporal Trend Analysis
The summer (JJA) and winter (DJF) spatiotemporal trend change ( Table 2 Other studies [35,53] also approved an increase in DJF surface mean temperature over Pakistan with higher warming rates than in JJA season, dominantly over the northern regions. These results were sufficiently in agreement with the spatial JJA and DJF climatology (Figures 3 and 4) and trends (Figures 7 and 8) in this study. The JJA spatial trend change distribution for models, ensemble, and observed datasets over Pakistan is displayed in Figure 7. The results indicated a warming pattern of temperature over most of Pakistan (0.01 • C -0.06 • C/year), except for models such as CESM2-WACCM, CNRM-ESM2-1, and MIROC6 and CRU display, which showed negative trends over some northern and southern regions. CanESM and HadGEM-GC31-LL displayed the highest warming rate of 0.01 • C to 0.08 • C/year over the entire country, with the highest shown over northern and northwestern regions. Overall, MME-Ensemble, CanESM5, CESM2, CNRM_CM6-1, FGOALS-g3, HadGEM-GC31-LL, and MRI-ESM-0 indicated the highest warming rates over north and northwest areas, in the range of 0.01 to 0.08 • C/year. capture JJA and DJF CRU/observed mean temperature patterns over Pakistan for 1970-2014 and is shown in Figure 9. These metrics were calculated as area-averaged over the whole of Pakistan on JJA and DJF seasonal mean datasets. The long-term bias for JJA and DJF presented in Figure 9 (top) shows diverse results.     DJF compared to that of JJA over Pakistan and north regions was visible in the outputs. The present trend results were similar to study results by [27,[33][34][35], where warming trends were found across the country, with higher warming over northern regions.
The higher warming rates at higher altitudes were attributed to the increase in the magnitude of snow-albedo feedback mechanism, an increase in incoming thermal radiation, and surface heat loss [16,34]. The other factors aiding in higher warming rates across Pakistan could be associated to surface-based feedbacks, increase in thermal radiation budget, aerosol (CFCs and hydrocarbon-based) concentrations, increasing urbanization rates, increasing industrialization, population growth, and intensive land-use change practices [8,9].

Temporal Bias, Correlation, and RMSE
A model with low bias and RMSE with higher correlation coefficient (CC) magnitudes is usually presumed as a well-performing and accounted for projecting future climate characteristics. This study employed three important statistical metrics to measure model performance capability to capture JJA and DJF CRU/observed mean temperature patterns over Pakistan for 1970-2014 and is shown in Figure 9. These metrics were calculated as area-averaged over the whole of Pakistan on JJA and DJF seasonal mean datasets. The long-term bias for JJA and DJF presented in Figure 9  applications since they may not have precisely depicted a specific weather event in a particular year; instead, they showed the aggregate of such events [48].
Atmosphere 2020, 11, x FOR PEER REVIEW 17 of 30 MM-Ensemble, CESM2, CESM2-WACCM, MPI-ESM1-2-HR performed well overall, with the lowest cold and warm bias in the range of 0.1 °C to −1.6 °C for DJF. The results indicate an increase in winter temperatures in the study period. Model biases and other uncertainties resulted from differences in aspects of model parameterizations, (coarse) resolution, and (poor) representation of atmospheric physics and chemistry processes [57].  MIROC6 displayed the highest warm biases, and IPSL-CM6A-LR showed the coldest biases among all models. Persistent warm biases in CMIP5 GCMs simulations over southern plains and cold biases over the mountainous regions of north and northwest areas of Pakistan were also revealed by [56]. The above results also coincided with the temporal bias ( Figure 9) distribution for models and ensemble over Pakistan.

Winter Spatial Bias, RMSE, and Correlation Coefficient Metrics
The winter (DJF) season spatial bias, RMSE, and correlation are given in Figures 13-15. The DJF bias ( Figure 13)   conforming to their better performance in simulating the observed DJF mean temperature. A study by [32] found matching results for bias during JJA (mostly cold bias in the range of −2 °C to >−10 °C) across the whole of Pakistan and in DJF, with mostly warm bias >2 °Cover few northern and southern regions. Similarly, [53] revealed high cold bias over the country's northern parts at annual and seasonal scales.  The JJA RMSE (difference in values of CRU observations and model simulations) of mean temperature is shown in Figure 11. Ideally, a near-zero RMSE value indicates close matching outputs of observed climate features. MM-Ensemble, CNRM-ESM2, HadGEM-GC31-LL, MPI-ESM1-2-HR exhibited the lowest RMSE (0.1 • C to 10 • C range) for most parts of the country, particularly over southern, southeastern, and southwestern regions. However, IPSL-CM6A-LR, MIROC6, and MRI-ESM-0 exhibited the highest (underperformed) RMSE (7 • C to 10 • C) over few northern and northeastern parts, presumably for the models' passiveness to resolve inherent issues of mountainous topography resolution, cloud cover, and snow-albedo feedback parameterization as observed by [18,58].
Atmosphere 2020, 11, x FOR PEER REVIEW 21 of 30 especially over southern regions. The remaining models gave higher values (>5 °C) over most areas, specifically over northern areas. The correlation coefficient (CC) for the DJF season ( Figure 15) revealed the strongest CC (0.01 to <0.8) by MM-Ensemble, CESM2, CESM2-WACCM, GFDL-CM4, and HadGEM-GC31-LL for the whole country, dominantly across the central-eastern to south-eastern regions. The CNRM-CM6-1 showed an apparent disagreement with CRU data over the extreme north; MRI-ESM-0 and CNRM-ESM2-1 agreed (weak) with CRU over the whole country. Overall, the majority of models displayed a weak positive CC over the country.

Discussion
In this study, CMIP6 models simulated seasonal mean temperature reasonably over Pakistan. Earlier, Ather et al. [32] found higher temperatures over southern Pakistan from two GFDL CMIP5 (CM2p1, CM3.0) model runs during DJF (cold westerlies in north and north-westerlies) and southern areas in JJA for (heat low/land-sea heat gradient and role of moisture flux) in recent years. In an attribution study, Bollasina et al. [59] detected the role of thermal forcing and low-level models' simulated seasonal variability over South Asia could be due to the embedded convection schemes' dynamic behavior over the sub-regions [62]. The CRU datasets go through extensive quality control measures and gauge station numbers to reduce uncertainties in climate variability patterns [46]. The multi-decadal temperature changes may be forced by anthropogenic and natural forcing (volcanic aerosol forcing) or could arise unforced from the climate system due to climate sensitivity and unforced variability [63].  The correlation coefficient (CC) for the DJF season ( Figure 15) revealed the strongest CC (0.01 to <0.8) by MM-Ensemble, CESM2, CESM2-WACCM, GFDL-CM4, and HadGEM-GC31-LL for the whole country, dominantly across the central-eastern to south-eastern regions. The CNRM-CM6-1 showed an apparent disagreement with CRU data over the extreme north; MRI-ESM-0 and CNRM-ESM2-1 agreed (weak) with CRU over the whole country. Overall, the majority of models displayed a weak positive CC over the country. Further, in the study, CMIP6 models and CRU datasets showed a significant increase in mean temperature trends in this study, particularly higher over the north during DJF over Pakistan. Many studies [12][13][14]64] identified increasing trends in winter and summer (0.17 °C-0.37 °C/decade) over Pakistan's central and northern regions. Babar et al. [35] discovered a higher (0.21 °C/decade) winter mean temperature rise than in summer (0.21 °C/decade) for CRU; 0.11°C-0.06 °C/decade in UDEL dataset and 0.10°C-0.09 °C/decade in CMIP5 ensemble. The CanESM, CCSM, IPSL, and MPI models showed higher trends for winter, and summer over northern and southwestern regions had the highest trends (0.6 °C/decade) for all datasets. Supportively, [34] identified higher warming rates over the northern, southeastern, and southwestern regions. The global temperature warming has amplified at high altitude environments due to elevation-dependent warming through changes in the response of mechanisms like snow albedo, surface-based feedbacks, water vapor changes, latent heat release, surface water vapor, radiative flux changes, surface heat loss, temperature change, and aerosols, and earth's energy balance system [65]. Over the Tibetan Plateau, the snow-albedo

Discussion
In this study, CMIP6 models simulated seasonal mean temperature reasonably over Pakistan. Earlier, Ather et al. [32] found higher temperatures over southern Pakistan from two GFDL CMIP5 (CM2p1, CM3.0) model runs during DJF (cold westerlies in north and north-westerlies) and southern areas in JJA for (heat low/land-sea heat gradient and role of moisture flux) in recent years.
In an attribution study, Bollasina et al. [59] detected the role of thermal forcing and low-level northerlies (indirect) over the Hindu Kush Mountains in deepening heat low during JJA and DJF over the north. Babar et al. [35] utilized CMIP5 models and established the BCC-CSM1.1, HadGEM2-CC, and NorESM1-M models as best performing with higher temperatures over southern parts compared to northern parts (cold bias). For the CMIP5 ensemble, a recent work by Das et al. [60] exhibited higher JJA temperatures over north, central, southwest, and southeast parts of Pakistan forced by anthropogenic activities and industrialization. Meanwhile, Ahmed et al. [2] observed lower temperature extremes over northern regions and higher over southern regions from CMIP5 multi-mode-ensemble, NorESM1-M, MIROC5, BCC-CSM1-1, and ACCESS1-3 simulations close to CRU patterns. Besides the mentioned local climate factors, model resolution (influence of physical and biological processes representation), benchmark datasets quality (consequence magnitude and the direction of comprehensive circulation processes), model climate sensitivity (planetary energy balance, CO2 change-led earth warming), and changes in the spatiotemporal extent of forcings (annual and seasonal variations) determine simulation accuracy [52,57,61]. The models' simulated seasonal variability over South Asia could be due to the embedded convection schemes' dynamic behavior over the sub-regions [62]. The CRU datasets go through extensive quality control measures and gauge station numbers to reduce uncertainties in climate variability patterns [46]. The multi-decadal temperature changes may be forced by anthropogenic and natural forcing (volcanic aerosol forcing) or could arise unforced from the climate system due to climate sensitivity and unforced variability [63].
Further, in the study, CMIP6 models and CRU datasets showed a significant increase in mean temperature trends in this study, particularly higher over the north during DJF over Pakistan. Many studies [12][13][14]64] identified increasing trends in winter and summer (0.17 • C-0.37 • C/decade) over Pakistan's central and northern regions. Babar et al. [35] discovered a higher (0.21 • C/decade) winter mean temperature rise than in summer (0.21 • C/decade) for CRU; 0.11 • C-0.06 • C/decade in UDEL dataset and 0.10 • C-0.09 • C/decade in CMIP5 ensemble. The CanESM, CCSM, IPSL, and MPI models showed higher trends for winter, and summer over northern and southwestern regions had the highest trends (0.6 • C/decade) for all datasets. Supportively, [34] identified higher warming rates over the northern, southeastern, and southwestern regions. The global temperature warming has amplified at high altitude environments due to elevation-dependent warming through changes in the response of mechanisms like snow albedo, surface-based feedbacks, water vapor changes, latent heat release, surface water vapor, radiative flux changes, surface heat loss, temperature change, and aerosols, and earth's energy balance system [65]. Over the Tibetan Plateau, the snow-albedo feedback has been identified as the primary factor for higher warming and ice melting [66,67]. Moreover, Archer et al. [68] observed a high positive correlation between summer runoff (snowmelt water) and temperature increase over the Upper Indus Basin of Pakistan. Further, Fatima et al. [69] concluded that with the rapid melting of Hindu Kush-Karakorum-Himalayan glaciers, floods and glacial lake outburst floods (GLOFs) are obvious, and depletion of freshwater availability is the next phase. Southern Pakistan also experiences warming temperatures under the influence of industrialization and transportation boom, land-use change, population pressure, vegetation loss, water resources absence, and pollution (aerosols and other chemical compounds), which pave the way for heatwaves and droughts [8,9,70].
The role of internal variability (variability of climate system occurring in absence of external forcings like atmospheric, oceanic, and coupled ocean-atmosphere processes systems) as a major driver of climate change uncertainty is manifested in calculations of trends and climatology estimates [71,72]. Historical trends are highly influenced by internal variability and are useful for climatology information analysis under multi-realizations of larger ensemble members, and vice versa to the forced variability trends. Historical multi-run members' trends are highly influenced by internal variability and forced variability is dominant in future projection trend patterns across model runs. In addition, across most model run pairs (in multi-run member cases), changes are small, and less than intermodal differences, the evaluation of models' spatial climatology can be more informative. The single run-based analyzed trends by any chance may resemble or not resemble observations, possibly bringing ambiguity (considering noise fitting as trends) to model evaluation [72]. Deser et al. [73] studied that near-term (like 2010-2030) surface temperature responses need large multiple realizations ensemble members at middle and higher latitudes to acquire robust estimates of forced and internal variability of climate system in models. Moreover, climate change monitoring can be best served by focusing on thermodynamic components (e.g., air temperature, inbound radiations, ocean heat content) of the climate system. Any historical model-based trend study should involve multiple realizations for models involved, although such realizations possess less effects on future climate projections.
Most models and ensembles in the study performed well in simulating observed patterns of temperature in JJA, although with some differences. Improved performance of CMIP6 models across JJA can be attributed to improvements in climate forcings' effects, aerosols' representation and resolution, detailed parameterization of cloud cover, vegetation, surface convection, physiographic features, diurnal cycles of models, and inclusion of the earth system models [27,31,74]. A CMIP5 study [53] yielded MIROC5, CESM1-WACCM, GISS-E2-H-CC, GISS-E2-H, and MRI-CGCM3 models' better performance in JJA (than DJF) with less bias and RMSE, and sound correlation with CRU. Whilst [35] found INM-CM, IPSL, BCC, EC-Earth, NorESM, and GISS performing well for JJA and DJF mean temperature simulation, model-ensemble displayed a higher cold bias in summer and winter.
The differences in simulations to observed patterns could be due to the complex and diverse geography, landscape, and climatology of Pakistan; ranging from mountains in the north and plains, deserts, and coasts in the south with a blend of humid, arid, semi-arid, and coastal to hyper-arid climates [2,34,75]. Simulation variances also emerge from systematic errors due to internal variability and different responses to forcings, creating contradictory atmospheric processes [24]. Models performed passively across diverse topography and geography for their sensitivity/passiveness to variations such as mountains, with resolution issues all functioning simultaneously [76]. Moreover, higher climate sensitivity caused overestimations and underestimations, leading to biases in present and future climate estimates [29].

Conclusions
Current and historical simulations for climate variables are vital to understanding the prevailing climate and future climate scenarios. This study employed the mean temperature variable from the CMIP6 models for the summer (June-July-August) and winter (December-January-February) seasons for 1970-2014 over Pakistan. Thirteen CMIP6 utilized JJA and DJF climatology, ECDF, trend, bias, RMSE, and correlation coefficient (CC). The highest mean temperature for 1970-2014 over Pakistan was observed (24 to 35 • C) during JJA months and the minimum during DJF months (2 to 9 • C).
The JJA and DJF spatial mean climatology by CRU, models, and MM-Ensemble displayed a dipole structure over north-south with low to high temperature scales. The ECDF for JJA temperature was identified close to the observed and smaller temperature distribution range than the DJF season.
Further, DJF spatiotemporal trends revealed higher increasing trends for all datasets across Pakistan, especially over northern regions, than in JJA, although few models showed insignificant trends.
A model of low bias and RMSE with a higher correlation coefficient (CC) is considered as a well-performing in simulating specific climate characteristics. The temporal bias, RMSE, and CC in JJA and DJF yielded diverse outcomes. For JJA, most models yielded low bias (2 • C to −2 • C), low RMSE (<1.9 • C) and higher CC (0.01-0.4) values for 1970-2014 simulations. However, in DJF, higher negative bias (0.21 • C to −3 • C), higher RMSE (<4 • C), and lower CC values (0.01-0.30) were detected for the 1970-2014 period.
The JJA spatial bias, RMSE, and CC metrics analysis discovered a higher warm bias (1 • C to 20 • C), low RMSE (1 • C to 15 • C), and high CC values (0.2 to >0.8) over the whole of Pakistan. A strong cold bias (0 • C to −20 • C), higher RMSE (2 • C-25 • C range), and a positive but weak CC (0.2-0.8) was detected for the majority of models and ensemble in DJF. The cold bias/underestimation also signaled towards the higher observed temperature during DJF for 1970-2014.
This study revealed diverse outputs of models' simulations and their performance in two critical seasons over Pakistan. After assessing models for bias, RMSE, and CC performance metrics, CanESM5, CESM2, CESM2-WACCM, GFDL-CM4, HadGEM-GC31-LL, MPI-ESM1-2-HR, MPI-ESM1-2-HR, and MRI-ESM-0 were found to perform better overall in simulating observed temperatures over Pakistan. These models could be utilized in future temperature projection and impact studies over Pakistan.