Evaluating the Applicability of a Quantile–Quantile Adjustment Approach for Downscaling Monthly GCM Projections to Site Scale over the Qinghai-Tibet Plateau

: In the context of global climate change, the Qinghai-Tibetan plateau (QTP) has experienced unprecedented changes in its local climate. While general circulation models (GCM) are able to fore-cast global-scale future climate change trends, further work needs to be done to develop techniques to apply GCM-predicted trends at site scale to facilitate local ecohydrological response studies. Given the QTP’s unique altitude-controlled climate pattern, the applicability of the quantile–quantile (Q-Q) adjustment approach for this purpose remains largely unknown and warrants investigation. In this study, this approach was evaluated at 36 sites to ensure the results are representative of different climatic and surface conditions on the QTP. Considering the practical needs of QTP studies, the study aims to assess its capability for downscaling monthly GCM simulations of major variables onto the site scale, including precipitation, air temperature, wind speed, relative humidity, and air pressure, based on two GCMs. The calibrated projections at the sites were veriﬁed against the observations and compared with those from two commonly used adjustment methods—the quantile-mapping method and the delta method. The results show that the general trends of most variables considered are well adjusted at all sites, with a quantile pair of 25–75% for all the variables except precipitation where 10–90% is used. The calibrated results are generally close to the observed values, with the best performance in air pressure, followed by air temperature and relative humidity. The performance is relatively limited in adjusting wind speed and precipitation. The accuracies decline as the adjustment extends into the future; a wider adjustment window may help increase the performance for the variables subject to climate changes. It is found that the performance of the adjustment is generally independent of the locations and seasons, but is strongly determined by the quality of GCM simulations. The Q-Q adjustment works better for the meteorological variables with fewer ﬂuctuations and daily extremes. Variables with more similarities in probability density functions between the observations and GCM simulations tend to perform better in adjustment. Generally, this approach outperforms the two peer methods with broader applicability and higher accuracies for most major variables.


Introduction
As one of the greatest challenges we confront in the 21st century, climate change is already manifest in changes to climatic, coastal, oceanic, and land surface systems, giving rise to agricultural, political, economic and energy-related issues [1][2][3][4][5]. The warming effect of the greenhouse gas emissions has taken us far away from pre-industrial climatic conditions. Reliable climate predictions provide the scientific foundations for decision making, disaster

Meteorological Observations
Over 100 national meteorological stations are located within the boundaries of the QTP. Of these, 36 reference stations (as shown in Figure 1) were selected in our study for analysis. These stations were selected in consideration of their representativeness in a dry-wet climate zone and the topographies where they are located. At these sites, standard meteorological variables, as well as radiation variables (only available at partial sites), are measured. Rather than testing the approach over all of the variables, we chose five of them, including air temperature, wind speed, air pressure, relative humidity and precipitation as they have undergone obvious changes in the past decades on the QTP and are available in both the site observations and the GCM simulations [49,66,70]. The data at the selected sites were extracted from the daily meteorological dataset of basic meteorological elements provided by China Meteorological Administration. The daily data were aggregated to obtain monthly data, among which precipitation is an accumulative variable while the others are mean variables. After processing, the monthly data cover the period of 1970-2014. However, it should be noted that data for the years 1970-1972 is missing for the Pulan site, and for the Naqu site, the 1955-1969 data are additionally included to form a more extended time series in the experiments of varying adjustment windows and error propagation.

GCM Data
According to a recent study [58], the CMIP6 models are generally able to capture the spatial and temporal patterns of atmospheric variables on the QTP, although there is a persistent overestimation of precipitation and cold biases in winter and spring. Over 100 models are included in the project. The simulation data [16] are accessible via the World Climate Research Program (WCRP) CMIP6 Search Interface. The simulations cover the period of 1850-2100, of which the 1850-2014 period is deemed as the historical period [16]. While different outputs have been provided in these simulations, fundamental variables, such as the ones we selected for the present analysis, are all provided. To better evaluate the potential impact of GCM quality on the adjustment results, we chose two different simulations rather than one single GCM: the Earth System Model (ESM) collaboratively developed by the European consortium (EC-Earth 3, or EC for short) and a higher-resolution version of the Max Planck Institute ESM (MPI-ESM1.2-HR, or MPI for short). EC is an ESM combining the feedback from and processes of diverse climatic components including atmosphere, ocean, sea ice, land surface, dynamic vegetation, atmospheric composition, ocean biogeochemistry and the Greenland ice sheet with boundary data exchange with other components allowed. The horizontal resolution of its atmospheric components is 78 km and the vertical resolution is 91 layers [73,74]. MPI (0.9 • spatial resolution) is an improved version based on its lower-resolved version MPI-ESM1.2-LR (1.9 • spatial resolution), which is the baseline for CMIP6. Its horizontal resolution is~100 km and its vertical resolution is 95 layers [75,76]. Though overestimation in temperature mean values and variations tend to be unavoidable, the MPI model showed a generally satisfactory performance in most scenarios [77].

The Quantile-Quantile Adjustment Approach
The Q-Q approach was originally developed to adjust the simulated results of a future period based on the daily observations and RCM simulations in a control period [39]. In this study, this approach was adapted to adjust monthly GCM simulations in consideration of the practical needs of climate change studies on the QTP. It employs a nonparametric function characterizing the differences between the distributions of the monthly observations and GCM simulations, which is then applied to remove the GCM biases in the projected period. The adjustment is performed in two successive periods of the same length. The first period is the control period, in which both the site observations and the GCM simulations are available. The second period is a projection period where only the GCM simulations are available and the site projection is unknown. Calibration is thus made to estimate the site-scale projection by adjusting the GCM simulations in the projection period. For a long time series divided into successive equal-length periods where the observations in the first period are known, the Q-Q approach operates for the next period by taking the first period as the control period, and iterates along the time series by moving one period forward. The process for adjusting a certain GCM variables can be described by the following equation (Equation (1)): where o denotes the observations known in the control period, and p is the adjusted sitescale projections in the second period (i.e., calibrated future projection) with the same length as the control period. Subscript i indicates the ith month in both periods. The second term in the right-hand side represents the effect of the difference between the GCM simulations in the control and projection periods, where ∆ is the mean difference between the pair-wise GCM simulations across the two periods, and g is a parameter used to correct the systematic error by redressing the biases between the observed mean and the simulated mean in the same control period. The right-hand third term considers monthly variabilities with respect to the mean variability, where ∆ i is the difference anomaly for the ith month with respect to the mean difference (∆), and f is an adjusting factor considering both distributions of observations and GCM simulations in the same control period. The variables in Equation (1) can be given by the following equations (Equations (2)-(5)): where s f i and s ci are the GCM simulation values for the ith month in the projection (denoted by the subscript f ) and control (denoted by subscript c) periods (i.e., raw future simulation and raw control simulation), respectively. ∆ i is the difference between the future simulation and control simulation for the ith month. s f and s c denote the means of the future and control simulation values respectively. N is the total number of monthly values of a certain climatic variable in either period. N equals 180 for a 15-year time window and 360 for a 30-year window. f can be approximated as the quotient of the standard deviation of the observation distribution (σ 0 ) in the control period over the standard deviation of the raw control simulation (σ s c ) in the same period. As standard deviations are more sensitive to outliers, they can be substituted by IQ R| o and IQ R|s c , which are the interquantile ranges on the observation and raw control simulation distributions. Figure 2 illustrates the intuitive meanings of the variables involved in the Q-Q approach given the known distributions of the observations (control observed) and the known simulations in both periods (raw control simulation and raw future simulation). The variable of air temperature is used for example and the percentile pair of 25th and 75th is used as the interquantile range. The adjustment results depend on two important factors, g and f , which serve as weights to modify the impact of the average difference between the periods and of the specific differences. When no systematic biases exist between the observations and GCM simulations, g equals 1. Otherwise, g is altered to reflect the GCM biases. Equation (5) assumes proportionality between the GCM simulations and the observations in the control and projection periods. In the case of adjusting air temperature, however, g in Equation (5) may cause problems because as an interval variable, the ratio of air temperatures makes no sense. Meanwhile, when the denominator (s c ) comes to zero, g will go astray. On the other hand, f is highly related to the distribution characteristics of the simulations and the observations. In order to reduce the impact of extreme values, IQR is used to estimate this factor. In the previous study [39] the 25th and 75th percentiles are recommended for all variables except precipitation, for which the 10th and 90th percentiles are taken.
Atmosphere 2021, 12, x FOR PEER REVIEW 6 of 39 Figure 2. Illustrations of the variables involved in the quantile-quantile (Q-Q) adjustment approach with an example of air temperature on the Naqu site, with the control period being 1970-1984 and projection period being 1985-1999. Left shows the probability density functions (PDF) and the right, the cumulative density functions (CDF). The notations used are consistent with those in Equations (1)- (6). The lines of calibrated future projection represent the results after the adjustment. The vertical straight lines in the PDF subplot represent the means of the observations ( ̅ ) and raw simulations ( ̅ ) in the control period and the mean of the simulations in the projection period ( ̅ ). The percentile pairs used are the 25th and 75th and their corresponding values are described by (25%) and (75%) in the CDF subplot. Raw control simulation and raw future simulation data are from the GCM simulations; control observed data are the site observations or the calibrated results produced in the preceding adjustment; and calibrated future projection data are the results desired. The illustrative plot is created by the authors with reference to the concept of Yang et al. [78] The adjustment results depend on two important factors, and , which serve as weights to modify the impact of the average difference between the periods and of the specific differences. When no systematic biases exist between the observations and GCM simulations, equals 1. Otherwise, is altered to reflect the GCM biases. Equation (5) assumes proportionality between the GCM simulations and the observations in the control and projection periods. In the case of adjusting air temperature, however, g in Equation (5) may cause problems because as an interval variable, the ratio of air temperatures makes no sense. Meanwhile, when the denominator ( ̅ ) comes to zero, g will go astray. On the other hand, is highly related to the distribution characteristics of the simulations and the observations. In order to reduce the impact of extreme values, is used to estimate this factor. In the previous study [39] the 25th and 75th percentiles are recommended for all variables except precipitation, for which the 10th and 90th percentiles are taken.

Experimental Design
The recommendation of the quantile ranges in the original study [39] is not based on rigid mathematics. Therefore, it is critical to conduct extensive verifications using sites that are distinct in topography and climate type. A few technical questions require clarification: 1. How does this approach perform when applied to inhomogeneous sites with the recommended settings? Does the performance vary by individual variable, in space and time? 2. How do factors such as the GCM quality, length of adjustment window, quantile pair, etc., influence performance? How do errors propagate when the approach iterates into the future? 3. What is the underlying mechanism that impacts performance? 4. Can it compete with other commonly used peers in terms of adjustment accuracy and effectiveness?
These questions outline the technical aspects of the motivation for this study. The evaluation processes include an overall assessment of the performance of the approach Figure 2. Illustrations of the variables involved in the quantile-quantile (Q-Q) adjustment approach with an example of air temperature on the Naqu site, with the control period being 1970-1984 and projection period being 1985-1999. Left shows the probability density functions (PDF) and the right, the cumulative density functions (CDF). The notations used are consistent with those in Equations (1)- (6). The lines of calibrated future projection represent the results after the adjustment. The vertical straight lines in the PDF subplot represent the means of the observations (O) and raw simulations (s c ) in the control period and the mean of the simulations in the projection period (s f ). The percentile pairs used are the 25th and 75th and their corresponding values are described by Q (25%) and Q (75%) in the CDF subplot. Raw control simulation and raw future simulation data are from the GCM simulations; control observed data are the site observations or the calibrated results produced in the preceding adjustment; and calibrated future projection data are the results desired. The illustrative plot is created by the authors with reference to the concept of Yang et al. [78].

Experimental Design
The recommendation of the quantile ranges in the original study [39] is not based on rigid mathematics. Therefore, it is critical to conduct extensive verifications using sites that are distinct in topography and climate type. A few technical questions require clarification:

1.
How does this approach perform when applied to inhomogeneous sites with the recommended settings? Does the performance vary by individual variable, in space and time? 2.
How do factors such as the GCM quality, length of adjustment window, quantile pair, etc., influence performance? How do errors propagate when the approach iterates into the future? 3.
What is the underlying mechanism that impacts performance? 4.
Can it compete with other commonly used peers in terms of adjustment accuracy and effectiveness?
These questions outline the technical aspects of the motivation for this study. The evaluation processes include an overall assessment of the performance of the approach and analyses of possible influencing factors, as well as comparisons with peer downscaling methods to further prove its advantages. In order to assess the applicability of the approach on the QTP, adjustments at all reference sites on the QTP were conducted. The performance on different meteorological variables and in various seasons was examined and its spatial variations were analyzed.
To test how the performance changes as the calibration extends into the future, the temporal spanning of the selected 36 reference sites from 1970 to 2014 was divided into three successive 15-year periods. The first adjustment was targeted at GCM simulations in the second period (1985 to 1999). In this process, observations and GCM simulations from 1970-1984 served as control observed (control) values (o i ) and raw control simulated values (s c i ), respectively, while GCM simulations from 1985 to 1999 were used as the simulated raw future (s f i ). Upon completion, the results of the first adjustment serve as the control for the second adjustment, whose results correspond to the 2000-2014 window. Note that for calibrated values of a specific variable exceeding the value domain during an adjustment, the values closest to the domain boundary were used as substitutions before entering the subsequent adjustment. For example, if the precipitation calibration results return values below zero in the first calibration, o i at the i-th month during the second adjustment would be set as 0. This problem is inherent to this approach [79] as the difference terms (both the second and third terms) on the right-hand side of Equation (1) can be potentially negative, thus leading to unrealistic values after the adjustment. Moreover, when calculating the performance metrics, the same substitution processes were conducted to ensure the physical sense of the results, while for figures showing the adjusted results, we intentionally leave the values as is in order to better show the direct results after the adjustment.
Potential influencing factors were analyzed and assessed. Two GCMs are both employed to test whether the quality of the original GCM affects the adjustment results. The influence of time window length was analyzed by selecting two different time windows (15-year and 30-year) and comparing the results of the same projection period. To support this, we carried out the analysis using a longer time series of observations at Naqu from 1955-2014, which is divided into four periods using a 15-year window and two periods using a 30-year window. Note that except for this purpose, the data span at Naqu as one of the 36 selected sites starts from 1970, unless otherwise stated. Different interquantile ranges were tested and compared to analyze whether quantile ranges exert a sizeable influence on the results and whether the recommended quantiles by [39] are the optimized choices as well for use on the QTP. Finally, possible statistical explanations contributing to performance differences were investigated through the PDF and CDF curves of each variable. Two univariate downscaling methods commonly found in the QTP studies, the quantile mapping method (Q-M method) [80] and the delta downscaling method [26], serve as comparisons for adjustment to further assess the performance of the Q-Q approach in terms of accuracy and effectiveness.
Two special experiments were designed for precipitation and air temperature. Precipitation is reported to be overestimated in GCM simulations and the quality is considerably affected by the failure in predicting extreme events. Thus, in order to assess the impacts of precipitation extremes in the adjustment, we designed a contrasting experiment where daily precipitation extremes are removed. Specifically, based on the site observations, we removed those daily precipitation observations that were larger than the yearly 95th percentile, which is decided from the indices recommended by the Expert Team on Climate Change Detection and Indices (ETCCDI) [81][82][83], before adding them up as a monthly value, and on the other side, we multiplied monthly GCM simulation values [84] by 0.95 to address the overestimation problem in the GCM. After this processing, we obtained a statistical relationship for the Q-Q approach with daily extreme rainfalls removed from the monthly data serving as the control observations, using which, all GCM simulations in the projection periods were adjusted, in contrast to those without the removal of daily extremes. Air temperature differs from other variables considered by its domain covering both negative and positive values. By its definition, the factor g becomes abnormally large when the mean simulated air temperature in the control period (S c ) in Equation (5) equals or is very close to zero, thus causing discrepancies in the adjusted results. To verify this issue, we conducted an extra experiment where g is best fitted via a least-square method using the data of two preceding periods, in contrast to that which follows Equation (5).

1.
Quantile mapping (Q-M) method The Q-M method employs a transformation function to establish the exact relationship between the CDFs of GCM simulation on a macroscale and the observation of the site scale during the control period, which can be described as: where F (S c ) is the CDF of GCM simulation in the control period and F (o) is the CDF of the observation during the same period. G(·) is the transformation function for the adjustment.
Assuming that this relationship constantly works for the projection period, the inverse function can be applied to the future simulation to the site scale so that biases between site projections and GCM simulations can be adjusted [84][85][86][87].

Delta downscaling method
The adjustment term of the delta downscaling method is a fixed value determined by calculating the discrepancy between the mean observation (Xo, where X denotes any variables except precipitation) and the mean GCM simulation (X s c ) during the control period [26,88].
where x f (t) is the calibration result in the t-th month of the projection period and x s f (t) is the future GCM simulation at the corresponding month. Let Z denote precipitation. Equation (8) is transformed as Equation (9) by using a multiplicative form to restrict unnecessary overestimation.
where z c (t), z s f (t), Z o and Z s c are precipitation counterparts of Equation (8).

Evaluation Metrics
The performance metrics used in this study include Pearson's correlation coefficient, root mean square error (RMSE), mean absolute error (MAE) and normalized mean absolute error (NMAE).
The Pearson's correlation coefficient (r) measures the association between two variables [89,90] and in this study is measured between the calibrated results/the GCM simulations and the corresponding observations following Equation (10): where Y i is a value of the i-th month in the validation dataset andŶ i is the value in the prediction dataset (the GCM simulations or the calibrated results) in the corresponding month. N is the total number of values of a certain climatic variable. The range of the correlation coefficient is [−1, 1]. The closer the coefficient is to 0, the less correlated the predictions are to the validations. Positive coefficient means positive association between the two variables, while negative corresponds to inverse association. RMSE is defined as follows: where the same variable notations as Equation (10) are used. The RMSE is highly sensitive to outliers and can be advantageous when assessing model performance [91]. MAE and NMAE are computed using the following equations: The MAE is an explicit and unambiguous measure of model errors and can best represent the intercomparisons of model performances [92]. To enable cross-variable comparisons, the MAEs are normalized as NMAE [93], as in Equation (13).  (Table 1), as well as Pearson's correlation coefficient (r) ( Table 2). In Table 1, the average metrics measured against the observations over the 36 reference sites are present for the raw GCM simulations before the adjustment and the calibrated results of the first and second adjustments. The raw GCM simulations here and the results of the first adjustment, correspond to the same 1985-1999 period. By comparing them, for all the variables except precipitation and EC wind speed, RMSE, MAE and NMAE are considerably reduced after the adjustments. Meanwhile, higher r values have been measured for all the variables except air temperature in the calibrated results against the observations than the raw simulations (Table 1). Although the median of r for air temperature is slightly lower (marked by the downward arrow in Table 2) in the calibrated results than the raw GCM simulations, the minimum values have greatly increased from 0.44 (EC) and 0.47 (MPI) in the raw GCM simulations to 0.90 and 0.84, respectively. It means the adjustment of air temperature works well for all the sites, while the raw GCM simulations are not equally good at those sites. These results consistently indicate that the Q-Q adjustment approach can be well applied for most variables on the QTP.
The best calibrated results come from air pressure, as can be seen in Figure 3e,f, where the patterns indicated by both the GCMs are roughly the same, with both R-squares close to one. The errors of raw GCM simulations of air pressure are over 300 hPa, and are reduced to one hundredth by the adjustment ( Table 1). The r values increase from negative ones (−0.41 in EC and −0.22 in MPI) to about 0.60. The projections of relative humidity from both GCMs shown in Figure 3a,b are relatively well-calibrated, with RMSE and MAE/NMAE reduced by two-thirds and correlation increased from −0.60 to 0.60 for both GCMs. As for the results of air temperature, the calibrations of GCM MPI illustrate better consistency with the observations than GCM EC validations, with a coefficient of determination (R-square) value of 0.9, as shown in Figure 3h,g. Meanwhile, Figure 3g (corresponding to GCM EC) shows an apparent overestimation for the temperatures higher than −10 • C and underestimation below this temperature by the adjustment. Table 1 also demonstrates a better adjustment for the MPI temperature, with only half of the RMSE and MAE/NMAE values of the EC temperature. As shown in Table 2, the adjustment achieves no obvious improvements on air temperature at most sites, although at some sites with low-quality GCM simulations, the improvements become significant. For wind speed, as indicated by Figure 3i,j, the calibrated projections are not good enough, with the R-square values of the fitting lines for both GCMs falling well below 0.5, although the aggregating patterns of the results implied by the densities are normal. Moreover, while the value of wind speed is always positive, a small proportion of the points have been adjusted to negative. Poor performance of adjustment is found on precipitation as presented in Figure 3c,d, where only the points with positive values after the adjustment are displayed. After the adjustment, considerable quantities of the calibrated values of precipitation are wrongly adjusted to below zero, which should be post-corrected. However, the adjustment does improve to a certain degree the agreement between the two GCM projections with the observations with respect to the correlation coefficient ( Table 2). For some variables, such as wind speed and precipitation, the correlation coefficients measured are not really statistically significant, though all negative values of the coefficient become positive after the adjustment. Overall, the best performance of the adjustment comes from air pressure, followed by relative humidity and air temperature. The improvements in wind speed are limited and those in precipitation are relatively poor. and MAE/NMAE values of the EC temperature. As shown in Table 2, the adjustment achieves no obvious improvements on air temperature at most sites, although at some sites with low-quality GCM simulations, the improvements become significant. For wind speed, as indicated by Figure 3i,j, the calibrated projections are not good enough, with the R-square values of the fitting lines for both GCMs falling well below 0.5, although the aggregating patterns of the results implied by the densities are normal. Moreover, while the value of wind speed is always positive, a small proportion of the points have been adjusted to negative. Poor performance of adjustment is found on precipitation as presented in Figure 3c,d, where only the points with positive values after the adjustment are displayed. After the adjustment, considerable quantities of the calibrated values of precipitation are wrongly adjusted to below zero, which should be post-corrected. However, the adjustment does improve to a certain degree the agreement between the two GCM projections with the observations with respect to the correlation coefficient ( Table 2). For some variables, such as wind speed and precipitation, the correlation coefficients measured are not really statistically significant, though all negative values of the coefficient become positive after the adjustment. Overall, the best performance of the adjustment comes from air pressure, followed by relative humidity and air temperature. The improvements in wind speed are limited and those in precipitation are relatively poor.   Table 1. Averaged performance of the Q-Q approach for monthly variables from EC-Earth 3 (EC) and MPI-ESM1.2-HR (MPI) GCM simulations over the 36 selected sites on the QTP against the site observations in terms of root mean square error (RMSE), mean absolute error (MAE) and normalized mean absolute error (NMAE). The results for the first and second adjustments refer to the calibrated results in the 1985-1999 and 2000-2014 periods, and the raw simulation refers to the uncalibrated GCM simulations in 1985-1999 as a reference before the adjustment. The second adjustment was made based on the results of the first adjustment. The units of RMSEs and MAEs are the same as those of the variables, while NMAE is dimensionless. N = 180 for each site.

Raw Simulation
First  Table 2. Stats of Pearson's correlation coefficient (r) at the 36 QTP sites before (Raw) and after (Calibrated) the first adjustment (1985-1999) based on two GCM simulations (EC and MPI). The coefficients are measured against the observations. The medians ahead the parentheses, and the minima and maxima within the parentheses. The upward and downward arrows indicate the gain and loss in r from the adjustment compared to the raw simulations. N = 180 for each site.  Figure 3 and the metrics still cannot provide a comprehensive view of the behaviors of the adjustment. Therefore, we looked into the calibrated results at a specific site. Figure 4 exhibits the time series of the GCM variables before and after the adjustment at the Naqu site in the 2000-2014 period. Note that three adjustments were made at Naqu using 1955-1969 as the initial control observations. The results for the 2000-2014 period corresponds to the last iteration of the adjustment, with most uncertainties due to error propagation through the three adjustments. The closer the solid lines in Figure 4, the better the performance of adjustment is made for the variable. Similar to the previous findings, the approach's best performance at this site comes from air pressure. The obviously overestimated raw GCM simulations are satisfactorily corrected to the observation level ( Figure 4e). The next comes to air temperature. It is clear that the raw GCM simulations match well with the observations in trend, albeit varying in magnitude. The adjustments succeed in reducing the peaks of the raw GCM simulations down close to the observations. The results based on EC are subject to underestimation and those based on MPI well approach the observations. In the case of relative humidity, the patterns of the two GCMs' original simulations do not agree with the observations. However, after adjustment, the patterns of the calibrated results became much more synchronized with the observations despite amplitude discrepancies preserved. While the approach works satisfactorily even in the third adjustment for these three variables at Naqu, it fails to work effectively for precipitation and wind speed. The adjustment to precipitation roughly captures major peaks and valleys of the observations, whereas it apparently overestimates the precipitation peaks in summers and wrongly calibrates low precipitation in falls and winters. Relatively poor performance is also seen on wind speed, with large discrepancies throughout the time series, although similar change trends are observed.
results based on EC are subject to underestimation and those based on MPI well approach the observations. In the case of relative humidity, the patterns of the two GCMs' original simulations do not agree with the observations. However, after adjustment, the patterns of the calibrated results became much more synchronized with the observations despite amplitude discrepancies preserved. While the approach works satisfactorily even in the third adjustment for these three variables at Naqu, it fails to work effectively for precipitation and wind speed. The adjustment to precipitation roughly captures major peaks and valleys of the observations, whereas it apparently overestimates the precipitation peaks in summers and wrongly calibrates low precipitation in falls and winters. Relatively poor performance is also seen on wind speed, with large discrepancies throughout the time series, although similar change trends are observed.   Figure 5 illustrates the spatial distribution of the NMAE values of different climatic variables before and after the first adjustment (projection period being 1985-1999) at the 36 selected sites on the QTP, together with the 15-year mean of each variable. NMAE is used to clarify the inter-variable comparability. Generally, the accuracies of the projections significantly improve after the adjustment, as indicated by the small circle sizes spreading out in Figure 5. In this figure, the values of NMAE are divided into five classes, and columns 2 and 3 have consistent divisions. According to the circle sizes on the maps, the ranking from the most to least accurate is air pressure, relative humidity/MPI air temperature, wind speed/EC air temperature, and precipitation, which is in consistence with the performance ranking revealed in the previous section.  In space, the small circles on the air pressure (Figure 5n,o) and relative humidity (Figure 5k,l) maps are almost uniform in size, meaning the adjustment for the two variables performs equally well everywhere, independent from the GCM simulation used. However, the MPI air temperature map (Figure 5f) highly differs from the EC air temperature map (Figure 5e) in circle size, although on each map itself the circle sizes are roughly the same. It suggests that no spatial variation exists in the adjustment performance of air temperature, and the resulting accuracies based on MPI are better than those based on EC. For EC air temperatures, the approach tends to generate better results in low-temperature areas on the northern plateau and in high-altitude areas as can be seen in Figure 5d. Wind speed exhibits relatively good spatial uniformity before the adjustment, as indicated by the most majority of circles being in dark blue in the EC wind speed map (Figure 5h) and in light blue in the MPI map (Figure 5i). The circle sizes show some spatial variations, indicative of inhomogeneous adjustment performance in space. Such inhomogeneity is more evident on the EC map (Figure 5h) than in the MPI map (Figure 5i). Errors in the calibrated results are greater on the northern plateau than those on the southern, probably because of the existence of westerlies in the north QTP [94].

Spatial Variations of Performance
As seen in Figure 5b,c showing precipitation, huge errors exist in the raw simulations (indicated by orange and red) as well as the calibrated results (indicated by the large size of circles) throughout all sites. Before adjustment, precipitation NMAE values range from 1.26 to 13.37 and after the adjustment, they are reduced to a range between 0.45 and 0.95. In the EC map (Figure 5b), the largest errors occur only in the Qaidam Basin on the north QTP, while in the MPI map (Figure 5c), the largest ones pervade the majority of the QTP, apart from the southeast portion. Despite huge errors persisting through the adjustment process and widespread throughout the entire region, the performance of the approach on precipitation seems unrelated to the spatial locations. Table 3 shows the RMSE and MAE values averaged over the 36 sites on the QTP before and after the first adjustment (projection period being 1985-1999) stratified by season based on the GCM EC simulations. We also include the seasonal changes of errors based on the GCM MPI in Table A1 (Appendix A). Similar to the whole year case, for relative humidity, air pressure and air temperature, the errors in the raw GCM simulations have been significantly reduced by the adjustment in the four seasons, while for wind speed and precipitation, the errors remain at the same level or even become larger. For relative humidity, the largest errors exist in winter, followed by summer, spring and autumn for both GCMs before adjustment. Although after adjustment the sequence became winter, autumn, spring and summer, differences between errors for different seasons are paltry. For air pressure, the adjustment results are highly consistent with the observations all year, and differences between seasons are minor. For air temperature, the greatest errors are found in winter both before and after adjustment. Spring and summer follow. Generally, differences between seasons after adjustment are much less conspicuous than the raw GCM data.

Seasonal Variations of Performance
Both GCMs indicate maximum RMSE and MAE values for precipitation in summer, followed by autumn, spring, and winter. After adjustment, the ordering has slightly changed. Autumn precedes spring for EC, whereas it was reversed for MPI, although the maximum and minimum errors retain in summer and winter, respectively. As expected, summer holds errors far exceeding those of other seasons in any GCMs after the adjustment. The failure of the adjustment in summer may link to the occurrence of extreme precipitation events in this season. Despite this, the similar finding as relative humidity that biases between seasons are reduced can still be made.
The RMSE and MAE values of wind speed from the two GCMs are significantly larger in winter and spring than those in summer and autumn, when wind speed on the QTP is apparently smaller than the previous two seasons, as can be seen from Table 3. Note that in some cases, such as calibrated winter results from EC, the RMSE and MAE values after adjustment are larger than the raw simulations. It can probably be related to the presence of strong westerly winds on the QTP in winter [94], which might cause more extremes and variabilities in wind events, adversely affecting the performance.
Overall, the Q-Q approach effectively reduces the errors in the projection periods and is applicable to different variables from the GCM simulations in all seasons. In terms of seasonal variations in performance, the seasonal distinction of errors in the calibrated results is more likely attributed to the raw GCM simulation qualities and extreme events than to the approach itself. Similar findings can be obtained based on the GCM MPI simulations as presented in Table A1. Figure 6 illustrates the calibrated results for the second adjustment (projection period corresponding to 2000-2014), with the same notations applied as in Figure 3, which shows the results of the first adjustment (1985)(1986)(1987)(1988)(1989)(1990)(1991)(1992)(1993)(1994)(1995)(1996)(1997)(1998)(1999). By comparing the two figures, the same patterns can still be observed after the second adjustment. Generally, the approach demonstrates good performance for multiple successive adjustments. For relative humidity, the clustering characteristics of points become less prominent after the second adjustment. The decreases in R-squared for wind speed are more significant, while for air pressure, the best agreements still maintain in the results of the second adjustment. For air temperature, the over-and under-estimations of high and low temperatures in GCM EC become more obvious in the results for the second adjustment, whereas an underestimation trend of low temperature appears in GCM MPI. For precipitation from both GCMs, the performance further deteriorates. Atmosphere 2021, 12, x FOR PEER REVIEW 18 of 39   As indicated in Table 4, the approach is effective regardless of the time window selected, and in most circumstances, the two GCMs demonstrate similar patterns with respect to the variations in error metrics. For air pressure, both projections based on the 15year and 30-year windows are highly accurate and the differences between the calibrated results are negligible. For precipitation from both GCMs, air temperature and wind speed from MPI, the errors using the 30-year window are noticeably smaller than those from the second and third adjustments using a 15-year window, the third adjustment especially suffering from the accumulated error. For EC temperature, although the second adjustment for 15-year-window adjustments outperforms the 30-year-window adjustment, the As shown in Table 1, the errors for the second adjustment are generally greater than those of the first adjustment in terms of RMSE and MAE, indicating that errors accumulate through the iteration of the adjustments as the process extends into the future, though the accumulation of errors through the adjustments is not much and the overall performance for the second adjustment is acceptable. It is attributed to the dependence of succeeding adjustments on the calibrated results of the preceding adjustment as the control. It might also be partially related to the changes in the future climate, leading to an altered distribution of the atmospheric variable of interest.

Length of Adjustment Window
An individual site, Naqu, which has a longer series of observations , is chosen to further investigate the error propagation in a third adjustment. The results are shown in Table 4. Compared with the second adjustment, the errors further increase in the third adjustment, indicating that the results indeed worsen with the proceeding of adjustment. Note that for some variables such as wind speed, the errors of the third adjustment decrease to the contrary, probably due to the compensation effect of the accumulation of errors related to the climate change on the way. Air pressure and humidity are more stable than other variables, allowing the adjustment to rely less on the first control. In contrast, precipitation and air temperature projections obviously deteriorate along the adjustments.   Table 4 also presents the calibrated results using a 15-year adjustment time window versus a 30-year window at the Naqu site. The 15-year adjustments are based on the 1955-1969 observation data as its first control, while the 30-year adjustment makes use of the observations from 1955 to 1984 as its control. The projection period of the 30-year adjustment is 1985-2014, which temporally covers the second and third adjustments using a 15-year window.

Length of Adjustment Window
As indicated in Table 4, the approach is effective regardless of the time window selected, and in most circumstances, the two GCMs demonstrate similar patterns with respect to the variations in error metrics. For air pressure, both projections based on the 15-year and 30-year windows are highly accurate and the differences between the calibrated results are negligible. For precipitation from both GCMs, air temperature and wind speed from MPI, the errors using the 30-year window are noticeably smaller than those from the second and third adjustments using a 15-year window, the third adjustment especially suffering from the accumulated error. For EC temperature, although the second adjustment for 15-year-window adjustments outperforms the 30-year-window adjustment, the performance in the third adjustment becomes worse. Relative humidity is the only variable that exhibits smaller RMSE and MAE values in the consecutive adjustments using the 15-year time window rather than the 30-year window.
Generally speaking, for variables apparently influenced by climate changes, including precipitation, air temperature and wind speed, the performance of the approach can be improved by extending the time window, for instance, from 15-years to 30-years. This is possibly because the longer control period contains more information for potential climate change and helps alleviate error propagation and accumulation. Meanwhile, variables that are relatively stable during the study period already show good performance in the 15-year-window adjustments, and using longer adjustment windows may bring forth more uncertainties and lead to mismatches between the control and the simulated data.

Impacts of Extreme Events
In order to inspect whether extreme climatic events influence the performance of the Q-Q approach, we examined the results of contrasting experiments of precipitation as an example. Due to the influence of the South Asian Summer Monsoon and the East Asian Summer Monsoon, most extreme precipitation events happen in summer [95]. From Table 3, it can be seen that errors in both the raw and adjusted simulations are the biggest in summer. As shown in Figure 4a, both GCMs have the largest discrepancies in the summer months from the observations on the Naqu site, and EC looks worse than MPI in summer precipitation, with larger error metric values as in Table 5. The adjustments imposed on both GCMs are not much help in removing these discrepancies in the summer months. It suggests that the occurrence of extreme events in the summer months can undermine the adjustment results. We compared the errors obtained from the contrasting experiment, i.e., whether daily extreme rainfalls were removed from the monthly data in the control period, with the results shown in Table 5. The results show obvious decreases in RMSE and MAE at all three sites after the removal, which implies that the exclusion of daily extremes from the aggregate monthly data as the control can help build a more robust statistical relationship, which further reduces the errors. Due to the inability of GCMs to predict extreme climatic events, it is likely that the relationship established upon observations and GCM simulations in the control period cannot be preserved in the future period, resulting in considerable errors after the adjustment. Therefore, necessary pre-processing, such as excluding daily extreme values from the aggregate monthly data as the control, can help improve the performance of adjustment. We selected five sites as representative sites for the investigation (Table 6). Naqu, Gaize and Geermu represent the sites holding a negative g, a value approaching to g = 1 from the left, and a value from the right, respectively. Langkazi and Maduo are the sites with the largest (g = 12.56) and smallest (g = −4.39) values in the original form of g. The control period is 1970-1984 and the projection period is 1985-1999. The fitted g values were obtained via the least square method using the observations in the projection period and the MPI GCM simulations. It can be clearly observed that in most cases the RMSE values are reduced to a certain degree after using optimal g values. While the differences in g are not significant at Naqu, Gaize and Geermu, the optimized g values greatly differ from the original g values at the two sites with extreme g values (Langkazi and Maduo). At the Langkazi site, the optimal g for adjusting air temperature is found to be 0.66 and the resulting RMSE value has been considerably reduced from 7.50 • C to 3.71 • C. It indicates that the original form of g may occasionally fail to work in some cases, although it is generally applicable for most sites. Though a close-to-zero denominator seems to exacerbate the bias, it is not the full answer at the Langkazi and Maduo sites. In theory, this problem is related to the fact that g takes the form of a ratio, whereas air temperature is actually an interval variable. An interval variable is one where the difference between two values is meaningful, while the ratio of the two is not. A possible resolution is to optimize g values using observations in the first projection period, albeit at a cost of extra inclusion of one more period of observations than the original form. In Table 6, we also present the results of the Q-M as a reference to the experiment. Table 6. Comparison in adjusting air temperature by using fitted values of the g parameter over the original form of g (Equation (5)) in terms of RMSE at five representative sites. The control and projection periods correspond to 1970-1984 and 1985-1999, respectively. The GCM MPI is used. s c is the average air temperature in the control period on the site. Q-M stands for the quantile mapping method.

Choice of Quantile Range
Different quantile ranges were tested to assess the impacts of the quantile range and to determine an optimal pair of quantiles for the site on the QTP. The changes in NMAE for all the variables due to applying different quantile pairs based on the two GCMs are present in Figure 7. We also calculated the changes in RMSE and MAE and enclosed them in Appendix A (Tables A2-A4). The 25th and 75th quantile division (shortened as (25,75)) is recommended in the reference [39] for the variables, and except for that of precipitation, the 10th and 90th quantiles (shortened as (10,90)) are suggested. Quantile pairs (5,95) and (15,85) for precipitation and (15,85) and (30,70) for the other four variables were tested, and the results on three sites: Naqu, Wushaoling and Shiquan River, situated far from each other (Figure 1), are presented as examples. The sites include Naqu in the center of the QTP, Wushaoling in the north-eastern low-land mountain area, and the Shiquan River, which lies in the high-land river valley in the western part of the QTP. Overall, the differences in NMAE caused by the various choices of quantile pairs are relatively small (Figure 7). For both GCM simulations, no apparent superiority is found over the recommend quantile pair (25,75) for relative humidity, air pressure, air temperature and wind speed. Some differences in NMAE can be caused for precipitation, but their relative changes are still small. The choice of (15,85) looks a bit better than (10,90) for adjusting precipitation as for both GCMs, the values of NMAE (Figure 7), as well as RMSE and MAE (Tables A2-A4), slightly decline at all the three sites. In contrast, the choice of (5,95) is likely to result in performances varying at sites and with the GCM data used. However, compared to the huge discrepancies in the precipitation projection inherent in the GCM simulations, the gains by using (5,95) in place of (10,90) are rather marginal. Therefore, the recommended pair of 25%-75% works well for adjusting relative humidity, air pressure, air temperature and wind speed across all sites on the QTP. The quantile division 10%-90% for precipitation is also acceptable, although in our test, (15,85) works slightly better than (10,90) using either GCM data on the study sites.

GCM Quality
All our analyses are conducted based on both EC and MPI simulations. As seen from Table 1, both GCM simulations are greatly biased in projecting precipitation (NMAE exceeding 1), followed by air temperature (NMAE close to 1), and are better in the rest variables (about 0.5 in NMAE). The differences between EC and MPI in terms of the error metrics are not obvious. However, MPI looks better in projecting air temperature, as illustrated in Figure 3g,h, where points in MPI are more concentrated around the 1:1 line. It is clear that the accuracies of the calibrated results are highly controlled by the quality of the raw GCM simulations. For precipitation, because of huge biases in the raw GCM simulations, the adjustment seems in no way to eliminate those errors and thus leaves considerable errors in the calibrated results. In the analysis of seasonal differentiations in Table 3, it is clear that the performance of the approach in itself does not display many differences between the four seasons. The main cause of seasonal variances in RMSE and MAE values is the large differences in the raw GCM qualities. As revealed through the spatial analysis, the prevalent distributions of the largest circles in orange and red in Figure 5b,c and small circles in dark blue in Figure 5n,o further confirm the decisive influences of the quality of GCM simulations on the adjustment accuracies. Those results indicate that the more accurate the raw GCM simulation is, the better the adjustment results will be.

Dependence on the Variable Distribution
The Q-Q adjustment approach uses the inter-quantile range to replace standard deviations (Equation (6)), and applies different quantile ranges for precipitation and other variables. Therefore, we investigated the changes in PDF and CDF curves during the successive adjustment process using the Naqu observations and MPI simulations as an example. The results are displayed in Figure 8 with a good case of adjusting air pressure and in Figure 9 with a bad case of adjusting precipitation. In addition, the changes in CDF for all the variables concerned are provided in Figure 10. The PDFs for the rest variables: air temperature ( Figure A1), relative humidity ( Figure A2), and wind speed ( Figure A3), can be found in Appendix A. From CDFs in Figure 10, it can be observed that for variables showing good performances like relative humidity and air pressure, the control CDFs (dashed green line) and GCM simulations (dashed purple line and solid red line) are basically parallel to each other, while for variables with inferior performance like precipitation and wind speed, the curves show the different shapes and crossings of the curves that exist in the plots, indicating varied distributions from the control and the GCM simulations. For air temperature, the parallelism between curves is present at high temperatures, while in low-temperature ranges, the curves are somehow intersected. It can also explain the findings in Figure 3 and Table 3 that the approach shows greater errors when applied in low-temperature cases (in winter).     As seen from the figures, for variables showing good performance using the approach, distribution patterns of GCM simulations are generally in good agreement with the control observations. For example, the PDF patterns of air pressure shown in the bottom row of Figure 8 (MPI simulations) and Figure 8a (control observations) are largely similar. For variables with relatively bad performances, obvious differences exist between the control observation patterns and simulation distributions. For instance, in Figure 9, the control observations of precipitation show that the number of zero rainfalls far exceeds other values, and a small peak exists at approximately 60 mm. In GCM simulations of precipitation shown in the bottom row of Figure 9, however, the peak is around 30 mm and the proportion of zeros is considerably smaller. A similar mismatch can be observed in the case of wind speed ( Figure A3), which also has a relatively bad adjustment performance using this approach.
In addition, it is important for the representativeness of the PDF in the control period to the future PDFs. In the case of relative humidity ( Figure A2), the match in PDF between the control and the GCM simulations is good, but according to the validation data (the middle row, Figure A2), the control PDF pattern cannot well represent the patterns in the future periods. In other words, relative humidity is subject to long-term climate change, which alters its PDF pattern in each subsequent period. Because all the future adjustments depend on the statistic characteristics of the first control observations, the performance becomes worse when the future does not hold the same statistic characteristics. Therefore, variables subject to obvious climate changes suffer such penalties when using this approach. In the case of air temperature ( Figure A1), it looks like a result of the combined effects of the two aspects.
From CDFs in Figure 10, it can be observed that for variables showing good performances like relative humidity and air pressure, the control CDFs (dashed green line) and GCM simulations (dashed purple line and solid red line) are basically parallel to each other, while for variables with inferior performance like precipitation and wind speed, the curves show the different shapes and crossings of the curves that exist in the plots, indicating varied distributions from the control and the GCM simulations. For air temperature, the parallelism between curves is present at high temperatures, while in low-temperature ranges, the curves are somehow intersected. It can also explain the findings in Figure 3 and Table 3 that the approach shows greater errors when applied in low-temperature cases (in winter).

Comparisons with Peer Methods
The calibrated results at the 36 QTP sites using the Q-M method and delta method are displayed in Figures 11 and 12 as scattering points. Table 7 shows the RMSEs and MAEs of the results averaged over the entire plateau for the first adjustment (1970)(1971)(1972)(1973)(1974)(1975)(1976)(1977)(1978)(1979)(1980)(1981)(1982)(1983)(1984) as the control period and 1985-1999 as the projection period) separately using the three methods, with errors after the second adjustment listed in Table A5 (Appendix A). By checking the two figures with Figure 3 generated by the Q-Q approach, as well as the corresponding columns in Table 7, it can be seen that the Q-M method (Figure 11a,b) and the delta method (Figure 12a,b) are ineffective for relative humidity on the QTP, whereas the Q-Q approach demonstrates relatively good performance on this variable. For precipitation (Figure11c,d and 12c,d) and wind speed (Figure11i,j and 12i,j), the Q-Q approach also outperforms the two peer methods, though its overall performance is not yet satisfactory. In terms of air pressure, the Q-Q approach has a narrower range of points distribution along the 1:1 line than the two peers, implying a better performance, which is also evidenced by smaller values of RMSE and MAE than the two peers (Figure11e,f and 12e,f, Table 7), although all the methods are well performed. characteristics of the variables in the simulation datasets, which allows the adjustment to not only preserve the general tendencies and patterns of the climatic variables, but also to avoid any overfitting problems that undermine the extrapolation capability of the method, ensuring its effectiveness for variables with different types of distributions. To do so, the Q-Q approach employs the quantile range ( | and | for ) instead of the exact CDF transfer function to improve the extrapolation ability and considers both internal average (Δ ̅ , ̅ and ̅ for ) and differential (Δ ′ ) characteristics in an attempt to deal with more extremes, variations and fluctuations.  The exception is air temperature. The Q-M method (Figure 11g,h) is superior in adjusting this variable over the Q-Q approach (Figure 3g,h) and the Delta method (Figure 12g,h) for any GCM data. This superiority is especially outstanding in adjusting the EC projections (Figure 11g), as the Q-M method provides better calibration of temperature with no apparent over-or under-estimations. However, the original form of the factor g in Equation (5) may seriously influence the performance of the Q-Q approach in adjusting air temperature. As presented in Table 6, by optimizing g values, the resulting RMSEs of the Q-Q approach at the five exemplified sites turn out to be comparable to or even better than those of the Q-M method. Therefore, caution is required when using the original form of g for any interval variables and further research is needed. The calibrated results of the two GCM simulations by the delta method (Figure 12g,h) seem to be the worst of the three, since the trend lines in Figure 12g,h appear to more severely deviate from the 1:1 line, though their R-squared values are quite comparable and both performances are acceptable.    All the methods suffer from the propagation of errors (the first adjustment in Table 7  and the second adjustment in Table A5) as the adjustments proceed over multiple periods, with decline rates in accuracy do not vary much from each other. This indicates that the dependence on the initial observations of the successive adjustments of those statistical methods is roughly the same. Overall, the Q-Q approach is advantageous in the majority of aspects, whereas in other cases its performances are also acceptable. These results suggest that the Q-Q adjustment approach has a more stable performance when handling variables of different distribution patterns. For most variables on the QTP, it shows stronger adaptability than the two peer methods.
The Q-M method depends on the exact matching of CDFs and quantiles to provide adjustments without considering long-term change trends of the variable, which is, however, very important in the context of climate change, while the delta method merely involves the mean values of the variables, which might increase the method's susceptibility to extreme events and varied statistical distributions. The unsatisfying adaptability and vulnerability of the Q-M method are attributable to the strong dependence of the Q-M result on the exact relationship of the CDFs of observation and simulation in the control period; such an overfitting inclination undermines the extrapolation ability of the method. The Q-Q approach combines both features considering the long-term trends and the CDF characteristics of the variables in the simulation datasets, which allows the adjustment to not only preserve the general tendencies and patterns of the climatic variables, but also to avoid any overfitting problems that undermine the extrapolation capability of the method, ensuring its effectiveness for variables with different types of distributions. To do so, the Q-Q approach employs the quantile range (IQ R| o and IQ R|s c for f ) instead of the exact CDF transfer function to improve the extrapolation ability and considers both internal average (∆, s c and O for g) and differential (∆ i ) characteristics in an attempt to deal with more extremes, variations and fluctuations.

Conclusions
This study has provided a comprehensive evaluation of the quantile-quantile (Q-Q) adjustment approach proposed by Amengual, A., et al. [39] for adjusting monthly general circulation model (GCM) simulations to the site scale at 36 sites across the Qinghai-Tibetan Plateau (QTP). The analyses of performance on different atmospheric variables in space and seasonality, as well as various influencing factors such as GCM quality, choice of quantile ranges, and length of adjustment window, are included. Two commonly used downscaling methods are selected as peer methods to compare with this approach. The conclusions drawn from this study are as follows: 1.
The approach has proven applicable for a wide range of monthly meteorological variables based on two GCMs (EC-Earth3 and MPI-ESM1.2-HR). It shows the best performance on air pressure, followed by air temperature and relative humidity. The performance is limited in adjusting wind speed and precipitation. When working with air temperature, the factor g as a ratio of two temperatures is likely to cause problems in some cases and thus limits the performance. However, the Q-Q approach outperforms the quantile-mapping method and the delta downscaling method, with broader applicability and higher accuracies in most of the major variables.

2.
By examining the performance of the Q-Q approach across 36 sites and using different quantile ranges, it can be found that the choice of quantile range exerts relatively small influences on the calibrated results for all the variables considered. The quantile pair 25-75% works well for air pressure, temperature, relative humidity, and wind speed, while the 10-90% works for precipitation across all sites. The use of other quantile pairs does not bring significant gains over the recommended ones. No significant spatial and seasonal variations in performance are found.

3.
Due to the dependence of the adjustment on the previous adjusted projection as the control along a series of adjustments, the accuracy of the results generally declines due to the accumulation of errors as the adjustment process extends into the future and with the impacts of climate change in the projection periods. Moreover, the lengths of the adjustment time window will influence the performance of the approach. A wider window can reduce the iteration times and include more climate change information, and is specifically helpful to improve the performance when variables with changing trends are projected.

4.
Accuracies of the calibrated results are highly influenced by the quality of the raw GCM simulations. The Q-Q adjustment works more effectively for the atmospheric variables with milder fluctuations and fewer extremes, since GCMs are generally limited in simulating future extremes and are thus unable to provide precise probability density function (PDF) and cumulative density function (CDF) curves, which reduces the performance. Our test shows that the weakened performance is highly related to mismatches of PDF/CDFs between the GCM simulations and the observations.
In general, the Q-Q adjustment approach is capable of providing site-scale GCM downscaling results and is generally advantageous compared to other univariate statistical adjustment methods. The main errors are sourced from the mismatching of statistical traits between raw GCM simulations and the observations, which is a common drawback of any statistical adjustment method. In addition, as a univariate statistical downscaling approach, the Q-Q approach is inherently limited in considering correlations between variables and the disadvantages of causing physical inconsistencies. There is space for further improvement, such as handling extreme events, especially for precipitation, optimizing the g factor when working with air temperature, and minimizing the influence of original GCM qualities; and multivariate adjustment approaches are worth exploring in future studies. Despite these limitations, the approach is overall recommended for the QTP studies where projections of monthly meteorological variables at sites are required.   Appendix A Table A1. Performance of this adjustment approach averaged over the 36 QTP sites stratified by seasons in terms of RMSE and MAE values before (raw) and after (calibrated) the first adjustment based on the GCM MPI data. The metrics are measured against the site observations. Same as Table 3 Table A2. RMSE and MAE values measured at Naqu between the calibrated results and the monthly observations in the 1985-1999 period using different pairs of quantiles: (25,75), (20,80), (30,70), except for precipitation where (10,90), (15,85) and ( Table A3. RMSE and MAE values measured at Wushaoling between the calibrated results and the monthly observations in the 1985-1999 period using different pairs of quantiles: (25,75), (20,80), (30,70), except for precipitation where (10,90), (15,85) and (5,95) were compared. Same as  Table A4. RMSE and MAE values measured at the Shiquan River site between the calibrated results and the monthly observations in the 1985-1999 period using different pairs of quantiles: (25,75), (20,80), (30,70), except for precipitation where (10,90), (15,85) and (5,95) were compared. Same as