Site-Adaptation of Modeled Solar Radiation Data: The SiteAdapt Procedure

The adaptation of modeled solar radiation data with coincident ground measurements has become a standard practice of the industry, typically requested by financial institutions in the detailed solar resource assessments of solar projects. This practice mitigates the risk of solar projects, enhancing the adequate solar plant design and reducing the uncertainty of its yield estimates. This work presents a procedure for improving the accuracy of modeled solar irradiance series through site-adaptation with coincident ground-based measurements relying on the use of a regression preprocessing followed by an empirical quantile mapping (eQM) correction. It was tested at nine sites in a wide range of latitudes and climates, resulting in significant improvements of statistical indicators of dispersion, distribution similarity and overall performance: relative bias is reduced on average from −1.8% and −2.3% to 0.1% and 0.3% for GHI and DNI, respectively; relative root mean square deviation is reduced on average from 17.9% and 34.9% to 14.6% and 29.8% for GHI and DNI, respectively; the distribution similarity is also improved after the site-adaptation (KSI is 3.5 and 3.9 times lower for GHI and DNI at hourly scale, respectively). The methodology is freely available as supplementary material and downloadable as R-package from SiteAdapt.


Introduction
The bankability of a solar power project requires the best possible information about the quality and reliability of the solar resource for both technical (planning, dimensioning and designing stages) and financial aspects of the project. The interaction of the Earth's atmosphere with solar radiation result in three fundamental components of solar radiation at surface level: Diffuse horizontal solar irradiance (DHI) is the terrestrial irradiance received by a horizontal surface which has been scattered or diffused by the atmosphere. Conversely, the solar radiation coming from a small solid angle of the sky, centered on the position of the sun's disk itself is the direct normal solar irradiance (DNI), which is of particular interest for concentrating solar power (CSP) and concentrating photovoltaic (CPV) technologies. The sum of the direct and diffuse components which reaches the Earth's surface both on a horizontal (global horizontal solar irradiance, GHI) and tilted (global tilted solar irradiance, GTI) planes is of major interest for non-concentrating photovoltaics (PV). Due to the significant short-term and long-term variability in solar irradiance [1][2][3], the solar resource assessment should consider solar irradiance time series-and not just mean annual averages-to understand the solar dynamics over intraday, daily and seasonal scales [4][5][6] and the interannual variability which defines the plant performance in different probability of exceedance scenarios [7][8][9][10]. These scenarios, as well as their corresponding uncertainties, are modeled to evaluate a project's ability to return the investment for these projects [10,11].
Despite the fact that the ground network of radiometric measurements is continuously growing, it is still incapable to represent in detail the spatial variability of the solar radiation that reaches the Earth's surface and, consequently, long-term (multi-decadal) solar irradiance ground measurements at the site selected for a solar plant are usually not available. For this reason, solar irradiance modeled time series are typically used for solar resource assessments that have been derived mainly through numeric weather prediction (NWP) models and satellite-based models. In NWP models, solar radiation has not traditionally been treated as a priority as it has been mainly used to force the average Earth's surface energy balance. This situation, influenced by the traditional lack of major stakeholders, is changing in recent years in which substantial investments are now being made in this field due to the increasing demand for operational and improved solar forecasts. Notwithstanding, NWP estimates are not as accurate as those derived from satellite-based models in general, which have evolved during the last 30 years (from simple methods based on a crude atmospheric energy balance [12,13] to current comprehensive models [14]). In fact, currently, satellite-based models is the most common approach carried out in solar resource mapping and solar irradiance time series generation [15]. Notwithstanding, significant differences between satellite-based modeled solar irradiance and ground measured data may result, as shown by various comparative and assessment works, especially in DNI [16,17]. In this regard, there are three important potential sources of error associated with the satellite-based solar irradiance modeling: • the treatment of the effect of clouds in solar radiation attenuation; • the irradiance modeling under clear sky conditions; • the area integrated by the satellite pixel leads to inaccuracies mainly during intermittent cloudy weather and changing aerosol load, which is the so-called "nugget effect" [18]; • terrain effects and the high reflective albedo of deserts and snow The effects of these potential error sources translate into bias and disagreement of frequency distribution functions, which introduces uncertainty in the solar resource assessment.
To overcome the inaccuracies introduced by these error sources, a common solution is to post-process the modeled data with coincident and quality assured ground measurements, in order to understand the source of discrepancy and subsequently to improve the accuracy of the resulting time series. In particular, for reliable solar resource assessments at a specific location, the accuracy of satellite-based solar irradiance series is typically enhanced by a short-term ground measurement campaign at the site [17,19,20]. In order to achieve an acceptable small risk level in solar projects, financial institutions typically require the disposal of a coincident period of (at least) one year between solar irradiance measurements and models for the application of this procedure. This procedure has received different names elsewhere [21]: site adaptation, local adaptation, calibration, data fusion, dataset merging, measured record extension, Measure-Correlate-Predict (MCP, which is typically used in wind resource assessments). All these approaches use ground measured and modeled observations during overlapping time periods to decrease the uncertainty in the local solar resource improving the satellite-derived irradiance data (by lowering their random errors and most importantly, reducing their bias). Some of them rely on the adaptation of modeled GHI and DNI values, whether by means of simple correction of bias [4] or focusing on the empirical cumulative distribution function (ECDF) [22,23], while other are based on the adaptation of the parameters or inputs to the solar model [24]. Notwithstanding, there is no standard procedure that can be universally applied to improve the accuracy of any solar irradiance modeled data against a coincident short-term measurement campaign at a given site. In this respect, it is necessary to develop universal procedures that make use of the specific techniques for Remote Sens. 2020, 12, 2127 3 of 17 each type of data and that have no need to modify the parameters of the radiation model. Thus, it should be applicable to any database (i.e., satellite or NWP based, among others).
This work was developed in the framework of Task 16 (solar resource for high penetration and large scale applications) of the photovoltaic power systems program (PVPS) of the International Energy Agency (IEA) and Task V IEA-SolarPACES entitled "Solar Resource for High Penetration and Large Scale Applications" [25], where a benchmarking and review of techniques for site-adaptation of modeled solar irradiance series was carried out [26]. The knowledge gained from this benchmarking was oriented to develop a tailor-made site-adaptation procedure that is universally applicable. In this scenario, this article presents a new correction for reducing bias and other statistical indicators in modeled solar irradiance series with coincident and quality assured surface observations. Unlike many studies available in the literature, the novelty of this work stems from (1) the universal applicability of the methodology as variables with greater impact as well as their interactions are analyzed at the site under study, (2) the sequential application of two complementary procedures to provide better performance according to indicators of different nature (such as bias, distribution or dispersion), (3) the accessibility of the methodology, that is freely available as supplementary material and downloadable as R-package from https://github.com/cfernandezcener/SiteAdapt to facilitate its general use in solar resource assessments.

Solar Radiation Data
For the preparation of this work, an extensive database containing quality assured (according to BSRN standards [27]) solar irradiance measurements from different networks and coincident modeled series from different providers was used. Solar measured data were hourly averaged to be coincident with the corresponding modeled series. Only daytime observations between sunrise and sunset are taken into account. In particular, time intervals (e.g., hour or 15 min periods) where the average solar elevation above the horizon (α) is above 0 • are included in the analysis. Table 1 summarizes the main characteristics of the sites selected, which are shown in the world map along with their climatic zones in Figure 1. and accurate solar radiation data. This network covers the States of Idaho, Montana, Oregon, Utah, Washington and Wyoming, with 39 stations. The data are recorded with an hourly interval before 1995 and of 5 min afterward, and the associated estimated uncertainties for daily irradiances are 2% for DNI and 5% for GHI. The radiometers are calibrated on a yearly basis with periodic on-site checks with reference radiometers. The quality of these recorded data was validated at their original time resolution (1-min for BSRN and 5-min for UO SRML), both visually and by standardized automatic procedures recommended by the BSRN [27].

Modeled Data
The satellite-based modeled solar irradiance data used in this work were extracted from the following providers: • The NSRDB (national solar radiation data base), that provides solar irradiance at a 4-km horizontal resolution for each 30-min interval from 1998 to 2016 over the United States and regions of the surrounding countries [28]. It was computed by the National Renewable Energy Laboratory's (NREL's) physical solar model (PSM) and products from the National Oceanic and Atmospheric Administration's (NOAA's) geostationary operational environmental satellite (GOES), the National Ice Center's (NIC's) interactive multisensor Measured and modeled databases used in this work are detailed in the following subsections.

Ground Measured Data
The selected stations belong to the following radiometric networks: • Baseline surface radiation network (BSRN). The BSRN is a project managed under by the World Climate Research Program (WCRP), and imposes high quality standards, both in instrumentation and maintenance. Global horizontal solar irradiation (GHI) and direct normal solar irradiation (DNI) data were used in this work. • UO SRML: The activity of the solar radiation monitoring laboratory of the University of Oregon started in 1977, with the creation of a five-station global network under the auspices of the Pacific northwest regional commission, motivated by the lack of available and accurate solar radiation data. This network covers the States of Idaho, Montana, Oregon, Utah, Washington and Wyoming, with 39 stations. The data are recorded with an hourly interval before 1995 and of 5 min afterward, and the associated estimated uncertainties for daily irradiances are 2% for DNI and 5% for GHI. The radiometers are calibrated on a yearly basis with periodic on-site checks with reference radiometers.
The quality of these recorded data was validated at their original time resolution (1-min for BSRN and 5-min for UO SRML), both visually and by standardized automatic procedures recommended by the BSRN [27].

Modeled Data
The satellite-based modeled solar irradiance data used in this work were extracted from the following providers: • The NSRDB (national solar radiation data base), that provides solar irradiance at a 4-km horizontal resolution for each 30-min interval from 1998 to 2016 over the United States and regions of the surrounding countries [28]. It was computed by the National Renewable Energy Laboratory's (NREL's) physical solar model (PSM) and products from the National Oceanic and Atmospheric Administration's (NOAA's) geostationary operational environmental satellite (GOES), the National Ice Center's (NIC's) interactive multisensor snow and ice-mapping system Remote Sens. 2020, 12, 2127 5 of 17 (IMS) and the National Aeronautics and Space Administration's (NASA's) moderate resolution imaging spectroradiometer (MODIS) and modern era retrospective analysis for research and applications, version 2 (MERRA-2).

•
The CAMS-RAD database, provided by the atmosphere service of Copernicus, the European program for the establishment of a European capacity for earth observation with respect to land, marine and atmosphere monitoring, emergency management, security and climate change [29]. This database was calculated by means of the Heliosat-4 method [15], a fully physical model using a fast, but still accurate approximation of radiative transfer modeling. It is composed of two models based on abaci, also called lookup tables: the McClear [30] model calculating the irradiance under cloud-free conditions and the McCloud model calculating the extinction of irradiance due to clouds. Both were constructed by using the LibRadtran [31] radiative transfer model. The main inputs to Heliosat-4 are aerosol properties, total column water vapor and ozone content as provided by the CAMS global services every 3 h. Cloud properties are derived from images of the Meteosat Second Generation (MSG) satellites in their 15 min temporal resolution using an adapted APOLLO (AVHRR processing scheme over clouds, land and ocean) scheme.

The SiteAdapt Procedure
The strategy of dividing data into types of days reduces the dispersion between adapted and recorded data as opposed to the division into clear and cloudy moments, which improves the similarity between the adapted and measured distributions [26]. To reduce both the dispersion and improve the distribution of adapted data, the procedure designed relies on the sequential application of two complementary procedures of a different nature, which are applied separately to clear and non-clear data: a preprocessing consisting on a site-specific regression (Section 2.2.1) followed by a bias correction based on empirical quantile mapping (eQM) (Section 2.2.2). A regression-based preprocessing reduces both the dispersion and bias between measured and modeled solar data, favoring a more efficient adaptation. It is important to apply the procedures in this order because otherwise, the results (especially in DNI) are not good. Finally, a postprocessing is carried out to avoid inconsistencies in adapted data (Section 2.2.3).

Preprocessing
A multi-linear regression is carried out on solar modeled data using the variables that have been shown to have the greatest influence. To distinguish if the added values of the variables are independent from each other, and to select the best set of variables and their interactions at each site, this step is applied by means of the R-package 'glmulti' version 1.0.7.1 [32], which allows computing, comparing, and ranking an exhaustive list of generalized linear models (glm) with the smallest Akaike information criterion (AIC). This preprocessing is applied separately to solar data under clear and cloudy skies (which are defined by modeled data in comparison with a clear sky model [33]) and it is based on multiple linear regressions using the following variables for the different solar components: • GHI preprocessing. It relies on the regression of the measured clearness index (kt, the ratio of GHI to top-of-atmosphere solar irradiance on the same plane) on the best combination of the following variables: α; kt of modeled series; relative air mass (m, based on Kasten formulation [34]); modeled clear-sky index (kc, the ratio between modeled GHI and its corresponding value under clear-sky conditions, calculated through the McClear model [30]). • DHI preprocessing: it relies on the regression of the ratio DHI to GHI (kd, diffuse ratio) of measured data on the following variables: m; Kc; α; kt m designed to diminish the solar zenith angle dependence of kt by normalizing it with respect to a standard clear-sky GHI profile, normalized to 1 for a relative air mass of 1 [35] Equation (1): Remote Sens. 2020, 12, 2127 6 of 17 • DNI preprocessing: it is calculated from both preprocessed GHI and DHI by the closure Equation (2): where θ is the solar zenith angle expressed in radians. It was found that using a fourth order polynomial of kt m provides better results than simply using kt m .
The outputs of this preprocessing are used as inputs for the next step (Section 2.2.2).

Empirical Quantile Mapping (eQM) Correction
A correction based on eQM is applied on solar preprocessed data (from Section 2.2.1 procedure) to modify their distribution so that it matches the distribution of the measured coincident dataset [36,37]. The correction for quantile 50 is schematized in Figure 2, where the modeled ECDF (shown in blue) must match the measured ECDF (red) by adding the difference between them (in purple).
Remote Sens. 2020, 12, x FOR PEER REVIEW 6 of 18 • DNI preprocessing: it is calculated from both preprocessed GHI and DHI by the closure Equation (2): Where θ is the solar zenith angle expressed in radians. It was found that using a fourth order polynomial of ktm provides better results than simply using ktm.
The outputs of this preprocessing are used as inputs for the next step (2.2.2).

Empirical Quantile Mapping (eQM) Correction
A correction based on eQM is applied on solar preprocessed data (from 2.2.1 procedure) to modify their distribution so that it matches the distribution of the measured coincident dataset [36,37]. The correction for quantile 50 is schematized in Figure 2, where the modeled ECDF (shown in blue) must match the measured ECDF (red) by adding the difference between them (in purple). This procedure transforms the modeled data, χm, to a probability domain and subsequently applies the inverse transformation using the cumulative distribution function (CDF) of the observational data to obtain the corrected data, χadapted [38] Equation (3): Where CDFmeas and CDFmodel are the cumulative distribution functions of the observed and modeled data, respectively. This procedure is carried out employing the R-package 'hyfo' (hydrology and climate forecasting) version 1.4.0 [39] and it is applied separately to solar data under clear and cloudy skies, which are defined by modeled time series as in Section 2.2.1.

Postprocessing
The application of the previous procedures (2.2.1 and 2.2.2) to raw modeled database may result in some erroneous values, which are discarded from the adapted datasets and replaced with the raw modeled values (model is assumed to be consistent). These erroneous values were defined as follows: • Inconsistencies between GHI, DNI and DHI, detected from the ratio and the conditions shown in Table 2, extracted from the BSRN recommended quality check tests [27].  This procedure transforms the modeled data, χ m , to a probability domain and subsequently applies the inverse transformation using the cumulative distribution function (CDF) of the observational data to obtain the corrected data, χ adapted [38] Equation (3): where CDF meas and CDF model are the cumulative distribution functions of the observed and modeled data, respectively. This procedure is carried out employing the R-package 'hyfo' (hydrology and climate forecasting) version 1.4.0 [39] and it is applied separately to solar data under clear and cloudy skies, which are defined by modeled time series as in Section 2.2.1.

Postprocessing
The application of the previous procedures (Sections 2.2.1 and 2.2.2) to raw modeled database may result in some erroneous values, which are discarded from the adapted datasets and replaced with the raw modeled values (model is assumed to be consistent). These erroneous values were defined as follows: • Inconsistencies between GHI, DNI and DHI, detected from the ratio and the conditions shown in Table 2, extracted from the BSRN recommended quality check tests [27].
Finally, it is worth mentioning that the SiteAdapt procedure is designed to the solar data at α > 1 • , whereas for α ≤ 1 • , the raw modeled values are maintained.

Statistical Indicators
A number of validation methodologies and statistical performance indicators are used in the present study for validating the modeled solar radiation data in a comprehensive way. Specifically, three main classes of statistical indicators were selected: • Class A: relative mean bias difference (relbias); relative mean absolute difference (relMAD); relative root mean square difference (relRMSD). These indicators quantify the dispersion (or "error") of individual points (their value would be 0 for a perfect model); • Class B: Nash-Sutcliffe's efficiency (NSE, varying between −∞ and 1) [40]; Willmott's index of agreement (WIA, varying between 0 and 1) [41]. These indicators quantify overall performance (a higher value indicates a better model); • Class C: Kolmogorov-Smirnov test Integral (KSI) and OVER [42]. These indicators quantify the distribution similarity (a lower value indicates better distributions similarity). KSI estimates the area between the CDFs of the datasets to be compared and OVER is also an estimate of the area between the CDFs, but only for the parts where a critical value distance is exceeded.
In addition, a combination of some of these indicators is used for assessing the validity of the proposed methodology, which will be referred to as the combined performance index (CPI). This indicator combines conventional information about dispersion and bias with information about distribution likeness and is defined as follows Equation (4): In order to make the comparison homogeneous between sites with different number of years available, the KSI and OVER were calculated for each year and averaged.
The main advantage of this statistical indicator is its high degree of discrimination between different models and thus it is recommended in the cases where only one statistical indicator had to be selected [43]. More details on the statistical indicators used hereafter can be found in a review given by Gueymard [44]. All statistical analyses were performed with the R-package 'hydroGOF' version 0.3-10 [45].

Case Study
Emulating a typical situation in solar resource assessment studies, one year of coincident measured and modeled data was used for the calibration of the model; this calibration was applied to the rest of the period. By doing this, a period of n years with coincident measured and modeled data will provide n site-adapted series with data from other years at the site. Consequently, the statistical indicators used to test the site-adaptation procedure at each site will be quantified by the average of the abovementioned n cases. In addition, the IQR (interquartile range, 25 th to 75 th percentiles) is calculated for the same cases, to assess in detail both the performance and robustness of the site-adaptation methodology.

Preprocessing
The best set of variables (and their interactions) for constructing the multilinear-regression used as preprocessing can be shown to be site-dependent, and also year-dependent (i.e., each year used to calibrate provides different coefficients of the variables selected).
To illustrate this, Table 3 shows the coefficients for the multilinear-regression of measured kt. In order to compare the value of these coefficients between different locations, and also between different periods, we show three different locations (DAA, GOB and SBO) and periods (either the entire period available or a single year). Each regression shown in Table 3 is selected among 1050 models generated from the selected variables and their combinations (the sign "-" refers to the fact that the variable in question is not used for the optimal regression). The regressions found for each site (and period) vary from one location to another, using each one a different set of variables.  It is interesting to note the different model predictor's values in the different years analyzed at SBO site: the use of different single years to develop the site-adaptation models leads to slight differences between them, which in several cases require different predictors. Notwithstanding, there is a high correlation between the parameters found at different years: the coefficient of correlation (R-squared or R 2 ) between parameters found in 2007, 2009 and 2010 is >0.99 and also the R 2 of the parameters found at these years with respect to all period is >0.99. On the contrary, the R 2 of the parameters found in 2008 show lower R 2 values with respect to individual years (R 2 = 0.92) and with all periods (R 2 = 0.96). This suggests that both the model and its correction do not have a homogeneous behavior: although the model can be corrected well with most years, there may be specific years whose behavior is especially different from the others in view of the correlation results obtained (however, this difference is slight). It is also interesting to highlight the stability of the kt predictor value in all cases analyzed at SBO site: The average of their values found for different years (1.8211) is very close to the one found using all period available (1.8366). In addition, their values found at different years are close between them (their standard deviation is 3.7% of their average). The combination of kt with Kc and with m also shows stability between the different cases analyzed.

Site-Adaptation Performance
To illustrate the improvement of Class A metrics (regarding the dispersion or error of individual points), the scatterplots at GOB site between modeled (left) and adapted (right) DNI are shown in Figure 3, where the slope one line is shown in purple. The modeled scatterplot shows a marked overestimation with respect to measured data, as reflected in the accumulation of points above the slope one line (higher data concentration is represented by yellow and red colors). Conversely, the site-adapted scatterplot (Figure 3, right) is placed symmetrically around the slope one line, with the higher data concentration (shown in red) located on this line. There is also an appreciable reduction in dispersion between adapted and measured values with respect to the modeled ones, as reflected by the narrower yellow and red areas.
Remote Sens. 2020, 12, x FOR PEER REVIEW 9 of 18 Conversely, the site-adapted scatterplot (Figure 3, right) is placed symmetrically around the slope one line, with the higher data concentration (shown in red) located on this line. There is also an appreciable reduction in dispersion between adapted and measured values with respect to the modeled ones, as reflected by the narrower yellow and red areas.  Table 4 illustrates the statistical indicators representing the accuracy of the modeled and adapted solar irradiance time series with respect to the measured ones, averaged over all sites analyzed. The results indicate that the modeled GHI time series show, in general, a better performance than DNI ones, whereas the site-adaptation procedure increases the accuracy in general. It is worth mentioning that the preprocessing (Section 2.2.1) in the site-adaptation procedure reduces both Class A (dispersion) and Class C (distribution similarity) indicators, whereas Class B (overall performance) ones are not affected by this preprocessing. For example, the application of the preprocessing reduces the relRMSD of DNI (averaged on all sites analyzed) by 4.5%: without preprocessing in the site-adaptation, the DNI adapted time series shows on average a relRMSD of 34.3% (close to the modeled one, 34.9%), whereas the application of the preprocessing reduces this value to 29.8%. Similarly, the application of the preprocessing reduces the KSI of DNI and GHI (averaged on all sites analyzed) by 10.1% and 3.8%, respectively, with respect to the same adaptation without preprocessing.  Figure 4 shows bar graphs of the statistical indicators calculated at the analyzed sites, both for modeled (light red) and site-adapted (purple) GHI, showing the IQR of the latter as error bars. A large variation in the performance of modeled series is observed, due to the different approaches in the modeling and diverse site characteristics. Class A metrics reveal that site-adapted GHI is less  Table 4 illustrates the statistical indicators representing the accuracy of the modeled and adapted solar irradiance time series with respect to the measured ones, averaged over all sites analyzed. The results indicate that the modeled GHI time series show, in general, a better performance than DNI ones, whereas the site-adaptation procedure increases the accuracy in general. It is worth mentioning that the preprocessing (Section 2.2.1) in the site-adaptation procedure reduces both Class A (dispersion) and Class C (distribution similarity) indicators, whereas Class B (overall performance) ones are not affected by this preprocessing. For example, the application of the preprocessing reduces the relRMSD of DNI (averaged on all sites analyzed) by 4.5%: without preprocessing in the site-adaptation, the DNI adapted time series shows on average a relRMSD of 34.3% (close to the modeled one, 34.9%), whereas the application of the preprocessing reduces this value to 29.8%. Similarly, the application of the preprocessing reduces the KSI of DNI and GHI (averaged on all sites analyzed) by 10.1% and 3.8%, respectively, with respect to the same adaptation without preprocessing.  Figure 4 shows bar graphs of the statistical indicators calculated at the analyzed sites, both for modeled (light red) and site-adapted (purple) GHI, showing the IQR of the latter as error bars. A large variation in the performance of modeled series is observed, due to the different approaches in the modeling and diverse site characteristics. Class A metrics reveal that site-adapted GHI is less dispersed with respect to measured GHI than the modeled one. Modeled GHI relbias values are~1.8% on average, which is reduced after applying the adaptation procedure to~0.1%. Likewise, a slight reduction is noticed both in relMAD and relRMSD after the site-adaptation procedure: from 11.3% to 8.7% and from 17.9% to 14.6%, respectively. It is worth mentioning the low IQR values found for these parameters. Overall, performance statistical indicators (Class B) show high values in modeled GHI, being slightly improved after the site-adaptation procedure. For example, NSE is increased (on average) from 0.91 to 0.94, whereas WIA remains similar (and close to 1 in all cases). In this case, again, IQR values are low. Finally, the distribution similarity indicators (Class C) are markedly improved after the application of the site-adaptation: KSI is reduced on average by a factor of 3.3 (from 76.8% to 23.1%), whereas OVER is markedly reduced: from 17.3% to~0.7% (on average). IQR of both KSI and OVER of site-adapted GHI have low values, typically 5% of their corresponding modeled values. Finally, CPI values are reduced by a factor of 2.5, from 32.5% to 13.3% (on average), with also a low IQR (1.2%).   Figure 5 shows bar graphs of the statistical indicators calculated at the analyzed sites, both for modeled (blue) and site-adapted (purple) DNI, showing the IQR of the latter as error bars. Class A metrics reveal a high dispersion in some of the modeled DNI (relbias ranges from −15.1% to 15.1%, found at PAY and SBO sites, respectively), that markedly decreases after the site-adaptation (0.3% on average). On the other hand, both relMAD and relRMSD of site-adapted DNI are reduced by 5.1% (from 24.5% to 19.2% and from 34.9% to 29.8% on average), respectively, with low IQR values (below 0.5% on average in both cases). Overall, performance statistical indicators (Class B) show a variety of values in modeled DNI, lower than those obtained for GHI. In this case, again, the site-adaptation procedure results in an improvement of these statistical indicators: NSE is increased from 0.64 to 0.78 (on average), whereas WIA is increased from 0.90 to 0.94 (on average). In this regard, the site-adaptation of DNI shows a more substantial improvement with respect to the adaptation of GHI. The frequency distribution similarity is markedly improved after the application of the site-adaptation, as deduced by Class C metrics: mean KSI of modeled DNI is 222.2%, being reduced on average to 57.6% (a factor of 3.9), whereas OVER is reduced on average by a factor of 9.6 (from 141.0% to 12.1%). IQR of both KSI (31%) and OVER of site-adapted DNI have low values, typically 9% of their corresponding modeled values.  Figure 5 shows bar graphs of the statistical indicators calculated at the analyzed sites, both for modeled (blue) and site-adapted (purple) DNI, showing the IQR of the latter as error bars. Class A metrics reveal a high dispersion in some of the modeled DNI (relbias ranges from −15.1% to 15.1%, found at PAY and SBO sites, respectively), that markedly decreases after the site-adaptation (0.3% on average). On the other hand, both relMAD and relRMSD of site-adapted DNI are reduced by 5.1% (from 24.5% to 19.2% and from 34.9% to 29.8% on average), respectively, with low IQR values (below 0.5% on average in both cases). Overall, performance statistical indicators (Class B) show a variety of values in modeled DNI, lower than those obtained for GHI. In this case, again, the site-adaptation procedure results in an improvement of these statistical indicators: NSE is increased from 0.64 to 0.78 (on average), whereas WIA is increased from 0.90 to 0.94 (on average). In this regard, the site-adaptation of DNI shows a more substantial improvement with respect to the adaptation of GHI. The frequency distribution similarity is markedly improved after the application of the site-adaptation, as deduced by Class C metrics: mean KSI of modeled DNI is 222.2%, being reduced on average to 57.6% (a factor of 3.9), whereas OVER is reduced on average by a factor of 9.6 (from 141.0% to 12.1%). IQR of both KSI (31%) and OVER of site-adapted DNI have low values, typically 9% of their corresponding modeled values. To illustrate the distribution similarity performance of site-adaptation, modeled and site-adapted GHI (left) and DNI (right) ECDFs at GOB site are shown in Figure 6. It is worth mentioning the similarity between measured (red, left graph) and raw modeled (blue, left graph) GHI ECDFs, which is corrected after the adaptation procedure (adapted, in purple, and measured GHI ECDFs are indistinguishable). On the other hand, raw modeled DNI ECDF (blue, right graph) is far from the measured one (red, right graph), especially above 300 W/m 2 , which is markedly corrected after the adaptation procedure (purple, right graph).

Prediction of Site-Adaptation Performance
In solar resource assessment studies, it is desirable to estimate the performance of the site-adaptation procedures since weather is not a linear system and solar irradiance is influenced by To illustrate the distribution similarity performance of site-adaptation, modeled and site-adapted GHI (left) and DNI (right) ECDFs at GOB site are shown in Figure 6. It is worth mentioning the similarity between measured (red, left graph) and raw modeled (blue, left graph) GHI ECDFs, which is corrected after the adaptation procedure (adapted, in purple, and measured GHI ECDFs are indistinguishable). On the other hand, raw modeled DNI ECDF (blue, right graph) is far from the measured one (red, right graph), especially above 300 W/m 2 , which is markedly corrected after the adaptation procedure (purple, right graph). To illustrate the distribution similarity performance of site-adaptation, modeled and site-adapted GHI (left) and DNI (right) ECDFs at GOB site are shown in Figure 6. It is worth mentioning the similarity between measured (red, left graph) and raw modeled (blue, left graph) GHI ECDFs, which is corrected after the adaptation procedure (adapted, in purple, and measured GHI ECDFs are indistinguishable). On the other hand, raw modeled DNI ECDF (blue, right graph) is far from the measured one (red, right graph), especially above 300 W/m 2 , which is markedly corrected after the adaptation procedure (purple, right graph).

Prediction of Site-Adaptation Performance
In solar resource assessment studies, it is desirable to estimate the performance of the site-adaptation procedures since weather is not a linear system and solar irradiance is influenced by

Prediction of Site-Adaptation Performance
In solar resource assessment studies, it is desirable to estimate the performance of the site-adaptation procedures since weather is not a linear system and solar irradiance is influenced by irregular oscillations in weather patterns like ENSO (El Niño, Southern Oscillation) and NAO (North Atlantic Oscillation), aerosols dynamics, anthropogenic aerosols emissions or volcano eruptions. In the previous results, we have seen how aerosols have a higher impact in solar irradiance for year 2008 in GOV site as we can see when we compare the parameter for kc with other parameters and its value in other years. In order to guarantee that the validation results are not biased by an overfitting of the measured data used in the calculation of the model parameters, training and validation should be done with independent data. Usually, the measured dataset is divided in equal percentages or 80% for training and 20% for validation.
To predict the performance of the site-adaptation procedure in a long time series with only one year of measurements, we have used the first fifteen days of each month as a single dataset for defining the site-adaptation procedure, and we have applied this adaptation to the whole year. By repeating this process for all years available at each location analyzed and comparing with the actual performance of the adaptation in the long time series, we can assess the validity of this prediction. Figure 7 shows the prediction of the site-adaptation performance of DNI using only 1 year of coincident measured and modeled data with respect to the actual performance (which is calculated using the whole period available). Vertical bars represent the IQR of the statistical indicators at each site, as every year available is analyzed separately.
Remote Sens. 2020, 12, x FOR PEER REVIEW 12 of 18 irregular oscillations in weather patterns like ENSO (El Niño, Southern Oscillation) and NAO (North Atlantic Oscillation), aerosols dynamics, anthropogenic aerosols emissions or volcano eruptions. In the previous results, we have seen how aerosols have a higher impact in solar irradiance for year 2008 in GOV site as we can see when we compare the parameter for kc with other parameters and its value in other years. In order to guarantee that the validation results are not biased by an overfitting of the measured data used in the calculation of the model parameters, training and validation should be done with independent data. Usually, the measured dataset is divided in equal percentages or 80% for training and 20% for validation.
To predict the performance of the site-adaptation procedure in a long time series with only one year of measurements, we have used the first fifteen days of each month as a single dataset for defining the site-adaptation procedure, and we have applied this adaptation to the whole year. By repeating this process for all years available at each location analyzed and comparing with the actual performance of the adaptation in the long time series, we can assess the validity of this prediction. Figure 7 shows the prediction of the site-adaptation performance of DNI using only 1 year of coincident measured and modeled data with respect to the actual performance (which is calculated using the whole period available). Vertical bars represent the IQR of the statistical indicators at each site, as every year available is analyzed separately.  Table 5 shows the slope of the linear fit between the statistical indicators of all sites analyzed calculated with 1 year (of measured and modeled data) and their actual values (calculated in the whole period available), along with the corresponding R 2 values. It is worth highlighting an almost perfect match of R 2 , NSE and WIA for GHI and DNI, with slopes between 0.986 and 0.999 and R 2 = 1.000. Similarly, relMAD and relRMSD show an R 2 > 0.999, but with slightly higher slopes (in this case, prediction underestimates actual values by ~4% or less). KSI is almost exactly predicted for DHI, but it is underestimated for GHI and DNI with predictable behavior, reflected in R 2 of 0.951 for GHI and 0.905 for DNI. CPI is also underestimated for GHI, DNI and DHI (ranging from 31% to 46%), but also with great predictability (with R 2 between 0.991 and 0.991). Finally, both OVER and  Table 5 shows the slope of the linear fit between the statistical indicators of all sites analyzed calculated with 1 year (of measured and modeled data) and their actual values (calculated in the whole period available), along with the corresponding R 2 values. It is worth highlighting an almost perfect match of R 2 , NSE and WIA for GHI and DNI, with slopes between 0.986 and 0.999 and R 2 = 1.000.
Similarly, relMAD and relRMSD show an R 2 > 0.999, but with slightly higher slopes (in this case, prediction underestimates actual values by~4% or less). KSI is almost exactly predicted for DHI, but it is underestimated for GHI and DNI with predictable behavior, reflected in R 2 of 0.951 for GHI and 0.905 for DNI. CPI is also underestimated for GHI, DNI and DHI (ranging from 31% to 46%), but also with great predictability (with R 2 between 0.991 and 0.991). Finally, both OVER and relbias are not well predicted for both GHI and DNI (R 2 < 0.207), for different reasons: adapted relbias are usually <1% in absolute value, but with great variability within its range, and OVER predicted with a single year of measurements provides low values (<3%), whereas their actual values are up to 40%.

Discussion
The adaptation of modeled solar irradiance time series with coincident high−quality short-term ground measurements has become a standard practice of the industry, being typically requested by financial institutions in the detailed solar resource assessments of solar projects. This practice mitigates the risk of financing solar projects, enhancing the adequate solar plant design and reducing the uncertainty of its yield estimates. Unfortunately, there are no standard procedures for applying this adaptation. The dominating parameter in risk assessments of solar power projects is relbias, while the solar irradiance distribution has implications for the estimations of energy yield. For example, in the case of Concentrating Solar Power plants, their operation doesn't start if DNI values are lower than a certain threshold early in the morning and heat in the solar field and storage is not enough to move the turbine [46]. In addition, the solar field is built to operate with DNI levels up to a maximum value, which represents approximately 90% of the cumulative distribution function (considering the nonzero values of DNI weighted by the cosine of the incident angle), to optimize the cost and energy production [47]. Consequently, exceeding this maximum DNI value may lead to discard energy to avoid system overloads.
In general, the model's performance is worse for DNI than for GHI: averaged relMAD and relRMSD values of DNI are double those of GHI, the averaged KSI value of DNI is 2.9 times higher than the corresponding GHI value, whereas the averaged CPI value of DNI is~3.3 times higher than the averaged CPI value of GHI. This behavior is caused by the higher sensitivity of DNI to the variability of clouds, aerosols and water vapor (and other atmospheric constituents) than GHI. The different sources of error of satellite-based solar radiation modeling result in the occurrence of systematic and random deviations with respect to the ground measurements. Those deviations with a systematic nature can be reduced by site-adaptation, as shown in this work. In particular, the modeled solar irradiance errors at desert climate sites, although of high magnitude (DNI relbias of −15.1% and −10.7% at SBO and GOB, respectively), appear to be predictable and correctable: a trend of the relbias at desert climate sites have a deviation of the same sign in every year at a given site, enabling the predictability of the error (maybe reflecting persistent inaccuracies in clear sky conditions assessment) and thus facilitating its proper adaptation. It is worth highlighting that, after site-adaptation, the results found in this work for DNI are qualitatively similar to those found for GHI, but with higher magnitudes and dispersion.
The site-adaptation procedure proposed relies on the eQM (Section 2.2.2), based on the modification of modeled distributions so that it matches the distribution of the measured coincident dataset. This procedure has shown considerable improvement of both relbias and KSI, but it lacks the reduction of other dispersion indicators (such as relMAD or relRMSD) that provides regression-based approaches [26]. A regression-based preprocessing is useful for preparing the dataset for a more efficient eQM-based site-adaptation: it has shown a significant improvement in both Class A (dispersion) and Class C (distribution similarity) indicators.
The prediction of site-adaptation performance with one year of measured and modeled overlapped data allows for the uncertainty estimation of the long term adapted time series for most of the statistical indicators analyzed. A better performance is found if the adaptation procedure is applied to the year of measurements (using only a half year for calibrating) rather than to the long term period (using 1 year for calibrating), as deduced by their CPI values (Section 3.3). This result is expected, as meteorological conditions of data to be adapted are well represented in the calibrating dataset in the case of the adaptation applied to the year of measurements. Notwithstanding, for the prediction of the performance of both GHI and DNI, several parameters (relMAD, relRMSD, R 2 , NSE and WIA) are almost perfectly correlated (R 2 > 0.999), others (KSI, CPI) are highly correlated (R 2 > 0.9) and the rest (relbias and OVER) in general show low values after site-adaptation (<0.5% and <15%, respectively), allowing for an accurate description of the site-adaptation procedure with only one year of overlapped measurements and models.

Conclusions
The methodology presented in this work for the site-adaptation of modeled solar irradiance has shown significant improvement in terms of dispersion, distribution and overall performance in comparison with measured data. This procedure provides reliable historical time series of solar irradiance at a given site, which can be used to reduce uncertainties in the development of solar projects. In addition, the performance of the adapted solar data can be estimated by using just one year of measurements as we have demonstrated when the validation was done with a set of data independent from the data used for the calculation of the parameters of the models. In general, the method proposed is very effective to reduce below 1% the relbias and to improve the distribution of modeled values (as shown by Class C metrics), but to a lesser extent, the deviations dispersion defined by relMAD and relRMSD.
As future work and on the basis of this approach, one could treat systematic errors differently from occasional ones (such as forest fires or extreme sand storms). Other future works can focus on more detailed separation of solar irradiance data through the consideration of coupled GHI and DNI values [48], as well as more sophisticated techniques to adapt them. In addition, the relative weight and homogeneity of some model predictor's values along with the search of scaled distributions derived from solar irradiance which are similar for different sites [49,50] could lead to extending the approach presented in this work to regional (not only site) adaptation at high-frequency time scales.
Finally, to encourage other researchers to reuse and improve our algorithm, this methodology, along with a brief tutorial, is freely available as an R-package downloadable from https://github.com/ cfernandezcener/SiteAdapt.