Statistical Modeling of Spatio-Temporal Variability in Monthly Average Daily Solar Radiation over Turkey.

Though one of the most significant driving forces behind ecological processes such as biogeochemical cycles and energy flows, solar radiation data are limited or non-existent by conventional ground-based measurements, and thus, often estimated from other meteorological data through (geo)statistical models. In this study, spatial and temporal patterns of monthly average daily solar radiation on a horizontal surface at the ground level were quantified using 130 climate stations for the entire Turkey and its conventionally-accepted seven geographical regions through multiple linear regression (MLR) models as a function of latitude, longitude, altitude, aspect, distance to sea; minimum, maximum and mean air temperature and relative humidity, soil temperature, cloudiness, precipitation, pan evapotranspiration, day length, maximum possible sunshine duration, monthly average daily extraterrestrial solar radiation, and time (month), and universal kriging method. The resulting 20 regional best-fit MLR models (three MLR models for each region) based on parameterization datasets had R2adj values of 91.5% for the Central Anatolia region to 98.0% for the Southeast Anatolia region. Validation of the best-fit MLR models for each region led to R2 values of 87.7% for the Mediterranean region to 98.5% for the Southeast Anatolia region. The best-fit anisotropic semi-variogram models for universal kriging as a result of one-leave-out cross-validation gave rise to R2 values of 10.9% in July to 52.4% in November. Surface maps of monthly average daily solar radiation were generated over Turkey, with a grid resolution of 500 m × 500 m.


Introduction
Solar radiation is one of the most significant driving variables that trigger changes in ecological processes such as biogeochemical cycles and energy flows [1][2][3]. The rate of total (both direct and diffuse) incoming solar energy on a horizontal plane at the earth's surface is referred to as global solar radiation and mathematically expressed as follows: where SR g : global radiation on a horizontal surface; SR d : diffuse radiation; SR db: direct beam radiation on a surface perpendicular to the direct beam; and z: Sun's zenith angle. Direct solar radiation is usually measured by a pyrheliometer, while global and diffuse solar radiation is measured by groundbased pyranometers [4]. However, solar radiation data are often estimated from statistical models, and remotely-sensed data for areas where there are limited or non-existent conventional ground-based measurements. Satellite-derived solar radiation data provide coverage over large regions of 100 to 10,000 km 2 , with relatively long time intervals and are generally derived from such sensors as the geostationary Earth radiation budget satellites (GERBS), the geostationary operational environmental satellites (GOES), geostationary meteorological satellites (GMS), and NOAA-AVHRR (Advanced Very High Resolution Radiometer) [5][6][7][8][9][10]. Ground-based observation data are one point-measured data for relatively short time intervals. (Geo)statistical models can produce a reliable solar radiation database at the local-to-global scales for a given spatio-temporal range from a single variable (e.g. day length) or multiple variables (e.g. elevation, temperature, and evapotranspiration) [2,4,[11][12][13][14].
This study aims at national and regional quantifications of spatial and temporal patterns of monthly average daily solar radiation on a horizontal surface at the ground level through multiple linear regression (MLR) and semi-variogram models.

Statistical Performance Indicators and Validation of National and Regional MLR Models of Daily Solar Radiation
The MLR models were based on five geographical variables of latitude (decimal degree), longitude (decimal degree), altitude (m), aspect (compass degree), and distance to sea (DtS, km); 11 monthlyobserved climate variables of minimum, maximum and mean air temperature (T min , T max and T, o C) and relative humidity (RH min , RH max and RH, %), soil temperature (ST, o C at the depth of 0 to 5 cm), cloudiness (CLD, %), precipitation (PPT, mm), pan evapotranspiration (PET, mm), and day length (S, h); two monthly-derived climate variables of maximum possible sunshine duration (S o , h), and monthly average daily extraterrestrial solar radiation (H o , MJ m -2 day -1 ); and time (month) for the entire Turkey and its conventionally-accepted seven geographical regions (Mediterranean, Aegean Sea, Black Sea, Central Anatolia, East Anatolia, Southeast Anatolia, and Marmara). Monthly climate variables were acquired between 1968 and 2004 from 130 climate stations across Turkey through the Turkish State Meteorological Service. Based on the Jackknifing procedure, the dataset was randomly divided into independent parameterization and validation datasets, so as to make the ratio of number of climate stations of validation dataset to those of parameterization dataset equal to or greater than 25% for each region and the entire country ( Figure 1).
Through the parameterization datasets for each region and the entire country, best MLR models with site-specific explanatory variables and parameters were determined. Three optimum MLR models were recommended for each region and the entire country based on a forward stepwise selection. In forward stepwise selection, each variable that is not already in the model is tested for inclusion one at a time in the model. The most significant ones of these variables are added to the model provided that their P values ≤ 0.001 pre-set in this study. In this approach, variables once entered in the model may be dropped if they are no longer significant as other variables are added.
The degree of model accuracy, and thus, comparative performances of MLR models were quantified using the following four statistical indicators: (1) coefficient of determination (R 2 , %); (2) the adjusted coefficient of determination (R 2 adj , %); (3) the root mean square error (RMSE, MJ m -2 day -1 ); and (4) Mallows's Cp statistic [15]. The coefficient of determination (R 2 ) is the proportion of variation in a response variable explained by a regression model, while the adjusted coefficient of determination (R 2 adj ) is the coefficient of determination modified to account for the number of explanatory variables added to a model and sample size. The R 2 and R 2 adj are calculated as follows: where SR p , SR o , and SR m are the predicted, observed and mean values of the response variable, monthly average daily solar radiation, respectively. p is the total number of explanatory variables, and n is sample size. The RMSE reveals the level of scatter that a model produces and provides a comparison of the absolute deviation between the predicted and observed values. The lower the RMSE values are, the better a model is indicated to perform. The RMSE can be calculated as follows: Mallows's Cp statistic is mathematically expressed as follows: where SS res is the residual sum of squares for the best model with p (the number of parameters in the model) (including the intercept). MS res is the residual mean square when using all available explanatory variables. If the model fits the data well, then Cp value is expected to be approximately equal to p. Models with considerable lack-of-fit have values of Cp larger than 2p [16]. Three optimum MLR models chosen for each region and the entire country with the forward stepwise selection were tested comparing observed versus predicted values of daily solar radiation through the validation datasets. The degree of model fit between observed versus predicted values of daily solar radiation was quantified using R 2 values (%). and validation of monthly average daily solar radiation models over Turkey.

Construction and Cross-Validation of National Geo-statistical Model of Daily Solar Radiation
The surface maps of monthly average daily solar radiation were created for the entire Turkey of 780,580 km 2 with a grid resolution of 500 m x 500 m using 130 weather stations using the ArcGIS 9.1 [17]. The assumption of spatial autocorrelation for daily solar radiation data from 130 climate stations was verified by examining Moran's Index (I) values and their statistical significance as an indicator of the strength of correlation between observations as a function of the distance separating them [18]. The values of Moran's I range from 1 to -1 (strong positive and negative spatial autocorrelations, respectively), with 0 indicating a random pattern. To satisfy stationarity assumption prior to the spatial interpolation, trend analysis was performed to determine whether or not a global trend, an overriding process that affects all observed data in a deterministic manner, exists. Detrending was implemented by removing first order trends from all the semi-variogram models and adding back before predictions were made in order to more accurately model the random short-range variation in monthly average daily solar radiation over Turkey. Directional influences (anisotropy) detected in the spatial autocorrelation were accounted for in the semi-variogram models. Spatial interpolation was carried out using universal kriging method, and thus, a semi-variogram model that defines variance as a function of distance and direction as follows [19]: where γ(h) is the semi-variance of variable z as a function of both lag distance or separation distance (h); N(h) is the number of observation pairs of points separated by h used in each summation; and z(x k ) is the random variable at location x k . The selection of the best-fit semi-variogram model was based on the six error statistics of leaveone-out cross-validation: (1) the mean prediction error (MPE), (2) the root mean square prediction error (RMSPE), (3) the average kriging standard error (AKSE), (4) the mean standardized prediction error (MSPE), (5) the root mean square standardized prediction error (RMSSPE), and (6) R 2 as follows: where z ok is the observed value at location k, z pk is the predicted value at k through the ordinary kriging method, N is the number of pairs of observed and predicted values, and σ(k) is the prediction standard error for location k.
As an indicator of prediction errors, the MPE and MSPE values reveal the degree of bias in model predictions and should be close to zero. In the assessment of uncertainty (variability in predictions), the RMSPE and AKSE values show the precision of prediction and should be equal to one another.
Overestimation and underestimation of variability in predictions occur when the AKSE > and < the RMSPE, respectively. The RMSSPE values provide comparison of the error variance to the kriging variance and should be close to unity. Underestimation and overestimation occur when the RMSSPE values > and < unity, respectively [20].

Results and Discussion
Monthly average daily solar radiation data for each month in Turkey were revealed to follow Gaussian distribution given their histogram plots and closeness of their mean and median values in Figure 2. On average, daily solar radiation ranged from 5.8 + 1.1 MJ m -2 day -1 in December to 22.6 + 2.2 MJ m -2 day -1 in June in Turkey. Three best-fit MLR models for each geographical region of Turkey, and their validation against the independent datasets were presented in Table 1. A total of the 20 regional MLR models resulted in R 2 adj values that accounted for 91.5% of variation in the solar radiation data for the Central Anatolia region and for 98.0% for the Southeast Anatolia region. Similarly, the RMSE values of the MLR models ranged from 0.89 in the Southeast Anatolia region to 1.86 in the Central Anatolia region. The frequency of presence of the explanatory variables in the regional MLR models was found in decreasing order as follows: H o (S/S o ) (100%), PET (55%), CLD (55%), ST (45%), S (40%), RH max (25%), aspect (25%), PPT (15%), elevation (15%), T max (15%), RH (5%), RH min (5%), DtS (5%), latitude (0%), longitude (0%), mean and minimum air temperature (0%), and time (month) (0%).
Monthly PPT and T max played a significantly important role only in the MLR models of the Mediterranean and Aegean regions, respectively (P ≤ 0.001). Monthly RH, and elevation were found as the significant explanatory variables in the MLR model of the East Anatolia. Monthly RH min , and DtS appeared to be significant only in the MLR models of the Southeast and Central Anatolia regions, respectively.
Comparisons of values observed from the climate stations versus values predicted by the best-fit MLR models for each region led to R 2 values of 87.7% for the Mediterranean region to 98.5% for the Southeast Anatolia region. The validation of the regional MLR models revealed that the highest  Table 2). Validation of the national MLR models indicated that the MLR with the six explanatory variables of H o (S/S o ), CLD, RH max , elevation, aspect, and month performed best, with the R 2 value of 93.3%.
The test of the assumption for spatial autocorrelation based on Moran's I showed that there is a significantly clustered pattern for the months of January to May and August to December (P < 0.01) and for June and July (P < 0.05) ( Table 3). The degree of spatial dependence for the solar radiation data was also calculated as the ratio of nugget (c 0 ) to sill (c 0 + c), and the nugget-to-sill ratios were found to range from 54% in February to 87% in June ( Table 3). As the values of the nugget-to-sill ratio increase, spatial dependence for the data is indicated to decrease.
The global trend analysis indicated that there is an overriding trend in the solar radiation data in the south-to-north direction of Turkey ( Figure 3). The first order of trend removal was performed before the implementation of universal kriging for the solar radiation data given the plots of the global trend analysis in Figure 3. An anisotropic spherical spatial correlation model was used due to significant anisotropy or nugget effect, generally attributed to small scale variability or measurement error. A large nugget effect for the solar radiation semi-variogram models means that the local scale spatial autocorrelation (spatial dependence) among observations weakens. The nugget was high relative to the sill, thus indicating that most of the fine-scale variability was not explained by the semivariogram models. Anisotropic spherical semi-variogram models performed best for the solar radiation data, with neighbors to include (at least) = 9(5), and number of lags = 12.
The specific parameters of the best-fit anisotropic semi-variogram models for universal kriging are presented in Table 3. The degree of bias in the monthly average daily model predictions was highest for June and lowest for January and February according to the MPE and MSPE values of the spatial one-leave-out cross-validation. Variability in the monthly average daily predictions of solar radiation was overestimated for January, April, May, June, September, October, and November and underestimated for the rest of the months according to the AKSE, RMSPE and RMSSPE values.
One-leave-out cross-validation of the monthly average daily solar radiation models revealed that R 2 values for the comparisons of observed versus predicted solar radiation values ranged from 10.9% in July to 52.4% in November. Geostatistical models performed better for the months of October to March (R 2 = 37.0 to 52.4%) than for those of April to September (10.9 to 28.8%) ( Table 3). Surface maps of monthly average daily solar radiation over Turkey were generated with a grid resolution of 500 m x 500 m (Figure 4). Daily solar radiation values are projected onto the x-z (west) and y-z (north) planes as the green and blue dots of the scatter plots, respectively. Green and blue lines refer to regression lines fitted to the scatter plots on the x-z and y-z planes, respectively. All the variables except for a P < 0.01 and b P > 0.05 are significant at P ≤ 0.001; H o : monthly average daily extraterrestrial solar radiation on a horizontal surface; S: day length; S o : maximum possible sunshine duration; ST: soil temperature for a depth of 0 to 5 cm; RH max : maximum relative humidity; PET: potential evapotranspiration; PPT: precipitation; RH: relative humidity; CLD: cloudiness; T max : maximum air temperature; RH min : minimum relative humidity; DtS: distance to sea; RMSE: root mean square error; and V/P: ratio of number of stations of validation dataset to those of parameterization dataset.   In this study, (1) the most robust generic MLR models of monthly average daily solar radiation, (2) their performance for predicting temporal variation, (3) spatial distribution of the solar radiation data interpolated by universal kriging, which discerns both stochastic and deterministic components of spatial variation, (4) jackknifing validation of temporal predictions, and (5) one-leave-out crossvalidation of spatial predictions were quantified not only for the entire Turkey but also for its seven geographical regions differentiated by virtue of their specific geographical conditions, based on 130 climate stations.