1. Introduction
Solar photovoltaic device (PV) provide 40% of world’s renewable electricity capacity. This is expected to increase by 825 GW by 2021 [
1]. Photovoltaic installations in the UK stood at 11 GW as of the end of August 2016 [
2]. The economics of these systems depend on a multitude of factors, but one major element is the supposed risk associated with their performance, i.e., deviation of real performance from the predicted performance. It has been shown that in the case of modelling accuracy, one of the main determinants is the prediction of the plane-of-array irradiance [
3]. Meteorological data is usually only available as total global horizontal irradiation at specific locations and thus needs to be translated to plane-of-array (PoA) irradiance, which incurs a significant uncertainty in the overall modelling chain. PoA irradiation may be obtained from horizontal radiation in the following sequence:
- (1)
Interpolate weather station readings of global horizontal irradiation to produce a national map of values.
- (2)
Compute the solar declination angle, an essential input to the clearness index (kt). kt is a component of the next stage.
- (3)
Split global horizontal irradiation into its components: beam and diffuse irradiation.
- (4)
Convert each component from horizontal radiation to the PoA inclination and azimuth.
- (5)
Sum the results: beam on tilted surface plus diffuse on tilted surface.
- (6)
Allow for albedo.
Total tilt irradiance = beam on tilted surface plus diffuse on tilted surface plus ground-reflected.
This final figure is now appropriate for PV performance estimations.
Work has already taken place in Loughborough on Stages 3 and 4 [
4]. Now, this paper focusses on selecting the optimum models for the interpolation of, and deriving horizontal beam and diffuse irradiation from, global horizontal irradiation. The research will be limited to models suitable for hourly datasets and therefore to those which are capable of automatically processing large numbers of results.
2. Interpolation Outline
2.1. Why Interpolate?
Solar irradiation data is normally obtained from either of two sources: ground observations or satellite-derived data. Quality ground-measured data is usually accepted as more accurate than satellite-modelled data [
5]. However, satellite-modelled data is frequently quoted as being more accurate at distances greater than 34 km from a weather station, based on the work of Perez et al. [
6]. Weather stations in the UK are, on average, only 40 km apart. A 34 km area around UK weather stations covers 90% of the country. The gaps are in the Welsh and Scottish mountains which are unfit for solar installations due to slope. Satellite-based data is also less suitable for the UK because it has reduced accuracy at latitudes greater than 50°, in coastal zones and in regions with constant occurrence of clouds [
7]. Due to the good network of weather stations and limitations of satellite modelling in the specific circumstances, interpolation of ground measurements was utilised in this research.
2.2. Review of Interpolation Techniques
Interpolation is the process of filling gaps between sample observations to produce a grid of values [
8]. There are a dozen or more methods, each with a set of up to ten criteria. Prominent examples include linear regression, nearest neighbour, inverse distance weighting, spline (a polynomial function) and kriging. Confusingly, names vary between authors and computer packages. Different techniques can produce very different results [
9]. The chosen tool should be accurate, robust, flexible enough to handle large and varied datasets with input errors, computationally efficient and easy to use. No existing method can satisfy all of these conditions. For this reason, the approach here is to single out only the best known interpolation techniques. Having narrowed down the number of possibilities, several will be examined and the most suitable selected in terms of appropriateness of match to the input data. The original data is evaluated in terms of quantity, spread across the sample area and trends in direction.
Linear regression fits a straight line through the data points. It may be criticised for over-simplifying complex real-world processes. Nearest neighbour (also known as triangulation or Thiessen) obtains the value of a grid cell from the three adjacent points. It requires many data points in order to work well—a feature which meteorological data does not possess. As with the previous procedure, it lacks realism.
The inverse distance weighted (IDW) interpolation method is frequently used. It estimates values as weighted averages, allocating the greatest weights to the nearest points to produce a smooth distance decay effect. However, solar irradiance does not vary continuously. It may display sudden change because of rapid alteration in cloud cover.
Spline uses “rubber-sheeting”. A surface is constructed which passes through known points whilst minimising the overall surface curvature. Spline works well when it is acceptable for the calculated values to exceed maximum and minimum input points and when the number of sample values is comparatively small. It is not appropriate where sample points are clustered together and have extremely different readings (as can be the case with weather stations in London). In addition, spline needs a gentle data variation. As mentioned above, solar irradiance does not exhibit this characteristic.
Kriging is a complex interpolation technique. Like IDW, the closest measurements exert the greatest influence. In contrast to IDW’s simple distance-based algorithm, kriging applies weights derived from a semi-variogram. The semi-variogram is a graph which models the difference between a value at one location and the value at another location according to the distance and direction between them.
Kriging has proven useful and popular in many fields [
10]. It is utilised by the most respected solar insolation databases photovoltaic geographical information system (PVGIS-3, European Commission, Joint Research Centre) and Meteonorm (Meteotest, Bern, Switzerland)—up to Version 6) and is a good choice where the sample points are poorly distributed or there are few of them. It can also mirror directional bias in the data. (According to the UK Met Office, solar radiation increases from north to south and from east to west—Kent westwards to Wiltshire and Dorset). Furthermore, kriging has the advantage that it is a geostatistical method. This means that it encompasses autocorrelation (i.e., everything is related to everything else, but near things are more related than distant things). This statistical relationship between the measured points enables estimation errors (kriging variance) to be reckoned. All the previously discussed interpolation methods are deterministic, therefore do not include autocorrelation and cannot provide error calculations. On account of its widespread use, suitability for the data and error calculations, some form of kriging will be investigated in this research.
3. Decomposition/Separation Model Appraisal
A search of the literature has revealed a large number of models which separate beam and diffuse from global horizontal irradiation. De Miguel identified 250 such models [
11] since the ground-breaking work of Liu and Jordan [
12]. Comprehensive appraisals may be found in [
13,
14]. A representative few are reviewed here. Lui and Jordan’s early model uses a piece-wise first-order fit of the clearness index to derive the diffuse fraction. Most later models augment Lui and Jordan’s, still proceeding piece-wise by binning the data into three clearness index divisions. A common feature between all of them is that they are parameterised by local or regional observational data. Some models are suitable only for monthly data, whilst others accept all lengths of timestamp, down to seconds. Having said that, smaller time steps are known to generate larger random errors.
Separation (also known as decomposition) models may be categorised according to the number of contributory variables they demand (one, two or many). The de Miguel (Climatic Synthetic Time Series for the Mediterranean Belt (CLIMED)), Erbs, Orgill, and Reindl (No. 1) models are fitted by the measured diffuse fraction only [
11,
15,
16,
17]. Reindl’s second model additionally employs solar elevation, that is, it is bivariate [
18]. Maxwell’s Direct Insolation Simulation Code (DISC) model is more complex since it requires sun zenith angle, day of year, and average site atmospheric pressure [
19]. Perez et al. [
20] modified the DISC model by binning inputs according to sky condition. A more recent multivariate model is that of Ridley et al. [
21,
22]. Unlike prior research, this utilises a logistic function to include solar altitude and a persistence factor into just one equation, i.e., there is no binning.
Separation models are site-dependent [
23]. In this research, the aim is to determine the most suitable separation model for weather conditions in Great Britain and to discover whether the same paradigm is applicable to the UK as a whole. Yang et al. [
24] and Gueymard [
25] find that sophisticated models with several inputs do not improve upon the simpler approaches, in contrast to Gueymard’s earlier conclusion, when he discovered more detailed models offer enhanced performance [
14]. (More recent work from these authors has focused on 1-min models [
26,
27].) This lack of consensus led the current author to test models from two categories. Two univariate models (de Miguel and Erbs [
11,
15]) and one complex model (Ridley et al. [
21,
22]) were trialled.
4. Methodology
The method falls into two stages:
It allows the two parts to be performed in either order, i.e., “kriging–decomposition” or “decomposition–kriging”, for the reasons detailed below.
Separation may be performed before interpolation, so that the kriging creates countrywide maps of beam and diffuse irradiation. Alternatively, interpolation may take place first, producing UK coverage of global horizontal irradiation. One or more points may then be selected and the decomposition algorithm applied to synthesise the beam and diffuse components for those locations only. The order in which the steps are performed depends on convenience for the user and the final output required; there is little impact on the eventual accuracy. Burgess [
28] found that interpolation before separation gave a smaller mean bias but had no effect on root mean square error (
RMSE). Carrying out decomposition to begin with will furnish information on nationwide cloud cover. On the other hand, starting with interpolation can be more expedient. This is so when the user intends to proceed to the next stage and incorporate a transposition model to calculate PoA irradiation, which must obviously always be last, because tilt and orientation are site specific. Just one map of interpolated global horizontal data needs to be loaded and searched for the points of interest, rather than two maps of beam and diffuse values. Moreover, it can be useful to have the interpolation and separation procedures available as two separate packages so that beam and diffuse figures can be estimated directly from original Meteorological Office or University weather station readings. It can be helpful to have the option to use original, non-interpolated data when the place under investigation is very close to a pyranometer position. For instance, it is only 2.2 km from the site of a proposed new solar farm on the outskirts of Loughborough to the University where irradiation is recorded.
5. The Kriging Stage
This section of the paper details the data, software, models and parameters employed by this research for kriging hourly global horizontal irradiation nationally for the UK. The results are validated and displayed.
Initially the four inter-related steps involved in kriging are elaborated: (1) calculate the empirical variogram; (2) fit a model; (3) create the matrices; and (4) make a prediction. The distinct forms of kriging are discussed and ordinary kriging is determined upon by an evaluation of the features of the different algorithms. Its precise mathematical structure (semi-variogram) is singled out by comparing the results of five ways of deciding (matching spatial autocorrelation, data visualisation, manual fitting of semi-variograms, cross-validation and ability to represent reality). Semi-variogram parameters are automatically fitted. The number of input points is prescribed by RMSE. Both pragmatic and scientific methods of specifying output pixel size were trialled before accepting an approximation guideline dictated by time and processing limits.
Eventually the following kriging model is chosen: ordinary kriging, exponential semi-variogram, automated parameters, all input points, 2.5 km grid. This is validated by leave-one-out cross validation and kriging variance. The results are then meaningfully classified for presentation as thematic maps.
5.1. Current Progress in the UK and Europe
Very little research has been published on the subject of selection of appropriate interpolation techniques for solar insolation in British weather conditions. A literature review discovered just two references [
28,
29], both of which prefer kriging to IDW. Turning to Europe, research on this topic has taken place in regions with rather different climates to the UK. Alsamamra et al. [
30] employed residual (related to universal) kriging in the complex mix of climatic types which exist in southern Spain; Bezzi and Vitti [
31] used universal kriging in an alpine region of Italy, whilst Caglayan et al. [
32] chose universal kriging in Turkey because of the robust average annual trends displayed by global solar radiation in that country. The tangential topic of spatio-temporal kriging has been covered by [
33,
34,
35].
5.2. Data and Software
The UK Meteorological Office currently has a network of approximately 85 automatic weather stations throughout the UK which observe irradiation as well as other meteorological conditions. The data is aggregated to hourly timestamps before being made available for public use as the MIDAS (Meteorological Office Integrated Data Archive System) database hosted by the British Atmospheric Data Centre (BADC) of the Centre for Environmental Data Archival [
36].
Two software packages were used in the kriging research phase. ArcGIS (ArcGIS Desktop: Release 10, Environmental Systems Research Institute (ESRI), Redlands, CA, USA) was used for exploration and visualisation of initial results. It contains a good range of interpolation models. Once the technique was worked out, the free, open-source R software (automap package, Version 3.3.2, R Foundation for Statistical Computing, Vienna, Austria) was preferred for automatic processing and because it is easy to parallelise for big data.
5.3. Kriging Operations
Kriging comprises an initial data exploration, followed by a predictive process. The empirical semi-variogram of the input data is plotted. This is the first use of data to estimate the spatial dependency of the data. Afterward, the theoretical semi-variogram is fitted to the points forming the empirical semi-variogram. This is a second use of data to predict values at unsampled locations.
In kriging, the estimations, i.e., output grid pixels
Ẑ, are calculated as weighted averages (
Wi) of known input point values (
Zi) (Equation (1)):
W is based on autocorrelation measures (semivariance), that is, the weight of each point decreases as distance to the point increases. A number of processes are involved:
- (1)
Construct the empirical semi-variogram (see
Section 5.5).
- (2)
- (3)
For all possible input point pairings, determine the straight-line distance between the points and swap into the chosen theoretical semi-variogram model (see later). Put differently, each point-pair distance is multiplied by the slope of the user-selected semivariance graph. The semi-variogram values obtained fill a data covariance matrix (dcm), to be inverted in preparation for subsequent use (idcm). It is necessary to replace the empirical semi-variogram with a theoretical one to comply with mathematical laws for the kriging equations to be solved.
- (4)
For each output pixel (Ẑ) whose irradiation value is to be predicted, create a vector of distances between itself and all input points. Again, substitute distance for semi-variance obtained from the semi-variogram graph to create an output pixel semi-variance vector (opsv).
- (5)
Generate a vector of weight factors (w) by multiplying the inverted input points semi-variogram matrix (idcm from Step 3) by the output pixel semi-variogram vector (opsv from Step 4). This is possible on the grounds that the kriging equation, Ẑ = Sum(Wi × Zi), can be expressed as opsv = w × dcm. Re-arranging the equation gives w = idcm × opsv.
- (6)
Finally, for every output pixel, calculate the predicted irradiation value by multiplying each entry in the weight factors vector w by the original input point measurements and summing the set of products. In this case, the irradiance recorded at each weather station is multiplied by a weight and the results totalled for 85 locations. The weights have to be recalculated for each output pixel because the distances to the input points (weather stations) constantly change as the algorithm moves on to make the next prediction. The dcm matrix stays the same but opsv and therefore w constantly change.
This is the basis of simple kriging [
37].
As highlighted previously, one of the benefits of using this approach is that it enables spatial configuration to be quantified. The error variance is worked out by multiplying the weight factors vector, w (result of Step 5) with the output pixel semi-variance vector, opsv (result of Step 4) and totalling the products. The standard error or standard deviation is the square root of the error variance.
5.4. Forms of Kriging
Having identified kriging as the interpolation method to be used, there is still a convoluted set of choices to be made, summarised in the decision tree in
Figure 1.
Simple kriging assumes a known mean which frequently poses problems. Unlike simple kriging, ordinary kriging limits the number of points used to calculate the output pixel by dictating a limiting distance and cut-off values for the amount of points. The other forms are far more complex. Universal kriging recognises local trends or drifts, creating a gradually changing lattice overlain by regional limits chosen by the user. Indicator kriging classifies the input variables, as does probability kriging. Neither is suitable for data containing trends. Disjunctive kriging has specific data distribution requirements and entails difficult to justify decisions. The trends existing in solar irradiation data are elaborate and defy straightforward explanation (see
Appendix for example). This rules out universal kriging which needs elementary dominant trends. Ordinary kriging is generally applicable and is the most commonly used form of kriging. It is the default choice and should be accepted unless there is a strong mathematical rationale for doing otherwise [
38].
5.5. Semi-Variogram Type
As indicated above, kriging partitions the spatial variation of natural phenomena into three: a deterministic trend or drift; a random spatially correlated element; and uncorrelated noise [
39]. The characteristics of spatially correlated part may be drawn by the semi-variogram relation. The object is to select the optimal values for interpolation weights. The semi-variogram is constructed as follows:
- (1)
Measure the distance between two locations.
- (2)
Reckon half the difference squared between the values at the locations. On the x-axis is the distance between the locations (or simplified distance, grouped into lag bins, h), and on the y-axis is the difference of their values squared, i.e., the semivariance, y(h). Thus, for the purposes of this research, x = distance in km, whilst y = [(irradiation at location i − irradiation at location j)2] ÷ 2.
There are several theoretical semi-variogram models (e.g., spherical, exponential, gaussian). Bailey and Gatrell [
40] give details of the graphs and equations. The parameters of the theoretical semi-variogram (drawn in
Figure 2) must be optimised to obtain the best fit to the empirical.
As point-pair distances on the semi-variogram plot increase (proceeding to the right of the x-axis), spatial dependency decreases, until it reaches a value (the range) where the graph flattens and it ceases entirely. The value of semivariance on the y-axis at which this event occurs is called the sill. In theory, points at the same location should have identical values (e.g., of insolation). Therefore, the plot should pass through (0,0) on the axes. In actual fact, the intercept occurs at a low value on the y-axis, known as the nugget. This represents spatially uncorrelated random noise in the data such as measurement errors. The names of the semi-variogram parameters are gold-mining terms, reflecting the historical origins of kriging.
5.6. Choice of Theoretical Semi-Variogram and Optimisation of Parameters
Advice on how to fit a model to an empirical semi-variogram varies. Bohling [
42] describes the exercise as “more of an art than a science” and is of the opinion that because empirical semi-variograms routinely contain errors and corrupt data, model selection may be influenced by subjective judgment. In an attempt to avoid bias, the author has identified five different methods of selecting a model. The results of applying these to irradiation data are detailed in Section
Appendix A.2 and summarized in the next section.
The semi-variogram has about a dozen forms. Not all of these are recommended. Several authors including Herzfeld [
43] state that only positive-definite models should be used. Positive definiteness means the kriging equation can be solved and kriging variance is positive [
44]. This characteristic is hard to prove.
The four positive definite models are:
- (1)
Spherical: this plot is linear close to the origin, making it suitable for the depiction of phenomena with close range variability. It demonstrates a progressive decrease of spatial autocorrelation until it reaches the sill (top of the semi-variogram curve), where autocorrelation is zero.
- (2)
Exponential: this is also linear near the origin but the exponential model differs from the spherical in that it approaches the sill gradually. Autocorrelation only ceases at infinity.
- (3)
Gaussian: the gaussian model traces a parabolic curve at the origin, representing smoothly varying properties. Like the exponential, it rises gradually to a straight sill at infinite distance.
- (4)
Linear: this model resembles the side of a trapezoid. It factors in a cease in autocorrelation between point-pairs at a determinable distance.
5.6.1. Summary of Variogram Selection
Table 1 sums up the findings of the various methods of fitting a suitable semi-variogram model (see
Appendix A.2 for details). It can be seen that the exponential model receives slightly more recommendations. It is reliable, produces a detailed surface and is endorsed by the well-documented cross-validation method. For this reason, large quantities of results were generated with this model.
5.6.2. Setting Parameter Values
Having specified the model, the parameters of the semi-variogram curve (nugget, range and sill) need to be given values which minimise deviation from the empirical points. According to the literature, parameter values can best be set manually, next best by cross-validation. However, the object is to avoid going through thousands of variograms assessing the fits by eye or comparing large statistical datasets. For this reason, the autofitVariogram technique from R software is employed. This calculates parameters as follows:
Sill—the mean of the maximum and median semi-variance values;
Range—0.1 multiplied by the value read from the diagonal of the bounding box of the map;
Nugget—the minimum of the semi-variance.
Another necessary decision is: should the output value for each location be determined using all the input points or a specified number? Guidance in the literature ranges from a minimum of 20 points to 100 points. Experimentation revealed that using fewer points (e.g., 10) creates a surface which is, to some extent, more detailed than using all 85 weather stations. However, the more points used, the lower the RMSE. Therefore, since there was access to sufficient computer processing power, the eventual maps were calculated using all the input points.
The final decision regards the output grid cell (pixel) size. Cell size needs to be detailed enough for future analysis whilst being feasible because this is a computationally demanding algorithm. Naturally, there is a considerable effect on results, depending in which pixel the point of interest falls. As with most kriging decisions, there is no preferred method of selecting a suitable grid resolution for output maps. The results of applying several methods to the irradiation data are outlined in
Table 2.
Pixel size varies from 1 km to 5 km. There is a need to balance processing time against accuracy. The 1 km map, although possible, did take some time (overnight with ArcGIS for one map) to generate. Therefore, the 2.5 km cell size was decided on as a trade-off between time and detail (10 min with ArcGIS for one map).
5.7. Synopsis of Kriging Decisions
It is comparatively easy to choose kriging options manually for one data set but circumstances dictate that a compromise solution, capable of fitting thousands of hourly datasets, is needed for automated processing. The selections detailed above are presented as suitable for this purpose. Unlike much research employing kriging which simply accepts default options (usually the spherical model), here all choices are scientifically justified to achieve optimal results.
The problem is that there is no “gold standard” solar irradiation map to enable comparison of generated surfaces. Insolation is recorded at comparatively few locations. RMSE can only be estimated for these locations but modelled values can show over 100 Wh/m2, i.e., about 10% difference between weather stations. On the other hand, kriging does have the advantage that it provides the ability to calculate error variance. This provides an indication of where on the map the interpolated values are least trustworthy.
There are more refined statistical techniques for prediction at points without data [
38] but these are labour intensive, involving up to four different error calculations for each hourly dataset. After an initial trial, it was found impossible to validate output for thousands of prediction surfaces by this method. The simpler statistics described in the
Appendix and viewing videos of results were preferred.
If there were plenty of well-dispersed weather stations (about 200 for the UK), the choice of kriging model and parameters would be less important but this is not the case. Consequently, kriging options must be carefully selected since they exert a noticeable influence on output. In conclusion, the chosen kriging options are ordinary kriging, exponential semi-variogram, sill and nugget from semi-variance, range from map size, all sample points, 2.5 km grid.
5.8. Success of the Kriging Choices
Interpolation methods can be validated using three techniques: data splitting, cross validation and calculation of the kriging variance [
46]. Data splitting cannot be carried out for the UK irradiation data because the number of observations is relatively low and the weather stations are not evenly dispersed. Therefore, cross validation was scrutinised.
The input data was 51,164 hourly global horizontal solar irradiation datasets for 2005–2014 from MIDAS (all the daylight hours for this period). Kriging took approximately 35 h of actual computer time using an i7 32 GB computer (8300 CMT, Hewlett Packard, Palo Alto, CA, USA) parallelised on all eight cores.
Averaged over the period 2005–2014, the kriging process yields an average hourly cross-validation RMSE of 56 Wh/m2 (11%) and an average maximum cross-validation RMSE of 211 Wh/m2 (42%). This compares very favourably to PVGIS (yearly average cross-validation RMSE 146 Wh/ m2/day (4.5%)).
Kriging variance is estimated between measurement locations. It provides a spatial view on the measure of success [
46]. In the case of MIDAS data, 50% of the UK has high kriging variance. Standard kriging error increases by approximately 50 Wh/m
2 every 25–30 km distance band from the nearest weather station.
Having examined uncertainty, output surfaces can at last be displayed. Even this requires decisions. The rasters exhibit a wide range of unique values and must be classified for viewing. This is a seemingly trivial exercise but there are innumerable choices of symbolisation. The author has found that over-simple classification can result in as much loss of detail as a large cell size or a smoother kriging model.
The majority of the datasets contain uniformly distributed values, so an equal interval classification method is appropriate. Matlab jet colour scheme is familiar to the intended audience. The audience are accustomed to interpreting complex issues, hence a fairly large number of classes may be used. Sturge’s rule (class number = 1 + 3.3log(Number of Observations)) suggests seven classes. Experimentation showed that more than nine classes may be confusing if weather patterns are fragmented rather than trending. In the event, it was found convenient to divide the kriged hourly global horizontal irradiation data in 12 equal classes of 100 Wh/m
2 each, representing a maximum hourly annual range in the UK of 0–1200 Wh/m
2. In practice, only six bands at most appear in one hourly map. That is, in any one hour, global horizontal irradiation is no more than 600 Wh/m
2 greater in Cornwall than in Scotland.
Figure 3 illustrates a small sample of the 51,164 surfaces created by this technique.
Figure 4 is the average of the 5045 hourly irradiation maps for 2013. This masks hourly variations and results in the southwest to northeast irradiation trend expected of Great Britain.
6. The Separation Stage
The plan was to obtain measured values of global horizontal, beam and diffuse irradiation. A series of models was then applied to generate beam and diffuse irradiation from the measured input global horizontal. The models were as follows. First, the Strous equation [
47] was employed to generate the solar declination angle which is an input to the separation model. Next, the Erbs et al. [
15], De Miguel et al. [
11] and Ridley et al. [
21,
22] separation models were tested. The diffuse values calculated by these models were compared to the measured diffuse values using the mean bias error (
MBE) and
RMSE statistical techniques. The separation model achieving the closest match to measured values (i.e., lowest errors) was selected.
6.1. Data
The component parts of irradiation are recorded at just a few Principal Radiation Stations using a pyranometer fitted with a ring that obscures the sun. Following automation about 10 years ago, there are just two of these stations in the UK: one at Camborne, east of St Ives Bay in Cornwall, and one at Lerwick, capital and main port of the Shetland Islands. The UK Meteorological Office operates Kipp and Zonen AP2 trackers with instruments measuring horizontal global, horizontal diffuse, planar direct, and downwelling longwave irradiation at these two sites. The data is recorded at minute intervals and archived by the Baseline Surface Radiation Network (BSRN,
http://bsrn.awi.de/). In order to be compatible with the hourly MIDAS data employed for kriging in this research, hourly aggregated means from the Global Atmosphere Watch (GAW) dataset at the World Radiation Data Centre (WRDC) in St Petersburg (
http://wrdc.mgo.rssi.ru/) were utilised. Hourly data for 2013 was downloaded.
This research would like to investigate nationwide applicability of decomposition models. Therefore, data from the recently commissioned Solys 2 Solar Tracker at Loughborough University was also used to test beam/diffuse split calculations. Although only available for one year (19 March 2015 to April 2016), this data is compatible with that of the UK Meteorological (Met.) Office, since the new pyranometer provides WMO-GAW-BSRN level performance. The data is recorded at one second intervals and was aggregated to hourly values to match the publicly available Camborne and Lerwick data. Hence, data has been obtained for sites in the north, centre and south of the country.
6.2. UK Weather
The UK has a temperate maritime climate. On the other hand, it is famous for its variability. In addition to this, different parts of the UK have slightly different regional climates:
- (1)
North West—cool summer, mild winter, heavy rain;
- (2)
North East—cool summer, cold winter, moderate rain;
- (3)
South East—warm summer, mild winter, light rain;
- (4)
South West—warm summer, mild winter, heavy rain.
The south coast receives the greatest number of sunshine hours and insolation despite the longer summer days in the north. This is due to southern coastal areas being more likely to be freed of cloud cover by the prevailing south west wind.
The available Met. Office data falls into two of these regional climatic zones (Camborne in the South West and Lerwick in the North East). Loughborough has less rain than either Camborne or Lerwick. Is has less sunshine hours and is colder than Camborne but is sunnier and warmer than Lerwick. Therefore, a check was carried out to discern if the same models fitted all locations.
6.3. Software Employed for Decomposition Models
There are several high-level languages, equally suitable for this research, e.g., Matlab and Python. However, the decision was taken to employ R software (package solaR), primarily because it was already in use in this project. R is used for meteorological modelling by National Aeronautics and Space Administration (NASA), National Oceanic and Atmospheric Administration (NOAA) and United States Geological Survey (USGS).
6.4. Irradiation Component Separation Models
The decision has been taken to trial two univariate (Erbs et al. [
15] and De Miguel et al. [
11]) and one multivariate model (Ridley et al. [
21,
22]), these being the intra-daily algorithms available in the chosen modelling language. (Model details in Section
Appendix A.3). We refer to them from now on as EKD (Erbs, Klein, Duffie), CLIMED and BRL (Boland, Ridley, Lauret) to comply with R conventions. All three correlate between the clearness index and the diffuse fraction. The clearness index is the ratio between global horizontal irradiation measured on the earth’s surface and calculated extra-terrestrial horizontal irradiation at the top of the atmosphere. That is, it is the fraction of extra-terrestrial horizontal irradiation which penetrates earth’s atmosphere. It depends on cloud cover and may range from 0.8 under clear blue skies to practically zero when conditions are overcast. Another factor is the latitude. The UK has relatively high latitudes and therefore may be expected to have a low clearness index because of the longer distance the sun’s rays must travel through the atmosphere under large zenith angles (
Table 3).
6.5. Results of Irradiation Component Separation Models
Table 4 shows the
MBE and
RMSE between the calculated global horizontal diffuse irradiation delivered by each separation model and the WRDC/Loughborough Solys 2 measured value. Irradiation values of less than 100 Wh/m
2 were filtered out to avoid the inherent inaccuracy in low radiation values. It may be seen that the BRL model delivers the lowest errors for UK locations. This is probably because it has been found to be less dependent on the zenith angle than the other models [
28].
Plotting all of the modelled values (
Figure 5) discloses the tendency of each one of the models to underestimate diffuse values, especially during non-winter months (April to October inclusive). Thus, there remain opportunities for enhancement of even the best performing algorithm. The seasonal effect is probably due to the fact that radiation values increase in the summer, therefore the discrepancy between measured and modelled values is intensified because percentage differences result in higher unit values. The BRL model delivers implausibly low values under clear sky conditions; however, in the UK, these most frequently occur in February and March because cold air cannot hold as much moisture as warm air can. In early spring, irradiation values are reduced in any case.
This graph also displays the results of the BRL model following the measured values more closely than any other calculated outcomes.
There are only small differences in performance between any of the models, that is, a maximum of 14 Wh/m2 MBE, 12 Wh/m2 RMSE. These errors may possibly be within the range of pyranometer uncertainty. Despite the minor variations in calculated results, it is still necessary to select the separation technique logically, in order to deliver the required data for PV performance modelling. BRL has been identified as the procedure which most accurately reproduces measured values. This finding is to some extent predictable because it is a universal algorithm, whilst the EKD equation was fitted to US data and the CLIMED to Mediterranean information. Given the regional climate of Lerwick, it is reasonable that the CLIMED equation would not be suitable. It is useful to note from the researcher’s point of view, that the same model is effective for all UK sites tested.
7. Combination of Stages
The following interpolation/separation model combination generates the most accurate results for UK-wide data input and therefore is recommended for PV performance modelling in the UK: ordinary kriging with exponential semi-variogram/BRL separation. Hourly and average annual national maps are produced for five years (2009–2014) for the UK using these algorithms.
Examples of the results of combining the interpolation and separation procedures are given in
Figure 6 and
Figure 7.
Figure 6 presents diffuse irradiation calculated from the kriging/BRL sequence for one location (Loughborough), whilst
Figure 7 shows calculated national diffuse irradiation for the UK.
Figure 6 again indicates the underestimation of the BRL model.
Figure 7 gives three examples of national maps of calculated diffuse irradiation values. The June midday map displays high values. The June evening and December midday maps are instances of when sun elevation and therefore irradiation are low. The December map has higher values than the June evening because diffuse irradiation comprises a higher fraction of the total irradiation in the winter. This is particularly so in a high latitude, cloudy location such as the UK.
RMSE values for both the single location and national application were determined. The kriging RMSE for Loughborough was 80 Wh/m2, generated from a dataset of April–December 2015 inclusive. This being a period when both Solys 2 and Met. Office data are available. The overall combined RMSE for Loughborough (from data produced by kriging and the BRL model in sequence and compared to Solys 2 measured diffuse values) was 42 Wh/m2.
The national
RMSE of the kriging and separation stages together may be estimated by combining the individual errors as follows (Equation (2)):
Using the kriging interpolation algorithm, national average (56 Wh/m
2) and average of Lerwick, Camborne and Loughborough sites’
RMSE for the BRL separation model (64 Wh/m
2 calculated from values in
Table 4); the expected composite UK national
RMSE on an hourly time step is 86 Wh/m
2. The separation stage is responsible for more than half (54%) of the overall error. These results compare well to related work. Burgess [
28] obtained a composite
RMSE of 46% following comparison of sequential application of IDW interpolation, EKD separation and Perez transposition [
48] models to measured data for one site in Cornwall.
Considering the succession of the kriging and BRL models used in this research, the underestimation of the decomposition model tends to balance out the overestimation of the interpolation, giving a lower combined error.
8. Conclusions
This research has presented a method for accurate derivation of horizontal diffuse solar irradiation from publicly available data. The procedures selected are applicable for both single locations and UK-wide. Prior to this work, publicly accessible diffuse irradiation data was only obtainable in the form of measured data for just two sites in the UK. The complete irradiation interpolation and beam/diffuse separation sequence described here is fully validated with one year’s measured data.
There are many possible models and model parameters for both the interpolation and separation stages. This work describes how to make a scientific selection. Ordinary kriging was decided upon because it is compatible with the characteristics of solar radiation in the UK. The exponential semi-variogram was chosen for kriging as it results in the lowest cross-validation RMSE. Kriging offers flexibility in terms of input and output data and can generate prediction error maps. The disadvantage is that it also demands a lot of decision-making. There is a large quantity of advice on offer as to how to make those decisions. Some of this guidance is scientific, some pragmatic. It is not clear what is the most appropriate for particular circumstances. This research experiments with a wide range of methods to make the most suitable choices. This is in contrast to the majority of work involving interpolation which often relies on default options supplied by the software or program. The error inherent in the statistical processes cannot easily be removed but selection of less than optimal choices can be avoided.
Turning to the separation stage, the BRL model again provided the lowest RMSE. Kriging decisions require a meticulous approach since values at a given UK location may vary by over 100 Wh/m2 depending on kriging model used. The same cannot be said of the separation models since there is little to choose between them in terms of error values. Nonetheless, this research logically determines the model which is employed. This results in a newer alternative being chosen, rather than relying on a usual long-standing choice.
9. Future Work
Although the choice of model is more important for the interpolation stage than for the separation, it is the separation process which contributes most to the overall error of the two-stage methodology. Even so, both stages present possibilities for improvement.
The BRL model could be enhanced, for instance by an atmospheric aerosol loading factor, or by other means. The UK is subject to sea spray aerosols. Additionally, summer anticyclones may result in increased water holding capacity in the atmosphere and elevated mass concentrations of secondary pollutants.
This paper established that standard kriging errors increase with distance from the closest weather station. Therefore, spatial distribution of solar radiation interpolated from ground-based sensor values could be improved by integration with satellite measurements.
Whilst both stages may be further developed, this research demonstrates the systematic selection and application of the most accurate models currently available to produce countrywide diffuse horizontal irradiation for the UK.