The Ångström-Prescott Regression Coefficients for Six Climatic Zones in South Africa

: The South African Weather Service (SAWS) manages an in-situ solar irradiance radiometric network of 13 stations and a very dense sunshine recording network; located in all six macro-climate zones of South Africa. A sparsely distributed radiometric network and over a landscape with dynamic climate and weather shifts is inadequate for solar energy studies and applications. Therefore, there is a need to develop mathematical models to estimate solar irradiation for a multitude of diverse climates. In this study, the annual regression coefficients, a and b , of the Å ngström-Prescott (AP) model that can be used to estimate global horizontal irradiance from observed sunshine hours were calibrated and validated with observed station data. The AP regression coefficients were calibrated and validated for each of the six macro-climate zones of South Africa using the observation data that spans 2013 to 2019. The predictive effectiveness of the calibrated AP model coefficients was evaluated by comparing estimated and observed daily global horizontal irradiance. The maximum annual relative Mean Bias Error (rMBE) was 0.371 %, relative Mean Absolute Error (rMAE) was 0.745 %, relative Root Mean Square Error (rRMSE) was 0.910 % and the worst-case correlation coefficient (R 2 ) was 0.910. The statistical validation metrics results show that there is a strong correlation and linear relation between observed and estimated solar radiation values. The AP model coefficients calculated in this study can be used with quantitative confidence in estimating daily GHI data at locations in South Africa where the daily observation sunshine duration data is available.


Introduction
Solar radiation data is important because it is required in many research fields such as meteorology, agriculture, hydrology, ecology and environment [1,2,3,4]. Solar radiation data is also an important reference for many applications such as solar power plants, engineering designs, regional crop growth modelling, evapotranspiration estimation and irrigation system development [1,3,5]. In relation to this, South African Weather Services (SAWS) re-established a global horizontal irradiance (GHI) radiometric network with 13 solar radiometric stations located in all 6 macro climatic zones of South Africa [6]. The data collected from SAWS network help in the validation of satellites as well as the development and verification of empirical models [7]. SAWS also manage a very dense sunshine duration recording network over South Africa to the extent that sunshine duration data has been continuously measured for several years [8]. SAWS GHI stations are sparse, according to [1,2,3,4,5,9]. Having dense radiometric networks is a worldwide challenge because of the high costs involved in the installation and maintenance of the solar radiation stations. To compensate for this, reliable measurements taken from a sparse network are needed to develop and validate empirical models that can be used to estimate and forecast the availability of solar energy at other locations [10]. The main objective of this study is to calibrate the Å ngström-Prescott (AP) model regression coefficients a and b that could be used to estimate GHI in different climatic zones of South Africa, thus increasing the density of available solar radiation data in the country.
The AP model estimates daily GHI using daily extra-terrestrial (Top of the Atmosphere) GHI radiation ( ), daily astronomical day length (N), daily measured sunshine duration (n) and Angström model coefficients a and b. The model was first proposed by Angström [11] in 1924 before Prescott [12] modified it in 1940 by adding to replace GHI on a clear sky day. The original AP coefficients were a=0. 25 and b=0.75 these were calculated using data from Stockholm [13]. The regression coefficients a and b are site dependent, therefore there is a need to calibrate them using a linear relationship in equation (1) at regions where they will be used to estimate GHI [1,4,13]. Researchers such as those from the Chinese Academy of Sciences, the Indian National Academy of Agricultural Research Management, the Brazilian Federal University of Rio Grande do Norte and Spanish Polytechnic University of Madrid [1][2][3][4][5] calibrated AP coefficients to their own climatic regions by using the linear relationship in equation (1). According to works by three different research groups [1,2,14] sunshine-based models provided better GHI estimates when compared to cloud and temperature-based models.
In South Africa studies to calibrate AP coefficients were carried out by Eberhard [15] and Mulaudzi et al. [16]. The challenge, according to Mulaudzi et al. [16] was the unavailability of a longterm observation GHI data set that covers all the climatic regions to calibrate and validate the AP coefficients. In this study a large enough data set with observations spanning 2013 to 2019 from stations that covers all the climatological zones of South Africa was used to calibrate the AP coefficients which were then used to estimate GHI. The estimated GHI was validated using the observed GHI daily averages, while the statistical metrics (10) to (16) from [17,18] were used to quantify the differences between observed and estimated GHI.
The results from this study, annual AP coefficients a and b in all six macro-climatological regions could be used to estimate daily GHI using daily observation sunshine duration data. The knowledge of estimated daily GHI data, can thereby be used to develop energy policies and solar energy programmes. They can also be used as benchmarks in climate analysis studies.

Materials and Methods
The observed 1-minute GHI data used in the study was collected from 8 SAWS solar radiometric stations during the periods shown in Table 1; which also shows the geographical locations and the climatic zones in which the stations are located. GHI data was collected using secondary standard, CMP11, Kipp & Zonen pyranometers. Daily GHI data was calculated from 1-minute GHI data. First, the 1-minute GHI data was quality controlled using a Baseline Solar Radiation Network (BSRN) quality control (QC) procedure outlined by Long and Dutton in [19]. GHI values that failed the QC test were regarded as outliers and were discarded, only the data that passed test was used [6,7,[20][21][22]. Minute values that passed the BSRN QC were averaged to 15 minutes and then 4 slots of 15-minute averages were averaged to get an hourly mean [6,7,[20][21][22][23]. Hourly mean values were then averaged to get daily average values. Daily average values were further quality checked by subjecting them to HelioClim model QC , described by Geiger et al in [24], outliers, which were daily average points coded 1 were discarded before further analysis.
Hourly sunshine duration data was obtained by determining the burn made by the sun on a coated card in a Campbell-Stokes sunshine recorder [8]. Hourly data was then summed to get total daily sunshine duration (n). Daily top-of-atmosphere (TOA) irradiance (GHITOA) and theoretical sunshine duration (N) were calculated using equations (1) to (9), from Iqbal [13], and the solar angles were calculated using the Solar Position Algorithm (SPA) on Python PVLIB [25,26] and Microsoft Excel. The coefficients a and b of the AP model were calculated by using the linear regression analysis between the irradiance fraction or clearness index , and daily sunshine fraction, for each day, based on a linear relationship shown by equation (1) proposed by Angström [11] and then modified by Prescott [12].
where is the daily Global Horizontal Irradiance in W/m 2 is approximation of the top of the atmosphere GHI or extra-terrestrial radiation on a horizontal surface i.e., the amount of global horizontal radiation that a location on Earth's surface would receive if there was no atmosphere and it is given it is given by equation (2) , as in Duffie and Beckman [27].
Annual AP coefficients were calculated for 8 stations. The observation periods for concurrent GHI and sunshine duration data for these stations are given in Table 1. Datasets up to the end of 2018 were used for the determination of the AP coefficients, and the daily observation data for 2019 was used to validate the corresponding estimated daily GHI data. For Thohoyandou, the 2017 data was used to validate the coefficients.
The statistical metrics that were used to compare estimated daily GHI data with the observed daily GHI data were derived from literature [17,18] and these are: 2. Mean Absolute Error (MAE) which measures the absolute value of the differences between the observed and the predicted values, it gives a better idea of the prediction accuracy, relative Mean Absolute Error (rMAE), which measures the size of the error in percentage terms was also calculated. The caution with MBE and rMBE is with cancelling of positive and negative bias which can lead to a false interpretation. The metrics are expressed as: 3. Root Mean Square Error (RMSE) which compares the predicted and observed data sets, it measures the statistical variability of the prediction accuracy, is expressed as shown in equation (14), while equation (15) shows the relative Root Mean Square Error (rRMSE) which measures the size error in percentage terms. The RMSE and rRMSE are also indifferent to the direction of the error. They are considered in this study since these put extra weight on large errors. The metrices are expressed as: Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 2 August 2020 doi:10.20944/preprints202008.0038.v1 4. Coefficient of Determination (R 2 ) which is a statistical measure of the strength of the relationship between the movement of predicted and observed. R 2 also measures how well the regression line represents the data. The value of R 2 is such that 0 ≤ 2 ≤ 1. The closer R 2 is to 1, the better the prediction. The metric is expressed as: where is the observation value, is the estimated value, is average of the observation values, i is the time point and n is the total number of points used.
The results were converted from W/m 2 to MJm -2 d -1 by dividing by 11.57415; a methodology used by Almorox et al. [5] to allow for easy comparison with other literature studies. Monthly averages of each metric were calculated and then aggregated to annual averages, and where observation data was not available data was replaced by NaN. The annual AP coefficients a and b coefficients were calculated.

Annual AP results
In this study, the annual AP regression coefficients a and b were calculated using equation (1) and the following variables: daily n, daily mean GHI, daily mean and daily N were used as inputs. The calculated a and b were then used together with daily n and N to estimate daily GHI, which was then compared to corresponding observed daily GHI. Statistical metrics in equations (10) to (16) were used to quantify the errors between the two datasets; the results are shown in Table 2 and Figures 1 to 4.
In Figures 1 and 2, the annual AP coefficients and the data points that were used to derive them are displayed. The values of the AP coefficients ranged from 0.188 to 0.243 for a while those for b ranged from 0.515 to 0.6. Values of a=0. 25 and b=0.5 were recommended by Allen et al. [29] to be used when there is no local observation GHI data to calibrate the coefficients. The minimum value of a in this study was less than 0.25 maximum value was greater than 0.25, the minimum and maximum values of b were greater than 0.5. The difference in default AP coefficients and calibrated AP coefficients proved that calibrating the coefficients locally is a necessity Studies done by Zhang et al., De Medeiros et al., Almorox et al. and Tsung et al. in [3][4][5]14] also found different results to Allen et al. [29] when they did a local calibration. The AP coefficients from this study are in line with the coefficients from similar studies done elsewhere in the  (Table 1), the difference was less than 0.05 for both a and b which is a very small difference. This means that the AP coefficients a and b, calibrated for a climatic zone could be used as a representative for an entire climatic zone to estimate GHI when observed sunshine duration data for the location is available.

Validation Results
Estimated GHI values were compared to the measured GHI values; errors were quantified by validation metrics in equations 10 to 16 and the results were tabulated in Table 2. It can be seen that in Table 2, that the rMBE ranged from -1.20 to 0.371 %, rMAE from 0.311 to 0.745 %, rRMSE from 0.393 to 0.910 % and R 2 from 0.910 to 0.948. De Aar, Irene and Thohoyandou had a positive MBE meaning that the model overestimated GHI while Upington, Durban, Mthatha, George and Polokwane had a negative MBE meaning that the model underestimated GHI values at these locations. The values of MBE and rMBE for all the stations were less than 1 indicating that there was a strong correlation between the predicted and observed GHI values. The worst case R 2 value was 0.910 suggesting that there is a very strong linear relation between observed and predicted values.  [4] and Tsung et al. [14], respectively, determined. Maximum MAE of 1.425 MJm -2 d -1 was less than 1.8 MJm -2 d -1 that Tsung et al. [14] determined. Maximum MBE of 0.733 MJm -2 d -1 was less than 1.040 and 0.85 MJm -2 d -1 that De Medeiros et al [4] and Tsung et al. [14], respectively, determined and the worst case R 2 of 0.910 was greater than 0.875, 0.74 and 0.8 that Zhang et al. [3], Adamala et al. [2] and De Medeiros et al. [4], respectively, determined. The overall validation results from this study are comparable and even better than what was found in similar studies like [2,3,4,5,14] which concluded that the AP coefficients could be used to estimate GHI with confidence based on those validation results. The data used in the study was collected using secondary standard pyranometers (CMP11), which according to Urraca et al. [21] generate high quality records of GHI. GHI data was subjected to robust quality control methodologies BSRN QC [19] and HelioClim model QC [24] before any analysis and outliers were discarded. The use of Python codes in data analysis enabled big data to be handled much more efficient, execution of a code in data analysis resulted in correct and consistent outputs, these some of the reasons why the results in this study are better. This means that the AP coefficients results from this study could also be used with confidence to estimate GHI in different climatological zones of South Africa. In Figure 3, 2019 monthly GHI observed data was compared to corresponding estimated 2019 monthly GHI data. Thohoyandou is the only station where validation was done using 2017 monthly data sets and the observation data was only available from February to October (January, November and December 2017 data sets were not available). The need to fill in missing data further motivates for this study i.e., development and validation of models, and results of this study can be used to fill any missing monthly mean GHI values for South African locations. 9 of 11 Similarly, in Figure 4, the 2019 monthly GHI observed data was compared to corresponding estimated 2019 monthly GHI data. In Durban GHI observation data for September was not available. In Polokwane the GHI observation data for March, April, May and June was not available. Results from the study can be used to fill those missing monthly mean GHI values.

Conclusions
The annual Å ngström-Prescott coefficients a and b were calculated using the linear relationship between ratio of the daily radiation on a horizontal surface to the daily extraterrestrial radiation on that surface, and the ratio of the daily sunshine duration to the theoretical sunshine duration were used to estimate global horizontal irradiance and there was a very close agreement with the corresponding observation global horizontal irradiance , the agreement was quantified by statistical metrics in equations (10) to (16), i.e. relative Mean Bias Error, relative Mean Absolute Error, relative Root Mean Square Error and correlation coefficient (R 2 ). The results were in good agreement with what other studies found. The Å ngström-Prescott coefficients calibrated for each station can be used as a representative for the climatic zone where that station is located. The Å ngström-Prescott coefficients calculated in this study could enable the estimation of daily global horizontal irradiance data at any location in South Africa where the daily observation sunshine duration data is available. The knowledge of estimated daily global horizontal irradiance data can thereby be used to support energy policies and solar energy programmes. It can also be used as benchmarking in climate analysis studies. The methodology used in the study can be applied elsewhere, where there is a station that records global horizontal irradiance and sunshine duration.