## 1. Introduction

Solar radiation data are important because they are required in many research fields such as meteorology, agriculture, hydrology, ecology, and environment [

1,

2,

3,

4]. Solar radiation data are also an important reference for many applications such as solar power plants, engineering designs, regional crop growth modelling, evapotranspiration estimation, and irrigation system development [

1,

3,

5]. In relation to this, South African Weather Services (SAWS) re-established a global horizontal irradiance (

GHI) radiometric network with 13 solar radiometric stations located in all 6 macroclimatic zones of South Africa [

6]. The macroclimatic zones are regions with similar climatic conditions, and they were established to classify different areas based on their maximum energy demand and maximum energy consumption [

6]. The data collected from the SAWS network help in the validation of satellites as well as the development and verification of empirical models [

7]. SAWS also manage a very dense sunshine duration recording network over South Africa to the extent that sunshine duration data have been continuously measured for several years [

8]. SAWS

GHI stations are sparse; according to [

1,

2,

3,

4,

5,

9], having dense radiometric networks is a worldwide challenge because of the high costs involved in the installation and maintenance of the solar radiation stations. To compensate for this, reliable measurements taken from a sparse network are needed to develop and validate empirical models that can be used to estimate and forecast the availability of solar energy at other locations [

10]. The main objective of this study is to calibrate the Ångström–Prescott (AP) model regression coefficients

a and

b that could be used to estimate

GHI in different climatic zones of South Africa, thus increasing the density of available solar radiation data in the country.

The AP model estimates daily

GHI using daily extraterrestrial (top of the atmosphere)

GHI radiation (

$GH{I}_{TOA}$), daily astronomical day length (

N), daily measured sunshine duration (

n), and Ångström model coefficients

a and

b. The model was first proposed by Ångström [

11] in 1924 before Prescott [

12] modified it in 1940 by adding

$GH{I}_{TOA}$ to replace

GHI on a clear sky day. The original AP coefficients were

a = 0.25 and

b = 0.75; these were calculated using data from Stockholm [

13]. The regression coefficients

a and

b are site-dependent; therefore, there is a need to calibrate them using a linear relationship in Equation (1) at regions where they will be used to estimate

GHI [

1,

4,

13,

14]. Martinez et al. [

14] emphasised that AP regression coefficients with proven accuracy in one climatic region should not be assumed to be equally reliable in the other climatic region without additional evidence. Researchers such as those from the Chinese Academy of Sciences, the Indian National Academy of Agricultural Research Management, the Brazilian Federal University of Rio Grande do Norte, and Spanish Polytechnic University of Madrid [

1,

2,

3,

4,

5] calibrated AP coefficients to their own climatic regions by using the linear relationship in Equation (1). The study from Spain by Almorox et al. [

5] focused on only one station and the study from Brazil by De Medeiros et al. [

4] focused on four stations, but all were located in only two climatic zones. According to the findings by Zhang et al. [

3], the regression coefficients were different in different climate zones. The differences in

**a** and

**b** in different climatic zones could be due to variations in latitude, altitude, aerosols, and water vapor concentration, surface albedo, and mean solar altitude [

14]. The study by Tsung et al. [

15] focused on one location using

n and

GHI data collected from two different stations because of the unavailability of both

GHI and

n from one location. In this study, eight stations located in all six climatic zones of South Africa, and where both

n and

GHI data were collected from the same location, are considered so that the respective AP coefficients are representative of a climatic zone in the country and not the whole country.

According to works by four different research groups [

1,

2,

3,

14,

16], sunshine-based models provided better

GHI estimates when compared to cloud and temperature-based models. This might be because the amount of

GHI reaching the earth’s surface is closely related to sunshine duration [

3]. Cloud cover restricts the amount of

GHI reaching the earth’s surface and cloud-based models also perform better [

3], but accurate cloud observation data to be used as an input in the model are scarce compared to sunshine duration data. The effect of temperature on

GHI is lower than that of sunshine duration and cloud cover. This is because most of the long-wave solar radiation reaching the earth’s surface is absorbed, emitted in the atmosphere, or reflected to space [

3].

GHI is the total amount of short-wave solar radiation reaching the earth’s surface and has little dependence on temperature. The availability of reliably measured sunshine duration data in all climatic zones and the performance of sunshine-based models motivated the focus on sunshine-based models in this study. The AP linear regression model was chosen because of its simplicity and also, as suggested by Tsung et al. [

15], linear models were ranked high in global performance indicator (GPI) in comparison to other models in a review in 2015.

In South Africa, studies to calibrate AP coefficients were carried out by Eberhard [

17] and Mulaudzi et al. [

18]. The challenge, according to Mulaudzi et al. [

18], was the unavailability of a long-term observation

GHI dataset that covers all the climatic regions to calibrate and validate the AP coefficients. In this study, a large enough dataset with observations spanning 2013 to 2019 from stations that cover all the climatological zones of South Africa was used to calibrate the AP coefficients, which were then used to estimate

GHI. The estimated

GHI was validated using the observed

GHI daily averages, while the statistical metrics (10) to (16) from [

19,

20] were used to quantify the differences between observed and estimated

GHI.

From the results of this study, annual AP coefficients a and b in all six macro-climatological regions could be used to estimate daily GHI, for the respective climate regions, using daily observation sunshine duration data. The knowledge of estimated daily GHI data can thereby be used to develop energy policies and solar energy programmes. They can also be used as benchmarks in climate analysis studies.

## 2. Materials and Methods

The observed 1-min

GHI data used in this study were collected from 8 SAWS solar radiometric stations during the periods shown in

Table 1, which also shows the geographical locations and the climatic zones in which the stations are located. The map in

Figure 1 gives the macro-climate regions in South Africa.

GHI data were collected using secondary standard, CMP11, Kipp and Zonen pyranometers.

The methodology is provided in the flowchart in

Figure 2 and

Figure 3. Daily

GHI data were calculated from 1-min

GHI data. First, the 1-min

GHI data were quality controlled using a Baseline Solar Radiation Network (BSRN) quality control (QC) procedure outlined by Long and Dutton in [

21].

GHI values that failed the QC test were regarded as outliers and were discarded; only the data that passed the test were used [

6,

7,

20,

21,

22]. Minute values that passed the BSRN QC were averaged to 15 min and then, 4 slots of 15-min averages were averaged to obtain an hourly mean [

6,

7,

22,

23,

24,

25]. Hourly mean values were then averaged to obtain daily average values. Daily average values were further quality checked by subjecting them to HelioClim model QC, described by Geiger et al. in [

26]; outliers, which were daily average points coded 1, were discarded before further analysis.

Hourly sunshine duration data were obtained by determining the burn made by the sun on a coated card in a Campbell–Stokes sunshine recorder [

8]. Hourly data were then summed to obtain total daily sunshine duration (

n). Daily top-of-atmosphere (

TOA) irradiance (

GHI_{TOA}) and theoretical sunshine duration (

N) were calculated using Equations (1)–(9), from Iqbal [

13], and the solar angles were calculated using the Solar Position Algorithm (SPA) on Python PVLIB [

27,

28] and Microsoft Excel. The coefficients

**a** and

**b** of the AP model were calculated by using the linear regression analysis between the irradiance fraction or clearness index,

$\frac{GHI}{GH{I}_{TOA}}$ and daily sunshine fraction,

$\frac{n}{N}$ for each day, based on a linear relationship shown by Equation (1) proposed by Ångström [

11] and then, modified by Prescott [

12].

where

$GHI$ is the daily Global Horizontal Irradiance in W/m

^{2}.

$GH{I}_{TOA}$ is an approximation of the top of the atmosphere

GHI or extraterrestrial radiation on a horizontal surface, i.e., the amount of global horizontal radiation that a location on Earth’s surface would receive if there was no atmosphere; it is given by Equation (2), as in Duffie and Beckman [

29].

(World Meteorological Organization recommendation, according to Gueymard in [

30]),

where D is the Julian day,

where

a and

b represents Ångström–Prescott regression coefficients.

Annual AP coefficients were calculated for 8 stations. The observation periods for concurrent

GHI and sunshine duration data for these stations are given in

Table 1. Datasets up to the end of 2018 were used for determination of the AP coefficients, and the daily observation data for 2019 were used to validate the corresponding estimated daily

GHI data. For Thohoyandou, the 2017 data were used to validate the coefficients.

The statistical metrics that were used to compare estimated daily

GHI data with the observed daily

GHI data were derived from the literature [

19,

20] and these are:

Mean Bias Error (

MBE), which estimates the average error in the prediction. A positive

MBE indicates that the prediction is overestimated and vice versa; the lower values of

MBE indicate a strong correlation between the prediction and observation. A relative Mean Bias Error (

rMBE), which measures the size of the error in percentage terms, was also calculated. The metrices are expressed as:

Mean Absolute Error (

MAE), which measures the absolute value of the differences between the observed and the predicted values, gives a better idea of the prediction accuracy; relative Mean Absolute Error (

rMAE), which measures the size of the error in percentage terms, was also calculated. The caution with

MBE and

rMBE is with the cancelling of positive and negative bias, which can lead to a false interpretation. The metrics are expressed as:

Root Mean Square Error (

RMSE), which compares the predicted and observed datasets, measures the statistical variability of the prediction accuracy and is expressed as shown in Equation (14), while Equation (15) shows the relative Root Mean Square Error (

rRMSE), which measures the size error in percentage terms. The

RMSE and

rRMSE are also indifferent to the direction of the error. They are considered in this study since these put extra weight on large errors. The metrices are expressed as:

Coefficient of Determination (R

^{2}), which is a statistical measure of the strength of the relationship between the movement of predicted and observed. R

^{2} also measures how well the regression line represents the data. The value of R

^{2} is such that

$0\le {\mathrm{R}}^{2}\le 1$. The closer R

^{2} is to 1, the better the prediction. The metric is expressed as:

where

$Oi$ is the observation value,

$Pi$ is the estimated value,

$\overline{O}i$ is the average of the observation values,

i is the time point, and

n is the total number of points used.

The results were converted from W/m

^{2} to MJ m

^{−2}d

^{−1} by dividing by 11.57415, a methodology used by Almorox et al. [

5] to allow for easy comparison with other literature studies. Monthly averages of each metric were calculated and then aggregated to annual averages, and where observation data were not available, data were replaced by NaN. The annual AP coefficients

**a** and

**b** were calculated.

## 4. Conclusions

The annual Ångström–Prescott coefficients a and b were calculated using the linear relationship between ratio of daily global radiation on a horizontal surface to the daily projected extraterrestrial radiation on that surface and the ratio of daily sunshine duration to the theoretical sunshine duration. They were used to estimate global horizontal irradiance and there was a very close agreement with the corresponding observation global horizontal irradiance. The agreement was quantified by statistical metrics in Equations (10)–(16), i.e., relative Mean Bias Error, relative Mean Absolute Error, relative Root Mean Square Error, and correlation coefficient (R^{2}). The results were in good agreement with what other studies found.

The methodology used in the study can be applied elsewhere, where there is a station that records global horizontal irradiance and sunshine duration. Practitioners need to cross check against their climate zones and not use a and b from one site to represent the entire country, as it varies per climatic zone due to variations in latitude, cloud cover, aerosols, surface albedo, and day lengths. The unavailability of confident observation of daily sunshine data in some areas might be a drawback for other practitioners.

Further research will focus on extended monitoring of the stability of the coefficients over time in each climate zone. Further research will also focus on calibrating and validating rainfall, cloud, temperature, and humidity-based models in areas where sunshine data are not recorded to make sure that daily global solar radiation data can be estimated in those areas in South Africa. The Python script used in calculating linear regression coefficients and validation of observation and estimated GHI data is available on request from the correspondence author.

The Ångström–Prescott coefficients calibrated for each station can be used as a representative for the climatic zone where that station is located. The Ångström–Prescott coefficients calculated in this study could enable the estimation of daily global horizontal irradiance data at any location in South Africa, where daily observation sunshine duration data are available and the climate is correctly classified. The knowledge of estimated daily global horizontal irradiance data can thereby be used to support energy policies and solar energy programmes. They can also be used as benchmarking in climate analysis studies.