New Decomposition Models for Hourly Direct Normal Irradiance Estimations for Southern Africa

: This research develops and validates new decomposition models for hourly direct Normal Irradiance (DNI) estimations for Southern African data. Localised models were developed using data collected from the Southern African Universities Radiometric Network (SAURAN). Clustered areas within Southern Africa were identified, and the developed cluster decomposition models highlighted the potential advantages of grouping data based on shared geographical and climatic attributes. This clustering approach could enhance decomposition model performance, particularly when local data are limited or when data are available from multiple nearby stations. Further, a regional Southern African decomposition model, which encompasses a wide spectrum of climatic regions and geographic locations, exhibited notable improvements over the baseline models despite occasional overestimation or underestimation. The results demonstrated improved DNI estimation accuracy compared to the baseline models across all testing and validation datasets. These outcomes suggest that utilising a localised model can significantly enhance DNI estimations for Southern Africa and potentially for developing similar models in diverse geographic regions worldwide. The overall metrics affirm the substantial advancement achieved with the regional model as an accurate decomposition model representing Southern Africa. Two stations were used as a validation study, as an application example where no localised model was available, and the cluster and regional models both outperformed the comparative decomposition models. This study focused on validating the model for hourly DNI in Southern Africa within a range of K t -intervals from 0.175 to 0.875, and the range could be expanded and validated for future studies. Implementing accurate decomposition models in developing countries can accelerate the adoption of renewable energy sources, diminishing reliance on coal and fossil fuels.


Introduction
Photovoltaic (PV) systems require accurate modelling and monitoring to ensure their profitability.The amount of irradiance at the site, the global plane-of-array irradiance (GPI), is the foundation of designing, modelling and monitoring PV systems.The GPI comprises the plane-of-array's (POA) direct beam, ground and diffuse irradiance components.GPI is used to model and monitor PV systems, as this shows the amount of generated solar power, and, therefore, it is one of the most important contributing factors to designing a PV system.The global horizontal irradiance (GHI), direct normal irradiance (DNI), and diffuse horizontal irradiance (DHI) components are required to calculate these irradiance components.
Irradiance components with a transposition model calculate GPI (G POA ) as G BC is the direct beam irradiance, G RC is the ground-reflected irradiance, and G DC is the diffuse irradiance component in the POA on the collector.The GHI, DNI, and DHI components are required to calculate G BC , G DC and G RC .The sum of the DNI projected onto the horizontal surface using the cosine of the solar zenith angle θ Z and DHI gives the GHI, as shown in Figure 1 [1]: ( Figure 1.The irradiance relationships between GHI, DNI, DHI, and θ Z .
The units of GHI, DHI, and DNI are W/m 2 .Most ground-based stations at least have measurements of GHI.Other measurements include radiometric data such as DNI, DHI, and ultra-violet and meteorologic data such as the temperature, pressure, rainfall, relative humidity, wind direction and wind speed.Pyranometers measure DHI and GHI, and the pyrheliometer measures DNI.
GHI is measured with a hemispherical view and is mounted horizontally.Similar in setup to other pyranometers, the DHI pyranometer includes the additional feature of being shaded from direct sunlight.The pyrheliometer has a narrow view that only measures the beam directly from the Sun and is usually a Sun tracker for increased accuracy [2].The irradiance measurements are converted to W/m 2 and logged accordingly.
Calibrating the equipment to the ISO 9060:2018 standard [3] is necessary, and it is advisable to undergo recalibration every two years to ensure the reliability of measurements.The maintenance required is to clean the domes and regularly check and replace the desiccant, which keeps the instruments dry internally.GHI, DNI, and DHI are interdependent; therefore, having only two irradiance measurements is sufficient to estimate the third using the decomposition models (also sometimes called separation models) [4].If only the GHI is available, the DNI and DHI also are estimated using the decomposition models.The transposition models calculate GPI using the irradiance components.Therefore, GHI, DHI and DNI correlations are usually empirically expressed as a decomposition model [5].
Indices are relationships between different irradiance components.Decomposition and transposition models utilise these relationships.
The definition of the direct beam transmittance K n and diffuse transmittance K d is Ref. [6] defines the K t as All K-values (K t , K n , and K d ) are unitless.
The extraterrestrial irradiance on a normal surface G 0n depends on the day of the year The solar constant is usually 1367 W/m 2 .Determining the horizontal extraterrestrial irradiance G 0h involves multiplying it by the cosine of θ Z as expressed in Equation (7): Multipredictor decomposition models can improve accuracy compared to single predictor models [7].However, the disadvantage is that multiple measurements must be available, which is not always the case for developing countries or brand-new sites of PV installations.
Decomposition models have been developed by assessing previous models and improving the accuracy of these estimations.As more data and measurements become available, researchers have the opportunity to develop models for different climates and temporal resolutions.Most models predominantly use K t .Some of the variables used in the decomposition models are the solar altitude angle β and dew point temperature T d .Using K t as the main predictor in decomposition models is popular because of its simplicity and applicability [7].
Ref. [15] developed a relationship between the K t and K d [15,16] extended the K t -K d relationship to latitudes from 31 to 42 • North [16].Ref. [17] established a GHI and DNI relationship for a Mediterranean site to estimate K n using K t [17].
The Direct Insolation Simulation Code (DISC) was developed by [18].Refs.[18,19] developed the Dirint model with the hopes of increasing the performance of the DISC model [19].The Dirint model of [19] has shown superior performance when estimating the DNI [20].
In Korea, ref. [21] developed a model using six Korean locations.The authors of [4,21] developed a new model using [18]'s DISC model by refitting the coefficients [4].Ref. [22] developed a DNI estimation model using the solar elevation angle for Norway based on hourly GHI and DHI records [22].
The main limitations of decomposition models are that some have a limited climate scope, and the dataset's temporal resolution affects the irradiance estimation accuracy.A decomposition model in a tropical climate may be unsuitable for a desert climate and vice versa.Intra-hourly-based models perform differently from daily or monthly models, which is why many available decomposition models exist.
The accuracy of decomposition models has been evaluated in several regions, such as Belgium [5], China [25], the USA [20], and North Africa [26].
Ref. [27] provided an extensive study of 140 available decomposition models.The authors state that the predicted DNI's accuracy highly depends on the decomposition model.Validation studies exist but are limited to a few models and test stations, i.e., biased to a specific location or climate [27].Research indicates that no decomposition model has been developed and validated for South Africa.
Ref. [28] state that, in general, decomposition models tend to overestimate DHI and underestimate DNI, and typically, models tend to underestimate DHI in overcast periods and overestimate during clear-sky periods [20].
Higher resolution data include higher K t values, resulting in extreme overestimations of DNI.These hourly DNI estimates have higher accuracy than 1-min DNI estimates.Subhourly estimations would be highly beneficial for the real-time monitoring and forecasting of solar power [27].
South African research on decomposition models includes the following: ref. [35] published the only Southern African-based study on the relationship between radiation and K t [35].However, this relationship is with photosynthetically active radiation related to agricultural practices, not PV systems.Clear-sky model assessments and validation studies have been performed by [36,37] for Southern African countries.Clear-sky models simplify atmospheric attenuation to estimate solar irradiance under clear-sky conditions, do not represent decomposition models, and do not include these studies as comparison models, as they are irrelevant to the research.
Ref. [38]'s thesis assessed decomposition and transposition models in South Africa and showed that the models tend to overestimate the DHI but underestimate the DNI [38].Furthermore, the DISC and Dirint decomposition models showed the most accurate estimations of the DNI and DHI for South African climatic conditions [39].
As discussed, decomposition models are empirical relationships between GHI, DHI and DNI.All three irradiance components are required to estimate GPI.Decomposition models are useful as they reduce the measurement equipment by decomposing one irradiance component into two others; for example, they use GHI to estimate DHI and DNI.
Most decomposition models are not universally applicable and localised to a specific climate, and the temporal resolution is not always transferable.There has not been extensive literature published representing the Southern African region in decomposition models, which this research article will attempt to address.

Model Development
The methodology to develop a novel decomposition model is based on selected data from the automated quality control (QC) procedure proposed in [40] and addresses three geographical models:

1.
A localised decomposition model, which is site-specific;

2.
A clustered decomposition model, which encapsulates several sites to group an area based on their geographical location; 3.
A regional (Southern African) model, which encapsulates the data from the SAURAN network for developing a model specific to Southern Africa.

SAURAN Database
The Southern African Universities Radiometric Network (SAURAN) is a network that includes multiple stations across Southern Africa, collecting meteorological data such as irradiance and wind, among others [41].Table 1 summarises the SAURAN stations' corresponding geographical information, such as latitude, longitude, and elevation above sea level.The data points are hourly measurements of the GHI, DNI, and DHI.The split of the train-validation-test datasets is 50:25:25, with the exceptions of two datasets, ILA and MIN.The ILA and MIN have a 0:0:100 data split and are two unknown datasets as part of the test study.
Table 3 also shows each station's mean GHI, DNI, and DHI, determined after applying the QC procedure. 1Daylight average, 2 Dataset size after quality control as in [40] and 0.175 ≤ K t ≤ 0.875.

Comparison Metrics
The comparison metrics are the root mean square error (RMSE), mean absolute error (MAE), and mean bias error (MBE).
In Equation ( 9), x i and y i represent the individual points with index i, and x and ȳ represent the mean of the x and y sample set.An r closer to -1 has a negative correlation, meaning if one variable increases, the other decreases.In contrast, if r is closer to 1, it has a positive correlation, meaning if one variable increases, the other would also [42].
Statistical indicators used for the comparison metrics are the MBE, RMSE, and MAE, which are all expressed as a percentage of the mean measured DNI [27] and R 2 .Further comparison metrics are two MAE K t -intervals: K t < 0.60 and K t ≥ 0.60.
The MBE indicates whether a model over-or underestimates the DNI, and the RMSE indicates the deviation of the errors.A significant difference between MAE and RMSE indicates a larger variance in the data.Lower RMSE and MAE are ideal, whereas an MBE closer to zero is optimal.The MAE is an unbiased estimator and also evaluates the two K t intervals.Lower and higher K t indicate overcast and clear-sky conditions, respectively.Therefore, the two K t intervals assess the models under varying weather conditions.

Regression and Fitting
The relationship between two variables is quantified using statistical methods like regression.Regression techniques can be linear, multi-linear and non-linear.
The definition of a linear relationship is where y is the response, x is the regressor, b 0 is the intercept, and b 0 is the slope.A regression analysis quantifies the strength of a relationship between y and x [42].
The least squares method estimates b 0 and b 1 so that the sum of the squares of the residuals is at a minimum.The residual sum of squares is denoted as SSE and is the sum of squares of the errors about the regression line.Thus, the minimisation of where ŷ denotes the predicted or fitted value.The coefficient of determination, R 2 , indicates how good the fit of a model is and is a number between zero and one.
A higher R 2 value indicates that the model explains the variation in the response variable around its mean, and the regression model fits the observation better [42].
Polynomial regression is the modelling of a dependent, y, as an n th -degree polynomial of Exponential regression is where the best fit of an equation is an exponential function, like y = a + bc x , ( 14) Multi-linear regression has multiple variables, which is the outcome of a response variable

Software Development Tools
The model development utilises a combination of data science applications and modelling.The primary tool is the open-source language Python with the anaconda interface [43], and various available libraries [44][45][46].

Baseline Models
Three comparative models are used as a baseline to compare the new models.Based on the literature, the DISC and Dirint models performed well for Southern African climates [39,47].
The Dirint [19] and Lee [4] models are also used for comparison because their foundation is similar to the DISC model [18].
The relative air mass (AM) is the dominant parameter affecting the relationship between K n and K t ; 2.
The physical model used to calculate K n will provide a physically based reference from which the changes in K n can be calculated (see Equation (20) below); 3.
Seasonal, annual and climate variations in the relationship between K n and K t are fully accounted for by parametric functions in K t that relate ∆K n to AM, cloud cover, and PW vapour.
AM is defined as [48]: The absolute AM (AM a ) is the pressurised normalisation of AM, expressed as where P refers to the atmospheric pressure at the test site and P o is the atmospheric pressure at sea level.The modelled DNI is determined using Equation (3): where and The clear-sky limit K nc is a polynomial in AM: Two K t intervals determine the coefficients a DISC , b DISC and c DISC : K t ≤ 0.60 and K t > 0.60.
Ref. [18]'s model possesses a different functional form because the quasi-physical approach is applied; therefore, it partially reflects the physics involved in the atmospheric transmission of solar radiation [4].The a DISC , b DISC and c DISC parameters were fitted based on solar radiation data from Atlanta, Georgia, USA, 1981 [18].Ref. [18] adopted the Bird clear-sky model for K nc (see Equation ( 22)).The parameters a DISC , b DISC and c DISC , as described in Equations ( 23) and (24), were then fitted based on the dataset.
The DISC model, termed 'quasi-physical', combines a clear-sky model with experimental fits for other sky conditions.The model is a clear-sky irradiance attenuated by a function of K t .The authors of [18] derived the empirical regressions from 12 years of recorded radiation data at 70 stations [5,18].
The Dirint model is based on the DISC model and was developed by [19].The goal was to improve the accuracy of the DISC model by [18].
The Dirint model uses a clearness index variation parameter K ′ t : Furthermore, a stability index parameter ∆K ′ t , considers the previous (i − 1), current (i) and next hourly (i + 1) record.When the preceding or hourly record is missing, ∆K ′ t is A low ∆K ′ t is a stable condition, whereas a high ∆K ′ t characterises unstable conditions, which allows the distinction between hazy and partly cloudy conditions.The T d is an adequate atmospheric PW estimator [19].The Dirint model's atmospheric PW (W) is estimated using The Dirint is a four-dimension conditional model, with the θ Z , K t , ∆K ′ t and W. Based on the four-dimensional model, the calculation of hourly DNI is where Coefficients a Dirint and b Dirint are from a complex lookup table.
Ref. [4] created a new model for Korea with the same format as [18]'s DISC model.
For K t ≤ 0.5 or K t > 0.5 The evaluation consists of comparing the localised, clustered and regional models against the three baseline models: DISC, Dirint and Lee.The DISC and Dirint models were selected based on their performance in estimating DNI for Southern African climates.The Lee and Dirint models have foundational similarities to the DISC model.These models consider whether the newly developed decomposition model improves the accuracy of hourly DNI estimations for Southern Africa.The accuracy evaluation uses the comparison metrics discussed in the next section.

Decomposition Model Development Methodology
The methodology builds on the DISC model.The DISC model stands out as one of the better-performing models for estimating DNI for South Africa [39].Its simplicity is evident in its lack of a need for a complex four-dimensional lookup table, unlike the Dirint model.
The original DISC model uses Equation ( 21), an exponential function.However, the regression model for an exponential function, as discussed in Section 2.3, showed difficulty in finding optimal a, b and c coefficients in all cases.Instead, a second-order polynomial function of AM is a suitable substitute with similar regression results.
The training set then fits a, b and c for intervals K t ≤ 0.60 and K t > 0.60: and the validation and testing sets evaluate the model's accuracy.Figure 3 summarises the development for each decomposition model in this study, where each model undergoes the following initial processing steps: 1.
Empirical formulae estimate θ Z , AM, pressure, I 0n , K t , and K n .From this, the assessment of available models aids in developing a new model.

2.
Data are split into intervals of 0.05 K t , starting from 0.175 to 0.875.
The interval or intervals are then fitted against the function to determine Equation (34) to determine the a, b and c coefficients using a least squares regression analysis.5.
From the K t -interval function, the a 0 -a 3 , b 0 -b 3 and c 0 -c 3 coefficients are fitted to a polynomial of Equation ( 35) with regards to K t .6.
These equations can be used to determine ∆K n and K n , which, in turn, calculate the DNI (see Equations ( 19) and ( 20)).
For each SAURAN station, a localised decomposition model is developed.A clustered decomposition model describes an area with similar irradiance patterns using the clustered areas discussed in [40].Ref. [49] first presented a two-cluster correlation map using the SAURAN database [49], and, by using this approach, this study formulated four clusters instead of two in Southern Africa, as shown in Figure 4a.  Figure 4a shows the clusters' geographical location, and Figure 4b shows the penetration levels of GHI.Table 4 shows the different clusters' training sets' mean GHI, DNI, and DHI.
Cluster 1 receives the most GHI and DNI, and Cluster 3 receives the least, as evident from Figure 4b.The different climates are also evident in these clusters: Cluster 3 is more humid and receives, on average, more DHI than Cluster 1.
Figure 5 shows how the cluster data are combined.Each cluster and the regional (Southern African) model are combined with even distributions of datasets to avoid introducing a bias, as some stations are over-represented in the original dataset.Some stations, such as the SUN, UPR and RVD stations, have considerably more data available as they are either older stations or have not been closed down.The different stations have varying climates, and therefore, a larger representation of one station will result in a biased model towards that station.The advantage of the even distribution is that every station is sufficiently represented and will not cause a model bias, but this reduces the amount of available data.Cluster 2's stations have higher elevation and summer humidity due to its warm, rainy summers and dry, cold winters.The expected annual irradiance levels are lower, as seen in Figure 4b.The stations have higher humidity because of their location and higher DHI levels.
The two stations in Cluster 2, UPR and CSIR, are expected to have more diffuse particles due to the higher air pollution levels and, therefore, higher DHI levels.Cluster 2 has a large bias of the data from Pretoria, South Africa, from the CSIR and UPR datasets.
Cluster 4 has lower annual irradiance levels, as seen in Figure 4b, and FRH and NMU are closer to the coastline, whereas GRT is inland.

Development of New Decomposition Models
This section discusses the newly developed a, b and c coefficients of Equation (34).The section consists of three subsections: 1.
The localised decomposition models, developed using the training dataset of the SAURAN station; 2.
The clustered decomposition models, which are modelled on the training data of all the stations within the cluster, as discussed in Figure 5; 3.
The regional model is modelled on all the stations' training data (Table 3).

Localised Decomposition Models
The localised decomposition model equations for the a, b and c coefficients are presented in Appendix A.

Cluster 1
Cluster 1 comprises the HLO, NUST, RVD, SUN and VAN datasets, as shown in Figures 4a and 5.
Figure 6 shows the Cluster 1 and five stations' a, b and c coefficients.The discussion of the different stations is in Appendix A under Appendices A.5 (HLO), A.11 (NUST), A.12 (RVD), A.13 (SUN) and A.19 (VAN).
The RVD model is the only model showing difficulty fitting the coefficients with K t .Table 3

Regional Decomposition Model
The regional (Southern African) decomposition model data are an even distribution of the SAURAN stations regarding the number of data points used per station.Multiple climates, different elevations, and various pollution levels are represented within the dataset, leading to a better decomposition model for Southern Africa and a regional application.
Figure 10 shows the coefficients a, b and c of the regional model and the four clusters.The coefficients for the regional model are a =

Results
Each station is discussed individually by assessing the dataset's comparison metrics: the R 2 -value, MBE, RMSE, MAE and MAE of two K t -intervals.The results compare the localised, clustered and regional (Southern African) models to the three baseline models, DISC, Dirint and Lee.The tables visualise the results for each station using red and green, with green denoting lower error and red denoting higher error.
Table 3 discusses the validation data.In the previous section, the localised, clustered, and regional models were empirically determined.Appendix A expands on the equations for the localised models.
Sections 3.2 and 3.3 discussed the clustered and regional models.The test data also introduce two unknown datasets, the ILA and MIN datasets.These datasets assess the models with new data for the developed models.ILA and MIN have no localised model, but geographically, they fall within a cluster: ILA falls under Cluster 1 and MIN under Cluster 2.

CSIR
Appendix A.1 shows the decomposition model equations for the CSIR station.Table 5 shows the results of the CSIR station.The results show that the localised, Cluster 2 and regional models outperform the baseline models in all metrics.The localised model significantly improves for lower K t , reducing the MAE from around 60% to 36%.
The test results of the CSIR dataset are presented in Figure A1.As seen in the figure, the localised, cluster, and regional models outperform the baseline models, which is consistent with the validation results of the previous section in Table 5.  6 shows the CUT station results.The localised Cluster 2 and regional model significantly improve the comparison metrics over the three baseline models.The Lee model has a similar MBE to the regional model (±0.7) and has a higher K t -metric similar to Cluster 2. However, the Lee RMSE and MAE still do not outperform the new models.Figure A2 presents the test results of the CUT dataset, where the localised, cluster and regional models outperform the baseline models.The test results are consistent with the validation results presented in Table 6.

FRH
Appendix A.3 shows the decomposition model equations for the FRH station.Table 7 shows the results of the FRH station.The localised model outperforms the baseline models by improving R 2 and MBE and reducing MAE and RSME.The Lee model shows the lowest MAE for higher K t -values; however, it does show an overestimation for DNI with a higher MBE.For most metrics, the localised, Cluster 4 and regional model outperforms the baseline models.Figure A3 presents the test results of the FRH dataset.The localised Cluster 2 and regional models outperform the baselines, but no significant difference exists between the three new models.The test results presented in Figure A3 correspond with the validation results in Table 7.

GRT
Appendix A.4 shows the decomposition model equations for the GRT station.Table 8 shows the GRT station results.The localised model does show improvement over the DISC and Dirint model but does not significantly outperform the Lee model.The Lee model has a higher R 2 and lower MBE and RMSE, whereas the localised model has a lower MAE for the entire dataset and the two K t intervals.The Cluster 4 and regional models perform better than the DISC and Dirint models but do not significantly outperform all the baseline models.Figure A4 shows the test results of the GRT dataset.The results correspond with the validation results in Table 8.The localised, Cluster 4 and regional model does outperform the DISC and Dirint model but only marginally outperforms the Lee model.9 shows the HLO station results.The localised model performs better than the baseline models and improves all comparison metrics.Figure A5 shows the test results of the HLO dataset.The validation results in Table 9 and the test results correspond well, indicating that the localised Cluster 1 and regional models outperform the baseline models.
4.1.6.ILA Figure A6 presents the test results of the ILA dataset.The ILA dataset has no localised decomposition model; therefore, the testing only assesses the Cluster 1 and regional models.The results show that the Cluster 1 and regional models outperform the baseline models.
The results highlight the substitution of using a Cluster model when no localised model is available, subject to the geographical location within the Cluster area.

KZH
Appendix A.7 shows the decomposition model equations for the KZH station.Table 10 shows the results of the KZH station.The localised, Cluster 3 and regional models all show significant improvements in reducing the error over the baseline models.The DISC has a lower MBE than the regional model.Figure A7 shows the test results of the KZH dataset.The localised, Cluster 3 and regional models all outperform the baseline models.The regional model does not significantly outperform Cluster 3 or the localised model.

KZW
Appendix A.8 shows the decomposition model equations for the KZW station.Table 11 shows the results of the KZW station.The localised, clustered, and regional models show improvement over the baseline models with metrics that assess the entire dataset.Figure A8 shows the test results of the KZW dataset.The validation and testing results from Table 11 correspond.

MIN
The test results of the MIN dataset are presented in Figure A9.MIN has no localised decomposition model and falls geographically under Cluster 2. The cluster model and localised model show improvement over the baseline models.Much like the ILA dataset, the MIN dataset demonstrates how the clustered and regional models can serve as alternatives to enhance DNI estimations in Southern Africa.
Appendix A.10 shows the decomposition model equations for the NMU station.Table 12 shows the NMU station results.The localised, Cluster 4 and regional models show significant improvement in reducing the errors from the baseline models.Based on the higher MBE, the Cluster 4 and regional models overestimate the DNI more than the DISC and Dirint models.The test results of the NMU dataset are presented in Figure A10.Localised and cluster models outperform baseline models, which is consistent with the results in Table 12.
4.1.10.NUST Appendix A.11 shows the decomposition model equations for the NUST station.Table 13 shows the results of the NUST station.The localised model shows superior performance over the baseline models, as well as the clustered and regional models.The metrics of the clustered model compared to the baselines indicate that the regional model slightly overestimates the DNI compared to the lowest baseline model (Lee), which slightly underestimates the DNI.The test results of the NUST dataset are presented in Figure A11.Localised, clustered and regional models outperform the baseline models, which is consistent with the validation results presented in Table 13.The regional model shows marginal underperformance compared to the localised and Cluster 1 model, but not significant enough to make it unusable.4.1.11.RVD Appendix A.12 shows the decomposition model equations for the RVD station.Table 14 shows the RVD station results.The localised, clustered and regional models outperform the baseline models.The Lee model performs better than the regional model but does not outperform the localised and cluster models.The RVD station receives more irradiance on average than other stations in the SAURAN database.Figure A12 shows the test results of the RVD dataset.The results indicate that the localised, cluster and regional models outperform the baseline models, which is consistent with the validation results of the previous section in Table 14.The localised model's RMSE is higher than the Lee model; however, the localised model performs best in reducing the error for the other metrics.Though the regional model outperforms the baseline models, it does show the worst performance of the three newly developed models for RVD.4.1.12.SUN Appendix A. 13 shows the decomposition model equations for the SUN station.Table 15 shows the SUN station results.The localised model outperforms the baseline models by improving R 2 and reducing the MBE, RMSE and MAE.The Cluster 1 and regional models show a slightly worse MBE than the Lee baseline model but otherwise outperform the baseline models.The Lee model also predicts higher K t points with a lower MAE than the regional model; however, the other metrics indicate that the regional model shows better results overall.Figure A13 shows the test results of the SUN dataset.The results indicate that the localised, cluster and regional models outperform the baseline models, which is consistent with the testing results of the previous section in Table 15.As with the validation results, the regional model is the worst-performing new model, but it still outperforms the baseline models.
4.1.13.UBG Appendix A.14 shows the decomposition model equations for the UBG station.Table 16 shows the UBG station results.The localised, clustered and regional models all outperform the baseline models.The Lee model has a lower MBE than the Cluster 1 and regional models.The Lee model also has a lower MAE for K t ≥ 0.60; however, the other metrics indicate that the model does not improve the R 2 , RMSE, overall MAE or K t < 0.60 MAE. Figure A14 shows the test results of the UBG dataset.The test results show that the localised, Cluster 1 and regional models outperform the baseline models.The Lee model has a lower MBE than the regional model, which is consistent with the validation results in Table 16.17 shows the UFS station results.The localised, Cluster 2 and regional models outperform the baseline models.The Lee model underestimates the DNI slightly better than the Cluster 2 model.Figure A15 shows the test results of the UFS dataset.All three new decomposition models significantly improve the errors compared to the baseline models, which is consistent with the validation results in Table 17 18 shows the UNV station results.The localised, Cluster 2 and regional models significantly improved over the baseline models.The Cluster 2 and regional model overestimates the DNI more than the DISC model, based on the MBE. Figure A16 shows the test results of the UNV dataset.The test results correspond with the validation results in Table 18, where the localised, Cluster 2 and regional models outperform the baseline models.The only exception is the MBE, where the Cluster 2 and regional models perform worse than the DISC model.Considering all the metrics, the new models outperform the baselines in reducing the overall error of DNI estimations.

UNZ
Appendix A.17 shows the decomposition model equations for the UNZ station.Table 19 shows the results of the UNZ station.The localised, clustered and regional models all show improvement over the baselines.The Dirint model has a lower MBE than the regional model.Figure A17 shows the test results of the UNZ dataset, which correspond with the validation results in Table 19.
4.1.17.UPR Appendix A.18 shows the decomposition model equations for the UPR station.Table 20 shows the UPR station results.The localised, cluster and regional models outperform the baseline models.Figure A18 shows the test results of the UPR dataset.The comparison metrics of the entire dataset indicate that the localised, cluster and regional models outperform the baseline models, which is consistent with the results of the validation dataset in Table 20 21 shows the results of the VAN station.The localised model outperforms the baseline models by improving R 2 and reducing the MBE, RMSE and MAE.The new models significantly reduce the MAE in the higher K t compared to the baseline models.Even when outperforming the baseline models, the regional model performs worse than the new model.The VAN station receives a very high average DNI and DHI and lower DHI than the rest of the database's stations.These results are similar to the RVD station, which has significantly higher irradiance levels than the other stations.Figure A19 shows the test results of the VAN dataset.The new models all outperform the baseline models.The regional model shows the worst performance of the new models, even when outperforming the baseline models, similar to the RVD station that receives more irradiance on average compared to the other stations.These results are consistent with the validation results in Table 21.

Discussion
Table 22 summarises the performance of the localised, clustered and regional models for both the test and validation sets.
As expected, the localised models outperformed the baseline models for all station datasets because of the site-specific climatic training data.As discussed in the previous section, the cluster model combines multiple stations in a similar geographical area.
The clustered model will have significantly more data from which to train a model.A clustered model is ideal if a site has no data for localised model development using the discussed methodology.The regional (Southern African) model also shows improvement over the baseline models, indicating that this model may be appropriate for adoption as a new model for Southern Africa.The two models with no localised models (ILA and MIN) showed improvement using the clustered and regional models in the validation study.

Conclusions
This article presents the development of a new decomposition model of hourly DNI estimations for Southern Africa using the SAURAN database.The new models improved the DISC model in developing new decomposition models localised for Southern African climates.The new decomposition models significantly improved the DNI estimation errors over the baselines for all the SAURAN stations' validation and test data sets.The proposed methodology can be helpful for the development of local decomposition models for other areas worldwide.
The results indicate that a localised model will improve the estimations of DNI.Clustered models also indicate that grouping data based on similar geographical and climatic properties can also improve the performance of decomposition models.This phenomenon could be helpful when using a clustered decomposition model if no local model or limited data are available but are available from two or more geographically close stations.
The overall model, the regional decomposition model, is encapsulated by different climatic regions and geographical locations.There are also some exceptions where the model over-or underestimates the DNI; however, the overall metrics indicate that the Southern African model significantly improves over the baseline models.
The validation study conducted at two stations further substantiates the superiority of the developed hourly decomposition models over comparative models, emphasising their robustness and applicability in realistic scenarios where the localised models are usually not unavailable.The study validates hourly irradiance data for K t intervals between 0.175 and 0.875.Recommendations for future work include developing models for higher and lower K t values and models for higher temporal resolutions with increased accuracy, which is ideal for the real-time monitoring and short-term forecasting of PV power.
Developing countries with an accurate decomposition model can open the path to expanding the use of renewable energy sources and reducing their dependence on coal and fossil fuels.Good-quality data are needed to ensure the progress of research and development related ot solar energy, and this paper addresses the shortcomings of the Southern African region's lack of accurate solar resource representation in existing models.The next step is assessing the decomposition models with well-known transposition models to determine improved accuracy for PR estimations.

Figure 2 .
Figure 2. Validation sites of discussed decomposition models.

Figure 3
Figure 3 details an overview of the decomposition model development, which is discussed in this section.The first part is ensuring the data are quality-controlled, followed by the development of the decomposition model.The accuracy of the models are rigorously scrutinised to determine if the proposed models outperform the baseline models.

where x i
is the measured value and xi is the predicted value.A low RMSE and MAE indicate a good model, whereas an MBE should be closer to zero.RMSE indicates the concentration of data around the line of best fit.Therefore, a smaller RMSE is indicative of a more accurate model.The Pearson correlation coefficient r indicates the correlation between data:

Figure 5 .
Figure 5. Distribution of data within clusters.

Figures 6 -
show the different corresponding clusters' model coefficients.

4. 1
.5.HLO Appendix A.5 shows the decomposition model equations for the HLO station.Table

4. 1 .
14. UFS Appendix A.15 shows the decomposition model equations for the UFS station.Table FigureA15shows the test results of the UFS dataset.All three new decomposition models significantly improve the errors compared to the baseline models, which is consistent with the validation results in Table17.

4. 1 .
15. UNV Appendix A.16 shows the decomposition model equations for the UNV station.Table FigureA18shows the test results of the UPR dataset.The comparison metrics of the entire dataset indicate that the localised, cluster and regional models outperform the baseline models, which is consistent with the results of the validation dataset in Table20.

4. 1 .
18. VAN Appendix A.19 shows the decomposition model equations for the VAN station.Table

Table 2
summarises the SAURAN database and dataset sizes, followed by Table3which shows the subsequent data points available for model development.Further, the data points assessed are K t between 0.175 and 0.875.

Table 3 .
Model development stations indicating the mean GHI, DNI, and GHI and sizes of training, validation, and testing sets.
4Mean values of training set.
indicates that the RVD station has the highest mean DNI and GHI, with the lowest DHI measurements, compared to the rest of Cluster 1's stations.Cluster 2 consists of the CSIR, CUT, UBG, UFS, UPR, and UNV datasets.Figure7shows the Cluster 2 and six stations' a, b and c coefficients.The discussion of the different stations are in Appendix A under Appendices A.1 (CSIR), A.2 (CUT), A.14 (UBG), A.15 (UFS), A.18 (UPR) and A.16 (UNV).The UFS have the greatest deviation from the Cluster 2 fit.Cluster 4 consists of the NMU, FRH and GRT datasets.Figure9shows the Cluster 4 and three stations' a, b and c coefficients.

Table 5 .
Hourly validation results of a decomposition model development for CSIR.

Table 6 .
Hourly validation results of decomposition model development for CUT.

Table 7 .
Hourly validation results of decomposition model development for FRH.

Table 8 .
Hourly validation results of decomposition model development for GRT.

Table 9 .
Hourly validation results of decomposition model development for HLO.

Table 10 .
Hourly validation results of the decomposition model development for KZH.

Table 11 .
Hourly validation results of decomposition model development for KZW.

Table 12 .
Hourly validation results of decomposition model development for NMU.

Table 13 .
Hourly validation results of decomposition model development for NUST.

Table 14 .
Hourly validation results of decomposition model development for RVD.

Table 15 .
Hourly validation results of decomposition model development for SUN.

Table 16 .
Hourly validation results of decomposition model development for UBG.

Table 17 .
Hourly validation results of decomposition model development for UFS.

Table 18 .
Hourly validation results of decomposition model development for UNV.

Table 19 .
Hourly validation results of the decomposition model development for UNZ.

Table 20 .
Hourly validation results of decomposition model development for UPR.

Table 21 .
Hourly validation results of decomposition model development forVAN.

Table 22 .
Summary of test and validation sets of stations outperforming baseline models.