A New Model for Estimating the Diffuse Fraction of Solar Irradiance for Photovoltaic System Simulations

We present a new model for the calculation of the diffuse fraction of the global solar irradiance for solar system simulations. The importance of an accurate estimation of the horizontal diffuse irradiance is highlighted by findings that an inaccurately calculated diffuse irradiance can lead to significant overor underestimations in the annual energy yield of a photovoltaic (PV) system by as much as 8%. Our model utilizes a time series of global irradiance in one-minute resolution and geographical information as input. The model is validated by measurement data of 28 geographically and climatologically diverse locations worldwide with one year of one-minute data each, taken from the Baseline Surface Radiation Network (BSRN). We show that on average the mean absolute deviation of the modelled and the measured diffuse irradiance is reduced from about 12% to about 6% compared to three reference models. The maximum deviation is less than 20%. In more than 80% of the test cases, the deviation is smaller 10%. The root mean squared error (RMSE) of the calculated diffuse fractions is reduced by about 18%.


Introduction
Adapting the common terminology in energy meteorology to differentiate between the power and energy of the solar radiation, the word 'irradiance' is used in this work to denote the instantaneous solar power per square meter in W/m 2 , whereas the word 'irradiation' refers to the integral of the irradiance over time, thus denoting the energy of the solar radiation in Ws/m 2 or kWh/m 2 [1].
In photovoltaic (PV) system simulations, the global horizontal irradiance and the ambient temperature are the two most important inputs in order to determine the PV system's energy output. The global horizontal irradiance is split up in its direct and diffuse components. These components are then separately translated to a tilted plane if the PV system in question has a module orientation that differs from the horizontal plane. In simple terms the global irradiance incident on a tilted module is then calculated as the sum of the direct and the diffuse irradiance on the tilted plane.
The model for estimating the diffuse fraction of the global horizontal irradiance is hence the first element in a chain of models that is necessary to simulate the electrical output of a PV system. As such, it has strong influence on the final output of the simulation, which is demonstrated by the following comparative simulations for two locations: Lindenberg, Germany and Gobabeb, Namibia. The analysed PV system is a standard 8 kWp (kilo Watt peak) grid connected system, the simulation is conducted in one-minute resolution with measurement data from the Baseline Surface Radiation Network (BSRN) [2].
The rest of the PV system model chain is then simulated with the help of the simulation core of PV software provider Valentin Software (Berlin, Germany) [5]. Table 1 lists the results of the comparison.
During these four days, the total PV energy yield would be 65.6 kWh when using the measured horizontal diffuse irradiance values. With the diffuse irradiance modelled by Reindl et al. [3], the total PV energy yield is only 60.2 kWh-an underestimation of 8.3%. The comparison was also conducted for the whole year 2003 in Lindenberg, where the annual deviation of the modelled diffuse irradiance is −7.2%, leading to a deviation of the annual PV energy output of −2.7%.
The second half of Table 1 lists the results of the same comparison that was conducted for the location of Gobabeb in Namibia, for the year 2014. Here, the deviation of the annual diffuse irradiation is as high as 42% which leads to an overestimation of the global irradiation of the module surface of 8.3% and to an overestimation of the annual PV energy yield of 7.6%.
These examples highlight the importance of a more accurate estimation of the horizontal diffuse irradiance. An inaccurately calculated diffuse irradiance can lead to significant over-or underestimations in the annual energy yield of a PV system. This is especially relevant in the price sensitive market of PVs, where only few percent more or less of PV energy output can render a project possible or uneconomical [6]. Table 1. Measured and modelled diffuse irradiation for Lindenberg, Germany (LIN), and Gobabeb, Namibia (GOB). The top section refers to the figures above, time ranges from 10-13 May 2003. The modelled underestimation of the diffuse irradiation by −17.9% leads to an underestimation of the global irradiation on the tilted PV module by −9.3%, hence leading to an underestimation of the simulated PV energy yield by 8.3%. When considering the whole year (2nd section), the modelled diffuse irradiation differs from the measured value by −7.2%, leading to an underestimation of the irradiation on module surface by −3.1%. The difference in the annual PV energy yield is −2.7%. In the 3rd and 4th section, the results of the same analysis are presented for Gobabeb, Namibia. Here, the PV module faces North and is tilted at 23 • . The modelled sum of diffuse irradiation for selected days (23)(24)(25)(26) July 2014) is 20.4% higher than the measured sum, leading to a deviation in the PV energy of 4.4%. Over the whole year of 2014, the deviation of the diffuse irradiation is even 42%, which causes a difference in the annual PV yield of 7.6%.

Measurement Data and Methodology
In the following section the measurement data and the methodology used in this contribution are presented.

Data basis (Baseline Surface Radiation Network)
As a source of high quality measurement data the data base of the BSRN is used [2]. The BSRN comprises 59 stations worldwide, 44 of which provide one-minute measurements of global horizontal and diffuse horizontal irradiance. The time range of the measurements starts in 1992 for the first stations and is still running until now. For this study the following criteria were applied for selecting the datasets:

•
High annual completeness of one-minute measurements of global and diffuse irradiance; • Between 60 • North and −60 • South; • No leap years. Table 2 gives an overview of the locations and years that were used for validation. In total, 28 locations with one year of measurement each were selected. The datasets feature a high geographic and climatological diversity. The last column of the table lists the annual completeness of the measurements (ACM) in %. The validation datasets comprise more than seven million data points (nights omitted) on which the following analysis is based. Table 2. Overview over the 28 datasets that were used in this work. The locations are spread over the globe between −45 • South and 52 • North. Height above sea level, surface, topography and climate zones (according to Köppen [7]) show a high level of variation. The years of measurement were chosen to provide a high annual completeness of measurement (ACM), i.e., as few missing data points as possible. The total resulting amount of data points that is used in the following analysis exceeds 14.7 million (or approximately 7 million when omitting night time).

Description of Quantities and Models
For the calculation of the position of the sun, the solar position algorithm provided by the National Renewable Energy Laboratory (NREL; Golden, CO, USA) is used [8]. The clear sky irradiance E clear is calculated on the basis of an adaption of the approach of Bourges [9]: (1) where γ S is the elevation of the Sun and E ext is the extra-terrestrial irradiance. The extra-terrestrial irradiance was calculated using Maxwell's approach [10]. In 2014 this formula was identified as the most accurate for another subset of BSRN data by Hofmann et al. [11]. The clearness index kt is the fraction of the measured global irradiance to the clear sky irradiance: In the models for calculating the diffuse fraction of the global irradiance that are presented in Section 2.3 a simpler approach of calculating the clear sky index is used: This causes the kt value at clear sky to be around 0.8 instead of 1 in the existing models. For the comparison of the results in Section 4, the calculation of the clearness index occurs according to the respective model description.
All PV system simulations are conducted using the simulation core of PV*SOL, a commercial PV system planning and simulation software by Valentin Software. More information about the models that are relied on in the simulation core can be found at PV*SOL [5].

Presentation of Existing Models
For the estimation of the diffuse fraction of the global horizontal irradiance, several algorithms were developed in the past. Most of them can be categorized as models with one or two parameters as input. The one-parameter models feature a simple dependency of the diffuse fraction (d f ) on the clearness index (kt), cf. Figure 2.

Description of Quantities and Models
For the calculation of the position of the sun, the solar position algorithm provided by the National Renewable Energy Laboratory (NREL; Golden, CO, USA) is used [8]. The clear sky irradiance clear is calculated on the basis of an adaption of the approach of Bourges [9]: where γ S is the elevation of the Sun and ext is the extra-terrestrial irradiance. The extra-terrestrial irradiance was calculated using Maxwell's approach [10]. In 2014 this formula was identified as the most accurate for another subset of BSRN data by Hofmann et al. [11]. The clearness index is the fraction of the measured global irradiance to the clear sky irradiance: In the models for calculating the diffuse fraction of the global irradiance that are presented in Section 2.3 a simpler approach of calculating the clear sky index is used: This causes the value at clear sky to be around 0.8 instead of 1 in the existing models. For the comparison of the results in Section 4, the calculation of the clearness index occurs according to the respective model description.
All PV system simulations are conducted using the simulation core of PV*SOL, a commercial PV system planning and simulation software by Valentin Software. More information about the models that are relied on in the simulation core can be found at PV*SOL [5].

Presentation of Existing Models
For the estimation of the diffuse fraction of the global horizontal irradiance, several algorithms were developed in the past. Most of them can be categorized as models with one or two parameters as input. The one-parameter models feature a simple dependency of the diffuse fraction ( ) on the clearness index ( ), cf. Figure 2.  These models typically define three functions for different ranges of kt. These kt ranges can be more or less referred to as different cloud situations. A kt value of less than 0.4 means only 40% of the possible global irradiance is measured, which is a good indicator for overcast skies. The maximum kt value that is detected at clear sky conditions is around 0.78-0.8. In between those areas, i.e., for kt values between 0.4 and 0.78, broken cloud situations are most likely [1,12]. Values of kt > 1 are possible due to broken cloud enhancement, firstly stated for the ultraviolet by Nack and Green [13] and later confirmed by Seckmeyer et al. [14]. Values of kt > 1 are possible at all wavelengths of the solar spectrum.
The first model, a one-parameter approach, was presented by Liu and Jordan in 1960 [15], but it soon became apparent that it was not able to produce good results in other locations than it was designed for (Blue Hill, MA, USA) [16,17].
In consequence, other models were developed that can also be categorized as one-parameter models: Orgill and Hollands [18], Erbs, Klein and Duffie [19], Reindl, Beckman and Duffie [3] and Boland and Ridley [20]. A schematic overview of those models is provided in Figure 2, along with sample measurement data of Lindenberg, Germany, 2003, for reference. Other one-parameter approaches include the model by Oliveira et al. [21] that provides varying clearness index polynomials for three periods per year (December-January, April-August and September-March).
Another category of algorithms is formed by the two-parameter models that-in addition to the clearness index kt-also make use of the sun height, γ S . Two-parameter models include the approaches by Reindl, Beckman and Duffie [3], Skartveit and Olseth [12] and Maxwell [10]. The reduced version of the two-parameter model of Reindl, Beckman and Duffie [3] is presented in Figure 3. These models typically define three functions for different ranges of . These ranges can be more or less referred to as different cloud situations. A value of less than 0.4 means only 40% of the possible global irradiance is measured, which is a good indicator for overcast skies. The maximum value that is detected at clear sky conditions is around 0.78-0.8. In between those areas, i.e., for values between 0.4 and 0.78, broken cloud situations are most likely [1,12]. Values of > 1 are possible due to broken cloud enhancement, firstly stated for the ultraviolet by Nack and Green [13] and later confirmed by Seckmeyer et al. [14]. Values of > 1 are possible at all wavelengths of the solar spectrum. The first model, a one-parameter approach, was presented by Liu and Jordan in 1960 [15], but it soon became apparent that it was not able to produce good results in other locations than it was designed for (Blue Hill, MA, USA) [16,17].
In consequence, other models were developed that can also be categorized as one-parameter models: Orgill and Hollands [18], Erbs, Klein and Duffie [19], Reindl, Beckman and Duffie [3] and Boland and Ridley [20]. A schematic overview of those models is provided in Figure 2, along with sample measurement data of Lindenberg, Germany, 2003, for reference. Other one-parameter approaches include the model by Oliveira et al. [21] that provides varying clearness index polynomials for three periods per year (December-January, April-August and September-March).
Another category of algorithms is formed by the two-parameter models that-in addition to the clearness index -also make use of the sun height, γ S . Two-parameter models include the approaches by Reindl, Beckman and Duffie [3], Skartveit and Olseth [12] and Maxwell [10]. The reduced version of the two-parameter model of Reindl, Beckman and Duffie [3] is presented in Figure 3. Not classifiable as one-or two-parameter model is the noteworthy approach by Furlan et al. [22] who developed a multi-parameter regression model for data from Sao Paolo, Brazil. Another important contribution was achieved by the model by Perez and Ineichen [23], which features a dynamic time-series approach to model the direct normal from the global irradiance based on the DISC model by Maxwell [10].
A good overview and an approach of global validation of the above mentioned models for calculating the diffuse fraction, also using BSRN data, is given by Zernikau [17]. In this thesis it was also shown that all analysed one-and two-parameter models showed relative mean absolute errors (rMAE) of (10.4 ± 0.4)% for the 24 BSRN locations that were included in the study. The author also analysed the minimal rMAE that can be achieved with a one-or two-parameter model by generating global medians of measurements of the diffuse fraction and the clearness index. According to this study, the minimal globally achievable rMAE for any two-parameter model is 8.9%. Not classifiable as one-or two-parameter model is the noteworthy approach by Furlan et al. [22] who developed a multi-parameter regression model for data from Sao Paolo, Brazil. Another important contribution was achieved by the model by Perez and Ineichen [23], which features a dynamic time-series approach to model the direct normal from the global irradiance based on the DISC model by Maxwell [10].
A good overview and an approach of global validation of the above mentioned models for calculating the diffuse fraction, also using BSRN data, is given by Zernikau [17]. In this thesis it was also shown that all analysed one-and two-parameter models showed relative mean absolute errors (rMAE) of (10.4 ± 0.4)% for the 24 BSRN locations that were included in the study. The author also analysed the minimal rMAE that can be achieved with a one-or two-parameter model by generating global medians of measurements of the diffuse fraction and the clearness index. According to this study, the minimal globally achievable rMAE for any two-parameter model is 8.9%. Another one-location comparison of the models was conducted for Athens, Greece, by Kambezidis [24]. Similarly, a study comparing ten models was presented by Jacovides et al. for validation data of Athalassa, Cyprus [25]. A model-to-model comparison for Hong Kong without validation on measurement data is provided by Wong [26]. A comparison of eight models for the location of Vienna, Austria, was conducted by Dervishi [27], resulting in findings that are in general agreement to the above mentioned studies.

Presentation of a New Model for the Diffuse Fraction of Solar Irradiance
In order to reduce the uncertainty of PV system simulations, a new model for calculating the diffuse fraction of global horizontal irradiance is presented in this section. The model consists of three parts that are calculated independently and then combined depending on statistic features of the clearness index. Each part is presented and afterwards the combination of the three parts into a single resulting diffuse fraction d f is explained.

Part One. Diffuse Fraction as Function of Clearness Index
Like existing models with one parameter, this part of the new model makes use of the relation between the clearness index kt and the diffuse fraction. Instead of parameterized functions, a matrix of probabilities is utilized. For the generation of the matrix, the one-minute time series of global and diffuse horizontal irradiance are converted into value pairs of the clearness index kt and the diffuse fraction d f , following the equations in Section 2.2. Each value pair is then stored into a matrix with kt ranging from 0 to 1.5 and d f ranging from 0 to 1, both with a step size of 0.01. The frequency of occurrence of a specific d f value for a given kt value is then converted into a probability value, so that for every value of kt a function of cumulated probabilities can be calculated. Another one-location comparison of the models was conducted for Athens, Greece, by Kambezidis [24]. Similarly, a study comparing ten models was presented by Jacovides et al. for validation data of Athalassa, Cyprus [25]. A model-to-model comparison for Hong Kong without validation on measurement data is provided by Wong [26]. A comparison of eight models for the location of Vienna, Austria, was conducted by Dervishi [27], resulting in findings that are in general agreement to the above mentioned studies.

Presentation of a New Model for the Diffuse Fraction of Solar Irradiance
In order to reduce the uncertainty of PV system simulations, a new model for calculating the diffuse fraction of global horizontal irradiance is presented in this section. The model consists of three parts that are calculated independently and then combined depending on statistic features of the clearness index. Each part is presented and afterwards the combination of the three parts into a single resulting diffuse fraction is explained.

Part One. Diffuse Fraction as Function of Clearness Index
Like existing models with one parameter, this part of the new model makes use of the relation between the clearness index and the diffuse fraction. Instead of parameterized functions, a matrix of probabilities is utilized. For the generation of the matrix, the one-minute time series of global and diffuse horizontal irradiance are converted into value pairs of the clearness index and the diffuse fraction , following the equations in Section 2.2. Each value pair is then stored into a matrix with ranging from 0 to 1.5 and ranging from 0 to 1, both with a step size of 0.01. The frequency of occurrence of a specific value for a given value is then converted into a probability value, so that for every value of a function of cumulated probabilities can be calculated.  For each value of , this matrix describes the probability with which a certain value of diffuse fraction will occur. The matrix correlates with the existing simple one-parameter models mentioned in Section 2.3, but it is based on measurements. Therefore the natural variability is better described by the model especially for high values of ( > 1.1, irradiance enhancement due to reflections by broken clouds) while preserving the strong relation at low levels of ( < 0.4, overcast sky).

Figure 4.
A probability matrix of the diffuse fraction as a function of the clearness index kt. For each value of kt, this matrix describes the probability with which a certain value of diffuse fraction will occur. The matrix correlates with the existing simple one-parameter models mentioned in Section 2.3, but it is based on measurements. Therefore the natural variability is better described by the model especially for high values of kt (kt > 1.1, irradiance enhancement due to reflections by broken clouds) while preserving the strong relation at low levels of kt (kt < 0.4, overcast sky). In order to determine a value for d f for a given kt, the procedure is as follows. Since this is the first part of the new model, the diffuse fraction of this part is referred to as d f 1 : (1) Select column of probability matrix that corresponds to the kt value; (2) Generate a Markov number r M (randomized number between 0 and 1) [28]; (3) Select the row where r M is smaller than the cumulated probabilities for the first time; (4) The selected row corresponds to d f 1 value.
The usage of real measurement values, incorporated into a matrix of probabilities, holds the advantage of preserving the natural relationship of the diffuse fraction and the clearness index and additionally resulting in a more realistic variability of the modelled d f value series.

Part Two. Change of df as Function of Change of kt
By analyzing the extensive BSRN measurement database, a strong correlation has been found between the relative changes of the clearness index (from one minute to the next) to changes of the diffuse fraction. In Figure 5 this correlation is shown for Lindenberg, Germany, for the year 2003.
Energies 2017, 10,248 8 of 20 In order to determine a value for for a given , the procedure is as follows. Since this is the first part of the new model, the diffuse fraction of this part is referred to as : (1) Select column of probability matrix that corresponds to the value; (2) Generate a Markov number M (randomized number between 0 and 1) [28]; (3) Select the row where M is smaller than the cumulated probabilities for the first time; (4) The selected row corresponds to value.
The usage of real measurement values, incorporated into a matrix of probabilities, holds the advantage of preserving the natural relationship of the diffuse fraction and the clearness index and additionally resulting in a more realistic variability of the modelled value series.

Part Two. Change of df as Function of Change of kt
By analyzing the extensive BSRN measurement database, a strong correlation has been found between the relative changes of the clearness index (from one minute to the next) to changes of the diffuse fraction. In Figure 5 this correlation is shown for Lindenberg, Germany, for the year 2003. . This strong relation is very valuable for modelling a realistic behaviour of the diffuse fraction over the day, since it depends highly on the behaviour of . The area where the change of df is 0 while kt shows relatives changes between −0.5 and 0.5, i.e., is changing while is not, indicates days with movement of broken clouds, the reflection on which couses the measured global irradiance to change rapidly without changing its diffuse fraction.
It was observed that for positive relative changes of (when the current is higher than the one minute before), the diffuse fraction will most likely show a negative relative change. If the relative change of is negative, the change of the diffuse fraction will be positive. There are situations, however, where is changing from one minute to the next without an observable change of (compare the horizontal value accumulation at = 0). These situations are typical for days with rapid irradiance enhancements due to moving broken clouds. In correspondence to part 1, the relationship between the relative change of and the relative change of kt ( ) is also expressed in a matrix of probabilities, displayed in Figure 6. This matrix is only used in the range of −0.5 to 1, since the amount of measurement values outside of this range is too small, which results in unwanted noise. The procedure to retrieve a value for the diffuse fraction in this part, , is as follows: (1) Calculate the relative change of as: (2) For greater than −0.5 and smaller than 1 Figure 5. Scatter plot of the relative changes of the diffuse fraction over the relative changes of the clearness index for Lindenberg, Germany, 2003. This strong relation is very valuable for modelling a realistic behaviour of the diffuse fraction over the day, since it depends highly on the behaviour of kt. The area where the change of df is 0 while kt shows relatives changes between −0.5 and 0.5, i.e., d f is changing while kt is not, indicates days with movement of broken clouds, the reflection on which couses the measured global irradiance to change rapidly without changing its diffuse fraction.
It was observed that for positive relative changes of kt (when the current kt is higher than the kt one minute before), the diffuse fraction will most likely show a negative relative change. If the relative change of kt is negative, the change of the diffuse fraction will be positive.
There are situations, however, where kt is changing from one minute to the next without an observable change of d f (compare the horizontal value accumulation at d d f = 0). These situations are typical for days with rapid irradiance enhancements due to moving broken clouds.
In correspondence to part 1, the relationship between the relative change of d f and the relative change of kt (d kt ) is also expressed in a matrix of probabilities, displayed in Figure 6. This matrix is only used in the d kt range of −0.5 to 1, since the amount of measurement values outside of this range is too small, which results in unwanted noise. The procedure to retrieve a value for the diffuse fraction in this part, d f 2 , is as follows: (1) Calculate the relative change of kt as: Energies 2017, 10, 248 9 of 21 (2) For d kt greater than −0.5 and smaller than 1 a. Select the column of the probability matrix that corresponds to d kt ; b.
Generate a Markov number r M (randomized number between 0 and 1) [28]; c.
Select the row where r M is smaller than the cumulated probabilities for the first time; d.
The selected row corresponds to change of d f , that is: (3) For d kt smaller than −0.5, d d f is not taken from the matrix, but extrapolated as: (4) For d kt greater than 1, d d f is extrapolated as: (5) The diffuse fraction for part 2, d f 2 , can now be calculated as: Energies 2017, 10, 248 9 of 20 a. Select the column of the probability matrix that corresponds to ; b. Generate a Markov number M (randomized number between 0 and 1) [28]; c. Select the row where M is smaller than the cumulated probabilities for the first time; d. The selected row corresponds to change of , that is: (3) For smaller than −0.5, is not taken from the matrix, but extrapolated as: (4) For greater than 1, is extrapolated as: (5) The diffuse fraction for part 2, , can now be calculated as: = before (8) Figure 6. The same relation between changes of and changes of as in Figure 5, here as the probability matrix that is used in the model, corresponding to Figure 4. In the model, only relative changes of −0.5 to 1 are computed with this matrix. In the matrix shown here, measurement values from the same locations and years as in Figure 4 were incorporated.

Calculation of the Daily Course of
In the case of clear sky, is mainly dependent on the air mass relative to its daily minimum. For that reason, in this part of the model a geometric approach has been chosen capable of reproducing the characteristic daily course of the diffuse fraction for clear sky days. The diffuse fraction of this part is calculated as: The air mass can be modelled as a function of the elevation of the sun. The minimal air mass min is calculated for each day by using the maximum elevation angle γ S, max : In the case of clear sky, d f is mainly dependent on the air mass relative to its daily minimum. For that reason, in this part of the model a geometric approach has been chosen capable of reproducing the characteristic daily course of the diffuse fraction for clear sky days. The diffuse fraction of this part is calculated as: The air mass can be modelled as a function of the elevation of the sun. The minimal air mass AM min is calculated for each day by using the maximum elevation angle γ S, max : Figure 7 displays the measured (blue) and modelled (green dashed) course of the diffuse fraction over an exemplary day in Tateno, Japan (13 February 2006). While the clearness index kt (black) is relatively stable around 1, the diffuse fraction is around 0.5 shortly after sunrise and before sunset and is falling down to a minimal diffuse fraction d f min at noon, to 0.136 in this example.  Figure 7 displays the measured (blue) and modelled (green dashed) course of the diffuse fraction over an exemplary day in Tateno, Japan (13 February 2006). While the clearness index (black) is relatively stable around 1, the diffuse fraction is around 0.5 shortly after sunrise and before sunset and is falling down to a minimal diffuse fraction min at noon, to 0.136 in this example. However, the main challenge in modelling the diffuse fraction over the course of a clear sky day is to find a good approximation for the minimal diffuse fraction of the day, since this factor is subject to strong variations in every possible respect: from location to location, from season to season and even from day to day.

Daily Variation of min
In order to illustrate the daily variation of min , seven consecutive days in Tamanrasset  However, the main challenge in modelling the diffuse fraction over the course of a clear sky day is to find a good approximation for the minimal diffuse fraction of the day, since this factor is subject to strong variations in every possible respect: from location to location, from season to season and even from day to day.

Daily Variation of d f min
In order to illustrate the daily variation of d f min , seven consecutive days in Tamanrasset

Seasonal Variation of min
In addition to daily variations, min also shows seasonal variation on some locations.

Seasonal Variation of d f min
In addition to daily variations, d f min also shows seasonal variation on some locations.

Seasonal Variation of min
In addition to daily variations, min also shows seasonal variation on some locations. values (grey crosses), no significant correlation can be observed which implies that other factors must have influence on the minimal daily diffuse fraction. The monthly means of aerosol optical depth (red) and the water vapor (blue dashed) taken from the NASA Terra/MODIS satellite [29,30] however feature a seasonal behavior similar to min .   Tamanrasset, 2006. While d f min is mostly close to 0.1 in wintertime, it varies strongly from spring to autumn, with no clear relation to the mean clearness index of the corresponding day (grey). It was found that changing levels of aerosols (red) and water vapour (dotted blue) may cause this effect.

Summary of Factors Influencing d f min
This leads to the conclusion that d f min is dependent on a series of factors. A list of factors that proved influential on d f min is given below: (1) The clearness index kt. The values of kt are averaged in a range of 120 min around noon: (2) The variability of the clearness index, kt Var . For the same period of time, the changes of kt are registered: (3) The maximum elevation of the sun during the day, γ S,max and the minimum air mass during the day, AM min , compare to top of this section. (4) The aerosol optical depth (AOD) and the water vapour (wv) of the respective month. These values are taken from the NASA Terra/MODIS satellite [29,30] and averaged on a month per month basis between 2001 and 2015. Figure 10 gives an impression of the worldwide seasonal characteristics of AOD and wv. These values are taken from the NASA Terra/MODIS satellite [29,30] and averaged on a month per month basis between 2001 and 2015. Figure 10 gives an impression of the worldwide seasonal characteristics of AOD and wv. (5) The up and down time of in the morning and in the evening. As a measure of the steepness of the curve, the time span is determined between sunrise and when first reaches the threshold of 1 in the morning. A second time span between the moment when is at last above 1 in the evening and sunset is measured as well. The two values are averaged and are a good indicator for min in places with high day-to-day variation of min : The longer the up/down time, the higher min will be.

Resulting Equations for min
The factors that influence min mentioned in the above section are combined in a series of posynomials, depending on the location and availability of data. The coefficients and exponents of

Resulting Equations for d f min
The factors that influence d f min mentioned in the above section are combined in a series of posynomials, depending on the location and availability of data. The coefficients and exponents of the following posynomials were fitted with the help of the Levenberg-Marquardt algorithm implementation of the software gnuplot 5.0 [31].
The datasets used for the posynomial fits are taken from the same locations as in Table 2 Case 1: AOD and water vapor data available, location features strong seasonal changes of AOD (indicator for seasonal aerosol concentrations e.g., due to sandstorms in desert regions, see Table 3 the values for factors a, b and c). Case 2: AOD and water vapor data available, no strong seasonal changes of AOD (see Table 4 the values for factors a, b and c).  Table 5 the values for factors a, b and c):    Table 6 the values for factors a, b and c).

Combination of the Three Parts
The three parts of the algorithm generate the values d f 1 , d f 2 and d f 3 . Depending on the current weather situation, expressed by characteristics and statistical features of kt, they are combined to one single, resulting d f : The mean absolute deviation of kt at a given time of day t x that is used as condition above is calculated as follows: with t range = 30 min. For illustration of the weighing conditions mentioned in Table 7, Figure 11 displays all-sky camera images from the Institute of Meteorology and Climatology of the Leibniz University Hannover [32,33]. Picture A shows a moment where no clouds are visible. It is classified as "Clear Sky" since kt = 1.03 and mad kt = 0.0025. Picture B shows a moment where kt = 0.147 and mad kt = 0.107, hence being classified as "Standard". In picture C some light clouds are visible around the sun. This moment is classified as "Transition" as kt = 1.01 and mad kt = 0.028. The "Transition" condition can be interpreted as clear sky with only few light clouds. The generated matrices and other model data can be obtained from the authors upon request.

Combination of the Three Parts
The three parts of the algorithm generate the values , and . Depending on the current weather situation, expressed by characteristics and statistical features of , they are combined to one single, resulting : The mean absolute deviation of at a given time of day x that is used as condition above is calculated as follows: with range = 30 min. For illustration of the weighing conditions mentioned in Table 7, Figure  The generated matrices and other model data can be obtained from the authors upon request.

Results
In this section, the results of the validation of the new algorithm are presented. As mentioned in Section 2.1, the validation is conducted for 28 locations with one year of one-minute values each, basing the validation on more than seven million data points worldwide.
The overall results are then compared to the results of three existing models for the diffuse fraction: the model of Orgill and Hollands [18] (OH), a one-parameter model, the reduced version of the two-parameter model of Reindl et al. [3] (RR), and the model by Perez and Ineichen [23] (PZ), all introduced in Section 2.3. The first two models were also identified as two of the three best performing models among the eight investigated approaches by Dervishi [27]. The model by Perez and Ineichen [23] is still popular in the community and widely made use of. The model by Skartveit [12] was not used for the model comparison since no indications were found that show a significant advantage of this model over Orgill and Hollands [18], Reindl et al. [3] or Perez and Ineichen [23]. Figure 12 displays two weeks in Alice Springs, Australia, 2005. The measured global horizontal irradiance is plotted on top (green); the resulting clearness index kt is plotted for reference underneath (black). In the three following plots, the measured diffuse fraction (black) is displayed, together with the diffuse fraction that was modelled with the new approach (blue), with the model from Orgill and Hollands [18] (grey) , the model from Reindl et al. [3] (orange) and the model from Perez and Ineichen [23]. While the new model is able to reproduce the diffuse fraction in good accordance to the measurement values most of time, the inherent problem of models with static one-or two-parameter relationships between the clearness index and the diffuse fraction becomes apparent. Especially on clear sky days the existing models fail to reproduce the characteristic behavior of the diffuse fraction.
In order to evaluate the performance of the new model in statistical terms, the root mean squared errors (RMSEs) are calculated for the new model as well as for the three reference models. Figure 13 shows the RMSE for the four models over all test data sets. The RMSE produced by the new model is smaller than those produced by the models of Orgill and Hollands [18], Reindl et al. [3] and Perez and Ineichen [23], in parts significantly, except for one case in Izaña, Spain, 2011 (iza 2011). The overall RMSE, averaged over all test data sets, can be reduced from 0.138 (OH), 0.134 (RR) and 0.139 (PZ) to 0.116 for the new model, which equals an amelioration of 16%-20%.
A further validation is conducted by comparing the annual diffuse irradiation values that are estimated by the models with the measured value. Figure 14 lists the relative deviations of the modelled from the measured annual diffuse irradiation. In most of the cases, the deviation resulting from the new model is significantly smaller than the deviation resulting from the models of OH, RR or PZ. There are few cases where the model leads to higher deviations than the existing ones, e.g., for Billings, USA (bil 2003), Solar Village, Saudi Arabia (sov 2001) or Tamanrasset, Algeria (tam 2006). Extreme deviations of more than 20%, however, as apparent in some of the test cases for the two existing models, do not occur when using the new model. The average of the absolute (i.e. unsigned) relative deviations for all test cases can be reduced by nearly 50% from 11.9% for OH, 12.7% for RR and 10.9% for PZ to only 6.4% for the new model.
The histogram of the mean absolute deviations of the annual diffuse irradiation displayed in Figure 15 illustrates the frequency of the deviations each model produces. While the model of Reindl et al. [3] (RR) has its peak in the class of 0 to 5%, it still has several outliers of 35%-55%. The model of Orgill and Hollands [18] (OH) features only one extreme outlier at 35%-40%, but has most of its results lying in the class of 10 to 15%. The model by Perez and Ineichen [23] shows no outliers of more than 25% but has its results evenly distributed between 0 and 15%. The new model does not produce any outliers and has its peak in the class of 0%-5%, covering 50% of the test cases alone. Figure 12 displays two weeks in Alice Springs, Australia, 2005. The measured global horizontal irradiance is plotted on top (green); the resulting clearness index is plotted for reference underneath (black). In the three following plots, the measured diffuse fraction (black) is displayed, together with the diffuse fraction that was modelled with the new approach (blue), with the model from Orgill and Hollands [18] (grey) , the model from Reindl et al. [3] (orange) and the model from Perez and Ineichen [23].  [18], orange for Reindl et al. [3], yellow for Perez and Ineichen [23]). Most of the time, the output of the new model leads to good conformity for clear sky days as well as for days with broken clouds. The inherent problem of static one-or two-parameter models becomes apparent when comparing the measurement values to the output of the models by Orgill and Hollands [18], Reindl et al. [3] and Perez and Ineichen [23], especially for clear sky days. squared errors (RMSEs) are calculated for the new model as well as for the three reference models. Figure 13 shows the RMSE for the four models over all test data sets. The RMSE produced by the new model is smaller than those produced by the models of Orgill and Hollands [18], Reindl et al. [3] and Perez and Ineichen [23], in parts significantly, except for one case in Izaña, Spain, 2011 (iza 2011). The overall RMSE, averaged over all test data sets, can be reduced from 0.138 (OH), 0.134 (RR) and 0.139 (PZ) to 0.116 for the new model, which equals an amelioration of 16%-20%. A further validation is conducted by comparing the annual diffuse irradiation values that are estimated by the models with the measured value. Figure 14 lists the relative deviations of the modelled from the measured annual diffuse irradiation. In most of the cases, the deviation resulting from the new model is significantly smaller than the deviation resulting from the models of OH, RR or PZ. There are few cases where the model leads to higher deviations than the existing ones, e.g., for Billings, USA (bil 2003), Solar Village, Saudi Arabia (sov 2001) or Tamanrasset, Algeria (tam 2006). Extreme deviations of more than 20%, however, as apparent in some of the test cases for the two  The histogram of the mean absolute deviations of the annual diffuse irradiation displayed in Figure 15 illustrates the frequency of the deviations each model produces. While the model of Reindl et al. [3] (RR) has its peak in the class of 0 to 5%, it still has several outliers of 35%-55%. The model of Orgill and Hollands [18] (OH) features only one extreme outlier at 35%-40%, but has most of its results lying in the class of 10 to 15%. The model by Perez and Ineichen [23] shows no outliers of more than 25% but has its results evenly distributed between 0 and 15%. The new model does not produce any outliers and has its peak in the class of 0%-5%, covering 50% of the test cases alone. Relative deviation of the modelled annual diffuse irradiation from the measured diffuse irradiation for all analysed datasets. The new model performs better than the three reference models by Orgill and Hollands [18] (OH, grey), Reindl et al. [3] (RR, orange) and Perez and Ineichen [23] in nearly all cases, except for desert-like locations such as Solar Village, Saudi Arabia (sov) or Tamanrasset, Algeria (tam). None of the test cases shows deviations of more than ±20% for the new model. The mean absolute deviation over all test cases for the new model is 6.4%, whereas it reaches 11.9% for OH, 12.7% for RR and 10.9% for PZ. The mean absolute deviation can thus approximately be halved when using the new model. Figure 15 illustrates the frequency of the deviations each model produces. While the model of Reindl et al. [3] (RR) has its peak in the class of 0 to 5%, it still has several outliers of 35%-55%. The model of Orgill and Hollands [18] (OH) features only one extreme outlier at 35%-40%, but has most of its results lying in the class of 10 to 15%. The model by Perez and Ineichen [23] shows no outliers of more than 25% but has its results evenly distributed between 0 and 15%. The new model does not produce any outliers and has its peak in the class of 0%-5%, covering 50% of the test cases alone.  Figure 16). In none of the test cases deviations of more than 20% can be observed. While the model by Reindl et al. [3] (RR, orange) has most of its test cases in classes <15%, it still produces in some cases results of more than 40%. The model of Orgill and Hollands [18] (OH, grey) features less extreme deviations but shows a strong frequency of deviations between 10% and 20%. The model by Perez and Ineichen [23] (PZ, yellow) does not produce outliers and has its results distributed evenly between 0 and 15%.  Figure 16). In none of the test cases deviations of more than 20% can be observed. While the model by Reindl et al. [3] (RR, orange) has most of its test cases in classes <15%, it still produces in some cases results of more than 40%. The model of Orgill and Hollands [18] (OH, grey) features less extreme deviations but shows a strong frequency of deviations between 10% and 20%. The model by Perez and Ineichen [23] (PZ, yellow) does not produce outliers and has its results distributed evenly between 0 and 15%.
This histogram can be converted into confidence probabilities, depicted in Figure 16. The new model results in mean absolute deviations of less than 5% in 50% of the cases. In more than 80% of the cases, the deviation is less than 10%. Compared to the other three models this is a significant improvement, where the probability of producing less than 5% deviation is only 25% (OH), 36% (RR) and 25% (PZ) and the probability of producing less than 10% deviation is 36% (OH), 54% (RR) and 50% (PZ). Figure 16. When using the new model for calculating the diffuse irradiance, the annual deviation of the modelled irradiation will be smaller than 5% in more than 40% of the cases, and smaller than 10% in over 80% of the cases. When using the model of Orgill and Hollands [18] (OH), these confidence probabilities reduce to 25% and 36%, while using the model of Reindl et al. [3] (RR) reduces the probabilities to 36% and 54% respectively. The use of the model of Perez and Ineichen [23] (PZ) results in probabilities of 25% and 50%.

Conclusions
The newly developed model for the diffuse fraction of solar irradiance on PV systems provides significantly better agreement with measurements than the other models published so far. This is achieved by the following features: the first part utilizes the dependency of the diffuse fraction on the clearness index , in analogy to existing one-parameter models. In the new model, the correlation is expressed as probability matrices rather than single functions, leading to realistic, more natural diffuse fraction characteristics. Also taking advantage of probability matrices, the second part uses the relation of the relative changes of over the relative changes of . The third Figure 16. When using the new model for calculating the diffuse irradiance, the annual deviation of the modelled irradiation will be smaller than 5% in more than 40% of the cases, and smaller than 10% in over 80% of the cases. When using the model of Orgill and Hollands [18] (OH), these confidence probabilities reduce to 25% and 36%, while using the model of Reindl et al. [3] (RR) reduces the probabilities to 36% and 54% respectively. The use of the model of Perez and Ineichen [23] (PZ) results in probabilities of 25% and 50%.
This histogram can be converted into confidence probabilities, depicted in Figure 16. The new model results in mean absolute deviations of less than 5% in 50% of the cases. In more than 80% of the cases, the deviation is less than 10%. Compared to the other three models this is a significant improvement, where the probability of producing less than 5% deviation is only 25% (OH), 36% (RR) and 25% (PZ) and the probability of producing less than 10% deviation is 36% (OH), 54% (RR) and 50% (PZ).

Conclusions
The newly developed model for the diffuse fraction of solar irradiance on PV systems provides significantly better agreement with measurements than the other models published so far. This is achieved by the following features: the first part utilizes the dependency of the diffuse fraction d f on the clearness index kt, in analogy to existing one-parameter models. In the new model, the correlation is expressed as probability matrices rather than single functions, leading to realistic, more natural diffuse fraction characteristics. Also taking advantage of probability matrices, the second part uses the relation of the relative changes of d f over the relative changes of kt. The third part takes into account the diffuse fraction characteristics of days with clear sky only using a geometrical approach. The crucial factor for the third part is the minimum daily diffuse fraction for which a posynomial model has been introduced. The presented new model was analyzed and compared to two other models for 28 locations worldwide with one year of one-minute measurement data each. It was shown that the new model has a high quality of modeling the diffuse irradiance. The mean RMSE over all test cases was reduced by 16%-20%, whereas the mean absolute deviation of the annual diffuse irradiation was found to be nearly 50% smaller compared to the reference models. In more than 80% of the test cases, the deviation of the annual diffuse irradiation is smaller than 10%, with an overall maximum deviation of 20%.
With the new model, the diffuse irradiance can be calculated with much lower uncertainty, hence significantly reducing the uncertainty of PV energy yield simulations. Possible future work for the improvement of the model will include further investigations on the minimal daily diffuse fraction that has a very decisive influence on the model quality for clear sky days. Such days may be identified by cloud cameras that will allow for a much better estimation of the cloud fraction compared to satellite images [33].