Mathematical Determination of the Upper and Lower Limits of the Diffuse Fraction at Any Site

: A mathematical method for accurately determining the upper and lower diffuse-fraction (k d ) limits that divide the sky into clear, intermediate, and overcast is developed. Fourteen sites around the world are selected for demonstrating the methodology. The upper and lower k d values for these sites are determined from scatter plots of direct-normal solar radiation vs. k d pairs over the typical meteorological year of each site. They vary between 0.73 and 0.80 for the upper and between 0.24 and 0.27 for the lower k d limits. Plots of sunshine duration (SSD) vs. k d are prepared for 12 of the 14 sites. These plots show a decreasing trend in SSD with increasing values of k d , as anticipated. According to local climatology, the number of the SSD values in each sky-condition classiﬁcation varies from site-to-site.


Introduction
The estimation of solar radiation on the surface of the Earth on either horizontal or tilted planes has long been recognised an important parameter in a variety of fields such as atmospheric environment, e.g., [1], terrestrial ecosystems, e.g., [2], terrestrial climate, e.g., [3], and solar energy applications, e.g., [4]. Because of the scarcity of solar radiation platforms that may provide measurements of the total (or global) solar radiation to the above applications, solar models started appearing in the international literature as early as in 1950 s. The total solar radiation is the sum of two components: the diffuse solar radiation and the direct (or beam) one. The latter can accurately be estimated via explicit atmospheric transmission relationships. It is, therefore, the diffuse component that must be simulated satisfactorily; however, this solar component is very difficult to accurately be estimated due to numerous scattering processes of the solar rays that take place in the atmosphere under a clear sky and moreover in the presence of clouds on a cloudy day. This fact has triggered a lot of research for the development of various models that pursue to estimate as accurately as possible the diffuse solar component. Such models are numerous in the international literature, e.g., [5][6][7]. Most of them use the diffuse fraction (k d ), which is the ratio of diffuse-to-global solar radiation as function of other parameter (or parameters) that can easily be calculated, be measured or are known; such parameters are the clearness index (k t ), which is the ratio of the global-to-the-extraterrestrial solar radiation, or the geographical latitude (ϕ), e.g., [8]. Even though k t or k d can be computed in one way or another, it is very important for solar energy engineers to know their upper and lower limits that divide the sky conditions into clear, intermediate and overcast. Several such limits have been provided by many researchers. All of them are based on solar measurements or solar modelling, e.g., [9,10]. This is, therefore, the reason that these limits are considered empirical as they are not based on a standard methodology.
The above gap is bridged in the present study. In other words, this work develops a mathematical methodology for determining the upper and lower limits of k d . The methodology proves to be universal, but the values of the upper and lower k d limits are site-dependent. Such a work is presented for the first time in the international literature.

Materials and Methods
This Section gives a full account of the new (mathematical) method that accurately determines the upper and lower limits of k d at a site.
To apply the methodology, a typical meteorological year (TMY) must preferably be used instead of a long period of measurements or model simulations. This peculiarity is based on the fact that a TMY refers to all representative situations of the included parameters that occur within a year, while a long-term series of measurements or model simulations contains extreme values of the included parameters; the second, therefore, option may result in a broader dispersion of the calculated k d values. On the other hand, TMYs are nowadays used more often in many climatological studies, e.g., [11][12][13].
The selected solar radiation parameter to vary with k d is the direct horizontal solar radiation (H b ) because of much less dispersion of the H b -k d paired values in comparison with the cases of the functions H g (global horizontal solar radiation) vs. k d or H d (diffuse horizontal solar radiation) vs. k d . Even in the case of the direct horizontal solar radiation, the scatter plot of H b vs. k d is found to be greater than that of the relationship H bn (directnormal solar radiation) vs. k d . Figure 1 shows this difference; here hourly mean H b , H bn and k d values have been plotted for the typical month of January of the TMY-BOU (Boulder, CO, USA) derived from the PV-GIS platform [14,15] (see below for description). Therefore, the H bn solar component is considered in the rest of the analysis. measurements or solar modelling, e.g., [9,10]. This is, therefore, the reason that these limits are considered empirical as they are not based on a standard methodology.
The above gap is bridged in the present study. In other words, this work develops a mathematical methodology for determining the upper and lower limits of kd. The methodology proves to be universal, but the values of the upper and lower kd limits are sitedependent. Such a work is presented for the first time in the international literature.

Materials and Methods
This Section gives a full account of the new (mathematical) method that accurately determines the upper and lower limits of kd at a site.
To apply the methodology, a typical meteorological year (TMY) must preferably be used instead of a long period of measurements or model simulations. This peculiarity is based on the fact that a TMY refers to all representative situations of the included parameters that occur within a year, while a long-term series of measurements or model simulations contains extreme values of the included parameters; the second, therefore, option may result in a broader dispersion of the calculated kd values. On the other hand, TMYs are nowadays used more often in many climatological studies, e.g., [11][12][13].
The selected solar radiation parameter to vary with kd is the direct horizontal solar radiation (Hb) because of much less dispersion of the Hb-kd paired values in comparison with the cases of the functions Hg (global horizontal solar radiation) vs. kd or Hd (diffuse horizontal solar radiation) vs. kd. Even in the case of the direct horizontal solar radiation, the scatter plot of Hb vs. kd is found to be greater than that of the relationship Hbn (directnormal solar radiation) vs. kd. Figure 1 shows this difference; here hourly mean Hb, Hbn and kd values have been plotted for the typical month of January of the TMY-BOU (Boulder, USA) derived from the PV-GIS platform [14,15] (see below for description). Therefore, the Hbn solar component is considered in the rest of the analysis. The mathematical determination of the upper and lower limits of kd in a diagram as that of Figure 1b is fully described here. The first step is to find the best-fit curve to the Hbn-kd data points. A second-order polynomial of the form Hbn = a kd 2 + b kd + c was used, where a, b, c are the polynomial coefficients. This type of best-fit curve was selected as the results obtained showed that it is sufficient. Figure 2 shows the best-fit black curve to the scatter plot of Figure 1b, which has the expression Hbn = 902.30 kd 2 − 2087.44 kd + 1188.58, R 2 = 0.996. As WMO recommends that the sunshine duration is measured when Hbn > 120 Wm −2 , the green line represents this threshold. This straight line crosses the best-fit curve The mathematical determination of the upper and lower limits of k d in a diagram as that of Figure 1b is fully described here. The first step is to find the best-fit curve to the H bn -k d data points. A second-order polynomial of the form H bn = a k d 2 + b k d + c was used, where a, b, c are the polynomial coefficients. This type of best-fit curve was selected as the results obtained showed that it is sufficient. Figure 2 shows the best-fit black curve to the scatter plot of Figure 1b, which has the expression H bn = 902.30 k d 2 − 2087.44 k d + 1188.58, R 2 = 0.996. As WMO recommends that the sunshine duration is measured when H bn > 120 Wm −2 , the green line represents this threshold. This straight line crosses the best-fit curve at a certain point (k du ,H bnWMO = 120). Since the y-axis value is known (120 Wm −2 ), solving the above second-order equation for k d , two roots are obtained and the lower-than-1 is considered because k d cannot go higher than 1 by definition; in this case, the accepted solution is k du = 0.76. The third step is to draw the two medians from the vertexes A and B of the triang ABC to their opposite sides, i.e., the lines AD and BE, respectively. The two lines a crossed at M; if a straight line parallel to the y-axis is drawn through M, i.e., the blue lin MG, this crosses the x-axis at a value, which is the lower limit of kd. From the Euclidia geometry, it is known that AG = AC/3. This means that if the upper kd value at the poi C is obtained (kdu), then very easily the lower kd value at the point G (kdl) can be calculate kdl = kdu/3. This way, the estimated kdu and kdl values divide the sky conditions in cle skies, intermediate skies, and overcast skies at a site as shown in Figure 2.  The bold black solid line is the best-fit curve to the H bn -k d data points that correspond to their hourly values; in this case the typical meteorological month of January, which belongs to the PV-GIS TMY-BOU, has been considered. The line BC is tangent to the best-fit line at C (0.76,120). M is the crossing point of the medians AD and BE. The MG line is normal to AC. From the Euclidian geometry AG = AC/3 in the rectangle triangle ABC; therefore, k dl = k du /3 = 0.76/3 ≈ 0.25. The next step is to find the equation of the straight line that is tangent to the bestfit curve at the point C (0.76,120); this point is the crossing point between the best-fit curve and the WMO-defined threshold for SSD measurements. This line expresses the first derivative of the best-fit curve at the point C; in other words, it represents the slope of the best-fit line at C. This is done by taking the first derivative of the H bn expression (H bn ) in respect to k d and replacing k d with 0.76. It is found that H bn = −715.94 Wm −2 . Then, H bn − 120 = −715.94 (k d − 0.76) from which the expression for the tangent line is H bn = −715.94 k d + 664.12. This straight line is the BC in Figure 2. A rectangle triangle ABC is then formed.
The third step is to draw the two medians from the vertexes A and B of the triangle ABC to their opposite sides, i.e., the lines AD and BE, respectively. The two lines are crossed at M; if a straight line parallel to the y-axis is drawn through M, i.e., the blue line MG, this crosses the x-axis at a value, which is the lower limit of k d . From the Euclidian geometry, it is known that AG = AC/3. This means that if the upper k d value at the point C is obtained (k du ), then very easily the lower k d value at the point G (k dl ) can be calculated: k dl = k du /3. This way, the estimated k du and k dl values divide the sky conditions in clear skies, intermediate skies, and overcast skies at a site as shown in Figure 2.
The above methodology was applied to 14 sites around the world. The selection has been based on the following criteria: (i) different environmental characteristics, (ii) spread across the continents, and (iii) availability of TMY. Table 1 deploys the selected sites in alphabetical order together with their geographical coordinates and environmental description. Sites that do not meet criterion (iii) may not be appropriate for applying this methodology. Figure 3 gives the distribution of the 14 sites over the world. Table 1. Selected sites for the application of the mathematical determination of the k du and k dl limits. ϕ and λ are the geographical latitudes and longitudes of the sites, respectively; ϕ is given in the northern (N) or the southern (S) hemisphere, and λ east (E) or west (W) of the Greenwich meridian. The values of both ϕ and λ have been rounded to the second decimal digit. In column 9, I denotes rural and II denotes urban environment. The period of measurements is given in the last column. The selection of the sites and their solar radiation data were based on the BSRN (Baseline Surface Radiation Network), except for Athens (ATH); in this case, data from the Actinometric Station of the National Observatory of Athens not belonging to BSRN were used. The abbreviations of the sites (except for Athens) are those provided by the BSRN typology. Description of the BSRN operation is found in [16]. For the sake of the present analysis, TMYs for the selected sites have been downloaded from the PV-GIS platform [17,18]. These typical meteorological years have been derived from a combination of satellite and re-analysis data in the period 2005-2014. Each TMY contains the following parameters: date, time (h UTC), dry-bulb temperature (in degC), relative humidity (in %), global horizontal solar irradiance (H g , in Wm −2 ), direct-normal solar irradiance (H bn , in Wm −2 ), diffuse horizontal solar irradiance (H d , in Wm −2 ), infrared horizontal solar irradiance (in Wm −2 ), wind speed (in ms −1 ), wind direction (in deg), barometric pressure (P z , in Pa) at the site's altitude (z, in m), and a control parameter. The meteorological and solar radiation values are hourly averages. Therefore, from the 14 downloaded TMYs new files were created to contain the necessary parameters: date, time (h UTC), H g , H bn , and H d . Conversion of time from UTC (universal time constant) into LST (local standard time) at each site was applied to all files. Then, an own-developed routine, named XRONOS.bas in Basic programming language (xronos means time in Greek with x being spelled as ch), was used to derive the solar altitude (γ, in deg) at 30 min past the hour LST. For example, the value of any parameter in the data base at 10.00 LST denotes its average value between 09.01 LST and 10.00 LST. The XRONOS.bas algorithm calculated γ at 09.30 LST and this value was assigned to all parameters at 10.00 LST. Finally, the hourly values of k d were computed from concurrent values of H g and H d , i.e., k d = H d /H g . In cases that H d and/or H g contained zero values, no value was assigned to k d . Correspondingly, no value was assigned to H bn , too. In plotting the (x, y) data pairs, i.e., (k d , H bn ), the hourly values of these two parameters were considered for all γ > 5 deg. This way, 14 plots of H bn -k d were derived to apply the methodology and derive the individual k dl and k du limits for the sites. The plots are deployed and discussed in Section 3.1.  For the sake of the present analysis, TMYs for the selected sites have been downloaded from the PV-GIS platform [17,18]. These typical meteorological years have been derived from a combination of satellite and re-analysis data in the period 2005-2014. Each TMY contains the following parameters: date, time (h UTC), dry-bulb temperature (in degC), relative humidity (in %), global horizontal solar irradiance (Hg, in Wm −2 ), directnormal solar irradiance (Hbn, in Wm −2 ), diffuse horizontal solar irradiance (Hd, in Wm −2 ), infra-red horizontal solar irradiance (in Wm

Determination of the k d Limits
This section presents the results from the application of the developed methodology to the 14 sites. Figures 4-10 show the graphical estimation of the k du and k dl values for all

Determination of the kd Limits
This section presents the results from the application of the developed methodology to the 14 sites. Figures 4-10

Determination of the kd Limits
This section presents the results from the application of the developed methodology to the 14 sites.       From the above Figures it is seen that different spreads in the scatter plots exist. This variance is least at the ATH, GAN, KIS, LIN, REG, and SOV sites and greatest at the BOU, LER, and POY ones. The broader variance in the H bn -k d data pairs is possibly due to the changing weather conditions at these sites (fast moving clouds, prevailing heavy/light cloudiness most of the time), which affect the H d values in the estimation of k d . On the other hand, missing solar radiation values occur in the case of the OHY site; this results in fewer data points than at the rest of the sites. Table 2 gives the expressions of the best-fit curves, their R 2 that determine the variance, and the estimation of the k du and k dl values. Since these values are similar for all sites examined, one could adopt universal values that are the average of the distinct ones; in this case a universal k du would be 0.78 and a universal k dl 0.26. Nevertheless, these averages are only indicative, because they refer to a small number of sites over the globe. From the above Figures it is seen that different spreads in the scatter plots exist. This variance is least at the ATH, GAN, KIS, LIN, REG, and SOV sites and greatest at the BOU, LER, and POY ones. The broader variance in the Hbn-kd data pairs is possibly due to the changing weather conditions at these sites (fast moving clouds, prevailing heavy/light cloudiness most of the time), which affect the Hd values in the estimation of kd. On the other hand, missing solar radiation values occur in the case of the OHY site; this results in fewer data points than at the rest of the sites. Table 2 gives the expressions of the best-fit curves, their R 2 that determine the variance, and the estimation of the kdu and kdl values. Since these values are similar for all sites examined, one could adopt universal values that are the average of the distinct ones; in this case a universal kdu would be 0.78 and a universal kdl 0.26. Nevertheless, these averages are only indicative, because they refer to a small number of sites over the globe.

Evaluation of the Methodology
To show the applicability of the method developed in Section 2, Figures 11-17 show the distribution of the (k d ,SSD) pairs during the years shown in parentheses in column 1, Table 2. The SSD values were derived from the BSRN data, meeting the H bn > 120 Wm −2 WMO criterion; moreover, the SSD values from the ATH station are real 1-min measurements with the aid of an EKO MS-90 direct-normal-irradiance sensor. All (k d ,SSD) data values are hourly ones in the corresponding year of the site (column 1, Table 2). Therefore, SSDs for k du < k d ≤ 1 correspond to overcast-, for k dl < k d ≤ k du to intermediate-, and for 0 ≤ k d ≤ k dl to clear-sky conditions.

Evaluation of the Methodology
To show the applicability of the method developed in Section 2, Figures 11-17 show the distribution of the (kd,SSD) pairs during the years shown in parentheses in column 1, Table 2. The SSD values were derived from the BSRN data, meeting the Hbn > 120 Wm −2 WMO criterion; moreover, the SSD values from the ATH station are real 1-min measurements with the aid of an EKO MS-90 direct-normal-irradiance sensor. All (kd,SSD) data values are hourly ones in the corresponding year of the site (column 1, Table 2). Therefore, SSDs for kdu ≤ kd ≤ 1 correspond to overcast-, for kdl ≤ kd < kdu to intermediate-, and for 0 ≤ kd < kdl to clear-sky conditions.

Evaluation of the Methodology
To show the applicability of the method developed in Section 2, Figures 11-17 show the distribution of the (kd,SSD) pairs during the years shown in parentheses in column 1, Table 2. The SSD values were derived from the BSRN data, meeting the Hbn > 120 Wm −2 WMO criterion; moreover, the SSD values from the ATH station are real 1-min measurements with the aid of an EKO MS-90 direct-normal-irradiance sensor. All (kd,SSD) data values are hourly ones in the corresponding year of the site (column 1, Table 2). Therefore, SSDs for kdu ≤ kd ≤ 1 correspond to overcast-, for kdl ≤ kd < kdu to intermediate-, and for 0 ≤ kd < kdl to clear-sky conditions.     The shape of the scattered data points in the above diagrams is the expected one, i.e., decreasing SSD values with increasing kd. Nevertheless, this self-evident pattern is not repeated at the GAN and ILO sites; this may be due to the solar radiation data quality, which, in the case of ILO, may be unacceptable. Therefore, these two sites are excluded from further analysis. Moreover, the site of OHY presents some low SSD values for clear skies (i.e., for 0 ≤ kd ≤ 0.26), but the number of such values is small, and, therefore, this site is included in the analysis that follows.
In each of the remaining 14 sites, the number of the (kd,SSD) hourly pairs is maximum 8760 or 8784 for a non-leap or a leap year, respectively; these numbers include night-time values, which were not taken into account. The analysis that follows concerns the number of daytime hourly pairs that represent the three sky condition statuses (overcast, intermediate, clear). Table 3 shows the results. The shape of the scattered data points in the above diagrams is the expected one, i.e., decreasing SSD values with increasing kd. Nevertheless, this self-evident pattern is not repeated at the GAN and ILO sites; this may be due to the solar radiation data quality, which, in the case of ILO, may be unacceptable. Therefore, these two sites are excluded from further analysis. Moreover, the site of OHY presents some low SSD values for clear skies (i.e., for 0 ≤ kd ≤ 0.26), but the number of such values is small, and, therefore, this site is included in the analysis that follows.
In each of the remaining 14 sites, the number of the (kd,SSD) hourly pairs is maximum 8760 or 8784 for a non-leap or a leap year, respectively; these numbers include night-time values, which were not taken into account. The analysis that follows concerns the number of daytime hourly pairs that represent the three sky condition statuses (overcast, intermediate, clear). Table 3 shows the results. The shape of the scattered data points in the above diagrams is the expected one, i.e., decreasing SSD values with increasing k d . Nevertheless, this self-evident pattern is not repeated at the GAN and ILO sites; this may be due to the solar radiation data quality, which, in the case of ILO, may be unacceptable. Therefore, these two sites are excluded from further analysis. Moreover, the site of OHY presents some low SSD values for clear skies (i.e., for 0 ≤ k d ≤ 0.26), but the number of such values is small, and, therefore, this site is included in the analysis that follows.
In each of the remaining 14 sites, the number of the (k d ,SSD) hourly pairs is maximum (8760 or 8784 for a non-leap or a leap year, respectively); these numbers include nighttime values, which were not taken into account. The analysis that follows concerns the number of daytime hourly pairs that represent the three sky condition statuses (overcast, intermediate, clear). Table 3 shows the results. The differentiation in the total daytime values (column 5, Table 3) is due to the geographical location of the sites, which dictates the daylight pattern, i.e., the daylength. On the other hand, the variation in the SSD values under the three sky conditions is much influenced by the prevailing climatology at each site.
showed that the k dl and k du values ranged between 0.24 and 0.26, and between 0.73 and 0.80, respectively, as average values over the selected sites.
The value of the presented methodology lies not only in the accurate determination of the k du and k dl values at any site, but also contributes to the solar radiation climatology of a site, and the derivation of its SSD values, if they are not available. The estimation of the SSD values is not, of course, that accurate because of the great dispersion of the SSD-k d pairs (Figures 11-17). Nevertheless, the methodology gives a qualitative indication; this is something than nothing.
It must be mentioned here that various attempts to estimate k d in the past were based on modelling than measurements as in the present study. The use of models may result in reduced accuracy because the intrinsic error in the estimation of k d increases in comparison with a measurements-based methodology. For instance, Dervishi and Mahdavi [9] used 8 models for the estimation of k d as function of k t ; all of them gave high MBD (mean bias difference, in %) and RMSD (root mean square difference, in %) values. Last but not least, the present study provided a methodology to compute the two k d limits from which one may categorise the sky at his/her site in the three main statuses: clear, intermediate, and overcast. This was done for the first-time worldwide.
The developed mathematical formulation, therefore, clearly determines the upper and lower diffuse-fraction limits, and seems to work quite efficiently over the globe. Therefore, it may have a universal character. However, the method should be tested for applicability at sites in Russia, central and south Asia, Japan, Australia, and Oceania in order to fully manifest its universality.
Author Contributions: Conceptualisation, methodology, data download, data curation, software application, writing-original draft preparation, H.D.K.; data curation, visualisation, writingreview and editing, S.I.K.; writing-review and editing, D.K. All authors have read and agreed to the published version of the manuscript.