Attenuation Factor Estimation of Direct Normal Irradiance Combining Sky Camera Images and Mathematical Models in an Inter-Tropical Area

Nowadays, it is of great interest to know and forecast the solar energy resource that will be constantly available in order to optimize its use. The generation of electrical energy using CSP (concentrated solar power) plants is mostly affected by atmospheric changes. Therefore, forecasting solar irradiance is essential for planning a plant’s operation. Solar irradiance/atmospheric (clouds) interaction studies using satellite and sky images can help to prepare plant operators for solar surface irradiance fluctuations. In this work, we present three methodologies that allow us to estimate direct normal irradiance (DNI). The study was carried out at the Solar Irradiance Observatory (SIO) at the Geophysics Institute (UNAM) in Mexico City using corresponding images obtained with a sky camera and starting from a clear sky model. The multiple linear regression and polynomial regression models as well as the neural networks model designed in the present study, were structured to work under all sky conditions (cloudy, partly cloudy and cloudless), obtaining estimation results with 82% certainty for all sky types.


Introduction
One of the main factors supporting the continued consumption of energy from renewable sources, such as solar energy, is their potential to substitute approximately 4% of the electricity currently generated from burning fossil fuels. Given their high energy consumption, the world's largest economies are aiming to sustainably develop their own power generation processes by implementing new technologies and methodologies. Using rigorous analytical models, research studies in this area have evaluated the future costs, benefits and disadvantages for electricity generation; they have also analyzed how solar energy generation will evolve [1][2][3]. According to the latest Global Data report, the solar energy capacity is estimated to increase significantly from about 600 gigawatt (GW) in 2019 to about 1600 GW in 2030 following significant additional capacity coming from China, India, Germany, the US and Japan [4].
Over recent years, a wide body of research has been carried out to optimize the solar energy resource using new technologies and taking advantage of regions where there is a high concentration of surface solar irradiance. The beam irradiance has been predicted for cloudless, partially cloudy to characterize cloudiness is paramount. In this context, the technical use of sky cameras has been expanding as a tool to forecast solar irradiance. There are records of its use at the University of California Merced's solar observatory station (2012), at CIESOL, University of Almeria, Spain (2013), at the University of California, San Diego (2014), at UNAM in Mexico (2015) and at the University of Singapore (2019), amongst others. Using this technology with models that convert the digitized information into irradiance indices, it is possible to estimate the direct solar irradiance for different sky conditions, as they occur, by modeling a camera calibration system (a process where an image's pixel intensity is related to the amount of solar radiation present at that moment) [5,24,25]. In addition, remote sensing techniques (satellite images and sky cameras) have been combined with radiometric data for sky classification processing [26].
Solar companies recognize that cost remains the main drawback of Concentrated Solar Power (CSP) systems. Consequently, important projects such as CSPIMP (Concentrated Solar Power efficiency IMProvement) have managed to make CSP plants more competitive [27]. R.Chauvin and J. Nou, forecasted the cloud cover for different sky conditions over 5-30 min with a spatial resolution of (1 km 2 ) using sky images and algorithm development [27,28]. F. Batlles and J. Montesinos established that detecting and classifying clouds as well as determining their trajectory are essential factors in forecasting cloud cover [29]. Likewise, they demonstrated that infrared channels are essential for cloud height assignment and the visible channel is necessary for cloud opacity determination; however, this was solely applied to short-term cloud forecasting using multispectral satellite imagery [29,30]. It is clear that a wide range of different technologies and systems have been used to estimate the solar resource; nonetheless, most publications focus on solar resource assessment and not on estimating the attenuation factor.
In this paper, three models were applied to estimate the direct normal irradiance for Mexico City based on determining the attenuation percentage caused by clouds under different sky conditions. To perform the study, a total sky camera was used (TSI-880 model) from which the digitized image levels were characterized and modelled to determine the attenuation coefficient of this solar irradiance component.

Data
The data and images used in this work were obtained from the Solar Irradiance Observatory (SIO) at the Geophysics Institute of the National Autonomous University of Mexico (latitude: 19.32, longitude: −99.17, elevation: 2280 m above sea level). The observatory serves as the Regional Center for the Measurement of Solar Irradiance, part of the World Meteorological Organization (WMO).
Direct Normal Irradiance (DNI) observations were taken over the period from 2016 to 2017 using a CHP1 Kipp&Zonen pyrheliometer. Regarding the pyrheliometer's operation, Kipp&Zonen state that a maximum uncertainty of 2% is expected for hourly totals and 1% for daily totals. These data pass from the instrument's sensor to the Kipp&Zonen CR3000 data logger. The solar height (α) and azimuth angle were recorded (in degrees) every minute.
A total sky camera with a rotational shadow band (a TSI 880 model) providing a hemispheric view of the sky was used to obtain minute-by-minute images from 6:00 a.m. to 6 To work with solar irradiance models, it is necessary to include additional parameters. We included local monthly values of the Linke turbidity index obtained from the SODA web page [31], supported by Meteotest, Switzerland. The data processing and image analysis were carried out using MATLAB R mathematical software.

Determination of the Attenuation Percentage
In the present study, an approach was carried out that uses sky camera images to estimate solar irradiance for all sky types. This model allows us to determine the amount of lost solar irradiance caused by the cloudiness factor.
To have reference elements for the attenuation percentage of the lost solar irradiance, we start from a clear sky model, where it is possible to calculate 100% of the direct solar irradiance received at each study moment. To estimate the solar irradiance attenuation, we started with the European Solar Radiation Atlas (ESRA) clear sky model, which requires the Linke turbidity index to determine the theoretical local solar irradiance of a clear sky [32,33]. With ESRA, we obtained the maximum DNI clear-sky value corresponding to the date and time for 400 different images.
The sample of 400 images was selected from an image database recorded from February 2016 to January 2017. We chose 400 images taken under different sky types (cloudy, partly cloudy and clear), covering all the seasons of the year. The selected images were taken from sunrise to sunset every hour, guaranteeing different solar zenith elevations and different sky conditions. The camera was correctly maintained to provide good image definition and clarity thus reducing the risk of working with pixels that were not typical of the sky or cloud type when digitally analyzing the images. All the images were selected when the solar elevation was greater than 10 degrees. The images were classified into the following sky types-233 belonging to partially cloudy skies, 68 to cloudy skies and 99 to clear skies.
The attenuation percentage recorded by the pyrheliometer sensor was calculated considering the value of the direct solar irradiance corresponding to the date and the solar altitude at which the images were taken. For any given sky camera image, the maximum DNI value was calculated from the ESRA model and the DNI value obtained from the pyrheliometer was multiplied by the sine of the solar height. Subsequently, we obtained the percentage of attenuation caused by the different cloud types traversing the solar disk at the time the image was taken. According to the pyrheliometer specifications, the attenuation percentage data obtained had an uncertainty of 2% because we considered the recorded DNI data on an hourly basis. Nonetheless, there were clear days where the maximum DNI value obtained by the ESRA model could not be reached. In this way, a small percentage of attenuation is recorded that is important for solar energy capture.

Digitized Data That Represent the Percentage of Attenuation
A new database was created that included the percentage of DNI attenuation generated by the different sky conditions and the respective cloud-type classification that caused it. Then, we calibrated the DNI measured on the surface with the pixel values of the clouds near the solar disk. These values better represent (at the pixel level) the solar irradiance attenuation caused by the cloud at that moment. With this procedure, we hope to find patterns between the pixel values, the sky type and the estimated DNI value measured [34]. This method offers better image calibration at the pixel level with respect to the DNI attenuation than do other methods. Indeed, other methods study such a large image area around the solar disk that different sky conditions can be present at the same time.

Extraction of the Image's Digitized Data
In situations where the sun position is not known (due to the presence of clouds), an algorithm was developed that allows us to know the sun position as Cartesian coordinates, which are those that govern the interpretation of the image's spatial position [35]. The solar position algorithm was presented by Reda and Andreas (2013) to obtain the sun's geographical location [36].
Therefore, the sun's position can be ascertained on any day of the year regardless of the sky conditions. In the 400 images we analyzed, different areas were plotted where we observed the same pixel intensity -this was done to identify the pixel value corresponding to each sky type, which would attenuate the DNI 1 to 3 minutes later. The method applied to define these uniform areas was visual using an algorithm elaborated. We checked the uniformity of the areas by extracting the values of all the pixels for each selected area. In this way we could ensure that the pixels had the same values. For conditions with different pixel values, the area was considered non-uniform and a new area was redrawn. We selected areas near the sun's position to avoid those areas where sunlight might provide information that was not specific to the sky type. This is shown in Figure 1. During the area selection process, the digitized channel values attributed to each area were extracted. Using an algorithm developed for the purpose, the pixel intensity averages were obtained for the areas, and for each channel [R G B H S V E]. Since the average values of the areas are related to solar height, we added an external geographic variable [A(α)] to our value sets, which describes the selected areas. We will call the set of values for [R G B H S V E A] digitized data (DD). With this method, it is possible to obtain the digitized data causing the attenuation recorded on the surface. The temporal space between the selected area and the subsequent position of the sun fluctuates between 1 and 2 min.

Combination of Digitized Data
The digitized channel values for blue sky during the day are not the same in the morning as they are in the afternoon. The path that the sunlight takes through the atmosphere to the measurement point changes, giving different shades of color in the sky owing to the light dispersion caused by the presence of atmospheric particles. The blue wavelength is mainly scattered in the atmosphere while the raindrops that make up the clouds do not disperse the sunlight wavelengths, hence making them appear white to gray depending on their density (Rayleigh scattering). Accordingly, we used other variables to identify or contrast the cloud or sky pixels, such as dividing the digitized data (e.g., R/B) or multiplying certain digitized data by the sin of the solar height (e.g., V sin (α)), among others. Essentially, we performed these division and multiplication operations on the DDs because this is a widely used mechanism in digital image analysis. In our case, each channel is not equally relevant for characterizing the clouds; therefore, it was necessary to manipulate the channels in this way to distinguish the cloud pixels from the sky pixels. These results were then used as the new input variables for the models. In this way, adequate patterns were acquired to determine the DNI attenuation caused by cloud presence.

Estimation of the DNI Attenuation Factor Using Multiple Linear Regression
An initial model for measuring the surface DNI attenuation factor starts with the digitized image data, developed using the multiple linear regression method. This model is similar to simple linear regression, the difference being that we need to consider more than one explanatory variable (x 1 , x 2 , ..., x n ). The multiple linear regression model, using the least squares criterion in matrix form, finds the coefficients (β 1 , β 2 , ..., β k ) that best represent the dependent variable (y) via the independent variables.
Considering that the regression function that relates both variables is linear, the model to be solved in its matrix form is defined by Equation (1): where y est is the estimated direct solar irradiance in percentage of attenuation plus an error term between the independent and dependent variables.
To solve the proposed model, an algorithm was developed that uses the tools offered by the software employed to resolve the multiple linear regression. The independent input variables chosen for the model were DD.
Once the 8 independent variables were defined, we will call register to each area selected with their respective [R G B H S V E A] value. In total, 400 registers were selected. To do this, 300 registers were chosen randomly, generating a matrix (M1) with 2,400 data points in total (300 registers multiplied by 8 variables) for training the programmed multiple linear regression model. The algorithm allows us to obtain the regression coefficients (β i ) necessary to integrate the regression function. We used digitized image analysis to combine the DD set, to obtain the maximum correlation coefficient (Equation (2)) for the direct solar radiation measured on the surface in percentage attenuation terms (y mea ) and the direct solar radiation estimated by the model (y est ).
where σ y est y mea is the covariance between the estimated and measured input databases, σ y est is the standard deviation of y est and σ y mea is the standard deviation from y mea . When we observed combination types where the correlation coefficient tended to decrease, we did not continue with those combination structures. Once the best combination was found from the DD providing the best DNI estimation, the multiple linear regression function was validated with the remaining 100 registers (M2, with 800 data points in total-100 registers multiplied by 8 variables), where the same combination of the variables remained in the data validation set.

Estimation of the DNI Attenuation Factor Using polynomial Regression
Polynomial regression is another model used to estimate the DNI attenuation factor through the sky camera. Similarly, by means of the least squares criterion, it returns the coefficients (p) for a polynomial of degree (n) which is better adapted to the data behavior trend; the length of p is n + 1. Unlike the multiple linear model, when applying the polynomial model, one or two independent variables are required to generate the regression function that fits a nonlinear relationship between the value with independent variable exponents (x n ) versus the dependent variable or explanatory variable (y est ). The matrix representation is shown in Equation (3).
Following graphical observation, the value dispersion between the estimated attenuation factors was calculated using the clear sky model, and the polynomial behavior was observed. By performing mathematical operations combining digitized channels and solar height to obtain a single independent variable, we selected the one that had the highest correlation coefficient in terms of its linear relationship. Once the independent input variable was defined, we created an algorithm that provided the polynomial coefficients that best represented the dependent variable. To train the model, we worked with 300 registers at random, that is, 1200 data points in total according to the combination found. Once the polynomial function was defined, we validated it by making the same channel combination for the 100 registers (400 data points) missing from the database. The polynomial regression using two independent input variables to generate the polynomial function z = f (x, y) (x and y being the independent variables) presented a maximum correlation coefficient of r = 0.5 therefore we decided to continue with the polynomial regression of one independent variable.

Estimation of the DNI Attenuation Factor Using Neural Networks
We also implemented the neural networks method to estimate direct solar irradiance (y est ). For this, the Neural Net Fitting function was utilized, a function included in the interactive applications (APPS) contained in the MATLAB tools. An artificial neural network perceptron of interconnected multilayers was used, as shown in Figure 2. The input data used for executing the neural networks were the same independent variables (x j ) as in the previous models. Each data entry in the neuron is multiplied by validating its corresponding weight or significance (w ij ). All the weighted inputs are summed, and a neuron activation function is activated, such that it generates neuronal learning; when passing to the last neuronal layer, the output data are obtained -in our case, the (y est ) results. A representation of the artificial neural networks model is described, as in Equation (4), where N is the total number of neurons [37,38].
The Levenberg-Marquardt algorithm was used to train the neurons since it has a faster MSE convergence speed, facilitating the computation time [37,38]. To maintain the same proportion as in the previous models, we used 300 registers to train the neural network; 50 registers were occupied for validation and 50 for testing (400 registers in total). Multiple combinations were performed between the dependent variables to find the digitized data combination that returned the r and nRMSE with the best results.
Once the best input values were found to explain the dependent variable, the neural network was trained by modifying the number of hidden layers such that the r and RMSE results had the best relationship between the estimated and the measured solar irradiance.

Results
The results from the proposed models are presented to estimate the attenuation factor of the direct solar irradiance at the pixel level by means of images. To develop the models, we worked with 400 images chosen from a file containing a year's worth of images captured every minute from February 2016 to January 2017 with their respective DNI surface measurements.
To quantify the models' validity, the statistical estimation analysis was obtained for the three cases: RMSE-the root mean square error (5), in which the results are shown as a value between 0 and 1 (0 being 0 attenuation and 1 being 100% attenuation), and where N is the total number of estimations; nRMSE-the normalized mean square error (6), measured in percent (%); MBE-the mean deviation (7), with the same unit as the RMSE; and nMBE-given by (8), expressed in (%) and the dimensionless correlation coefficient Equation (2). Table 1 displays the digitized data (DD) combinations of M1, carried out randomly, that had the highest r value from which the best combination was chosen; this was based on the behavior of the digitized channels in performing the linear regression training with 2400 data points, and then validating the model for M2 with the remaining 800 data points.

Multiple Linear Regression Model
In Table 1, we observe how the DD combination that allows a high correlation between the measured and estimated data dominates the image's red component. Once the best digitized data (DD) combination was selected and used as the model's independent input values, we proceeded to validate Equation (9) on the M2 matrix. Table 2 shows the coefficients obtained when developing the multiple linear regression during the model training and when applied to the same DD combinations performed in M1.   Within the model validation, we noticed that the highest correlation between y mea and y est was not necessarily presented in the same combination as that which gave the highest training value (r = 0.782). Considering the same combination as in the test data, there is a value of r = 0.677 with an nRMSE = 42.36%, while for combination No. 3, a value of r = 0.681 was obtained with an nRMSE = 42.34% and an overestimated value of nMBE = 10.29% (negative), indicating it was a slightly greater correlation than that selected in the training.
The training (M1) and test (M2) graphs ( Figure 3) display the behavior of the multiple linear regression model on a linear trend; both graphs have a high concentration of points in larger attenuations, but far from the linear trend.

Polynomial Regression Model
As discussed previously, it is essential to introduce an input variable to develop the polynomial regression. The input variable was selected after creating combinations between the DDs. After 50 tests between the DD combinations, we noticed that the r value tendencies did not present better results. For this reason, in Table 3, there are only 4 combinations shown -these were chosen because they had the highest r value. Once the input variable was defined as a DD combination, by means of an algorithm developed, the characteristic polynomial function (Equation (10)) was obtained that best defined the percentage of attenuation suffered by the solar irradiance, as measured on the surface.
Considering the x-axis as the percentage of measured DNI attenuation and the y-axis as the percentage of estimated DNI attenuation, we used a polynomial adjustment created in MATLAB to compute the polynomial which best described our scatter plot for its highest value at r = 0.766; the grade was n = 9, which is plotted in the same graph (Figure 4). For this reason, the polynomial regression for the M1 matrix was developed, applying a polynomial of degree 9 ( Figure 5) and, using a developed algorithm, the polynomial coefficients (p) were obtained. Figure 6 shows the model validation applied to the M2 matrix.  Predictably, both polynomial adjustment plots present a similar trend but with a slight precision with respect to the scattering of the points. Table 4 shows the polynomial coefficients obtained from the M1 training and the statistical results when validating the model with the M2 matrix. Regarding the input variables in the polynomial model, the ratios between the RGB channels multiplied by the gray scale data (E) predominated; this did not depend on the solar height recorded in each image. Analyzing Table 4, we note that, with the polynomial model, a relationship coefficient between the y mea and the y est of r = 0.6756 was obtained with an nRMSE = 42.77% and an overestimated value of the nMBE = 10.633% (in negative), showing a very similar fit to that of the multivariable linear regression model.

Neural Networks Model
Starting from the complete DD set, the number of variables was reduced, obtaining different combinations among the DDs to identify which contributed less to the DNI estimation. Using Matlab APPS to develop the model by means of artificial neural networks, we acquired the following statistical results (Table 5), where the term AD refers to all the data, TRA refers to the training data, VAL refers to the validation data and TST refers to the test data. Table 5. Combination of the digitized data as input parameters for the neural networks model. Considering Table 5, we see that different DD combinations are presented, which are input information parameters required by the neural network for the training. It operates by finding patterns of interest between the DD and the percentage of attenuation values for the measured DNI (y mea ) allowing the neural network to learn.

R G B H S V E
The model results were obtained by training the network, which occupied 10 neurons in a perceptron of two layers, (1. The hidden learning layer and 2. The output layer). After one hundred input parameter combinations were introduced into the network, the combination [R G B H S E A] was selected with the highest r value to make computational runs with a greater number of neurons -this was done to achieve better network learning and thus, generate the computational model that obtained the estimated DNI attenuation percentage with the best approximation to the surface measurement. Table 6 shows that the final computational model for estimating the DNI attenuation percentage using artificial neural networks was defined for a network model containing 60 neurons in the hidden learning layer. Table 6. Results of the neuronal network training varying the number of neurons.

Functioning of the Models
As we can see in Figures 3,5 and 6 the models better estimate DNI attenuation for partly cloudy and totally cloudy skies. With respect to the linear and exponential trend lines, we see that the correlation between the attenuation of the measured DNI and the estimated DNI is greater in the range of values between 0.6 to 1 (considering 1 as 100% attenuation). Obtaining attenuations above 0.6, we performed digitized analysis of the images where the percentage of cloudiness was very dense or we analyzed images where low clouds were present, usually nimbostratus or cumulonimbus-types in different seasons and at different times of the year. Values below 0.6 indicated that we were assessing images where high clouds, clear skies or low-density clouds were present, analogous to different seasons and times of the year. These types of cloud, usually altostratus or cumulus clouds which were not very dense, attenuate the DNI lesser degrees.
Regarding Figure 7, which corresponds to the neural networks model, we see that the training graph exhibits a more uniform correlation between the attenuation of the measured and estimated DNI for all sky conditions in all the range of values, from clear skies (0% attenuation) to cloudy skies (1 ; i.e., 100% attenuation). In terms of validation and testing, we understand that the model functions better (as with the 2 previous models) for partly cloudy and cloudy skies.

Discussion
We have determined that, of the three methods considered and applied to estimate the percentage of DNI attenuation under all sky conditions, the neural networks model provided the best results in terms of the correlation coefficient obtained, achieving a calibration system for the digitized data and the DNI attenuation. To improve the model's efficiency, more parameters need to be integrated to help explain the dependent variable, such as adding cloud classification. Similarly, it is important to experiment by adding more layers within the neural network so that learning by patterns of interest becomes more and more accurate, thus obtaining output data comparable to what is measured or known.
For all three methods, the estimates might be better if thousands of images were processed, thus minimizing and compensating for errors. Another important point is that we should consider the presence of different cloud layers. Viewing the cloud cover from a terrestrial viewpoint, each cloud has a particular value in the digitized channels. The cloud is seen with particular digitized data and, therefore, it has an attenuation coefficient. However, it is possible to have the same (or similar) digitized data for a cloud but with a different attenuation coefficient because there are further cloud layers above the low cloud, hidden from the sky camera. Such situations can occur and, consequently, they increase the error in the models.

Conclusions
In this paper, we presented three models for estimating the DNI attenuation percentage under all sky conditions. Images were taken using a sky camera with a rotational shadow band (model TSI 880). This is the first study carried out for Mexico City that estimates direct solar irradiance based on determining the attenuation resulting from the percentage of cloudiness that is present.
Estimation by means of the multiple linear regression model offers an r reliability of approximately 0.70 while it can compute an error of 42%; however, it also achieves an approximation of r = 0.78, decreasing the normalized error to 33%. The best combination was modeled to obtain the final function of the linear model. The red component, used as the input parameter, turned out to be the most predominant for the model, showing that the cloudiness contrast can be detected more easily on this channel than on the other channels. The final model function consisted of 10 summed terms, of which 9 multiply the respective combination between the (DD) by the coefficients extracted from the multiple linear regression.
With regard to the polynomial model, the best model representation obtained was a polynomial function of degree n = 9, achieving a certainty of r = 0.76 with a normalized error of 35%. The most important combined digitized data for generating a single independent input variable are the RGB channels and the pixel values in grayscale. Similarly, for the multiple linear regression, the final function consisted of 10 summed terms, where 9 of them respectively multiply the coefficients of the polynomial by the value of (DD) raised to a certain exponential degree.
The model built using neural networks achieved a better approximation of the data for the direct solar irradiance attenuation percentage. Through neural networks, it is possible to have data reliability of r = 0.82 in a layer of 60 neurons. The best DD combination forms part of the information data for the neuronal model learning. Unlike the two previous cases, the DD combinations are integrated directly. The input data are entered separately and mostly occupy the defined digitized data, excluding the V channel, which represents the brightness in the image. The final model turns out to be an execution code which determines the weights W ij and assigns them to the digitized data with respect to the already known y mea data.
It was possible to develop a first DNI estimation system using sky images applicable in real time for Mexico City. From these sky cam images, we developed a tool that can be applied to any measurement system, complementing the solar irradiance information that reaches the Earth's surface. Our work serves as a preliminary study that helps anticipate drops in solar irradiance. Consequently, if this system is installed in a solar plant, the operator can predict how much irradiance might be lost as a result of passing clouds. This system may also contribute to the classification of clouds depending on the level of attenuated irradiance and in determining the percentage of local cloudiness in the sky on a particular day. The study complements a previous work that predicted solar irradiance over the short or medium term in the hope of determining the vector movement of clouds. Accordingly, the present article aims to be a reference for predicting DNI in the CSP environment.