Retrieval of Aerosol Optical Properties via an All-Sky Imager and Machine Learning: Uncertainty in Direct Normal Irradiance Estimations

: Quality-assured aerosol optical properties (AOP) with high spatiotemporal resolution are vital for the accurate estimation of direct aerosol radiative forcing and solar irradiance under clear skies. In this study, the sky information from an all-sky imager (ASI) is used with machine learning (ML) synergy to estimate aerosol optical depth (AOD) and the Ångström Exponent (AE). The retrieved AODs (AE) revealed good accuracy, with a dispersion error lower than 0.07 (0.15). The retrieved ML AOPs are used to estimate the DNI by applying radiative transfer modeling. The estimated ML DNI calculations revealed adequate accuracy to reproduce reference measurements with relatively low uncertainties.


Introduction
Aerosol particles constitute the highest source of uncertainty in direct aerosol radiative forcing and solar irradiance estimations under cloud-free conditions [1].So, accurate aerosol optical and microphysical properties are vital to alleviating the uncertainties in solar resource and climate change studies.Currently, several monitoring techniques are available for AOP retrieval, based on satellite and ground-based measurements.The most widely used ground-based network is AERONET (AERosol RObotic NETwork).AERONET delivers AOP at the highest temporal resolution (5-15 min), using several inversion algorithms and CIMEL sun-photometers [2][3][4].From space, several passive sensors aboard satellites are available, such as MODIS (MODerate resolution Imaging Spectroradiometer) [5][6][7] and MISR (Multi-angle Imaging SpectroRadiometer) [8].Despite the great performance of those aerosol remote sensing data sets, there are still some limitations.
Although ground-based measurements are optimal in terms of accuracy, a limitation is that their global coverage is sparse.On the other hand, satellite-based AOP retrievals provide approximate global coverage; a limitation is that their temporal variability is up to two measurements per day.For this reason, several alternative techniques for AOP retrieval have emerged in the last few years to fill the gap and extend the availability of accurate AOP.In this study, a supervised ML algorithm is applied to retrieve AOPs such as the spectral AOD and AE by using the sky information captured from an all-sky imager (ASI) and the aerosol information from AERONET.In addition, the potentiality of using the ML retrieved AOPs to estimate the direct normal irradiance (DNI) is presented using radiative transfer modeling, analyzing their levels of uncertainty.

Data
The data applied in this work were acquired from an ASI and a CIMEL sun-photometer installed at the National Observatory of Athens (NOA) in Thissio, Greece.The CE318 sun-sky photometer installed in the station is the standard instrument of AERONET.For the sake of this study, Level 2.0 data from the latest AERONET Version 3 were used, including pre-field and post-field calibrations [3], providing quality-assured data that are optimal for the application of the proposed methodology.Moreover, the used ASI is a commercial Mobotix Q24M model, which captures images from the entire upper hemisphere every 640 µs and stores them in 24-bit JPEG format with a spatial resolution of 1024 × 768 pixels.In particular, the sensor has a red-green-blue (RGB) filter.So, images on those three spectral channels (440, 500, and 675 nm) were used, including color intensities ranging from 0 to 255.
The AERONET measurements were temporally synchronized with the closest ASI image, spanning a time frame between 1 January 2021 and 18 November 2021.All the images are examined to be under cloud-free conditions by visual inspection.In addition, measurements with a solar zenith angle (SZA) greater than 70 • are eliminated from the analysis to prevent cases with the sun at the edges of the image, where many obstacles interfere.In total, 3212 images were retained to apply the proposed methodology.

Methodology
In this study, a machine learning algorithm, the Light Gradient Boosting Machine (LGBM) [9], is applied to retrieve AOD and AE.The data used as input parameters to the ML algorithm are the following: (1) RGB information from the ASI, (2) SZA, and (3) total column water vapor (TCWM).TCWV data were obtained through the CAMS (Copernicus Atmosphere Monitoring Service) reanalysis product [10].The RGB ASI information are extracted from the following two different aspects: Firstly, 60 pixels were selected from the RGB images outside the sun saturated area including (1) 30 pixels with a 2 • step across the principle plane and (2) 30 pixels with a 2 • step across almucantar.Secondly, the saturation area (SAT in %) is applied, which is defined as the ratio between the number of pixels around the sun that includes sunlight and the total number of image pixels.SAT is related to Mie theory [11], providing valuable information for ML algorithm training.
The dataset has initially been separated into two datasets, i.e., the train and the test, which include 70% and 30% of the whole dataset, respectively.In total, four LGBM models are trained, three for retrieving the spectral AOD, and one for AE.For the AOD 440nm , AOD 500nm , and AOD 675nm ML models, the corresponding images for the blue, green, and red channels are used, while for the AE, all RGB values are applied.All four models included the SZA and TCWV.Before the training process, all input parameters are normalized to include values solely between 0 and 1.In particular, the Min-Max normalization process is applied to all parameters except SZA, where the cosine function is used.To achieve optimal accuracy, a randomized search procedure was performed during the LGBM algorithm training to find the best combination of hyperparameters, including a 10-fold cross-validation process using the mean square error as a loss function.After the training of the LGBM algorithm, the LGBM scheme with the highest performance, including the best combination of hyperparameters, is implemented to evaluate the test dataset.

DNI Estimations and Uncertainties
In the second section of the results, the DNI is calculated using the libRadtran software package and the UVSPEC radiative transfer code [12,13].In the libRadtran simulations, the radiative transfer equation is numerically solved using the SDISORT radiative transfer solver, which includes a pseudospherical approximation [14].The libRadtran simulations were calculated using a band parameterization based on an optimized version of the correlated-k approximation [15], KATO2.Regarding aerosol, the default background aerosol model of Shettle (1989) [16] is used, considering constant climatological values of  The coefficient of determination (R 2 ) ranged between 0.79 and 0.86 for AODs and equaled 0.85 for AE 440-675nm , revealing an adequately good linear relationship between the ML-based AOPs from ASI and AERONET (Table 1).

DNI Estimations and Uncertainties
In the second section of the results, the DNI is calculated using the libRadtran software package and the UVSPEC radiative transfer code [12,13].In the libRadtran simulations, the radiative transfer equation is numerically solved using the SDISORT radiative transfer solver, which includes a pseudospherical approximation [14].The libRadtran simulations were calculated using a band parameterization based on an optimized version of the correlated-k approximation [15], KATO2.Regarding aerosol, the default background aerosol model of Shettle (1989) [16] is used, considering constant climatological values of AERONET single scattering albedo, asymmetry parameter, and ozone across the study station, while the AE and AOD are applied from AERONET measurements and ML retrievals.The DNI estimations using the retrieved ML and AERONET AOPs are hereafter abbreviated as "DNI ML" and "DNI AERONET,", respectively.
Figure 2a illustrates the linear relationship between the ML and AERONET DNI.An R 2 of 0.90 is found, revealing an adequately good linear relationship.The MBE statistic revealed a modest bias (0.54 W m −2 ), indicating a small overestimation of the ML DNI compared to AERONET.The highest discrepancies between ML and AERONET DNI are observed for high AOD cases (Figure 2a), providing a relatively high DNI overestimation due to a relatively high underestimation of AOD values.AERONET single scattering albedo, asymmetry parameter, and ozone across the study station, while the AE and AOD are applied from AERONET measurements and ML retrievals.The DNI estimations using the retrieved ML and AERONET AOPs are hereafter abbreviated as "DNI ML" and "DNI AERONET,", respectively.Figure 2a illustrates the linear relationship between the ML and AERONET DNI.An R 2 of 0.90 is found, revealing an adequately good linear relationship.The MBE statistic revealed a modest bias (0.54 W m −2 ), indicating a small overestimation of the ML DNI compared to AERONET.The highest discrepancies between ML and AERONET DNI are observed for high AOD cases (Figure 2a), providing a relatively high DNI overestimation due to a relatively high underestimation of AOD values.2b depicts the differences between AERONET and ML, AOD, and AE and the resulting differences in DNI when using the simulated AOPs instead of AERONET.It is apparent that the AOD retrieval performance regulates the ΔDNI.In particular, ML AOD underestimation (ΔAOD > 0) or overestimation (ΔAOD < 0) reflects negative or positive ΔDNI up to 300 and 150 W m −2 , respectively.Regarding AE, a negative ΔAE440−675nm leads to a more negative ΔDNI, and vice versa.

Conclusions
In this study, an ML-based approach is evaluated to simulate AOPs using the sky information captured from an ASI.Auxiliary parameters were also used, such as the water vapor column and solar geometry.Considering the adequate ML AOP simulation performance, the potential use of them for DNI estimations was investigated.ML DNI estimations revealed quite good results with a Pearson correlation coefficient of 0.95 and biases around 0.54 W m −2 , presenting the applicability of ML AOPs for solar resource estimation with small uncertainties.Figure 2b depicts the differences between AERONET and ML, AOD, and AE and the resulting differences in DNI when using the simulated AOPs instead of AERONET.It is apparent that the AOD retrieval performance regulates the ∆DNI.In particular, ML AOD underestimation (∆AOD > 0) or overestimation (∆AOD < 0) reflects negative or positive ∆DNI up to 300 and 150 W m −2 , respectively.Regarding AE, a negative ∆AE 440−675nm leads to a more negative ∆DNI, and vice versa.

Conclusions
In this study, an ML-based approach is evaluated to simulate AOPs using the sky information captured from an ASI.Auxiliary parameters were also used, such as the water vapor column and solar geometry.Considering the adequate ML AOP simulation performance, the potential use of them for DNI estimations was investigated.ML DNI estimations revealed quite good results with a Pearson correlation coefficient of 0.95 and biases around 0.54 W m −2 , presenting the applicability of ML AOPs for solar resource estimation with small uncertainties.

Figure
Figure 2b depicts the linear relationship between the discrepancies of DNI (ΔDNI = DNIAERONET − DNIML) and AOD500nm (ΔAOD = AODAERONET − AODML).Figure2bdepicts the differences between AERONET and ML, AOD, and AE and the resulting differences in DNI when using the simulated AOPs instead of AERONET.It is apparent that the AOD retrieval performance regulates the ΔDNI.In particular, ML AOD underestimation (ΔAOD > 0) or overestimation (ΔAOD < 0) reflects negative or positive ΔDNI up to 300 and 150 W m −2 , respectively.Regarding AE, a negative ΔAE440−675nm leads to a more negative ΔDNI, and vice versa.

Figure
Figure 2b depicts the linear relationship between the discrepancies of DNI (ΔDNI = DNIAERONET − DNIML) and AOD500nm (ΔAOD = AODAERONET − AODML).Figure2bdepicts the differences between AERONET and ML, AOD, and AE and the resulting differences in DNI when using the simulated AOPs instead of AERONET.It is apparent that the AOD retrieval performance regulates the ΔDNI.In particular, ML AOD underestimation (ΔAOD > 0) or overestimation (ΔAOD < 0) reflects negative or positive ΔDNI up to 300 and 150 W m −2 , respectively.Regarding AE, a negative ΔAE440−675nm leads to a more negative ΔDNI, and vice versa.

Figure
Figure 2b depicts the linear relationship between the discrepancies of DNI (∆DNI = DNI AERONET − DNI ML ) and AOD 500nm (∆AOD = AOD AERONET − AOD ML ).Figure2bdepicts the differences between AERONET and ML, AOD, and AE and the resulting differences in DNI when using the simulated AOPs instead of AERONET.It is apparent that the AOD retrieval performance regulates the ∆DNI.In particular, ML AOD underestimation (∆AOD > 0) or overestimation (∆AOD < 0) reflects negative or positive ∆DNI up to 300 and 150 W m −2 , respectively.Regarding AE, a negative ∆AE 440−675nm leads to a more negative ∆DNI, and vice versa.