Deep Neural Networks for Aerosol Optical Depth Retrieval

: Aerosol Optical Depth (AOD) is a measure of the extinction of solar radiation by aerosols in the atmosphere. Understanding the variations of global AOD is necessary for precisely determining the role of aerosols. Arctic warming is partially caused by aerosols transported from vast distances, including those released during biomass burning events (BBEs). However, measuring AODs is challenging, typically requiring active LIDAR systems or passive sun photometers. Both are limited to cloud-free conditions; sun photometers provide only point measurements, thus requiring more spatial coverage. A more viable method to obtain accurate AOD may be found through machine learning. This study uses DNNs to estimate Svalbard’s AODs using a minimal set of meteorological parameters (temperature, air mass, water vapor, wind speed, latitude, longitude, and time of year). The mean absolute error (MAE) between predicted and true data was 0.00401 for the entire set and 0.0079 for the validation set. It was then shown that the inclusion of BBE data improves predictions by 42.167%. It was demonstrated that AODs may be accurately estimated without the use of expensive instrumentation, using machine learning and minimal data. Similar models may be developed for other regions, allowing immediate improvement of current meteorological models.


Introduction
Aerosols play a critical role in most atmospheric processes, including the atmospheric thermal structure by absorbing and scattering incoming and outgoing radiation, leading to accelerated climate change. This is most evident in the Arctic, which is currently warming two to three times faster than the rest of the Earth [1]. This phenomenon, known as Arctic amplification, reduces glacial coverage, increases sea levels, and disrupts global weather patterns [2]. There is a consensus that this is due in large part to the presence of atmospheric aerosols such as black carbon (BC) and sulfate (SO42-) [3].
Unfortunately, aerosol readings in the Arctic are sparse due to temporal and spatial limitations [4]. Moreover, the sources of such particles tend to be predominantly remote, further complicating measurements since it is difficult to track their origin. To track aerosol presence in the atmosphere, various remote sensing methods are employed, including Lidar and AERONET (Aerosol Robotic Network) sun photometer systems. Unfortunately, these instruments suffer from technological limitations, including satellite bias and spatial limitations [5]. Since these instruments are attempting to capture a relatively weak aerosol signal from strong background reflections, all AOD measurements in the Arctic are currently limited to cloud-free and ice/snow-free regions. In addition, ship-based sun-photometers can only be active over a short time period when there is sunlight and have a low spatial reach. Moreover, satellites are expensive and not always feasible for certain regions such as the Arctic [6]. The limited data that can be retrieved is analyzed to determine the aerosol optical depth (AOD), the most comprehensive parameter for determining aerosol load [7]. This lack of aerosol data in the Arctic region makes it incredibly difficult to generate proper climate models for this critical region. Therefore, figuring out how to obtain more frequent AOD measurements is critical to understanding aerosol involvement in climate change and the improvement of current climate models. Machine learning provides a possible solution and work-around for the expensive instruments traditionally required for AOD estimation. [8].
In this study, a deep neural network (DNN) is employed to estimate AODs using easy to obtain parameters from the Norwegian Climate Center that include temperature, wind speed, water vapor, date/time, latitude, and longitude. These predictors have been found to have a notable correlation with AOD levels, particularly wind speed [9]. The main goal of this paper is to demonstrate the applicability of the deep neural network method. For such a study, a relatively clean area without local emission sources was needed; thus, the Arctic archipelago of Svalbard was chosen.
In addition, biomass burning events (BBEs) that give rise to atmospheric aerosols such as black carbon (BC) and sulfate (SO42-) have been shown to alter AOD levels significantly and are thought to be one of the major factors behind the current Arctic amplification. Therefore, this study determines if the inclusion of BBE data in DNN regression models improves prediction accuracy.

Computational Background
A DNN is a type of machine learning regression model, meaning that the input parameters are turned into a multilayer 'function', and the function can be applied to new inputs to predict unknown values. Given input data, the machine creates a map of virtual neurons that connect the input data to the predicted output. For each neuron, a weight exists, which is initialized randomly and adjusted as the machine learns. The weights are adjusted with each iteration to ensure that the predicted output is as close as possible to the true values. A loss function measures the difference between the predicted output and the true values. The losses are fed back to the network, and the system 'learns' by adjusting its weight between the layers to minimize losses. The learning rate can be regulated to maximize prediction accuracy and efficiency.
There are two major challenges in using DNNs: premature convergence, and overfitting. Premature convergence occurs when the weights and bias of the DNN settle at the optimization of the loss function's local minima instead of the global minima. Overfitting occurs when the DNN is highly tailored to a given training dataset but is unfit and not adaptable to other test sets. To avoid premature convergence, the learning rate can be adjusted. To avoid overfitting, random dropout is used to randomly select and omit a certain percentage of data from the training data. During every iteration, the learned weights will be applied to the omitted data, also called the validation data, to determine its performance on data it has never seen before. This also provides a true representation of the machine's performance in the real world, where input data does not have paired known output data.
In this project, the regression model builds a function where the meteorological parameters are the inputs and AOD values are the outputs. When this function is applied to new meteorological inputs, it can be used to estimate the respective AOD values.
To improve the machine's minimization of the loss function, properties of the machine, otherwise known as hyperparameters, were manipulated, including the number of hidden layers and the number of nodes per layer.

Defining the Area
This work used data from the Svalbard archipelago and its surrounding maritime areas, waters adjacent to the west coast of Spitsbergen, which includes the northeastern Greenland Sea and eastern Fram Strait. The authors deliberately used the data from Svalbard, since the main goal of the paper was to demonstrate the applicability of the deep neural network method. Thus, Svalbard, a relatively clean area without local emission sources, was chosen.
The measurements were conducted in the distance of 500 km from Longyearbyen, chosen as a central point in the studied area. Within the Svalbard archipelago, three sites were specified: Ny-Ålesund (Nya), Longyearbyen (Ly), and Hornsund (Hor). Then, the data was divided by the buffer of each land location (Hornsund, Longyearbyen, Ny-Ålesund) within a distance of 75 km for each station. Those sectors were named after each station's area. The boundaries of these areas are mapped in the figure below ( Figure 1). The rest of the measurements were named as the open waters station.
Svalbard, since the main goal of the paper was to demonstrate the applicability neural network method. Thus, Svalbard, a relatively clean area without loca sources, was chosen.
The measurements were conducted in the distance of 500 km from Lon chosen as a central point in the studied area. Within the Svalbard archipelago, were specified: Ny-Å lesund (Nya), Longyearbyen (Ly), and Hornsund (Hor) data was divided by the buffer of each land location (Hornsund, Longyearbye sund) within a distance of 75 km for each station. Those sectors were named station's area. The boundaries of these areas are mapped in the figure below The rest of the measurements were named as the open waters station.

AERONET (MAN): AOD Data
The Maritime Aerosol Network (MAN), which collects ship-borne AOD photometric data, has been developed as a reliable component of the AERON be used to provide high-quality AOD depths with a small amount of known u [10].
Since this project focuses on the glacial island of Svalbard, only data fro rounding seas (Greenland, Fram Strait) were used. Data were used in the ana urements from 13 research cruises (mostly from R/V Oceania, R/V Polarstern, R yen, and R/V Knorr), which is almost 56 % of the original number of data that c research area [11]. Only level 2.0 data from the MAN network were used, enough pre-and post-processing algorithms applied to the data by AERONE tion and data processing standards).
The dataset is composed of results for each of the three locations (Horns yearbyen, and Ny-Å lesund) in Spitsbergen Island for the period of 2007-201 photometers only revealed data from non-cloudy daytime. Thus all data used ject was collected from June, July, and August when Svalbard experiences 24-h

AERONET (MAN): AOD Data
The Maritime Aerosol Network (MAN), which collects ship-borne AOD from sun photometric data, has been developed as a reliable component of the AERONET and can be used to provide high-quality AOD depths with a small amount of known uncertainty [10].
Since this project focuses on the glacial island of Svalbard, only data from the surrounding seas (Greenland, Fram Strait) were used. Data were used in the analysis measurements from 13 research cruises (mostly from R/V Oceania, R/V Polarstern, R/V Jan Mayen, and R/V Knorr), which is almost 56 % of the original number of data that coincide the research area [11]. Only level 2.0 data from the MAN network were used, which has enough preand post-processing algorithms applied to the data by AERONET (calibration and data processing standards).
The dataset is composed of results for each of the three locations (Hornsund, Longyearbyen, and Ny-Ålesund) in Spitsbergen Island for the period of 2007-2015. The sun photometers only revealed data from non-cloudy daytime. Thus all data used in this project was collected from June, July, and August when Svalbard experiences 24-h sunlight.  Fires). From the catalog, the following attributes were used: latitude, longitude, brightness, date, time, and fire radiative power (FRP). Each pixel covers approximately 1 km 2 . Latitude and longitude are measured from the center of the pixel. Brightness refers to the brightness temperature in Kelvins of the hotspot/active fire pixel; in other words, the measure of photons at a certain wavelength received but presented in Kelvins. Date and time were measured for the acquisition date of the hotspot/active fire pixel. FRP measures the radiant heat output of detected fires in megawatts.
Based on previous studies identifying the source of aerosols and the pattern of aerosol transport in the Nye-Ålesund/Southern Svalbard area, which demonstrated that the primary source was northern Europe and Siberian Russia, data was collected from Russia, Norway, and Finland.

Making the BBE Data: Evaluating BBE Intensity
Since long wind transport of aerosols spreads particles over the course of approximately four to seven days [12], comparing the AOD and brightness value for the same day would not be very effective. For each data point in the MAN data, the BBE intensity for each day was calculated up to seven days before. The BBE intensity is a function of the fire's brightness value and the distance between the fire and the MAN datapoint's location of measurement. For each AOD measurement in the MAN dataset, corresponding fires from the week of measurement were selected, and the intensity was calculated. For a given pair of AOD measurements and BBE measurements, the intensity formula is shown in Equation (1), where I refer to the intensity, B refers to the brightness value, Lat and Long refer to the latitude and longitude of measuring location, respectively, and the MAN or the BBE are differentiated by subscript.
The intensities for all fires occurring that day were added together, this sum being the daily fire intensity. The seven intensities and seven FRPs were added as parameters for each of the countries, creating a total of 42 new parameters.
Overall, two versions of the machine were created: one which included BBE data (+BBE) in the training data (thus learning the importance of BBE data), and one which did not include BBE data (−BBE) in the training data. After the BBE columns were added, changes in the machine were also made to optimize performance.

Machine/Experiment Settings
Input parameters from each of the datasets are included in Table 1. For +BBE, 14 pairs of BBE intensity and FRP for the seven days leading up to the date of acquisition of the AOD for each of the three countries were also included.
In this DNN, the multiple parameters were used as the input layer, and the AOD (one single value) was used as the output layer. To avoid overfitting, random dropout was performed on the training data to randomly remove 20 % of the data and use it as the validation data.
For the loss function, the mean absolute error (MAE) was used instead of the mean squared absolute error; the MAE is less sensitive to outliers and, therefore, less affected by day-to-day measurement changes.  To optimize the loss function, this project uses an optimization algorithm called Adam [13], which updates weights every iteration instead of traditional stochastic methods of gradient descent. The learning rate was initialized at 0.001, and learning rate decay was used to decrease the learning rate over time as the predicted outputs approached the true values.
This was performed for 300 epochs. The deep neural network (DNN) was implemented using the Keras Application Programming Interface [14].

Linear Model vs. DNN
The unique capabilities of a DNN were demonstrated by comparing the results of the DNN to a typical linear model. In this project, the MAE between estimated AODs (predictions) and true AODs (original data) was used to represent the loss for each epoch. The change in MAE over 300 epochs for the linear model and the DNN is shown in Figure 2.
If the machine learns properly, the loss should decrease over the epochs. In the linear regression, the MAE remained high with an average value of 0.0865 for the MAE training and 0.109 for the MAE val , indicating that the machine was not improving. This was expected, as the linear regression model could not identify non-linear relationships and thus could not adjust weights and biases correctly. This was compared to the MAEs of the DNN model, which decreased after every epoch and stabilized around an average value of 0.00401 for the training data and 0.00443 for the validation data. The steady decrease of MAEs in the DNN indicates that the machine could properly adjust the weights and biases to make accurate AOD estimates.

DNN without BBE (-BBE) vs. DNN with BBE (+BBE)
BBEs are significant contributors to atmosphere pollutants, specifically aerosol particles in the Arctic region. In an attempt to improve AOD prediction accuracy, seven days' worth of biomass burning event (BBE) data before the date of each determined AOD value was added to the DNN parameter list to account for long wind transport. Upon addition of the BBE data, a significant improvement was observed with an MAEtraining of 0.002339 (MAEval = 0.002562) when BBE data was included, as compared to an MAEtraining of 0.00401 (MAEval = 0.00443) in the absence of BBE data over 300 epochs (Figure 3). Therefore, the inclusion of BBE data decreased the MAEtraining value by 0.001671 and decreased the MAEval by 0.001738. Using this difference, the MAEtraining after the inclusion of BBE data was 42.167% less than the previous model without BBE data. The inclusion of BBE data in the +BBE model resulted in a smaller difference between the losses (0.000223) of the testing data and validation data compared to the -BBE model (0.00042); this means that the inclusion of BBE data improved the machine's performance on data it had never seen before. This suggests that the inclusion of BBE data helped the machine avoid overfitting and is more accurate in real-world applications.
The relationship between true AOD values and predicted AOD values across 300 Therefore, the DNN was able to provide highly accurate AOD estimations that matched current AERONET AOD measurements using only common meteorological and geographical parameters, suggesting an alternative model using machine learning may be more feasible than the installation of new satellites.

DNN without BBE (−BBE) vs. DNN with BBE (+BBE)
BBEs are significant contributors to atmosphere pollutants, specifically aerosol particles in the Arctic region. In an attempt to improve AOD prediction accuracy, seven days' worth of biomass burning event (BBE) data before the date of each determined AOD value was added to the DNN parameter list to account for long wind transport. Upon addition of the BBE data, a significant improvement was observed with an MAE training of 0.002339 (MAE val = 0.002562) when BBE data was included, as compared to an MAE training of 0.00401 (MAE val = 0.00443) in the absence of BBE data over 300 epochs (Figure 3). Therefore, the inclusion of BBE data decreased the MAE training value by 0.001671 and decreased the MAE val by 0.001738. Using this difference, the MAE training after the inclusion of BBE data was 42.167% less than the previous model without BBE data.
The inclusion of BBE data in the +BBE model resulted in a smaller difference between the losses (0.000223) of the testing data and validation data compared to the −BBE model (0.00042); this means that the inclusion of BBE data improved the machine's performance on data it had never seen before. This suggests that the inclusion of BBE data helped the machine avoid overfitting and is more accurate in real-world applications.
The relationship between true AOD values and predicted AOD values across 300 epochs is shown for both −BBE and +BBE (Figure 4). The r 2 values for −BBE and +BBE are 0.9904 and 0.993, respectively, further indicating that the slope for the +BBE is 0.989, which is closer to 1 than the slope for the −BBE, 0.981. This suggests that the addition of BBE data improved AOD prediction accuracy. was added to the DNN parameter list to account for long wind transport. Upon addition of the BBE data, a significant improvement was observed with an MAEtraining of 0.002339 (MAEval = 0.002562) when BBE data was included, as compared to an MAEtraining of 0.00401 (MAEval = 0.00443) in the absence of BBE data over 300 epochs (Figure 3). Therefore, the inclusion of BBE data decreased the MAEtraining value by 0.001671 and decreased the MAEval by 0.001738. Using this difference, the MAEtraining after the inclusion of BBE data was 42.167% less than the previous model without BBE data.
(a) (b) The inclusion of BBE data in the +BBE model resulted in a smaller difference between the losses (0.000223) of the testing data and validation data compared to the -BBE model (0.00042); this means that the inclusion of BBE data improved the machine's performance on data it had never seen before. This suggests that the inclusion of BBE data helped the machine avoid overfitting and is more accurate in real-world applications.
The relationship between true AOD values and predicted AOD values across 300 epochs is shown for both −BBE and +BBE (Figure 4). The r 2 values for −BBE and +BBE are 0.9904 and 0.993, respectively, further indicating that the slope for the +BBE is 0.989, which is closer to 1 than the slope for the −BBE, 0.981. This suggests that the addition of BBE data improved AOD prediction accuracy.  When comparing the average of the true and predicted AODs from 2007 to 2015, our 23 results are consistent with other studies that indicate that BBEs cause an increase in AOD. 23 Studies have attributed this effect of increased AODs to long wind transport. Long wind 23 transport of biomass burning events and industrial anthropogenic burning events have 23 been attributed as the major causes of the warming of the Arctic due to the transport of 23 pollutants and increased emissions [15]. 23 This study presents a DNN algorithm to accurately predict AOD values from only a 24 limited number of meteorological and geographical parameters. This exact model, how-24 ever, is not applicable to other regions. Since the primary goal of the paper was to demon-24 strate the applicability of the deep neural network method, for the purpose of this paper, 24 the authors focused only on the described region. Accurate AOD levels improve atmos-24 pheric models, allowing for better simulations and predictions, which are especially im-24 portant when addressing Arctic amplification. In addition, AOD retrieval is particularly 24 challenging in the Arctic due to the high reflection of snow and issues with large solar 24 zenith angles [16]. Therefore, remote sensing in the Arctic region is scarcely used, and 24 most satellite aerosol products do not have sufficient coverage in the Arctic [17]. Instead, 24 existing aerosol retrieval algorithms for passive remote sensing focus mainly on snow and 25 cloud-free regions. Due to this, there is much uncertainty regarding the influence of aero-25 sols and aerosol activity in the Arctic [4]. Therefore, it is especially important to both clar-25 ify existing data in the Arctic and use the prediction ability of DNNs to expand and further 25 extrapolate data. 25 This project only focused on data from Svalbard since its primary purpose was to 25 examine the potential of using DNN models for predicting AOD values. In the future, the 25 DNN model for AOD estimation or retrieval may be developed for other locations which 25 also lack significant satellite coverage. In urban areas, for example, AOD retrieval is par-25 ticularly important in the context of air quality and human health [18]. Additionally, fu-25 When comparing the average of the true and predicted AODs from 2007 to 2015, our results are consistent with other studies that indicate that BBEs cause an increase in AOD. Studies have attributed this effect of increased AODs to long wind transport. Long wind transport of biomass burning events and industrial anthropogenic burning events have been attributed as the major causes of the warming of the Arctic due to the transport of pollutants and increased emissions [15].
This study presents a DNN algorithm to accurately predict AOD values from only a limited number of meteorological and geographical parameters. This exact model, however, is not applicable to other regions. Since the primary goal of the paper was to demonstrate the applicability of the deep neural network method, for the purpose of this paper, the authors focused only on the described region. Accurate AOD levels improve atmospheric models, allowing for better simulations and predictions, which are especially important when addressing Arctic amplification. In addition, AOD retrieval is particularly challenging in the Arctic due to the high reflection of snow and issues with large solar zenith angles [16]. Therefore, remote sensing in the Arctic region is scarcely used, and most satellite aerosol products do not have sufficient coverage in the Arctic [17]. Instead, existing aerosol retrieval algorithms for passive remote sensing focus mainly on snow and cloud-free regions. Due to this, there is much uncertainty regarding the influence of aerosols and aerosol activity in the Arctic [4]. Therefore, it is especially important to both clarify existing data in the Arctic and use the prediction ability of DNNs to expand and further extrapolate data.
This project only focused on data from Svalbard since its primary purpose was to examine the potential of using DNN models for predicting AOD values. In the future, the DNN model for AOD estimation or retrieval may be developed for other locations which also lack significant satellite coverage. In urban areas, for example, AOD retrieval is particularly important in the context of air quality and human health [18]. Additionally, future work may consider using data sources from other periods of the year. This project used data from June to August, as these are "solar" days; the majority of AOD data in the Arctic comes from these months. To consider data from winter, star photometer data is required. Though available, this data is scarce; DNNs perform better with large datasets, and validation with field-collected data may not be possible; thus possible models built using these inputs may not be as reliable.
In addition to BBEs, other urban and industrial emission sources may be investigated. In particular, the burning activity of industrial complexes has been found to transport aerosol pollutants in Eurasia [19]. The DNN model can fill in missing data and investigate the role of aerosols in these regions.

Summary
This project used a deep neural network to predict AOD values from easy to obtain meteorological parameters. The network had high accuracy, predicting values that had a mean difference of 0.00401 from the RTM values. The addition of biomass burning event data improved AOD prediction by 42.167%. We have shown for the first time that AOD's can be accurately and precisely determined using a machine learning algorithm and minimal data without the use of expensive instrumentation. This should allow for the immediate improvement of current meteorological models, which is especially important, considering the current state of global warming.