Retrieval of Horizontal Visibility Using MODIS Data: A Deep Learning Approach

Horizontal visibility (HVIS) is a primary index used for assessing air quality. Although satellite images provide information regarding atmospheric aerosols, atmospheric visibility is not directly measured. In this paper, a deep learning approach is proposed to retrieve HVIS using moderate-resolution imaging spectroradiometer (MODIS) aerosol optical depth (AOD) data, the European Centre for Medium-Range Weather Forecasts reanalysis dataset, and ground-based visibility observations. The deep neural network model comprises a multi-layer unsupervised restricted Boltzmann machine (RBM) and a layer for supervised learning. The dropout mechanism was used in the training process to overcome the errors caused by over-fitting. The results demonstrate that the correlation coefficient values between HVIS observations and retrievals during training, pre-validating, and evaluation were 0.74, 0.723, and 0.697, respectively. The retrieved HVIS in Eastern China exhibited a north-to-south increasing trend, increasing and decreasing in summer and winter, respectively. In conclusion, the proposed model presents an effective and more reliable method for HVIS retrieval. However, the small samples, low AOD, low albedo, high total column water, high longitude, and the low vertical wind component at 10 m likely cause HVIS bias.


Introduction
Atmospheric visibility, a primary index used for describing air quality, commonly refers to horizontal visibility (HVIS) [1,2]. Low HVIS can result in human illness and diseases as well as transportation risks [3][4][5]. The degradation of HVIS is caused by the scattering and absorption of light by air molecules, hydrometeors (rain, snow, fog, and clouds), and aerosols. HVIS is affected by various natural and anthropogenic factors [6][7][8]. The contamination of air by anthropogenic aerosol particulates is the main factor causing low HVIS [6,7]. Generally, highly accurate atmospheric HVIS observations can be obtained from ground-based observations. However, the sparse and uneven spatial distribution of ground-based observations limits data availability. Satellite-derived aerosol optical depth (AOD) remote sensing data with wide spatial coverage and fine spatial resolution, e.g., the moderate-resolution imaging spectroradiometer (MODIS) Collection 6 Level 2 product with a spatial resolution of 3 km, can provide useful information for air quality [9][10][11].
Satellite-derived AOD is defined as the integration of light extinction in the entire atmospheric column. Satellite sensors have exhibited highly accurate retrieval of AOD [12,13]. Operational AOD products have been widely used in various aerosol studies [9,14,15]; however, few studies have spatial correlations of HVIS in the simulation process. The experimental results demonstrated that the proposed methods exhibit a more accurate performance than existing methods for determining HVIS. The remainder of this paper is structured as follows: Section 2 provides a brief introduction of data and describes the proposed method in detail. The results are discussed in Section 3. The conclusions are summarized in Section 4.

MODIS AOD Product
The MODIS operational AOD data have been widely used to investigate the spatiotemporal distribution of aerosols and to evaluate air quality [12,43]. The MODIS Collection 6 (C6) AOD Level 2 product with a spatial resolution of 3 km uses the dark target algorithm. The AOD over land is derived from observed top-of-atmosphere reflectance at 0.47, 0.66, and 2.12 µm bands [10,11]. Remer et al. [11] compared the MODIS AOD 3 km product with aerosol robotic network (AERONET) measurements and showed that the MODIS AOD retrievals fall within one standard deviation of the predicted uncertainty of ∆AOD (equal to ±0.05 ± 0.20 AOD over land). Xie et al. [44] compared 3 km MODIS AOD products from Aqua with AERONET at three sites located in Beijing, China. The results showed that MODIS AOD is highly consistent with AERONET measurements, with a Pearson correlation coefficient of 0.93 and the average difference in AOD was 0.29 at the three sites. In this study, the MODIS C6 AOD Level 2 Aqua product with a spatial resolution of 3 × 3 km was applied for the retrieval of the HVIS.

European Centre for Medium-Range Weather Forecasts ERA-Interim Data
The reanalysis data used in this study were obtained from the European Centre for Medium-Range Weather Forecasts Re-Analysis Interim (ERA-Interim) dataset. ERA-Interim is the latest global atmospheric reanalysis data produced. The ERA-Interim project was conducted in part to prepare for a new atmospheric reanalysis to replace ERA-40, which extends back to the early part of the 20th century [45]. In this study, data at 06:00 UTC (14:00 LST) were selected to match the equatorial crossing time at 13:30 LST of Aqua. The ERA-Interim reanalysis data of albedo (AL), dewpoint temperature at 2 m (D2m), surface pressure (SP), air temperature at 2 m (T2m), total column ozone (TCO3), total column water (TCW), vertical wind component at 10 m (V10), and BLH with a 0.125 • × 0.125 • spatial resolution confined to the area 23 • -35 • N and 113 • -123 • E were used.

HVIS Data
HVIS data from 168 ground-based stations ( Figure 1) were obtained from the China Meteorological Administration (CMA). According to the forward scattering principle of light, the forward scattering visibility meter emits infrared pulses to measure the forward scattering intensity of suspended particles in the air. Based on Rayleigh scattering theory, the extinction coefficient is calculated by scattering data, and then the HVIS is calculated by the equation HVIS = 3.912/σ s (aerosol extinction coefficients of 550 nm at surface) [16]. The accuracy of HVIS is ±10% in the range of 10 to 10,000 m and ±20% in the range of 10,000 to 35,000 m. This dataset, with a 1 h temporal resolution, covered the period from 1 January 2014 to 31 December 2017. To match MODIS AOD data, the time difference was limited to ±1 h.

Methodology
In this study, a typical DL method was designed for retrieving HVIS from MODIS AOD data. As shown in Figure 2, the HVIS_DBN model is a multi-layered probabilistic, generative model [28,29]. It is a semi-supervised learning method that combines the advantages of unsupervised and supervised learning methods. The model has a strong ability to classify and predict high dimensional feature vectors. The HVIS_DBN model comprises a multi-layer unsupervised RBM and a layer for supervised learning. The training process of the model includes three main stages: Pre-processing, pre-training, and fine-tuning. It uses a layer-by-layer unsupervised learning for pre-training the initial weights of the networks and global supervised learning for fine-tuning.

Methodology
In this study, a typical DL method was designed for retrieving HVIS from MODIS AOD data. As shown in Figure 2, the HVIS_DBN model is a multi-layered probabilistic, generative model [28,29]. It is a semi-supervised learning method that combines the advantages of unsupervised and supervised learning methods. The model has a strong ability to classify and predict high dimensional feature vectors. The HVIS_DBN model comprises a multi-layer unsupervised RBM and a layer for supervised learning. The training process of the model includes three main stages: Pre-processing, pre-training, and fine-tuning. It uses a layer-by-layer unsupervised learning for pre-training the initial weights of the networks and global supervised learning for fine-tuning.

Pre-Processing
In the pre-processing stage, the input variables and parameters of HVIS_DBN were first initialized using zero-mean method. The input variables included HVIS (m), AOD, AL, RH (%), SP (Pa), TCO3 (kg m -2 ), TCW (kg m -2 ), V10 (m s -1 ), BLH (m), altitude (m), longitude (°), and latitude (°): where V , is the standardized input variable, V is the original input variable, μ is the average of V , and σ is the variance of V .
Since the ECMWF ERA-Interim dataset did not include the variable RH, we used the following formula was used to calculate RH [46]: where D2m is the dewpoint temperature at 2 m and T2m is the air temperature at 2 m.

Pre-Training
After pre-processing, the input variables and parameters were trained using the layer-by-layer network training method. For each training cycle, the unsupervised learning method was used for training every layer of RBM. The weight and bias values of each layer can be obtained by pre-training. The RBM is an energy-based model, consisting of a visible layer (V) and a hidden layer (H). No neuronal connections were observed within the same inner layer; however, between layers, neurons were fully connected.
In a binary RBM, a joint configuration (υ, h) of visible and hidden units has the following energy: where θ = (W , a , b ) represents the parameter of RBM; ν and h are the binary states of visible unit i and hidden units j, respectively; a and b are the biases of visible unit i and hidden units j, respectively; W is the connecting weight between visible unit i and hidden units j; and m and n are the numbers of visible and hidden units, respectively. The purpose of the training algorithm is to obtain θ, which determines the performance of the RBM network. The lower the energy, the better the state of the network.

Pre-Processing
In the pre-processing stage, the input variables and parameters of HVIS_DBN were first initialized using zero-mean method. The input variables included HVIS (m), AOD, AL, RH (%), SP (Pa), where V std,i is the standardized input variable, V i is the original input variable, µ i is the average of V i , and σ i is the variance of V i . Since the ECMWF ERA-Interim dataset did not include the variable RH, we used the following formula was used to calculate RH [46]: where D2m is the dewpoint temperature at 2 m and T2m is the air temperature at 2 m.

Pre-Training
After pre-processing, the input variables and parameters were trained using the layer-by-layer network training method. For each training cycle, the unsupervised learning method was used for training every layer of RBM. The weight and bias values of each layer can be obtained by pre-training. The RBM is an energy-based model, consisting of a visible layer (V) and a hidden layer (H). No neuronal connections were observed within the same inner layer; however, between layers, neurons were fully connected.
In a binary RBM, a joint configuration (υ, h) of visible and hidden units has the following energy: where θ = W ij , a i , b j represents the parameter of RBM; ν i and h j are the binary states of visible unit i and hidden units j, respectively; a i and b j are the biases of visible unit i and hidden units j, respectively; W ij is the connecting weight between visible unit i and hidden units j; and m and n are the numbers of visible and hidden units, respectively. The purpose of the training algorithm is to obtain θ, which determines the performance of the RBM network. The lower the energy, the better the state of the network. Based on the energy function, the joint probability of (υ, h) can be written as where Z is a partition function. It is used for normalizing: The marginal probability of the joint probability that the network distributed to υ, can be defined as follows: The gradient or derivative of the log probability of training vectors with respect to W ij , a i , and b j can be derived as follows, respectively: where . data represents the expectation of the probability defined by the dataset and . model is the expection on the probability defined by the model. The learning rule for stochastic steepest ascent in the log probability of the training data can be expressed as ∆b j = η h j data − h j model (12) where η is the learning rate. Because the units in a single hidden layer are unrelated, the conditional distribution P(h|v) can be calculated as follows: Due to the units in a single visible layer being unrelated, the conditional distributions P( v|h ) can be calculated as θ can be calculated by the maximum-likelihood estimation of the training set [29]. However, the maximum-likelihood learning is infeasible in a large RBM because it is expensive to compute the derivative of the log probability of training data. Expected outcomes obtained using the model are difficult to achieve. A highly satisfactory stochastic approximation, known as the contrastive divergence (CD) algorithm, makes θ suitable as building blocks for learning DBN [28,29]. This algorithm uses Gibbs sampling, which alternates between stochastically updating the hidden and visible units. Even when only one iteration of Gibbs sampling is used, the CD algorithm provides satisfactory results [29]. The weight and bias are updated according to the following equations: (17) where . data represents the expectation on the probability defined by dataset, and . recon is the expectation of the probability defined by the reconstructed model.

Fine-Tuning
Fine-tuning is a supervised training process with labeled data. To improve network performance, the gradient descent algorithm is used for fine-tuning parameters. The back propagation (BP) algorithm, which was used to adjust and optimize weight parameters extracted in the pre-training stage, was used to fine-tune the entire network's parameters in a top-down fashion. The weight of each layer was pretrained by the RBM before fine-tuning. However, it was not in random initialization because BP neural networks avoid local convergence. Through multiple-iteration forward and back propagation, the weights between neurons were modified. When the error between the actual value and the output value meets the requirement, the training stopped. Finally, the HVIS were retrieved using Equation (18): where V HVIS_retrieved is the retrieved HVIS, V std,HVIS is the standardized output, µ HVIS is the average of the HVIS training set, and σ i is the variance of the HVIS training set.

Model Training and Pre-Validation
We selected data for training and pre-validation of the HVIS_DBN model from 1 January 2014 to 31 December 2016 to evaluate the effectiveness and performance of the model. In the three years of the study, the matchup samples totaled 31,377 pairs of input data and HVIS. In our experiment, we randomly selected 80% of the data as the training set and 20% as the validation set [47]. The input variables of HVIS_DBN were standardized according to Equation (1). To overcome potential HVIS_DBN model over-fitting, dropout techniques were applied to train the model. The model with dropout exhibited a higher predictive accuracy than without dropout, especially in the small dataset, where dropout-DBN produced the best performance [29,32].
The dropout technique is a random retreat mechanism used to overcome the data problem of over-fitting [32]. The basic idea of the dropout technique is to randomly ignore neurons of the hidden layer in the training process to prevent over-fitting. In the pre-training process of the HVIS_DBN model, some of the random sections of nodes were not involved in the forward propagation training process and the weight was reserved during each iteration process. These neurons may be involved in training in the next iteration. The dropout technique improves the generalization ability and effectively overcomes the time-consuming problem of network training, thus preventing interdependence among features and distinctly improving precision. The mean absolute error (MAE), RMSE, and R were used to evaluate model prediction accuracy.  (Figure 3a), the R was 0.74 and RMSE was 4.725 km. For the pre-validation period (Figure 3b), the R value only decreased by 0.017 and RMSE only increased by 0.173 km. Basically, the predicted HVIS values were slightly overestimated relative to observations below 20 km altitude, however HVIS values were underpredicted above 20 km. This was consistent for both model training and pre-validation periods.

Model Evaluation
The goal of the trained HVIS_DBN model is to be widely used for retrieving HVIS over Eastern China. To provide a more objective evaluation of the HVIS_DBN model, the data from 2017 (different from the training data) were applied. For evaluation, we matched HVIS from ground-based observations with retrieval results from the HVIS_DBN model. The number of matchup samples in 2017 was 8717. As shown in Figure 4, the relationship between observed and retrieved HVIS was approximately linear with an R of 0.697 and RMSE of 4.996 km. Compared with the training and prevalidating results, the R of retrieved results only decreased by 0.043 and 0.026, respectively. The RMSE only increased by 0.271 km and 0.098 km, respectively.

Model Evaluation
The goal of the trained HVIS_DBN model is to be widely used for retrieving HVIS over Eastern China. To provide a more objective evaluation of the HVIS_DBN model, the data from 2017 (different from the training data) were applied. For evaluation, we matched HVIS from ground-based observations with retrieval results from the HVIS_DBN model. The number of matchup samples in 2017 was 8717. As shown in Figure 4, the relationship between observed and retrieved HVIS was approximately linear with an R of 0.697 and RMSE of 4.996 km. Compared with the training and pre-validating results, the R of retrieved results only decreased by 0.043 and 0.026, respectively. The RMSE only increased by 0.271 km and 0.098 km, respectively. Figure 5 provides the time series of daily averaged HVIS between ground-based observations and retrievals from the HVIS_DBN model in 2017. The R and RMSE values of the daily mean HVIS were better than those of the separated matchup samples. The R was 0.77 with an RMSE of 3.08 km. According to statistical analysis, approximately 83.4% of the daily mean HVIS samples had a MAE of <4 km. The percentage of the samples with a MAE of <2 km was approximately 59.2%. The average number of matchup samples was approximately 31.6 when the MAE was less than 4 km. However, when the MAE was >4 km, the average number of matchup samples decreases to approximately 8.5. One of the reasons for the lower precision of HVIS retrieved by the HVIS_DBN model is an inadequate number of samples. from the training data) were applied. For evaluation, we matched HVIS from ground-based observations with retrieval results from the HVIS_DBN model. The number of matchup samples in 2017 was 8717. As shown in Figure 4, the relationship between observed and retrieved HVIS was approximately linear with an R of 0.697 and RMSE of 4.996 km. Compared with the training and prevalidating results, the R of retrieved results only decreased by 0.043 and 0.026, respectively. The RMSE only increased by 0.271 km and 0.098 km, respectively.      For all the matchup samples, the value of AOD ranged from 0 to 3.7. To analyze the precision of retrievals, the AOD was graded in 0.1 intervals. As shown in Figure 6a, the value of RMSE ranged from 0.641 to 11.087 km. The number of samples indicated significant variations with different AOD values. When the number of samples exceeded 20, the RMSE, ranging from 2.9 to 5.5 km, decreased with increasing values of AOD. The linear regression between AOD and RMSE produced correlation coefficient of −0.87 (significant at the 0.01 level(two-tailed)). This indicated that lower AOD values may have higher biases when retrieving HVIS using the HVIS_DBN model. Figure 6b-e illustrates the variations in RMSE with AL, TCW, longitude, and V10. The graded intervals were 0.004, 2 kg m −2 , 0.25, and 0.25 m s −1 , respectively. The RMSE ranged from 2.403 to 7.536 km for AL, from 3.224 to 9.782 km for TCW, from 2.859 to 6.824 km for longitude, and from 0.069 to 8.438 km for V10. When the number of samples exceeded 20, the RMSE between observed HVIS and predicted HVIS showed a gradually decreasing trend with increasing AL (Figure 6b). The values of RMSE increased when TCW increased (Figure 6c). The values of RMSE increased slightly when longitude increased (Figure 6d); the values of RMSE decreased slightly when V10 increased (Figure 6e). The R values of linear fitting were −0.58, 0.53, 0.41, and −0.34, respectively. The correlation coefficient between RMSE and AL, and the correlation coefficient between RMSE and TCW, all passed the t-test at a 0.01 level of significance (two-tailed). The correlation coefficient between RMSE and longitude, and the correlation coefficient between RMSE and V10, were significant at the 0.05 level (two-tailed). According to the analysis, the lower precision of HVIS retrievals by the HVIS_DBN model were probably due to lower AL, higher TCW, higher longitude, and lower V10.  For all the matchup samples, the value of AOD ranged from 0 to 3.7. To analyze the precision of retrievals, the AOD was graded in 0.1 intervals. As shown in Figure 6a, the value of RMSE ranged from 0.641 to 11.087 km. The number of samples indicated significant variations with different AOD values. When the number of samples exceeded 20, the RMSE, ranging from 2.9 to 5.5 km, decreased with increasing values of AOD. The linear regression between AOD and RMSE produced correlation   Figure 7 reveals the high consistency between predicted and observed HVIS both for spatial and seasonal variations. The HVIS_DBN model effectively improves HVIS retrievals. Compared with ground-based observations, it performed well when analyzing the spatial distributions of HVIS, revealing that HVIS values followed a north-to-south increasing trend. The lower values of HVIS were mainly located in Jiangsu Province. The spatial distribution of HVIS also showed that the HVIS over water bodies (such as Taihu Lake and Yangtze River) was lower than other land covers. The HVIS exhibited a strong seasonal variation, which increased in summer and decreased in winter. km for AL, from 3.224 to 9.782 km for TCW, from 2.859 to 6.824 km for longitude, and from 0.069 to 8.438 km for V10. When the number of samples exceeded 20, the RMSE between observed HVIS and predicted HVIS showed a gradually decreasing trend with increasing AL (Figure 6b). The values of RMSE increased when TCW increased (Figure 6c). The values of RMSE increased slightly when longitude increased (Figure 6d); the values of RMSE decreased slightly when V10 increased ( Figure  6e). The R values of linear fitting were -0.58, 0.53, 0.41, and -0.34, respectively. The correlation coefficient between RMSE and AL, and the correlation coefficient between RMSE and TCW, all passed the t-test at a 0.01 level of significance (two-tailed). The correlation coefficient between RMSE and longitude, and the correlation coefficient between RMSE and V10, were significant at the 0.05 level (two-tailed). According to the analysis, the lower precision of HVIS retrievals by the    Figure 8a. In total, 96 matchup samples were used for the analysis. According to the statistical result, the R of all the samples was 0.87 and the RMSE was 3.978 km. There were 59 samples with a MAE less than 3 km and 18 samples with a MAE larger than 5 km. The values of MAE in the regions with lower AOD (such as Fujian) were always higher than 5 km (Figure 8b). This finding is consistent with the analysis in Figure 6a. Figure 8c displays the distributions of retrieved HVIS over Eastern China on 1 November 2017. The distribution revealed that the regions with lower HVIS always had lower MAE. The findings indicate that the HVIS_DBN model is more reliable for retrieving low HVIS.  Figure 7 reveals the high consistency between predicted and observed HVIS both for spatial and seasonal variations. The HVIS_DBN model effectively improves HVIS retrievals. Compared with ground-based observations, it performed well when analyzing the spatial distributions of HVIS, revealing that HVIS values followed a north-to-south increasing trend. The lower values of HVIS were mainly located in Jiangsu Province. The spatial distribution of HVIS also showed that the HVIS over water bodies (such as Taihu Lake and Yangtze River) was lower than other land covers. The HVIS exhibited a strong seasonal variation, which increased in summer and decreased in winter.

Conclusions
In this study, a DL method for retrieving HVIS was presented. This method is primarily based on a HVIS_DBN model with data from the MODIS AOD product, ECMWF reanalysis data, and the HVIS observations of CMA. The HVIS_DBN model comprises a multi-layer unsupervised RBM and layer for supervised learning. The training of the HVIS_DBN model involved three main steps: Pre-processing, pre-training, and fine tuning. In the pre-processing stage, the input variables and parameters of HVIS_DBN were first initialized using zero-mean method. In the pre-training step, the input variables and parameters were first initialized. The BP algorithm, which is used to adjust and optimize weight parameters extracted in the pre-training stage, was used for fine-tuning the entire network's parameters in a top-down manner. The dropout mechanism introduced in the HVIS_DBN training process was used to overcome the over-fitting problem.
We used data from three years (2014-2016) for training and pre-validating the HVIS_DBN model. The data from 2017 were applied for model evaluations. The results demonstrated that the R values between observations and retrievals were 0.74 for training, 0.723 for pre-validating, and 0.697 for evaluations. The values of RMSE were all less than 5 km. The time series result of daily averaged HVIS showed high consistency between the ground-based observations and retrievals, with R of 0.77 and RMSE of 3.08 km; a small number of samples resulted in low precision. The precision analysis revealed that the bias of HVIS retrievals by the HVIS_DBN model was probably caused by lower AOD, lower albedo, higher TCW, higher longitude, and lower V10. The spatial distribution of HVIS followed a north-to-south increasing trend, showing that the HVIS over water bodies (such as Taihu Lake and Yangtze River) is lower than over other land cover types. The HVIS exhibited a strong seasonal variation, which increased in summer and decreased in winter. The regions with lower HVIS invariably exhibited lower MAE, which indicates that the HVIS_DBN model is more reliable in retrieving low HVIS.
Overall, the HVIS_DBN model provides an effective method for retrieving HVIS. The evaluations also exhibited a higher performance than ground-based HVIS. In future studies, the model should be improved to adapt to different ranges of input data (including higher AOD, higher albedo, lower TCW, lower longitude, and higher V10), and a larger number of training samples should be used, particularly for the samples with lower HVIS to improve the accuracy of the HVIS_DBN model. The retrieved HVIS can be further applied for PM 2.5 estimation by introducing humidity correction for hygroscopic growth. This work would have certain application in atmospheric environmental monitoring and air quality forecasts over Eastern China.