Application of Machine-Learning-Based Fusion Model in Visibility Forecast: A Case Study of Shanghai, China

: A visibility forecast model called a boosting-based fusion model (BFM) was established in this study. The model uses a fusion machine learning model based on multisource data, including air pollutants, meteorological observations, moderate resolution imaging spectroradiometer (MODIS) aerosol optical depth (AOD) data, and an operational regional atmospheric environmental modeling System for eastern China (RAEMS) outputs. Extreme gradient boosting (XGBoost), a light gradient boosting machine (LightGBM), and a numerical prediction method, i.e., RAEMS were fused to establish this prediction model. Three sets of prediction models, that is, BFM, LightGBM based on multisource data (LGBM), and RAEMS, were used to conduct visibility prediction tasks. The training set was from 1 January 2015 to 31 December 2018 and used several data pre-processing methods, including a synthetic minority over-sampling technique (SMOTE) data resampling, a loss function adjustment, and a 10-fold cross veriﬁcation. Moreover, apart from the basic features (variables), more spatial and temporal gradient features were considered. The testing set was from 1 January to 31 December 2019 and was adopted to validate the feasibility of the BFM, LGBM, and RAEMS. Statistical indicators conﬁrmed that the machine learning methods improved the RAEMS forecast signiﬁcantly and consistently. The root mean square error and correlation coefﬁcient of BFM for the next 24/48 h were 5.01/5.47 km and 0.80/0.77, respectively, which were much higher than those of RAEMS. The statistics and binary score analysis for different areas in Shanghai also proved the reliability and accuracy of using BFM, particularly in low-visibility forecasting. Overall, BFM is a suitable tool for predicting the visibility. It provides a more accurate visibility forecast for the next 24 and 48 h in Shanghai than LGBM and RAEMS. The results of this study provide support for real-time operational visibility forecasts. currently runs RAEMS based on WRF-Chem operational modeling to forecast daily environmental atmospheric pollutants and visibility [8]. However, large uncertainties still exist in numerical model predictions. The aim of this study is to build a new fusion model based on multisource data to improve visibility forecast. Single classiﬁcation and regression methods using historical observed data do not meet the prediction accuracy requirements for local visibility forecasts. In this study, a new model for visibility prediction was established using a machine-learning-based model fusion based on operational regional atmospheric environmental modelling system for eastern China (RAEMS) outputs, air pollutants, meteorological observations, and MODIS AOD data. Machine learning and numerical prediction methods were combined to build a new model. The new model uses two machine learning algorithms (XGBoost and LightGBM), and several special data processes are applied. We named this new model the boosting-based fusion model (BFM). The performance of this BFM prediction was compared with the predictions of the LightGBM model based on RAEMS and RAEMS itself. than LGBM and RAEMS in Shanghai. The results of this study provide support for real-time operational


Introduction
Visibility indicates the transparency of the atmosphere and is closely related to the daily life of the public. Low visibility, including precipitation, fog, haze, dust, and smoke, is a meteorological disaster that affects all forms of transport, particularly within the fields of land, aviation, and shipping, and causes casualties and property losses [1][2][3][4]. As a result, the study of atmospheric visibility has been a significant public concern. Visibility forecasts are complicated and dominated by meteorological conditions, such as particulate matter, relative humidity, wind speed, and wind direction [5,6]. At present, several approaches are commonly used to predict atmospheric visibility, including meteorological numerical forecasting and statistical forecasting based on machine learning. Numerical forecasts 2 of 17 based on meteorology and atmospheric chemistry, such as weather research and forecasting models coupled with chemistry (WRF-Chem) [7,8], the community multiscale air quality (CMAQ) model [9,10], and the CMA Unified Atmosphere Chemistry Environment (CUACE) [11] were developed to conduct real-time operational visibility and atmospheric pollutant forecasts. In fact, to determine the uncertainties of numerical model forecast, detailed regional emissions and suitable physical and chemical schemes are both required based on a deep understanding of physical and chemical mechanisms [12]. It is difficult to accurately quantify every atmospheric process theoretically because of the complexity of the atmosphere, which leads to large errors and uncertainties during prediction [13][14][15].
The second approach to visibility prediction is statistical forecasting based on machine learning methods, which is a branch of artificial intelligence [16,17], studies on the mechanism of human cognition, and the establishment of various learning models with the support of computer systems. Comparing with traditional statistical methods, a non-linear regression prediction based on machine learning forecast method performs better to some extent because machine learning methods do not require a time-consuming model selection for each different cell and reduce large forecasting errors [18,19]. The availability of large datasets also improves the forecasting performance [20]. Machine learning algorithms have been applied to environmental and meteorological forecasting and research in recent years [21][22][23][24][25][26][27]. Such algorithms detect and predict meteorological phenomena, including poor visibility events [28]. Several trials using machine learning to predict the visibility have been conducted by researchers and forecasters, such as artificial networks [29], tree-based methods [30], and multiple linear regression [31]. To date, previous studies on forecasting visibility have often used a single machine learning algorithm and meteorological historical data to determine the relationship between visibility and other observations. However, many other factors that contribute to local visibility forecast have not been considered.
Machine learning model fusion is an effective method of constructing multiple base classifiers (using multiple machine learning algorithms) and then combining them to complete a learning task and solve a particular computational problem [32,33]. Specifically, when the error rates of each base classifier are independent of each other, the error rate of the model fusion decreases exponentially to zero. In fact, each classifier is the result of solving the same problem, and the error rate has difficulty being independent. For the same problem, the more accurate the base classifier, the more similar it will be. As a result, classifiers that are too strong will significantly affect the following results and may even make it impossible to apply the following classifiers. Logistic regression [34] and support vector machines [35] are typical base classifiers. Machine-learning-based model fusion shows a better accuracy than other single machine learning algorithms, but with more complexity and lower efficiency. According to the relationship between learners, machine-learning-based fusion models can be divided into two categories: a serialization method with strong dependence among learners, as represented by boosting [36], and a parallel method with independent learners, as represented by bagging [37]. Boosting algorithms include AdaBoost [38], the gradient boosting decision tree (GBDT) method [39], XGBoost [40], LightGBM [41], CatBoost [42], and random forest [43]. Unlike the bagging algorithm, the boosting algorithm tries to add new classifiers where previous classifiers have failed. Boosting also determines the weights for the data and reduces the model bias [44].
Although model fusion has been used in several fields of meteorology [18,45,46], it has not been widely used to predict the visibility in China. Shanghai is in the east of China and has a huge population. The visibility trends of Shanghai, along with important meteorological and environmental factors, were investigated. Since 2000, the percentage of bad visibility days (visibility < 5 km) fluctuated, peeking in 2003 and 2015, and the number of good visibility days (visibility >15 km) has declined significantly since 2012 [6]. It is important to conduct accurate visibility forecasting in Shanghai to support disaster prevention and mitigation of this megacity. The Shanghai Meteorological Service (SMS) Remote Sens. 2021, 13, 2096 3 of 17 currently runs RAEMS based on WRF-Chem operational modeling to forecast daily environmental atmospheric pollutants and visibility [8]. However, large uncertainties still exist in numerical model predictions.
The aim of this study is to build a new fusion model based on multisource data to improve visibility forecast. Single classification and regression methods using historical observed data do not meet the prediction accuracy requirements for local visibility forecasts. In this study, a new model for visibility prediction was established using a machinelearning-based model fusion based on operational regional atmospheric environmental modelling system for eastern China (RAEMS) outputs, air pollutants, meteorological observations, and MODIS AOD data. Machine learning and numerical prediction methods were combined to build a new model. The new model uses two machine learning algorithms (XGBoost and LightGBM), and several special data processes are applied. We named this new model the boosting-based fusion model (BFM). The performance of this BFM prediction was compared with the predictions of the LightGBM model based on RAEMS and RAEMS itself.

Data Introduction
In this study, meteorological observational data and RAEMS modeling forecast data were collected from the Shanghai Meteorological Service. Observational pollutant data were obtained from the Shanghai Municipal Bureau of Ecology and Environment. The data covered 5 years from 1 January 2015 to 31 December 2019, with a time granularity of 1 h. Eleven national synoptic stations provided observed meteorological variables ( Figure 1). The meteorological data included surface visibility, temperature, pressure, relative humidity, precipitation, wind speed, wind direction, and radiation-related factors. The national environmental stations provided six basic observational pollutant variables, i.e., PM 2.5 , PM 10 , O 3 , NO x , CO, and SO 2 . In addition, RAEMS provided both surface and high-level meteorological and chemical forecast variables. The RAEMS surface forecast data included meteorological variables and the six air pollutants mentioned above. The RAEMS high-level forecast meteorological variables were collected at 1000, 925, 850, 700, and 500 hPa. The details of RAEMS were introduced in Section 2.2.
ens. 2021, 13, x FOR PEER REVIEW 4 of 18 meteorological conditions. The initial chemical condition was the previous 24 h operational forecast results. The MOZART monthly global simulation data were used as the gaseous chemical boundary condition. For biogenic emissions, MEGAN2 online data were used. Moreover, this forecast system used several physical and chemical parameterization schemes, which showed a good performance over eastern China [8]. The WSM 6-class microphysics scheme, the Monlin_Obukhov surface layer scheme, the Unified Noah land surface scheme, the YSU boundary layer scheme, the Dudhia short-wave radiation scheme and the RRTM long-wave radiation scheme, were used in model meteorological parameterization. The gas-phase chemistry scheme, the inorganic aerosol chemistry scheme, and the organic aerosol chemistry scheme used RADM2, ISORROPIA II, and SORGAM, respectively. Apart from these model details, the Emission Inventory for China (MEIC) at a 0.25° resolution, which was developed by Tsinghua University, was used as the emission data for WRF-Chem. Specifically, according to Shanghai local emission monitoring, emissions were hourly-distributed with the diurnal profile provided by the Shanghai Academy of Environmental Science.

Machine-Learning-Based Model Fusion
In this study, machine-learning-based model fusion was conducted using two models, XGBoost and LightGBM. These two boosting models can convert weak base classifiers into strong classifiers. First, an original dataset is trained to obtain a weak classifier. Then, the distribution of the new dataset is adjusted based on the performance of this weak classifier, allowing the incorrect training samples to receive more attention during the follow-up training process. Third, the adjusted sample data distribution is used for the next round of training to obtain the next weak classifier. We obtained several base weak classifiers after training for several rounds and combined these classifiers to build the final classifier. Single weak classifiers may not perform well, although the final Moreover, because aerosol optical depth characterizes atmospheric turbidity, which is highly related to visibility, MODIS AOD data from National Aeronautics and Space Administration (NASA) were involved in this study [47]. The MODIS AOD data, MCD19A2, was a multi-angle implementation of atmospheric correction (MAIAC) algorithm-based gridded aerosol optical depth product. This product was generated by using the Ross-Thick Li-Sparse (RTLS) bi-directional reflectance distribution function (BRDF), spectral surface Remote Sens. 2021, 13, 2096 4 of 17 albedo, bidirectional reflectance factors (BRF), and the semi-analytical Green's function solution models [48][49][50][51]. It was derived by Terra and Aqua MODIS inputs. It provided daily AOD data with a spatial resolution of 1 km from 1 January 2015 to 31 December 2019. In this study, we used the AOD data at 0.55 micron. Based on the high-resolution data of this products, the AOD data were directly extracted at the locations of the 11 synoptic stations.

Introduction to RAEMS
The RAEMS is a regional operational forecast system based on a numerical model WRF-Chem, which started a formal operational forecast from 1 April 2013 [8]. It consists of 400 grids in the south-north and 360 grids in the west-east, with a 6 km horizontal resolution ( Figure 1). Vertically it has 35 layers up to the top pressure layer of 50 hPa. The meteorology and chemistry integrated time steps are 30 s and 60 s, respectively. The forecast length was 4 d (96 h). Global Forecast System (GFS) data from the National Centers for Environmental Prediction (NCEP) were used as the initial and boundary meteorological conditions. The initial chemical condition was the previous 24 h operational forecast results. The MOZART monthly global simulation data were used as the gaseous chemical boundary condition. For biogenic emissions, MEGAN2 online data were used. Moreover, this forecast system used several physical and chemical parameterization schemes, which showed a good performance over eastern China [8]. The WSM 6-class microphysics scheme, the Monlin_Obukhov surface layer scheme, the Unified Noah land surface scheme, the YSU boundary layer scheme, the Dudhia short-wave radiation scheme and the RRTM long-wave radiation scheme, were used in model meteorological parameterization. The gas-phase chemistry scheme, the inorganic aerosol chemistry scheme, and the organic aerosol chemistry scheme used RADM2, ISORROPIA II, and SORGAM, respectively. Apart from these model details, the Emission Inventory for China (MEIC) at a 0.25 • resolution, which was developed by Tsinghua University, was used as the emission data for WRF-Chem. Specifically, according to Shanghai local emission monitoring, emissions were hourly-distributed with the diurnal profile provided by the Shanghai Academy of Environmental Science.

Machine-Learning-Based Model Fusion
In this study, machine-learning-based model fusion was conducted using two models, XGBoost and LightGBM. These two boosting models can convert weak base classifiers into strong classifiers. First, an original dataset is trained to obtain a weak classifier. Then, the distribution of the new dataset is adjusted based on the performance of this weak classifier, allowing the incorrect training samples to receive more attention during the follow-up training process. Third, the adjusted sample data distribution is used for the next round of training to obtain the next weak classifier. We obtained several base weak classifiers after training for several rounds and combined these classifiers to build the final classifier. Single weak classifiers may not perform well, although the final classifier exhibits a good performance. In addition, XGBoost implements a general tree-boosting algorithm. It adds Lasso (L1) [52] or Ridge (L2) [53] regulation to avoid an over-fitting, uses the second derivative information of the cost function, and introduces the idea of column sampling, as compared with GBDT. XGBoost significantly improves the efficiency and generalization of the prediction model. For example, this algorithm has been applied to the estimation of PM 2.5 , which is highly correlated with visibility, and has shown a better performance than some other statistical and machine learning models [54][55][56]. LightGBM is a tree-based gradient-boosting framework. It was developed for distribution and shows a good performance in terms of both efficiency and memory consumption. This algorithm has also proven to be effective and acceptable in PM 2.5 and visibility studies [33,57]. The framework of the machine-learning-based model fusion using XGBoost and LightGBM in this study is shown in Figure 2. visibility studies [33,57]. The framework of the machine-learning-based model fusion using XGBoost and LightGBM in this study is shown in Figure 2.

Feature Extraction
The model feature extraction was divided into three sections: basic, spatial, and temporal. To predict the visibility at one station, the basic features of the model included the observed meteorological and environmental variables, as well as the daily RAEMS predicted surface and high-altitude variables, as introduced in Section 2.1. For the spatial features, because the distribution of the variables at different altitudes presents the vertical movement and convection of the atmosphere, the differences between the same variable at different altitudes were calculated as new features representing the gradient change in the atmosphere in the vertical direction. In addition, considering horizontal atmospheric movement and correlation, important features at the four nearest stations were collected to predict the visibility at one station. For the temporal features, the differences between variables at adjacent moments 24 h before the initial forecast time were new features representing the tendencies of visibility and other related variables. To represent the regulation of visibility during recent periods, the previous 24, 48, 72, and 96 h visibility and related variables were also considered.

Data Sampling
Data from January 2015 to December 2018 were used as the training set for this visibility forecast, along with 385,377 data samples for 11 national synoptic stations, among which 327 data (0.08%) were missing. Five visibility levels were used to determine the visibility distribution (Table 1). Over half of the visibility hours exceeded 10 km. Compared to this high visibility, low visibility, which was below 5 km, only accounted for 23.4% of the entire period. In fact, the public is more concerned with the accuracy of low visibility forecast because it may result in discomfort or even meteorological disasters. To

Feature Extraction
The model feature extraction was divided into three sections: basic, spatial, and temporal. To predict the visibility at one station, the basic features of the model included the observed meteorological and environmental variables, as well as the daily RAEMS predicted surface and high-altitude variables, as introduced in Section 2.1. For the spatial features, because the distribution of the variables at different altitudes presents the vertical movement and convection of the atmosphere, the differences between the same variable at different altitudes were calculated as new features representing the gradient change in the atmosphere in the vertical direction. In addition, considering horizontal atmospheric movement and correlation, important features at the four nearest stations were collected to predict the visibility at one station. For the temporal features, the differences between variables at adjacent moments 24 h before the initial forecast time were new features representing the tendencies of visibility and other related variables. To represent the regulation of visibility during recent periods, the previous 24, 48, 72, and 96 h visibility and related variables were also considered.

Data Sampling
Data from January 2015 to December 2018 were used as the training set for this visibility forecast, along with 385,377 data samples for 11 national synoptic stations, among which 327 data (0.08%) were missing. Five visibility levels were used to determine the visibility distribution (Table 1). Over half of the visibility hours exceeded 10 km. Compared to this high visibility, low visibility, which was below 5 km, only accounted for 23.4% of the entire period. In fact, the public is more concerned with the accuracy of low visibility forecast because it may result in discomfort or even meteorological disasters. To improve the accuracy of low-visibility prediction, the synthetic minority oversampling technique (SMOTE) was used in this study to adjust the training set and eliminate the influence of an imbalanced data distribution. SMOTE [58] generates virtual training data for the minority class based on linear interpolation (k-nearest neighbor). For each data sample in the minority class, one or more k-nearest neighbors are randomly selected to build a virtual dataset for training. After the oversampling process, several classification methods are applied to process the new dataset. In this study, after applying the SMOTE process, the number of data points in the training set increased from 385,377 to 1,033,085 (approximately 93,916 for each synoptic station), and the number of data points for each visibility level was equivalent. The new training set was used for the formal training and forecasting. A 10-fold cross-verification [59,60] was also applied in this study to ensure not only the randomness of the verification set but also the similarity between the verification and training set checks. According to the verification results, the model was adjusted to a suitable parameterization.

Loss Function Adjustment
The training set was pre-processed and prepared as described in Sections 2.3.1 and 2.3.2. Another important step used to conduct model fusing is to set different prediction tasks based on the loss function. The loss function is the mean square error (MSE), which is given by Equation (1): where N refers to the sample numbers, and y i and o i refer to the forecasted value and the real value of the ith sample, respectively. Considering the importance of the low visibility forecast, we adjusted the loss function and applied it to Equation (2): where j is a constant. Different values of j represent a different emphasis on low visibility. With an increase in j, the root mean square error (RMSE) of the predicted low visibility decreased, whereas the RMSE of the entire range of visibility prediction increased. Taking a 24 h prediction as the adjustment example (Figure 3), when j = 2, the RMSEs of the XGBoost prediction and the RAEMS prediction were close to each other. For low-visibility situations, the RMSE of the XGBoost prediction was significantly smaller than that of the RAEMS prediction. Therefore, to obtain the best adjustment for low-visibility prediction in this study, j = 2 was used. A binary classification used to judge whether the predicted visibility was greater than 10 km was conducted based on the loss function adjustment. This step considers the accuracy of both high-and low-visibility forecasts. If the predicted visibility was greater than 10 km, we used the high visibility prediction model (j = 0); otherwise, we used the low-visibility prediction model (j = 2). Using binary classification and model fusion, the RMSEs of the optimized XGBoost model (O-XGB) were significantly smaller than those of the other situations (  A binary classification used to judge whether the predicted visibility was greater than 10 km was conducted based on the loss function adjustment. This step considers the accuracy of both high-and low-visibility forecasts. If the predicted visibility was greater than 10 km, we used the high visibility prediction model (j = 0); otherwise, we used the low-visibility prediction model (j = 2). Using binary classification and model fusion, the RMSEs of the optimized XGBoost model (O-XGB) were significantly smaller than those of the other situations (Table 2).

Model Fusion
XGBoost and LightGBM were merged to build a new prediction model, the boostingbased fusion model (BFM). The normalized mean bias (NMB) and mean error (ME) were used as the moving weights for model merging [61]. First, we calculated the NMBs of XGBoost and LightGBM for the period prior to prediction (Equation (3)), where k denotes the two boosting models XGBoost and LightGBM, Nd is the number of days before prediction, y k,i and o i refer to the forecasted and real values of the ith sample, respectively. In this study, Nd equals to 10 days, which was reasonable after an evaluation of the number of days from 1 to 30. ME k denotes the mean bias of the modified forecast value (Equation (4)). W k denotes the prediction weight for each model (Equation (5)), where Nm is the number of models (Nm = 2). Moreover, FF is the final prediction (Equation (6)). Four tasks, that is, high-and low-visibility prediction using XGBoost and LightGBM, respectively, were conducted. The different advantages of XGBoost and LightGBM are expected to improve the performance of the final model. For the testing data from 2019, the XGBoost and LightGBM models started forecasting at 6:00 a.m. each day. Thus, the 24 and 48 h forecasts in this study referred to the forecast visibility for every hour from 6:00 a.m. on the first day to 5:00 a.m. on the second day, and from 6:00 a.m. on the second day to 5:00 a.m. on the third day.

Statistical Scores
To quantitatively evaluate the accuracy of the forecast, the mean bias (MB), mean absolute error (MAE), mean relative error (MRE), root mean square error (RMSE), correlation coefficient (CC), 25% percentile, 75% percentile, median, and mean values were calculated. The calculations of MB, MAE, MRE, RMSE, and CC are shown in Equations (7)-(11), respectively: where N is the number of samples, and y i and o i refer to the forecasted and real values of the ith sample, respectively.

A Case Study Evaluation
In this section, to study the forecast performance of these three models, the average forecasts for the next 24 and 48 h in Shanghai city using the BFM were analyzed and compared with the RAEMS and LGBM predictions (single LightGBM prediction based on RAEMS, LGBM). The case of 8 March to 13 March 2019 was applied because a significant fluctuation in visibility occurred during this period and the lowest visibility reached less than 1 km for over 7 h, which turned out to be an extremely low visibility event. Both the 24 and 48 h forecast results revealed that the LGBM tended to perform well but with less accuracy than the other models in terms of the low-visibility prediction (Figure 4). RAEMS performed quite well under low visibility, but not as well as the other models when the visibility was above 10 km. All three models reflected the variations in the observations over time. Specifically, the BFM showed the lowest RMSE within the entire range and a low visibility forecast for both 24 and 48 h periods. Because of the low accuracy of LGBM in a low-visibility forecast, MB and RMSE (OBS < 5 km) were the largest among the three models. Both LGBM and BFM had close correlation coefficients with the observations during this event. In addition, for the prediction of extremely low visibility at below 1 km during this period, none of these three models were able to correctly predict within the same range, which revealed the shortage of this forecast algorithm. Overall, the best performance in this case was achieved using the BFM (Table 3).
where N is the number of samples, and yi and oi refer to the forecasted and real values of the ith sample, respectively.

A Case Study Evaluation
In this section, to study the forecast performance of these three models, the average forecasts for the next 24 h and 48 h in Shanghai city using the BFM were analyzed and compared with the RAEMS and LGBM predictions (single LightGBM prediction based on RAEMS, LGBM). The case of 8 March to 13 March 2019 was applied because a significant fluctuation in visibility occurred during this period and the lowest visibility reached less than 1 km for over 7 h, which turned out to be an extremely low visibility event. Both the 24 and 48 h forecast results revealed that the LGBM tended to perform well but with less accuracy than the other models in terms of the low-visibility prediction (Figure 4). RAEMS performed quite well under low visibility, but not as well as the other models when the visibility was above 10 km. All three models reflected the variations in the observations over time. Specifically, the BFM showed the lowest RMSE within the entire range and a low visibility forecast for both 24 and 48 h periods. Because of the low accuracy of LGBM in a low-visibility forecast, MB and RMSE (OBS < 5 km) were the largest among the three models. Both LGBM and BFM had close correlation coefficients with the observations during this event. In addition, for the prediction of extremely low visibility at below 1 km during this period, none of these three models were able to correctly predict within the same range, which revealed the shortage of this forecast algorithm. Overall, the best performance in this case was achieved using the BFM (Table 3).

Overall City Evaluation
The city-averaged forecast results for Shanghai from the 2019 testing dataset are presented in Table 4  To analyze the error source, the RMSEs and MREs of the hourly prediction results for the next 24 h were calculated and compared ( Figure 5). From the viewpoint of the RMSE variation over time, the RMSE of BFM was less than those of RAEMS and LGBM most of the time. After the third forecast hour, the RMSE of the three models increased dramatically and then fluctuated. There were two decreasing trends, from the 5th to the 10th forecast hour (11:00 a.m. to 16 p.m.) and from the 13th to the 22nd forecast hour (19:00 p.m. to 3 a.m. the next day). Both the BFM and LGBM reduced the RMSE and MAE of RAEMS. The smallest MRE of the three models appeared at the 8th forecast hour, which was at 13:00 in the afternoon, whereas the largest MRE appeared at the 22nd forecast hour, which was at 3:00 in the morning. These typical statistical results showed differences between the daytime and nighttime forecasts. During the daytime, particularly in the afternoon, the RMSE and MRE decreased to a relatively low level, whereas during the nighttime, they increased to a higher level. The results were in accord with those of a previous study [62].  Figure 5. Variation of (a) forecast RMSE and (b) MRE over time.

Station Evaluation
To evaluate the model error over Shanghai occurring from geographic differences, forecasts at Chongming (island station), Xujiahui (urban station), and Qingpu (suburban station) were investigated. Three typical synoptic stations are shown in Figure 5. In addition, because the BFM and LGBM predictions were both based on RAEMSforecasted data, the error of RAEMS had a direct influence on the accuracy of the machine learning models. If the errors of RAEMS were large or had high randomness, XGBoost and LGBM would not recognize such errors, leading to a decrease in the accuracy of the visibility forecast. In this study, the two most dominant factors were the errors of the surface PM 2.5 and the relative humidity from RAEMS. The correlation coefficient between the two above factors and the BFM forecasted visibility error were −0.39 and −0.45, respectively.

Station Evaluation
To evaluate the model error over Shanghai occurring from geographic differences, forecasts at Chongming (island station), Xujiahui (urban station), and Qingpu (suburban station) were investigated. Three typical synoptic stations are shown in Figure 5. Chongming Station is located on Chongming Island, north of Shanghai. Xujiahui is the core urban area of Shanghai. Qingpu is an inland suburban area located west of Shanghai. Statistics of 24 h forecast using RAEMS, BFM, and LGBM were illustrated to identify the detailed differences among the different model algorithms and different areas ( Figure 6 and Table 5). Figure 6 indicates that BFM and LGBM improved the prediction to a large extent. The distribution of BFM revealed the advantages of RAEMS (25th percentile) and LGBM (median and 75th percentile). The comparison also focused on MB, RMSE, MAE, MRE, and CC between the observed and predicted visibility. The 24 and 48 h predictions of BFM and LGBM had a strong linear correlation (greater than 0.5) with the observed visibility at all three stations, which was more than 0.1 higher than that of RAEMS. All mean biases for the three stations were negative, which means that the average prediction underestimated the observation. The statistical indexes including RMSE, MAE, and MRE for the three stations showed that those of BFM were the smallest, whereas those of RAEMS were the largest. For Xujiahui Station, the predicted RMSE of RAEMS reached over 10 km, but with a significant improvement by the BFM to 4.32 km (24 h) and 5.69 km (48 h). A smaller MAE and MRE also appeared at Xujiahui and Qingpu, as compared with Chongming. Along with the fact that the observed visibility of Chongming had a larger deviation than the other inland stations, it was more difficult to achieve a similar forecast performance for the island areas than for the inland areas. tained a lower CSI and ETS than the BFM. In the inland suburb area (Qingpu), although the BFM had the highest FAR, its CSI and ETS were both the highest among the three models. When comparing the forecasts for the three areas, the ETS of RAEMS, BFM, and LGBM were 0.29, 0.33, and 0.29, respectively, for the forecasting visibility in the suburban area (Qingpu), which were all higher than those in urban and island areas, as were the CSI values. These results prove that low visibility in the island area was more difficult to predict from another perspective. In total, the binary score results were acceptable. It was concluded that BFM outperformed RAEMS and LGBM with a high correlation and the lowest error over Shanghai.    Another approach to assessing the models is to calculate the scores for the forecasted visibility of the model, regardless of visibility. In this section, low visibility was defined as less than 5 km. Frequency bias (FB), percent correct (PC), probability of omission (PO), probability of detection (POD), false alarm ratio (FAR), critical success index (CSI), and equitable threat score (ETS) were used to assess the performance of low-visibility forecasts. The algorithm details were recorded in a document titled 'Guidelines on Performance Assessment of Public Weather Services (WMO/TD No. 1023)' [63]. According to the score results listed in Table 6, in the island area (Chongming), the BFM had higher FB, POD, and FAR values than the other two models, but with the lowest PO, which indicated a slight over-forecasting of low visibility. The BFM had the highest CSI, whereas LGBM had the highest ETS. In the central urban area (Xujiahui), RAEMS and BFM shared the same PO (0.36) and POD (0.64), which indicates that these two models performed similarly in lowvisibility forecasts. However, with a higher FB and FAR, RAEMS obtained a lower CSI and ETS than the BFM. In the inland suburb area (Qingpu), although the BFM had the highest FAR, its CSI and ETS were both the highest among the three models. When comparing the forecasts for the three areas, the ETS of RAEMS, BFM, and LGBM were 0.29, 0.33, and 0.29, respectively, for the forecasting visibility in the suburban area (Qingpu), which were all higher than those in urban and island areas, as were the CSI values. These results prove that low visibility in the island area was more difficult to predict from another perspective. In total, the binary score results were acceptable. It was concluded that BFM outperformed RAEMS and LGBM with a high correlation and the lowest error over Shanghai.

Discussion
Owing to the rapid increase of meteorological and environmental observational and numerical modeling data in recent years, researchers all over the world are engaging to use machine learning algorithms to conduct local visibility forecast. Many previous works used single algorithms to conduct visibility forecast at single stations and produced high quality results [28][29][30]. In accordance with the studies of Bari and Ouagabi [62] and Caruana and Niculescu-Mizil [64], where XGBoost and LightGBM ranked the top two algorithms among several single linear regression and machine learning methods in visibility forecasts, these two tree-based algorithms were chosen to conduct our prediction tasks. In our study, the correlation coefficient of BFM and LGBM reached 0.8, which was close to the results of Bari and Ouagabi. In fact, the model fusion prediction performed better. Zhang et al. [33] used a machine-learning-based multimodal fusion by combining surface and satellite atmospheric observations to predict the visibility within the Beijing-Tianjin-Hebei region. Similar to their study, both of our approaches used suitable surface and satellite observational data and showed that the RMSE of model fusion was significantly better than any single machine learning method or numerical model, which means that model fusion can significantly improve the local forecast capability. Moreover, the RMSEs of our BFM and LGBM predictions were both less than 6 km, which is smaller than that of Zhang et al.'s fused model prediction (6.71 km). The application of a numerical model system (RAEMS) and different geographic and environmental conditions may be important factors contributing to the lower prediction RMSEs and errors.
In fact, the accuracy of visibility forecasting is a problem in Shanghai, China. Researchers and forecasters have focused to use numerical models to study and predict haze/fog or air quality (PM 2.5 and PM 10 ) in Shanghai [8,65,66]. Zhou et al. [65] found detailed criteria for low visibility based on single variables including PM 2.5 , PM 10 , relative humidity, and wind speed. Zhou et al. [8] also used RAEMS to predict PM 2.5 and PM 10 , which were well forecasted. In addition, Wang et al. [66] investigated that the correlation coefficient between daily observations and regional numerical model predictions based on RAEMS in Shanghai was 0.534 for the entire year from January to December. Compared with these previous studies, our work improved visibility prediction in Shanghai, which revealed an encouraging prospect of using machine-learning based model fusion to conduct visibility forecasts.
The model fusion method has several advantages. First, this method can capture forecast features from the RAEMS numerical modeling and historical observational features from observational stations and MODIS AOD products. It combines numerical modeling and machine learning to enhance the hourly forecast accuracy. Second, it includes a moving weight algorithm while merging XGBoost and LightGBM, thereby applying the best performances of these two boosting methods. Third, the results of this method demonstrate a reliable and effective way to spread to operational visibility forecasts in Shanghai.
Although this study successfully applied machine learning approaches to predict the visibility in Shanghai, certain limitations should be considered. First, as mentioned in Section 3.2.1, the forecast error of WRF-Chem-based RAEMS contributed to the BFM forecast and limited the final accuracy. Second, the model fusion-based machine learning model indicated different forecast effects when applied to individual stations in different areas of Shanghai; thus, there were some limitations in spatial representation. Third, there were many modeling and testing periods in this study, and the stability of the conclusions might have some limitations, which would be further optimized and revised in the future. Eventually, applying other deep learning algorithms, building ensemble forecast techniques, and improving the current machine learning algorithms will increase the accuracy of the visibility forecasting.

Conclusions
A BFM model for visibility prediction using a machine-learning-based model fusion based on multisource data, including RAEMS outputs, air pollutants, meteorological observations, and MODIS AOD data, was established in this study. Machine learning methods (XGBoost and LightGBM) and the numerical prediction method RAEMS were fused to build this prediction model. Three sets of prediction models, BFM, LightGBM based on multisource data (LGBM), and RAEMS were used to conduct prediction tasks. The prediction models were constructed based on the training set from 1 January 2015 to 31 December 2018 after several effective data pre-processing steps, including SMOTE data resampling, a loss function adjustment, and a 10-fold cross verification. Moreover, apart from the basic features (variables), more spatial and temporal gradient features were considered. A test set (from 1 January 2019 to 31 December 2019) was adopted to validate the feasibility of the BFM, RAEMS, and LGBM approaches. Statistical indicators including MB, MAE, MRE, RMSE, and CC confirmed that the machine learning methods improved the RAEMS forecast significantly and consistently. The correlation coefficient and root mean square error of BFM for the next 24/48 h periods were 0.80/0.77, and 5.01/5.47 km, respectively, which are much higher than those of RAEMS. The statistics and binary score analysis for different areas in Shanghai also proved the reliability and accuracy of using BFM, particularly in low-visibility forecasting. Overall, the BFM is a suitable tool for predicting visibility. It provides a more accurate visibility forecast for the next 24 and 48 h