Visibility Prediction Based on Machine Learning Algorithms
Abstract
:1. Introduction
2. Data and Methods
2.1. Data
2.2. Method
- Step 1: Data preprocessing for ground observation data from January 2016 to January 2020, such as normalization and outlier removal.
- Step 2: Data preprocessing using PCA.
- Step 3: Selected elements used as initial data and divided into two categories: data without PCA and data with PCA added.
- Step 4: Evaluation of the accuracy of the six intelligent classification algorithms. Confusion matrices used to pick out algorithms that are better at predicting low visibility.
- Step 5: Selection of classification algorithms with better effects. A weather process was used to predict the visibility and compare the predicted results with the actual visibility and numerical weather forecast results.
2.2.1. Principal Component Analysis (PCA)
- Step 1: Original data formed into a matrix X with m rows and n columns. Rows represent time series, and columns represent features.
- Step 2: Matrix X is converted to mean of zero.
- Step 3: Covariance matrix is calculated.
- Step 4: Eigenvalues of the covariance matrix and corresponding eigenvectors are calculated.
- Step 5: Eigenvalues are arranged from large to small and composed into a matrix in the order of eigenvalues. k eigenvectors form the matrix U.
- Step 6: Y = UTX is the new data reduced to k dimensions.
2.2.2. Machine Learning Classification Algorithm
- (1)
- LDA is used to reduce the dimensionality of the input sample.
- (2)
- According to the probability density function, the probability that the reduced dimensionality sample belongs to each class is calculated.
- (3)
- The category corresponding to the largest probability is identified as the predicted category.
- Step 1: Network structure is chosen.
- Step 2: Weights are randomly initialized.
- Step 3: Forward propagation FP algorithm is executed.
- Step 4: The cost function J is calculated through the code.
- Step 5: The backpropagation algorithm is executed.
- Step 6: Gradient check is performed.
- Step 7: Function J is minimized using the optimization and backpropagation algorithms.
3. Experiment Design
3.1. Accuracy Test
3.2. Confusion Matrix
3.3. Case Analysis
4. Discussion
- Ground observation and radiosonde data in Chengdu were used in the manuscript.
- Six representative machine learning algorithms were used to predict visibility. We compared six machine learning algorithms for visibility prediction.
- We judge the usefulness of PCA by comparing visibility accuracy with and without PCA.
- We compare ECWMF, NECP, and neural networks with actual observation data.
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Conflicts of Interest
References
- WHO. Guide to Meteorological Instruments and Methods of Observation, 8th ed.; WHO: Geneva, Switzerland, 2018. [Google Scholar]
- Horvath, H. Atmospheric visibility. Atmos. Environ. 1967, 15, 1785–1796. [Google Scholar] [CrossRef]
- Deng, J.; Wang, T.; Jiang, Z.; Xie, M.; Zhang, R.; Huang, X.; Zhu, J. Characterization of visibility and its affecting factors over Nanjing, China. Atmos. Res. 2011, 101, 681–691. [Google Scholar] [CrossRef]
- Zhenyu, L.; Bingjian, L.; Hengde, Z.; You, F.; Yunan, Q.; Tianming, Z. A method of visibility forecast based on hierarchical sparse representation. J. Vis. Commun. Image Represent. 2019, 58, 160–165. [Google Scholar]
- Stewart, D.A.; Essenwanger, O.M. A survey of fog and related optical propagation characteristics. Rev. Geophys. 1982, 20, 481–495. [Google Scholar] [CrossRef]
- Marzban, C.; Leyton, S.; Colman, B. Ceiling and Visibility Forecasts via Neural Networks. Weather. Forecast. 2007, 22, 466–479. [Google Scholar] [CrossRef]
- Deng, T. Visibility Forecast for Airport Operations by LSTM Neural Work. Master’s Thesis, Shandong University, Shandong, China, 2019. [Google Scholar]
- Kaipeng, Z. Study on Characteristics and Forecast of Visibility in Bohai Rim Region. Master’s Thesis, Lanzhou University, Lanzhou, China, 2019. [Google Scholar]
- Chong, L. The Research of Multi-Dimensional Visibility on Ocean Based on Machine Learning. Master’s Thesis, Nanjing University of Information Science and Technology, Nanjing, China, 2019. [Google Scholar]
- Luying, J.; Xiefei, Z.; Shoupeng, Z.; Klaus, F. Probabilistic Precipitation Forecasting over East Asia Using Bayesian Model Averaging. Weather. Forecast. 2019, 34, 377–392. [Google Scholar]
- Hansen, B. A Fuzzy Logic Based Analog Forecasting System for Ceiling and Visibility. Weather. Forecast. 2010, 22, 1319. [Google Scholar] [CrossRef]
- Bari, D.; Khlifi, M.E. LVP conditions at Mohamed V airport, Morocco: Local characteristics and prediction using neural networks. Int. J. Basic Appl. Sci. 2015, 4, 354. [Google Scholar] [CrossRef]
- Dutta, D.; Chaudhuri, S. Nowcasting visibility during wintertime fog over the airport of a metropolis of India: Decision tree algorithm and artificial neural network approach. Nat. Hazards 2015, 75, 1349–1368. [Google Scholar] [CrossRef]
- Cornejo-Bueno, S.; Casillas-Pérez, D.; Cornejo-Bueno, L.; Chidean, M.I.; Caamaño, A.J.; Sanz-Justo, J.; Casanova-Mateo, C.; Salcedo-Sanz, S. Persistence Analysis and Prediction of Low-Visibility Events at Valladolid Airport, Spain. Atmosphere 2020, 12, 1045. [Google Scholar] [CrossRef]
- Salcedo-Sanz, S. Statistical Analysis and Machine Learning Prediction of Fog-Caused Low-Visibility Events at A-8 Motor-Road in Spain. Atmosphere 2021, 12, 679. [Google Scholar]
- Castillo-Botón, C.; Casillas-Pérez, D.; Casanova-Mateo, C.; Ghimire, S.; Cerro-Prada, E.; Gutierrez, P.A.; Deo, R.C.; Salcedo-Sanz, S. Machine learning regression and classification methods for fog events prediction. Atmos. Res. 2022, 272, 106157. [Google Scholar] [CrossRef]
- Chen, F.; Peng, Y.U.; Li, L.I. Preliminary Analysis of Chengdu Shuangliu Airport’s Prevailing Visibility Data over the Years and the Realization of R. Comput. Knowl. Technol. 2012, 27, 6428–6433. [Google Scholar]
- Li, L.; Tan, Q.; Zhang, Y.; Feng, M.; Qu, Y.; An, J.; Liu, X. Characteristics and source apportionment of PM2. 5 during persistent extreme haze events in Chengdu, southwest China. Environ. Pollut. 2017, 230, 718–729. [Google Scholar] [CrossRef]
- Roach, W.T.; Brown, R.; Caughey, S.J.; Garland, J.A.; Readings, C. The physics of radiation fog: I—A field study. Q. J. R. Meteorol. Soc. 1976, 102, 313–333. [Google Scholar] [CrossRef]
- Abdi, H.; Williams, L.J. Principal component analysis. Wiley Interdiscip. Rev. Comput. Stat. 2010, 2, 433–459. [Google Scholar] [CrossRef]
- Myles, A.J.; Feudale, R.N.; Liu, Y.; Woody, N.A.; Brown, S.D. An introduction to decision tree modeling. J. Chemom. A J. Chemom. Soc. 2004, 18, 275–285. [Google Scholar] [CrossRef]
- Balakrishnama, S.; Ganapathiraju, A. Linear discriminant analysis-a brief tutorial. Inst. Signal Inf. Process. 1998, 18, 1–8. [Google Scholar]
- Webb, G.I.; Keogh, E.; Miikkulainen, R.; Bayes, N. Encyclopedia of Machine Learning; Springer: Berlin/Heidelberg, Germany, 2010; Volume 15, pp. 713–714. [Google Scholar]
- Chauhan, V.K.; Dahiya, K.; Sharma, A. Problem formulations and solvers in linear SVM: A review. Artif. Intell. Rev. 2019, 52, 803–855. [Google Scholar] [CrossRef]
- Keller, J.M.; Gray, M.R.; Givens, J.A. A fuzzy k-nearest neighbor algorithm. IEEE Trans. Syst. Man Cybern. 1985, 4, 580–585. [Google Scholar] [CrossRef]
- Priddy, K.L.; Keller, P.E. Artificial Neural Networks: An Introduction; SPIE Press: Bellingham, WA, USA, 2005. [Google Scholar]
- Bonavita, M.; Hólm, E.; Isaksen, L.; Fisher, M. The evolution of the ECMWF hybrid data assimilation system. Q. J. R. Meteorol. Soc. 2016, 142, 287–303. [Google Scholar] [CrossRef]
- Rodell, M.; Houser, P.R.; Jambor, U.E.A.; Gottschalck, J.; Mitchell, K.; Meng, J.; Arsenault, K.; Brian, C.; Radakovich, J.; Entin, J.K.; et al. The global land data assimilation system. Bull. Am. Meteorol. Soc. 2004, 85, 381–394. [Google Scholar] [CrossRef]
Dataset Source | Name | Time Resolution | Location |
---|---|---|---|
Ground observation data | Total cloud cover | Every hour from January 2016 to January 2020 | Chengdu |
Low cloud cover | Every hour from January 2016 to January 2020 | ||
Wind direction | Every hour from January 2016 to January 2020 | ||
Wind speed | Every hour from January 2016 to January 2020 | ||
Temperature | Every hour from January 2016 to January 2020 | ||
Humidity | Every hour from January 2016 to January 2020 | ||
Vapor pressure | Every hour from January 2016 to January 2020 | ||
Dew point temperature | Every hour from January 2016 to January 2020 | ||
Atmospheric pressure | Every hour from January 2016 to January 2020 | ||
Visibility | Every hour from January 2016 to January 2020 |
Algorithms | Key Settings 1 | Key Settings 2 | Key Settings 3 | Key Settings 4 |
---|---|---|---|---|
Decision tree | The maximum number of splits: 100 | The split criterion: the Gini diversity index | ||
Linear discriminant | Covariance structure: Full | |||
Naive Bayes | The numerical predictor of Naive Bayes: Gaussian | |||
Linear SVM | The kernel function: linear | The kernel scale: automatic | The box constraint level: 1 | The multi-class method: one-to-one |
KNN | The number of neighbors: 1 | The distance metric: Euclidean | The distance weight: equidistant | |
Neural Networks | The number of fully connected layers: 2 | The size of the first layer and the second layer: 10 | The activation function: ReLU | The iteration limit: 1000 |
Times(h) | 2 | 4 | 6 | 8 | 10 | 12 | |
---|---|---|---|---|---|---|---|
Algorithms | |||||||
Decision tree | 87 | 78.2 | 65.4 | 65.7 | 69.9 | 64.2 | |
Linear discriminant | 82.1 | 79.5 | 63.9 | 64.7 | 71.9 | 61.9 | |
Naive Bayes | 80.8 | 68.4 | 59 | 57.8 | 62.5 | 56.6 | |
Linear SVM | 88.4 | 79.6 | 64.2 | 64.8 | 69.8 | 61.3 | |
KNN | 77 | 73.2 | 57 | 57.8 | 64.8 | 55.4 | |
Neural Networks | 87.6 | 79.4 | 66 | 66.4 | 72 | 65.2 |
Times(h) | 2 | 4 | 6 | 8 | 10 | 12 | |
---|---|---|---|---|---|---|---|
Algorithms | |||||||
Decision tree | 87.1 | 77.3 | 65.3 | 64 | 70.4 | 64.8 | |
Linear discriminant | 83.6 | 79.1 | 63.9 | 63.8 | 71.1 | 62.3 | |
Naive Bayes | 85.6 | 77.8 | 61.6 | 62.2 | 70.1 | 57.3 | |
Linear SVM | 88.3 | 79.6 | 64.5 | 64.6 | 69.8 | 61.3 | |
KNN | 84 | 72.9 | 58.2 | 58.1 | 64.3 | 56.9 | |
Neural Networks | 88.5 | 79.5 | 66.6 | 64.7 | 72.2 | 66.6 |
0–20%, 20–40%, 40–60%, 60–80%, 80–100% | |||||||||||||
2 h | 4 h | 6 h | |||||||||||
<1 km | 1–2 km | 2–4 km | >4 km | <1 km | 1–2 km | 2–4 km | >4 km | <1 km | 1–2 km | 2–4 km | >4 km | ||
Decision tree | TPR | 53.80% | 66.30% | 84.60% | 89.50% | 0.00% | 3.10% | 27.60% | 72.00% | 24.80% | 29.20% | 56.00% | 81.40% |
TNR | 86.97% | 77.31% | 82.83% | 91.35% | 23.31% | 12.33% | 36.41% | 66.83% | 31.25% | 33.10% | 65.23% | 91.89% | |
Linear discriminant | TPR | 0.00% | 0.00% | 86.00% | 86.30% | 0.00% | 0.00% | 7.50% | 92.90% | 8.30% | 4.60% | 76.40% | 75.10% |
TNR | 0.00% | 2.3% | 73.20% | 87.22% | 0.00% | 3.33% | 10.20% | 98.20% | 12.30% | 6.33% | 83.22% | 82.12% | |
Naive Bayes | TPR | 73.10% | 54.70% | 76.40% | 84.60% | 52.40% | 53.10% | 68.70% | 83.30% | 39.50% | 50.20% | 41.60% | 73.40% |
TNR | 63.33% | 52.21% | 45.23% | 93.21% | 46.60% | 63.78% | 82.03% | 91.21% | 46.00% | 62.23% | 51.01% | 82.88% | |
Linear SVM | TPR | 46.20% | 66.30% | 86.30% | 91.20% | 0.00% | 0.00% | 0.00% | 100.00% | 0.00% | 0.00% | 75.70% | 77.10% |
TNR | 53.39% | 78.25% | 89.35% | 96.23% | 0.00% | 2.00% | 3.33% | 98.33% | 5.2% | 3.2% | 68.2% | 88.20% | |
KNN | TPR | 34.60% | 25.60% | 67.80% | 85.50% | 31.20% | 22.40% | 53.20% | 87.40% | 24.60% | 32.10% | 46.20% | 72.40% |
TNR | 35.6% | 24.40% | 72.20% | 79.50% | 28.80% | 37.60% | 67.80% | 93.60% | 35.60% | 27.90% | 43.80% | 87.60% | |
Neural Networks | TPR | 50.00% | 58.70% | 80.60% | 90.60% | 0.00% | 2.50% | 29.90% | 69.10% | 28.00% | 31.90% | 59.30% | 82.40% |
TNR | 67..00% | 55.30% | 89.40% | 89.40% | 0.00% | 5.50% | 36.10% | 77.90% | 33.06% | 88.10% | 60.70% | 77.60% | |
8 h | 10 h | 12 h | |||||||||||
<1 km | 1–2 km | 2–4 km | >4 km | <1 km | 1–2 km | 2–4 km | >4 km | <1 km | 1–2 km | 2–4 km | >4 km | ||
Decision tree | TPR | 23.20% | 26.30% | 46.40% | 83.10% | 12.50% | 23.90% | 37.00% | 86.70% | 3.70% | 11.50% | 27.10% | 92.50% |
TNR | 36.80% | 56.70% | 51.60% | 78.90% | 15.50% | 26.10% | 43.00% | 83.90% | 5.40% | 15.20% | 36.30% | 88.20% | |
Linear discriminant | TPR | 6.50% | 4.40% | 62.00% | 81.50% | 0.00% | 3.90% | 22.00% | 95.80% | 0.00% | 0.00% | 6.10% | 99.20% |
TNR | 3.50% | 5.60% | 78.00% | 78.50% | 0.00% | 6.10% | 32.20% | 96.40% | 0.00% | 0.00% | 23.90% | 89.80% | |
Naive Bayes | TPR | 59.40% | 31.90% | 34.30% | 74.60% | 51.40% | 29.30% | 30.40% | 76.00% | 22.20% | 54.60% | 31.40% | 77.00% |
TNR | 52.60% | 23.10% | 54.40% | 87..45% | 63.36% | 36.75% | 38.67% | 87.70% | 17.87% | 56.44% | 35.64% | 87.40% | |
Linear SVM | TPR | 0.00% | 0.00% | 64.30% | 81.40% | 0.00% | 0.00% | 0.30% | 99.90% | 0.00% | 0.00% | 0.00% | 100.00% |
TNR | 0.00% | 0.00% | 71.36% | 88.60% | 0.00% | 0.00% | 9.70% | 96.32% | 0.00% | 7.21% | 12.36% | 96.45% | |
KNN | TPR | 21.30% | 27.30% | 41.10% | 74.30% | 15.30% | 24.30% | 33.30% | 79.30% | 0.00% | 17.50% | 32.60% | 85.20% |
TNR | 18.70% | 32.50% | 68.96% | 85.70% | 24.74% | 25.70% | 36.75% | 82.77% | 0.00% | 22.55% | 47.48% | 86.32% | |
Neural Networks | TPR | 24.50% | 28.20% | 49.00% | 83.60% | 22.20% | 24.30% | 33.10% | 89.90% | 3.70% | 15.30% | 26.00% | 93.40% |
TNR | 35.55% | 31.53% | 41.52% | 76.45% | 37.85% | 35.73% | 26.50% | 90.10% | 6.35% | 14.3% | 31.23% | 94.33% |
Ct | Cl | Wd (°) | Ws (m/s) | T (°C) | H (%) | Vp (hPa) | Td (°C) | V (km) | P (hPa) |
---|---|---|---|---|---|---|---|---|---|
10 | 3 | 10 | 0 | 18.8 | 96 | 20.8 | 18.1 | 15 | 1004.6 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhang, Y.; Wang, Y.; Zhu, Y.; Yang, L.; Ge, L.; Luo, C. Visibility Prediction Based on Machine Learning Algorithms. Atmosphere 2022, 13, 1125. https://doi.org/10.3390/atmos13071125
Zhang Y, Wang Y, Zhu Y, Yang L, Ge L, Luo C. Visibility Prediction Based on Machine Learning Algorithms. Atmosphere. 2022; 13(7):1125. https://doi.org/10.3390/atmos13071125
Chicago/Turabian StyleZhang, Yu, Yangjun Wang, Yingqian Zhu, Lizhi Yang, Lin Ge, and Chun Luo. 2022. "Visibility Prediction Based on Machine Learning Algorithms" Atmosphere 13, no. 7: 1125. https://doi.org/10.3390/atmos13071125
APA StyleZhang, Y., Wang, Y., Zhu, Y., Yang, L., Ge, L., & Luo, C. (2022). Visibility Prediction Based on Machine Learning Algorithms. Atmosphere, 13(7), 1125. https://doi.org/10.3390/atmos13071125