Identification of Airline Turbulence Using WOA-CatBoost Algorithm in Airborne Quick Access Record (QAR) Data
Abstract
:Featured Application
Abstract
1. Introduction
2. Related Work
2.1. CatBoost Classifier
- Initialize model parameters: initially, the parameters of the CatBoost classifier need to be set;
- Build initial tree: CatBoost employs a GBDT as the fundamental classifier. During the training process, an initial decision tree is constructed as the base model;
- Iterative optimization: Through iterative optimization, the initial base model is progressively improved. At each iteration, the CatBoost algorithm calculates residuals (i.e., the differences between predicted values and actual values) and then constructs a new tree to reduce these residuals;
- Feature scaling: at each iteration, CatBoost scales the features based on their distribution to enhance the stability and generalization ability of the model;
- Ensemble of trees: by integrating the predictions of multiple tree models, during the training process, each new tree model is integrated with the previous tree models, as shown in Figure 1;
- Early stopping strategy: it monitors the performance of the validation set during training and stops training early when the model’s performance no longer improves, thus avoiding overfitting;
- Model evaluation: after training, the trained CatBoost model can be evaluated using test data to understand its performance on unseen data;
- Model application: the trained CatBoost model can be applied to practical classification tasks to predict class labels for unknown data.
Kernel Parameters
2.2. Whale Optimization Algorithm
3. Method
3.1. WOA-CatBoost for Turbulence Classification
3.1.1. Fitness Definition
3.1.2. The Procedures of the WOA-CatBoost Model
4. Experimental Framework
4.1. Data Description
4.2. Data Preprocessing
4.2.1. Feature Selection Using RFE Algorithm
- Input all features into the CatBoost model and obtain the model’s performance evaluation metric (accuracy).
- Based on the weights or importance of features, selectively remove the feature with the lowest performance evaluation metric ranking (the feature with the minimum weight) from the feature set, resulting in a new feature set.
- Retrain the model and compute the model’s performance evaluation metric.
- Repeat steps 2 and 3, removing one feature at a time, until the number of features reaches the predetermined value, or it is no longer possible to remove features.
4.2.2. EDR Data Interpolation
4.3. Model Training
5. Results
5.1. Evaluation Indicators
5.2. Fitness Value Curve
5.3. Comparison Experiments
5.4. Application
6. Discussion and Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
Acronym | Full Name |
WOA | Whale optimization algorithm |
CatBoost | Categorical Boosting |
QAR | Quick Access Record |
EDR | Eddy Dissipation Rate |
RFE | Recursive Feature Elimination algorithm |
GBDT | Gradient Boosting Decision Tree |
ROC | Receiver Operating Characteristic |
TPR | True Positive Rate |
FPR | False Positive Rate |
AUC | Area Under the Curve |
SVM | Support Vector Machine |
References
- Huang, R.; Sun, H.; Wu, C.; Wang, C.; Lu, B. Estimating Eddy Dissipation Rate with QAR Flight Big Data. Appl. Sci. 2019, 9, 5192. [Google Scholar] [CrossRef]
- Kim, S.-H.; Kim, J.; Kim, J.-H.; Chun, H.-Y. Characteristics of the Derived Energy Dissipation Rate Using the 1-Hz Commercial Aircraft Quick Access Recorder (QAR) Data. Atmos. Meas. Tech. Discuss. 2021, 15, 2277–2298. [Google Scholar] [CrossRef]
- Sharman, R.; Tebaldi, C.; Wiener, G.; Wolff, J. An Integrated Approach to Mid- and Upper-Level Turbulence Forecasting. Weather Forecast. 2006, 21, 268–287. [Google Scholar] [CrossRef]
- Schwartz, B. The Quantitative Use of PIREPs in Developing Aviation Weather Guidance Products. Weather Forecast. 1996, 11, 372–384. [Google Scholar] [CrossRef]
- Bass, E.J. Turbulence Assessment and Decision-Making on the Flight Deck and in the Cabin. Hum. Factors Aerosp. Saf. 2001, 1, 267–294. [Google Scholar]
- Sharman, R.D.; Cornman, L.B.; Meymaris, G.; Pearson, J.; Farrar, T. Description and Derived Climatologies of Automated In Situ Eddy-Dissipation-Rate Reports of Atmospheric Turbulence. J. Appl. Meteorol. Climatol. 2014, 53, 1416–1432. [Google Scholar] [CrossRef]
- ICAO. Annex3: Meteorological Service for International Air Navigation. 2010. Available online: https://www.icao.int/airnavigation/IMP/Documents/Annex%203%20-%2075.pdf (accessed on 17 April 2024).
- Duraisamy, K.; Spalart, P.R.; Rumsey, C.L. Status, Emerging Ideas and Future Directions of Turbulence Modeling Research in Aeronautics. Turbulence Modeling Symposium, Ann Arbor, Michigan, American (July 2017). Available online: https://ntrs.nasa.gov/search?q=Status,%20Emerging%20Ideas%20and%20Future%20Directions%20of%20Turbulence%20Modeling%20Research%20in%20Aeronautics (accessed on 17 April 2024).
- Zhuang, Z.; Lin, K.; Zhang, H.; Chan, P.W. Detection of Turbulence Anomalies Using a Symbolic Classifier Algorithm in Airborne Quick Access Record (QAR) Data Analysis. Adv. Atmos. Sci. 2024. [Google Scholar] [CrossRef]
- Duraisamy, K.; Iaccarino, G.; Xiao, H. Turbulence Modeling in the Age of Data. Annu. Rev. Fluid Mech. 2019, 51, 357–377. [Google Scholar] [CrossRef]
- Haverdings, H.; Chan, P.W. Quick Access Recorder Data Analysis Software for Windshear and Turbulence Studies. J. Aircr. 2010, 47, 1443–1447. [Google Scholar] [CrossRef]
- Cotter, A.; Williams, J.; Goodrich, R.; Craig, J. A Random Forest Turbulence Prediction Algorithm. In Proceedings of the 5th AMS Conference on Artificial Intelligence Applications to Environmental Science, San Antonio, TX, USA, 14–18 January 2007. [Google Scholar]
- Emara, M.; Santos, M.; Chartier, N.; Ackley, J.; Puranik, T.; Payan, A.; Kirby, M.; Pinon, O.; Mavris, D. Machine Learning Enabled Turbulence Prediction Using Flight Data For Safety Analysis. In Proceedings of the 32nd Congress of the International Council of the Aeronautical Sciences, Shanghai, China, 6–10 September 2021. [Google Scholar]
- Sun, H.; Jiao, Y.; Han, J.; Wang, C. A Novel Temporal-Spatial Analysis System for QAR Big Data. In Proceedings of the 2017 IEEE 17th International Conference on Communication Technology (ICCT), Chengdu, China, 27–30 October 2017. [Google Scholar]
- WMO. Aircraft Meteorological Data Relay (AM-DAR) Reference Manual. 2003. Available online: https://library.wmo.int/viewer/32136/download?file=wmo_958_en.pdf&type=pdf&navigator=1#1.1%20WHAT%20IS%20AMDAR? (accessed on 17 April 2024).
- Lee, J.C.W.; Leung, C.Y.Y.; Kok, M.H.; Chan, P.W. A Comparison Study of EDR Estimates from the NLR and NCAR Algorithms. Atmosphere 2022, 13, 132. [Google Scholar] [CrossRef]
- Wu, M.; Sun, H.; Wang, C.; Lu, B. Detecting and Analysing Spatial-Temporal Aggregation of Flight Turbulence with the QAR Big Data. In Proceedings of the 2018 26th International Conference on Geoinformatics, Kunming, China, 28–30 June 2018. [Google Scholar]
- Mizuno, S.; Ohba, H.; Ito, K. Machine Learning-Based Turbulence-Risk Prediction Method for the Safe Operation of Aircrafts. J. Big Data 2021, 9, 29. [Google Scholar] [CrossRef]
- Tuba, E.; Tuba, M.; Simian, D. Adjusted Bat Algorithm for Tuning of Support Vector Machine Parameters. In Proceedings of the 2016 IEEE Congress on Evolutionary Computation (CEC), Vancouver, BC, Canada, 24–29 July 2016. [Google Scholar]
- Tharwat, A.; Gabel, T.; Hassanien, A.E. Parameter Optimization of Support Vector Machine Using Dragonfly Algorithm. In Proceedings of the International Conference on Advanced Intelligent Systems and Informatics, Cairo, Egypt, 9–11 September 2017; Advances in Intelligent Systems and Computing. Springer: Cham, Switzerland, 2018; pp. 309–319. [Google Scholar]
- Aljarah, I.; Al-Zoubi, A.M.; Faris, H.; Hassonah, M.A.; Mirjalili, S.; Saadeh, H. Simultaneous Feature Selection and Support Vector Machine Optimization Using the Grasshopper Optimization Algorithm. Cogn. Comput. 2018, 10, 478–495. [Google Scholar] [CrossRef]
- Barman, M.; Dev Choudhury, N.B. A Similarity Based Hybrid GWO-SVM Method of Power System Load Forecasting for Regional Special Event Days in Anomalous Load Situations in Assam, India. Sustain. Cities Soc. 2020, 61, 102311. [Google Scholar] [CrossRef]
- Huang, C.L.; Dun, J.F. A Distributed PSO–SVM Hybrid System with Feature Selection and Parameter Optimization. Appl. Soft Comput. 2008, 8, 1381–1391. [Google Scholar] [CrossRef]
- Sarafrazi, S.; Nezamabadi-Pour, H. Facing the Classification of Binary Problems with a GSA-SVM Hybrid System. Math. Comput. Model. 2013, 57, 270–278. [Google Scholar] [CrossRef]
- Yang, D.; Liu, Y.; Li, S.; Li, X.; Ma, L. Gear Fault Diagnosis Based on Support Vector Machine Optimized by Artificial Bee Colony Algorithm. Mech. Mach. Theory 2015, 90, 219–229. [Google Scholar] [CrossRef]
- Li, C.; An, X.; Li, R. A Chaos Embedded GSA-SVM Hybrid System for Classification. Neural Comput. Appl. 2015, 26, 713–721. [Google Scholar] [CrossRef]
- Dong, Z.; Zheng, J.; Huang, S.; Pan, H.; Liu, Q. Time-Shift Multi-Scale Weighted Permutation Entropy and GWO-SVM Based Fault Diagnosis Approach for Rolling Bearing. Entropy 2019, 21, 621. [Google Scholar] [CrossRef] [PubMed]
- Kose, U. A Hybrid SVM-WOA Approach for Intelligent Fault Diagnosis Applications. In Proceedings of the 2019 Innovations in Intelligent Systems and Applications Conference (ASYU), Izmir, Turkey, 31 October–2 November 2019. [Google Scholar]
- Kong, D.; Chen, Y.; Li, N.; Duan, C.; Lu, L.; Chen, D. Tool Wear Estimation in End Milling of Titanium Alloy Using NPE and a Novel WOA-SVM Model. IEEE Trans. Instrum. Meas. 2020, 69, 5219–5232. [Google Scholar] [CrossRef]
- Zhang, F.; Fleyeh, H.; Bales, C. A Hybrid Model Based on Bidirectional Long Short-Term Memory Neural Network and Catboost for Short-Term Electricity Spot Price Forecasting. J. Oper. Res. Soc. 2022, 73, 301–325. [Google Scholar] [CrossRef]
- Lan, C.; Song, B.; Zhang, L.; Fu, L.; Guo, X.; Sun, C. State Prediction of Hydro-Turbine Based on WOA-RF-Adaboost. Energy Rep. 2022, 8, 13129–13137. [Google Scholar] [CrossRef]
- Luo, J.; Gong, Y. Air Pollutant Prediction Based on ARIMA-WOA-LSTM Model. Atmos. Pollut. Res. 2023, 14, 101761. [Google Scholar] [CrossRef]
- Jabeur, S.B.; Gharib, C.; Mefteh-Wali, S.; Arfi, W.B. CatBoost Model and Artificial Intelligence Techniques for Corporate Failure Prediction. Technol. Forecast. Soc. Change 2021, 166, 120658. [Google Scholar] [CrossRef]
- Li, H.X. Research on Credit Risk of P2P Lending Based on CatBoost Algorithm. Finance 2019, 9, 137–141. [Google Scholar] [CrossRef]
- Ibrahim, A.A.; Ridwan, R.L.; Muhammed, M.M.; Abdulaziz, R.O.; Saheed, G.A. Comparison of the CatBoost Classifier with Other Machine Learning Methods. Int. J. Adv. Comput. Sci. Appl. 2020, 11, 738–748. [Google Scholar] [CrossRef]
- Izotova, A.; Valiullin, A. Comparison of Poisson Process and Machine Learning Algorithms Approach for Credit Card Fraud Detection. Procedia Comput. Sci. 2021, 186, 721–726. [Google Scholar] [CrossRef]
- Huang, G.; Wu, L.; Ma, X.; Zhang, W.; Fan, J.; Yu, X.; Zeng, W.; Zhou, H. Evaluation of CatBoost Method for Prediction of Reference Evapotranspiration in Humid Regions. J. Hydrol. 2019, 574, 1029–1041. [Google Scholar] [CrossRef]
- Daoud, E. Comparison between XGBoost, LightGBM and CatBoost Using a Home Credit Dataset. Int. J. Comput. Inf. Eng. 2019, 13, 6–10. [Google Scholar]
- Postnikov, E.B.; Esmedljaeva, D.A.; Lavrova, A.I. A CatBoost Machine Learning for Prognosis of Pathogen’s Drug Resistance in Pulmonary Tuberculosis. In Proceedings of the 2020 IEEE 2nd Global Conference on Life Sciences and Technologies (LifeTech), Kyoto, Japan, 10–12 March 2020. [Google Scholar]
- Kang, Y.; Jang, E.; Im, J.; Kwon, C.; Kim, S. Developing a New Hourly Forest Fire Risk Index Based on Catboost in South Korea. Appl. Sci. 2020, 10, 8213. [Google Scholar] [CrossRef]
- Hancock, J.T.; Khoshgoftaar, T.M. CatBoost for Big Data: An Interdisciplinary Review. J. Big Data 2020, 7, 94. [Google Scholar] [CrossRef]
- Prokhorenkova, L.; Gusev, G.; Vorobev, A.; Dorogush, A.; Gulin, A. CatBoost: Unbiased Boosting with Categorical Features. Arxiv Learn. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, Montréal, QC, Canada, 3–8 December 2018. [Google Scholar]
- Jani, D.; Varadarajan, V.; Parmar, R.; Bohara, M.H.; Garg, D.; Ganatra, A.; Kotecha, K. An Efficient Gait Abnormality Detection Method Based on Classification. J. Sens. Actuator Netw. 2022, 11, 31. [Google Scholar] [CrossRef]
- CatBoost.ai. Available online: https://catboost.ai/en/docs/concepts/python-reference_catboostclassifier (accessed on 17 April 2024).
- Mirjalili, S.; Lewis, A. The Whale Optimization Algorithm. Adv. Eng. Softw. 2016, 95, 51–67. [Google Scholar] [CrossRef]
- Li, Y.L.; Wang, S.Q.; Chen, Q.R. Comparative study of several new swarm intelligence optimization algorithms. Comput. Eng. Appl. 2020, 179, 685–695. [Google Scholar]
- Nguyen, B.H.; Xue, B.; Zhang, M. A Survey on Swarm Intelligence Approaches to Feature Selection in Data Mining. Swarm Evol. Comput. 2020, 54, 100663. [Google Scholar] [CrossRef]
- Li, J.; Cheng, K.; Wang, S.; Morstatter, F.; Trevino, R.P.; Tang, J.; Liu, H. Feature Selection: A Data Perspective. ACM Comput. Surv. 2018, 50, 1–45. [Google Scholar] [CrossRef]
- Yang, R.; Wang, P.; Qi, J. A Novel SSA-CatBoost Machine Learning Model for Credit Rating. J. Intell. Fuzzy Syst. 2023, 44, 2269–2284. [Google Scholar] [CrossRef]
- Rodriguez-Galiano, V.F.; Luque-Espinar, J.A.; Chica-Olmo, M.; Mendes, M.P. Feature Selection Approaches for Predictive Modelling of Groundwater Nitrate Pollution: An Evaluation of Filters, Embedded and Wrapper Methods. Sci. Total Environ. 2018, 624, 661–672. [Google Scholar] [CrossRef]
Model | Parameters | Explanations |
---|---|---|
CatBoost WOA | early_stopping_rounds learning_rate depth iterations population_size iterations fitness | When no improvement is observed in a consecutive set number of iterations, the training process is terminated to prevent overfitting Regulating the learning progress of the model The length of the longest tree that a decision tree can generate Quantity of trees Whale population size Iteration count Individual fitness |
Parameters | Unit | Parameters | Unit |
---|---|---|---|
AOA1 | deg | Pitch angle | deg |
AOA2 | deg | Roll angle | deg |
Gross weight | kilogram | Wind direction | deg |
Altitude | feet | Ground speed | knots |
Latitude | deg | Instruction air speed | knots |
Longitude | deg | Mach | / |
Radio height | feet | True air speed | knots |
Total air temp | °C | Wind speed | knots |
Static air temp | °C | Vertical acceleration | G |
Display heading | deg | Lateral acceleration | G |
Drift angle | deg | Longitudinal acceleration | G |
Vertical speed | feet/min | / | / |
Model | Accuracy | Precision | Recall | F1 | TP + TN | FP + FN |
---|---|---|---|---|---|---|
WOA-CatBoost | 0.95644 | 0.95916 | 0.89886 | 0.92803 | 6718 | 306 |
CatBoost | 0.86062 | 0.74675 | 0.83827 | 0.78987 | 6045 | 979 |
Extra Trees | 0.77919 | 0.64098 | 0.66697 | 0.65372 | 5528 | 1496 |
Random forest | 0.76224 | 0.59865 | 0.72574 | 0.65610 | 5354 | 1670 |
Logistic Regression | 0.70857 | 0.52884 | 0.61822 | 0.57005 | 4977 | 2047 |
SVM | 0.67298 | 0.48678 | 0.85558 | 0.62052 | 4727 | 2297 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhuang, Z.; Li, H.; Shao, J.; Chan, P.-W.; Tai, H. Identification of Airline Turbulence Using WOA-CatBoost Algorithm in Airborne Quick Access Record (QAR) Data. Appl. Sci. 2024, 14, 4419. https://doi.org/10.3390/app14114419
Zhuang Z, Li H, Shao J, Chan P-W, Tai H. Identification of Airline Turbulence Using WOA-CatBoost Algorithm in Airborne Quick Access Record (QAR) Data. Applied Sciences. 2024; 14(11):4419. https://doi.org/10.3390/app14114419
Chicago/Turabian StyleZhuang, Zibo, Haosen Li, Jingyuan Shao, Pak-Wai Chan, and Hongda Tai. 2024. "Identification of Airline Turbulence Using WOA-CatBoost Algorithm in Airborne Quick Access Record (QAR) Data" Applied Sciences 14, no. 11: 4419. https://doi.org/10.3390/app14114419
APA StyleZhuang, Z., Li, H., Shao, J., Chan, P.-W., & Tai, H. (2024). Identification of Airline Turbulence Using WOA-CatBoost Algorithm in Airborne Quick Access Record (QAR) Data. Applied Sciences, 14(11), 4419. https://doi.org/10.3390/app14114419