Hybrid Machine Learning–Statistical Method for Anomaly Detection in Flight Data
Abstract
:1. Introduction
2. Literature Review
3. Local Outlier Factor
3.1. Basic Concept
3.2. Implementation
3.2.1. Flight Data
3.2.2. Pre-Processing
3.2.3. Matrix Formation
3.2.4. k Nearest Neighbor
- LOF value approximately equal to one means that the density of point A is comparable to its neighbors, and thus A is not an outlier;
- LOF value less than one means that point A has a higher density than its neighbors, and thus A is an inlier;
- LOF value greater than one means that point A has a lower density than its neighbors, and thus A is an outlier.
3.2.5. LOF Implementation
3.2.6. A Hybrid Statistical Threshold Analysis
- The Inter Quartile Range (IQR) is the difference between the upper quartile (Q3) and the lower quartile (Q1);
- Inner fences are fixed at a distance of 1.5 times IQR below Q1 and above Q3. They are given by:[Q1 − (1.5 × IQR), Q3 + (1.5 × IQR)];
- Outer fences are fixed at a distance of 3 times IQR below Q1 and above Q3. They are given by:[Q1 − (3 × IQR), Q3 + (3 × IQR)];
3.2.7. Post Processing
4. Results and Discussion
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Avions de Transport Régional. Flight Data Monitoring on Atr Aircraft 2016; Avions de Transport Régional: Paris, France, 2016; p. 1. [Google Scholar]
- Zhao, W.; Li, L.; Alam, S.; Wang, Y. An Incremental Clustering Method For Anomaly Detection In Flight Data. Transp. Res. Part C Emerg. Technol. 2021, 132, 103406. [Google Scholar] [CrossRef]
- Mazareanu, E. EBIT Margin of Airlines Worldwide 2010–2022|Statista. Available online: https://www.statista.com/statistics/225856/ebit-margin-of-commercial-airlines-worldwide/ (accessed on 28 June 2022).
- Smart, E. Detecting Abnormalities in Aircraft Flight Data and Ranking Their Impact on the Flight. Ph.D. Thesis, Institute of Industrial Research, University of Portsmouth, Portsmouth, UK, 2011. [Google Scholar]
- Pelleg, D.; Moore, A. Active Learning for Anomaly and Rare Category Detection. In Proceedings of the Advances in Neural Information Processing Systems 17 (NIPS 2004), Vancouver, BC, Canada, 13–18 December 2004. [Google Scholar]
- Srivastava, A.N. Discovering system health anomalies using data mining techniques. In Proceedings of the 2005 Joint Army Navy NASA Airforce Conference on Propulsion, Monterey, CA, USA, 5–8 December 2005. [Google Scholar]
- Das, S.; Matthews, B.L.; Srivastava, A.N.; Oza, N.C. Multiple kernel learning for heterogeneous anomaly detection. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining—KDD, Washington, DC, USA, 25–28 July 2010. [Google Scholar] [CrossRef]
- Li, L.; Das, S.; John Hansman, R.; Palacios, R.; Srivastava, A. Analysis of Flight Data Using Clustering Techniques for Detecting Abnormal Operations. J. Aerosp. Inf. Syst. 2015, 12, 587–598. [Google Scholar] [CrossRef] [Green Version]
- Li, L.; Hansman, R.; Palacios, R.; Welsch, R. Anomaly detection via a Gaussian Mixture Model for flight operation & safety monitoring. Transp. Res. Part C Emerg. Technol. 2015, 64, 45–57. [Google Scholar]
- Melnyk, I.; Banerjee, A.; Matthews, B.; Oza, N. Semi-Markov switching vector autoregressive model-based anomaly detection in aviation systems. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016. [Google Scholar]
- Das, S.; Sarkar, S.; Ray, A.; Srivastava, A.; Simon, D. Anomaly detection in flight recorder data: A dynamic data-driven approach. In Proceedings of the 2013 American Control Conference, Washington, DC, USA, 17–19 June 2013. [Google Scholar] [CrossRef]
- Bhaduri, K.; Matthews, B.L.; Giannella, C.R. Algorithms for speeding up distance-based outlier detection. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining—KDD, San Diego, CA, USA, 21–24 August 2011. [Google Scholar] [CrossRef] [Green Version]
- Bay, S.; Schwabacher, M. Mining Distance-Based Outliers in Near Linear Time with Randomization and A Simple Pruning Rule. In Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining—KDD, Washington, DC, USA, 24–27 August 2003. [Google Scholar]
- Matthews, B.; Das, S.; Bhaduri, K.; Das, K.; Martin, R.; Oza, N. Discovering Anomalous Aviation Safety Events Using Scalable Data Mining Algorithms. J. Aerosp. Inf. Syst. 2014, 11, 482. [Google Scholar] [CrossRef] [Green Version]
- Oehling, J.; Barry, D. Using Machine Learning Methods in Airline Flight Data Monitoring To Generate New Operational Safety Knowledge From Existing Data. Saf. Sci. 2019, 114, 89–104. [Google Scholar] [CrossRef]
- Megatroika, A.; Galinium, M.; Mahendra, A.; Ruseno, N. Aircraft anomaly detection using algorithmic model and data model trained on FOQA data. In Proceedings of the 2015 International Conference on Data and Software Engineering (Icodse), Yogyakarta, Indonesia, 25–26 November 2015. [Google Scholar] [CrossRef]
- Nanduri, A.; Sherry, L. Anomaly detection in aircraft data using Recurrent Neural Networks (RNN). In Proceedings of the 2016 Integrated Communications Navigation and Surveillance (ICNS), Herndon, VA, USA, 19–21 April 2016. [Google Scholar] [CrossRef]
- Breunig, M.; Kriegel, H.; Ng, R.; Sander, J. LOF: Identifying density-based local outliers. In Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data—SIGMOD, Dallas, TX, USA, 15–18 May 2000. [Google Scholar] [CrossRef]
- DASHlink—Sample Flight Data. Available online: https://c3.nasa.gov/dashlink/projects/85/ (accessed on 18 February 2018).
- Boeing. Statistical Summary of Commercial Jet Airplane Accidents; Boeing: Seattle, WA, USA, 2021; p. 14. [Google Scholar]
- Airbus. A Statistical Analysis of Commercial Aviation Accidents 1958–2021; Airbus: Blagnac, France, 2022; p. 27. [Google Scholar]
- Aggarwal, C.; Hinneburg, A.; Keim, D. On The Surprising Behavior of Distance Metrics in High Dimensional Space. In Database Theory—ICDT; Van den Bussche, J., Vianu, V., Eds.; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2001; Volume 1973, pp. 420–434. [Google Scholar] [CrossRef]
Flights | LOF Score (k = 13) | LOF Score (k = 49) | LOF Score (k = 80) |
---|---|---|---|
Flight 1 | 1.89719894601340 | 1.794999571605941 | 1.61952094645704 |
Flight 2 | 1.24792810898067 | 1.09357548151929 | 1.00903988539139 |
Flight 3 | 1.20925468652125 | 1.12810045075146 | 1.07174161543744 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Jasra, S.K.; Valentino, G.; Muscat, A.; Camilleri, R. Hybrid Machine Learning–Statistical Method for Anomaly Detection in Flight Data. Appl. Sci. 2022, 12, 10261. https://doi.org/10.3390/app122010261
Jasra SK, Valentino G, Muscat A, Camilleri R. Hybrid Machine Learning–Statistical Method for Anomaly Detection in Flight Data. Applied Sciences. 2022; 12(20):10261. https://doi.org/10.3390/app122010261
Chicago/Turabian StyleJasra, Sameer Kumar, Gianluca Valentino, Alan Muscat, and Robert Camilleri. 2022. "Hybrid Machine Learning–Statistical Method for Anomaly Detection in Flight Data" Applied Sciences 12, no. 20: 10261. https://doi.org/10.3390/app122010261
APA StyleJasra, S. K., Valentino, G., Muscat, A., & Camilleri, R. (2022). Hybrid Machine Learning–Statistical Method for Anomaly Detection in Flight Data. Applied Sciences, 12(20), 10261. https://doi.org/10.3390/app122010261