A Systematic Review of Machine Learning Analytic Methods for Aviation Accident Research
Abstract
1. Introduction
- How do variations in the demographics factors such as aviation application, electronic database, publication type, study year, trends, data sources, and authors influence the comprehensiveness, reliability, and applicability of findings in aviation accident research?
- What types of machine learning approaches have been applied in post-accident analysis within the aviation industry?
- a.
- What specific machine learning types have been employed in this context?
- b.
- Which machine learning tasks have been targeted to enhance post-accident analysis?
- c.
- What are the predominant machine learning algorithms used in the aviation industry for post-accident analysis?
- How have these machine learning techniques contributed to enhancing safety measures and providing insights into aviation accidents?
2. Background
2.1. Artificial Intelligence
2.2. Machine Learning
2.3. Machine Learning Types
- Supervised Learning: In supervised learning, models are trained on labeled datasets, where input data is associated with corresponding desired outputs. The model learns the relationships between inputs and outputs, enabling it to make predictions on new, unseen data. Supervised learning is widely employed in tasks such as classification, where the goal is to assign input data to predefined categories, and regression, where continuous values are predicted based on inputs [19,20].
- Unsupervised Learning: Unsupervised learning involves training models on unlabeled data, with the aim of discovering patterns and structures within the data. Clustering is a common task in unsupervised learning, where similar data points are grouped together. Dimensionality reduction is another application, simplifying the data while preserving its key characteristics [21,22].
- Semi-Supervised Learning: This type of learning combines labeled and unlabeled data to enhance model performance. Often, labeled data is scarce, but unlabeled data is more abundant. By leveraging both types of data, models can generalize better and make more accurate predictions [23].
- Reinforcement Learning: Reinforcement learning is rooted in the concept of learning through interactions, often simulation of varying fidelity and often in environments with significant variability and uncertainty [24]. An algorithm interacts with a system and optimizes variables to maximize cumulative rewards based on an objective function and through trial and error, the system learns optimal strategies to achieve its goals [25].
2.4. Machine Learning Tasks
- Classification: Classification involves assigning input data to predefined categories or classes. The model learns patterns from labeled training data, enabling it to classify new, unseen data accurately. Applications range from email spam detection to medical diagnosis [19].
- Regression: Regression is concerned with predicting continuous values based on input features. The model learns the relationships between variables from training data and can then make predictions on new data points. Examples include predicting housing prices or stock market trends [26].
- Clustering: Clustering involves grouping similar data points together based on inherent patterns in the data. The model identifies clusters or segments within the data without requiring predefined categories. This task finds applications in customer segmentation and anomaly detection [27].
- Data Reduction: Data reduction involves techniques to reduce the complexity of large datasets while retaining crucial information. These techniques help minimize computational overhead, improve model efficiency, and avoid overfitting. Methods like Principal Component Analysis (PCA) and feature selection are commonly used for data reduction [21].
- Natural Language Processing (NLP): NLP is a specialized task involving the interaction between computers and human language. It encompasses various sub-tasks such as sentiment analysis, text generation, and language translation. NLP enables machines to understand, interpret, and generate human language, with applications spanning from chatbots to language translation services [29,30]. Because of its role in communicating, NLP is often a necessary sequential addition to enable the other listed ML tasks. Due to the diverse ontology at play in each domain, it is often necessary to systematically tailor NLP to new domains like extending a dictionary or lexicon [31,32].
Machine Learning Techniques in Aviation Safety
- Support Vector Machine (SVM). SVM is a powerful ML technique commonly used for classification and regression tasks. It works by finding the optimal hyperplane that maximally separates data points belonging to different classes. This hyperplane is selected in such a way that the margin between the classes is maximized, allowing for the effective classification of new, unseen clusters. In the context of aviation safety analysis, SVM has proven to be a valuable tool for categorizing aviation accidents based on their characteristics, such as severity, causal factors, and contributing variables [33]. In that research, SVM was used alongside other techniques such as Random Forest and Naive Bayes to classify aviation accidents over 20 years using the ASRS dataset.
- Decision Tree: This is a powerful tool used for both classification and regression tasks in ML. It takes the form of a tree-like structure where each internal node represents a decision based on a feature, each branch represents an outcome of that decision, and each leaf node represents a classification label or a predicted value. Decision Trees have found applications in aviation safety and analysis due to their ability to handle complex decision-making processes and their transparency. For instance, refs. [34,35,36] utilized Decision Trees, along with other ML algorithms, to classify and reduce data from airline databases spanning 42 years. This research enabled the classification of aircraft accident data into different categories, contributing to insights into warning levels in accidents. Also, another study extended the use of Decision Tree in the aviation safety domain by applying them in conjunction with NLP techniques. Their study involved extracting word-level meaning from safety report narratives and using these meanings to analyze a large dataset of 186,000 reports over 39 years. The combination of NLP and Decision Trees allowed for the extraction of valuable insights from textual data, contributing to improved safety analysis and risk assessment in the aviation industry [37].
- K-Nearest Neighbors (KNN): KNN algorithm is a fundamental ML technique used for classification and regression tasks. It is a non-parametric, instance-based learning method that makes predictions based on the similarity between input data points. KNN is intuitive and easy to understand, making it a popular choice for beginners in ML, noting it leverages the ML to distinguish nearest neighbor in often very multi-factor dimensional space (10 or more). The algorithm operates on the principle that similar data points tend to share common characteristics and attributes. The KNN algorithm’s performance depends on the choice of the parameter “k” and the distance metric used to measure similarity. A smaller “k” value can lead to a noisy prediction, while a larger “k” value can result in a smoother but potentially biased prediction. Additionally, selecting an appropriate distance metric is crucial, as it affects how the algorithm measures similarity between data points. Koteeswaran et al., [38] employed the KNN technique as part of their study on predicting the topmost causes of aviation accidents using data mining algorithms. In this research, along with other ML algorithms like Naive Bayes (NB) and Decision Trees (DT), KNN was used to classify aviation accidents spanning 95 years using data from the FAA. By applying KNN to the FAA dataset, the study identified commonalities and patterns in accidents that could contribute to better accident prevention strategies and safety measures.
- Neural Networks: This is a class of ML algorithms inspired by the structure and functioning of the human brain’s NNs (Neural Networks). They are designed to process and recognize patterns in data, making them highly suitable for tasks involving complex relationships and non-linear interactions. The determinations from neural networks can be difficult to trace, even when they satisfy objective functions and outperform baseline performance of humans or other more explainable AI [39]. NNs consist of interconnected nodes, or artificial neurons, organized in layers, each responsible for specific computations. In the context of aviation safety research, ref. [40] harnessed the power of NNs to address a critical task in aviation safety management systems. Their study involved classifying risk factors within the aviation domain using data obtained from the ASRS over 24 years. Also, the study conducted by [41] involved the application of NN to predict Human Factors Analysis and Classification System (HFACS) unsafe acts based on the pre-conditions of those unsafe acts.
- Naive Bayes (NB): This technique is a probabilistic ML algorithm rooted in Bayes’ theorem. It is particularly well-suited for classification tasks, where the goal is to assign predefined categories or labels to input data based on observed features. Naive Bayes assumes feature independence, which simplifies calculations and makes it a relatively efficient algorithm for text classification and other categorical data. In their study, Koteeswaran et al., [38] used NB to predict the topmost cause of accidents in aviation as a data mining technique. This technique was compared to other techniques such as SVM and KNN explained above and in this instance was the most effective when for their classification tasks.
- Random Forest (RF): RF “combines several randomized decision trees and aggregates their predictions by averaging” [42]. It is particularly effective for classification and regression tasks. RF is based on the concept of creating multiple decision trees during the training phase and combining their predictions to improve accuracy. Each decision tree is constructed using a random subset of the training data and a random subset of features, which helps to introduce diversity and robustness to the model [43]. Zhang & Mahadevan, [9] utilized the Random Forest technique along with Deep Neural Networks (DNN) and SVM for NLP and classification of aviation incident reports from the ASRS dataset. The study focused on analyzing and classifying aviation incident reports spanning an 11-year period. By incorporating RF in their analysis, the researchers leveraged its ensemble capabilities to improve the accuracy and reliability of their classification model, ultimately enhancing the understanding of safety incidents in aviation.
- Latent Dirichlet Allocation (LDA): LDA is a widely used ML technique for topic modeling and document clustering. It’s particularly applicable to textual data analysis, such as the analysis of aviation accident reports [44,45,46,47,48,49]. LDA assumes that each document is a mixture of a small number of topics or themes, and each topic is characterized by a distribution of words. The goal of LDA is to uncover these hidden topics and their associated word distributions from a collection of documents. In [50] study, “Text Mining Classification and Prediction of Aviation Accidents Based on TF-IDF-SVR Method,” various machine learning techniques, including LDA, SVM, NB, RF, and LR, are employed to analyze 20,000 aviation accidents spanning 59 years from NTSB data. LDA, a natural language processing technique, aids in uncovering underlying topics within accident reports, contributing to NLP-driven classification efforts for improved accident prediction and prevention.
- Long Short-Term Memory (LSTM): LSTM is a type of Recurrent Neural Network (RNN) architecture that is well-suited for processing data sequences, such as time series or sequences of text. It is particularly effective in capturing long-range dependencies and patterns within sequential data due to its ability to maintain and update information over extended sequences [51,52,53]. This makes LSTM suitable for tasks that involve sequential data, where past information can significantly impact future predictions and potentially where variability or uncertainty tends to obfuscate or confuse classical algorithms [54]. The research conducted by Zeng et al., focuses on the application of Long Short-Term Memory (LSTM) techniques for aviation safety prediction [55]. Specifically, the study employs LSTM with variable selection methods, including LASSO (Least Absolute Shrinkage and Selection Operator), to predict aviation safety-related outcomes. The data set used in this research is sourced from the ASRS, which collects and analyzes incident and accident reports from aviation professionals. In the context of aviation safety prediction by Zeng et al., LSTM is used to model and analyze the temporal patterns of safety-related incidents [55]. The LSTM network is designed to learn from historical data and capture complex relationships between variables, allowing it to make predictions about potential safety outcomes based on past incident reports.
- Principal Component Analysis (PCA): PCA is a widely used dimensionality reduction technique in ML and data analysis. It aims to transform high-dimensional data into a lower-dimensional space while preserving as much of the original data’s variance as possible. This reduction in dimensionality helps in simplifying the dataset and removing redundant or less informative features, making it easier to work with and potentially improving the performance of machine learning algorithms [56]. In the study conducted by İnan & Gökmen İnan, [57] PCA was applied in conjunction with NNs and DTs (Decision Trees) to classify survivor and non-survivor passengers in fatal aviation accidents based on data reduction. The researchers aimed to identify significant patterns and features that could distinguish between passengers who survived accidents and those who did not.
3. Materials and Methods
3.1. Search Strategy
3.2. Database Searches
3.3. Backward Search
3.4. Selection Criteria
- Inclusion Criteria
- ○
- Relevance to Aviation Accidents: Articles must directly address the application of machine learning techniques within the aviation industry.
- ○
- Publication Language: Only articles written in the English language were considered to ensure accessibility for the research team.
- ○
- Study Focus: The primary focus of the article should be on accident analysis within the aviation/air transport sector.
- ○
- Study Type: Peer-reviewed journal articles, book chapters, theses, and conference proceedings were included only if they presented full-length papers with rigorous methodology and were published in reputable venues such as IEEE or AIAA.
- Exclusion Criteria
- ○
- Non-English Language: Articles published in languages other than English were excluded due to language limitations.
- ○
- Irrelevant Focus: Studies unrelated to aviation accident analysis were excluded.
- ○
- Non-Aviation Applications: Articles discussing machine learning applications in contexts other than aviation accident analysis were excluded.
- ○
- Publication Type: Abstract-only or lightly detailed conference proceedings and other non-peer-reviewed sources were excluded during full-text screening to ensure scholarly rigor.
- ○
- Duplication: Duplicated records were removed to maintain the uniqueness of the dataset.
- ○
- Restricted Access: Articles for which full-text content was not accessible due to restrictions were excluded.
3.5. Data Extraction, Coding, and Quality
3.6. Data Synthesis
4. Results
4.1. Aviation Application
4.2. Sources and Types of Publications
4.3. Duration of Studies
4.4. Publication Trends
4.5. Software Packages Utilized for Analysis
4.6. Data Source Utilized in the Studies
4.7. Country of Publication Analysis
4.8. Notable Contributors in the Literature
4.9. Machine Learning Dimensions
4.9.1. Machine Learning Approaches
4.9.2. Machine Learning Types
4.9.3. Machine Learning Tasks
4.9.4. Machine Learning Algorithms
4.9.5. Machine Learning Summary
4.10. Enhancements to Safety
4.10.1. Risk Assessment and Prediction
4.10.2. Anomaly Detection
4.10.3. Classification of Accidents and Factors
4.10.4. Natural Language Processing (NLP) Applications
4.10.5. Clustering and Pattern Identification
4.10.6. Dimensionality Reduction
4.10.7. Human Factors Analysis
5. Discussion
5.1. Diverse Machine Learning Techniques and Applications
5.2. Data Sources and Quality Assessment
5.3. Safety Enhancement and Insights Generation
5.4. Variability in Study Types and Purposes
5.5. Critical Reflections on Cited Studies
5.6. Future Research Directions
5.6.1. Interpretable and Explainable AI
5.6.2. Real-Time Accident Prediction
5.6.3. Hybrid Models
5.6.4. Handling Unbalanced Data Sets
5.6.5. Privacy and Data Security
5.6.6. Human–Machine Interfaces for Safety Professionals
5.6.7. Regulatory Implications
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
Abbreviation | Full Form |
ACARS | Aircraft Communications Addressing and Reporting System |
ADTs | Alternating Decision Trees |
AI | Artificial Intelligence |
ASN | Aviation Safety Network |
ASR | Automatic Speech Recognition |
ASRS | Aviation Safety Reporting System |
BMR | Bayesian Multi-label Regression |
BN | Bayesian Network |
CNN | Convolutional Neural Network |
DBN | Deep Belief Network |
DL | Deep Learning |
DNN | Deep Neural Network |
DT | Decision Tree |
FAA | Federal Aviation Administration |
GB | Gradient Boosting |
ICAO | International Civil Aviation Organization |
KNN | K-Nearest Neighbors |
LDA | Latent Dirichlet Allocation |
LR | Logistic Regression |
LSA | Latent Semantic Analysis |
LSTM | Long Short-Term Memory |
ML | Machine Learning |
NB | Naive Bayes |
NN | Neural Network |
NLP | Natural Language Processing |
NTSB | National Transportation Safety Board |
PCA | Principal Component Analysis |
RF | Random Forest |
RNN | Recurrent Neural Network |
RST | Rough Set Theory |
SBA | State-Based Approach |
SVDA | Support Vector Discriminant Analysis |
SVM | Support Vector Machine |
Appendix A
ID | Reference | Title | Algorithms Used | Number of Accidents (Period) | ML Tasks | Data Source |
---|---|---|---|---|---|---|
1. | [33] | Setting up new standards in the aviation industry with the help of artificial intelligence—machine learning application | SVM, RF, NB | 3000 (20 years) | Classification, Anomaly Detection | ASRS |
2. | [34] | Prediction of warning level in aircraft accidents using data mining techniques | DT, KNN, SVM, NN and NB, PCA | - (42 years) | Classification, Data reduction | Airline databases |
3. | [64] | Prediction of Warning Level in Aircraft Accidents Using Classification Techniques: An Empirical Study | DT, KNN, SVM, NN, NB | 500 (62 years) | Classification | FAA |
4. | [35] | Large-scale data analysis on aviation accident database using different data mining techniques | DT, NB, SVM, KNN, NN | Clustering | ||
5. | [100] | Feature selection techniques for prediction of warning level in aircraft accidents | PCA, DT | 500 (62 years) | Classification | FAA |
6. | [101] | Enabling the Discovery of Recurring Anomalies in Aerospace Problem Reports using High-Dimensional Clustering Techniques | SVM, NB, LDA, LR, ADT | 62 (-) | Clustering | ASRS |
7. | [46] | Text mining of accident reports using semi-supervised keyword extraction and topic modeling | LDA | 37,678 (9 years) | Data Reduction | ASRS, PHMSA |
8. | [102] | An Ensemble Machine and Deep Learning Model for Risk Prediction in Aviation Systems | DL, SVM, NB | 6 (12 years) | Classification | ASRS |
9. | [103] | Analyzing Aviation Safety Reports: From Topic Modeling to Scalable Multi-Label Classification. | LDA, BMR | 66,309 (-) | Classification | ASRS |
10. | [104] | Knowledge Graph–Deep Learning: A Case Study in Question Answering in the Aviation Safety Domain | DT | 4000 (53 years) | NLP | NTSB |
11. | [105] | Deep learning for extracting word-level meaning from safety report narratives | DT | 186,000 (39 years) | NLP | ASRS |
12. | [106] | A state-based approach to modeling general aviation accidents | - | 6180 (33 years) | Classification | NTSB |
13. | [107] | Natural Language Processing of Aviation Safety Reports to Identify Inefficient Operational Patterns | DT | 4195 (23 years) | Anomaly Detection | ASRS |
14. | [108] | Predicting General Aviation Accidents Using Machine Learning Algorithms | DT, NN, RF, LR, GB | 27,786 (20 years) | Classification | NTSB |
15. | [86] | Augmenting topic findings in the NASA aviation safety reporting system using topic modeling | LDA | 100 (3 years) | NLP | ASRS |
16. | [109] | Incorporation of Pilot Factors into Risk Analysis of Civil Aviation Accidents from 2008 to 2020: A Data-Driven Bayesian Network Approach. | BN | 163 (12 years) | Classification | NTSB |
17. | [110] | Multi-concept document classification using a perceptron-like algorithm. | Perceptron-like algorithm | - | Classification | ASRS |
18. | [66] | Computational Solution to Prevent Aeronautic Accidents Cause by Wake Turbulence Using Machine Learning | KNN, NN, DT, NB | (1 year) | Anomaly Detection | EUROCONTROL |
19. | [68] | Deep learning-based approach for civil aircraft hazard identification and prediction. | RNN, KNN, LSTM, NN | 1244 (2 years) | Anomaly Detection | ACARS |
20. | [111] | Machine learning for helicopter accident analysis using supervised classification: Inference, prediction, and implications | KNN, DT, ADT, RF, NB, DNN | 13,055 (11 years) | Classification | NTSB |
21. | [83] | Advanced text mining algorithms for aerospace anomaly identification. | NN, NB, ADT, LDA, LR, SVM | 9910 (24 years) | Anomaly Detection | ASRS |
22. | [37] | The effect of COVID-19 on self-reported safety incidents in aviation: An examination of the heterogeneous effects using causal machine learning | RF | 7246 (2 years) | Anomaly Detection | ASRS |
23. | [112] | Hybrid safety analysis method based on SVM and RST: An application to carrier landing of aircraft | SVM, RST | 635 | Classification | NADC |
24. | [113] | Textual indicator extraction from aviation accident reports | SVM, DNN | 61,687 (35 years) | NLP | NTSB |
25. | [90] | Civil aviation safety evaluation based on deep belief network and principal component analysis. | DBN, PCA | 0 (5 years) | Classification, Data Reduction | |
26. | [114] | A hybrid data-driven approach to analyzing aviation incident reports | DNN, SVM | 64,573 (11 years) | NLP, Classification | ASRS |
27. | [9] | Ensemble machine learning models for aviation incident risk prediction | DNN, SVM | 64,573 (11 years) | Classification | ASRS |
28. | [115] | Bayesian network modeling of accident investigation reports for aviation safety assessment. | BN | 2243 (24 years) | NTSB | |
29. | [116] | Classification of aviation safety reports using machine learning. | RF, SVM, KNN, NN, NB | 73,000 (5 years) | Classification | ICAO |
30. | [117] | The analysis of fatal aviation accidents more than 100 dead passengers: an application of machine learning | SVM, NN, PCA, LR, DT | 220 | Classification, Data Reduction | ICAO |
31. | [118] | Semi-supervised learning with semantic knowledge extraction for improved speech recognition in air traffic control | DNN | 6004 | NLP | ASR |
32. | [119] | Predicting airline crash due to bird strike using machine learning | DT, KNN, NB | - | Classification | NTSB |
33. | [120] | Deep learning-based Time Series Forecasting of Go-around Incidents in the National Airspace System | LSTM | 3835 (24 years) | Classification | ASRS |
34. | [67] | An Innovative Approach to Modeling Aviation Safety Incidents | CNN, LSTM, SVM, RF, NB, LR | 158,070 | Classification | ASRS |
35. | [76] | Flight crash investigation using data mining techniques | K-Mean | 5268 (101 years) | Clustering | - |
36. | [121] | Applying Distilled BERT for Question Answering on ASRS Reports | BERT | 1,625,738 (43 years) | NLP | ASRS |
37. | [122] | Application of Machine Learning to mapping Primary Causal Factors in self-reported safety narratives | LSA | 7484 (4 years) | NLP | ASRS |
38. | [123] | Visual representation of safety narratives | LSA | 4497 (2 years) | NLP | ASRS |
39. | [124] | Temporal topic modeling applied to aviation safety reports: A subject matter expert review | LDA | 64,776 (14 years) | Clustering | ASRS |
40. | [38] | Data mining application on aviation accident data for predicting topmost causes of accidents | NN, SVM, KNN, DT, NB | 1610 (95 years) | Classification | FAA |
41. | [82] | Application of machine learning techniques for incident-accident classification problem in aviation safety management | SVM, NB, DT, RF | 84,262 (57 years) | Classification | NTSB |
42. | [70] | Learning Methods and Predictive Modeling to Identify Failure by Human Factors in the Aviation Industry. | NN, RF | 1105 (10 years) | Classification | ASN |
43. | [75] | Application of structural topic modeling to aviation safety data. | LDA | 386 (8 years) | NLP | ASRS, NTSB |
44. | [87] | Natural language processing-based method for clustering and analysis of aviation safety narratives. | PCA, K-Means | 13,336 (10 years) | Clustering | ASRS |
45. | [125] | A textual analysis of dangerous goods incidents on aircraft. | SVDA | 383 (10 years) | NLP | ASRS |
46. | [65] | Prediction of injuries and fatalities in aviation accidents through machine learning | DT, KNN, SVM, NN | 31,974 (27 years) | Classification | FAA |
47. | [126] | Prediction of aviation accidents using logistic regression model. | LR | 7415 | Classification | ASN |
48. | [127] | Airline Safety Data: How Predictable Are Accidents and Fatalities? | NN | 10 (61 years) | Classification | FAA |
49. | [88] | Aircraft safety analysis using clustering algorithms | K-Means | 1500 (25 years) | Clustering | - |
50. | [128] | Analysis of General Aviation fixed-wing aircraft accidents involving inflight loss of control using a state-based approach | SBA | 5726 (18 years) | Clustering | NTSB |
51. | [129] | Apriori algorithm for association rules mining in aircraft runway excursions. | AR | 434 (10 years) | Classification | ASN |
52. | [73] | Using correlation-based subspace clustering for multi-label text data classification | KNN | 15, 000 | Classification | ASRS, Reuters, 20 Newsgroups |
53. | [130] | A model fusion strategy for identifying aircraft risk using CNN and Att-BiLSTM | CNN, BLSTM, DNN | 32 (10 years) | Classification | ASRS |
54. | [41] | Using Neural Networks to predict HFACS unsafe acts from the pre-conditions of unsafe acts | NN | 523 (24 years) | Classification | ROC |
55. | [40] | A data-mining approach to identification of risk factors in safety management systems | NB | 168,227 (24 years) | Classification | ASRS |
56. | [131] | Causes and risk factors for fatal accidents in non-commercial twin engine piston general aviation aircraft. | LR | 376 (10 years) | Classification | NTSB |
57. | [132] | Examination of Aircraft Accidents That Occurred in the Last 20 Years in the World. | KNN, NB, DT, LR, GBM | 588 (20 years) | Classification | - |
58. | [85] | Classification of aviation accidents using data mining algorithms | DT, NB, SMO | 588 (20 years) | Classification | - |
59. | [93] | Predictive safety analytics: Inferring aviation accident shaping factors and causation | BN | 315 (23 years) | Classification | NTSB |
60. | [133] | Analysis of Helicopter Accidents and Certification Categories Using Machine Learning. | RF, DT | 1576 (10 years) | Classification | NTSB |
61. | [84] | Application of machine learning for aviation safety risk metric | GBM, RNN, SVM | 10,634 (20 years) | Classification | NTSB, MOR, ASIAS |
62. | [134] | Descriptive and predictive analyses of data representing aviation accidents. | DT, KNN, RF | 25,000 (4 years) | Classification | FAA |
63. | [135] | Flight Accident Modeling and Predicting Based on Least Squares Support Vector Machine | SVM | 40 years | Classification | NTSB |
64. | [55] | A novel method of aviation safety prediction based on Lstm-Rbf model | LSTM, RBF | 1 year | Classification | |
65. | [136] | On the chaos analysis and prediction of aircraft accidents based on multi-timescales | SVM | 59,511 (55 years) | Classification | NTSB |
66. | [137] | Critical parameter identification for safety events in commercial aviation using machine learning. | NB, RF, DT, KNN | 70 (6 years) | Classification | FOQA |
67. | [138] | PIA Accidents Analysis Using Naïve Bayes Classifier | NB | 22 (6 years) | Classification | - |
68. | [139] | Failing &! Falling (F&! F): Learning to Classify Accidents and Incidents in Aircraft Data | DT, NN | 137,236 | Classification | FAA |
69. | [140] | Subjectivity classification and analysis of the ASRS corpus | SVM, ADT | 140,599 (2 years) | Classification | ASRS |
70. | [141] | Using random forests to diagnose aviation turbulence | RF, KNN, LR | 778 (2 years) | Classification | NTSB |
71. | [142] | Understanding general aviation accidents in terms of safety systems. | - | 2303 (10 years) | Classification | NTSB |
72. | [143] | Using Machine Learning Models to Study Human Error-Related Factors in Aviation Accidents and Incidents | NB, RF, LR, SVM, NN | 90,000 (47 years) | Classification | NTSB |
73. | [144] | Using structural topic modeling to identify latent topics and trends in aviation incident reports. | LDA | 25,706 (5 years) | NLP | ASRS |
74. | [145] | Automated aviation occurrences categorization | NN | 12,500 (6 years) | Classification | ASRS |
75. | [89] | Understanding large text corpora via sparse machine learning. | PCA, LDA, LASSO | 20,000 (4 years) | NLP | ASRS |
76. | [146] | Sparse machine learning methods for understanding large text corpora. | PCA, LDA, LASSO | 20,000 (4 years) | NLP | ASRS |
77. | [50] | Text Mining Classification and Prediction of Aviation Accidents Based on TF-IDF-SVR Method. | LDA, SVM, NB, RF, LR | 20,000 (59 years) | NLP, Classification | NTSB |
78. | [147] | Cause identification from aviation safety incident reports via weakly supervised semantic lexicon construction | SVM | 140,599 (9 years) | Classification | ASRS |
79. | [148] | Analysis of Aviation Accidents Data. | RF, NB, KNN, DT, GBT | 19,455 (14 years) | Classification | NTSB |
80. | [149] | Document classification using nonnegative matrix factorization and underapproximation. | NMF | 21,519 (1 year) | Clustering | ASRS |
81. | [150] | Towards online prediction of safety-critical landing metrics in aviation using supervised machine learning | LSTM, NN, RF | 623 | Regression | FOQA |
82. | [151] | Identifying Incident Causal Factors to Improve Aviation Transportation Safety: Proposing a Deep Learning Approach | LSTM | 200,000 (32 years) | Classification | ASRS |
83. | [69] | Recent Experiences with Data Mining in Aviation Safety | DT | 1256 (9 years) | Classification | ASRS |
84. | [152] | Safer Approaches and Landings: A Multivariate Analysis of Critical Factors | DT, LR | 287 (16 years) | Classification | NTSB, ASRS |
85. | [74] | Multi-label asrs dataset classification using semi-supervised subspace clustering | KNN | 10,000 | Clustering | ASRS, Reuters, 20 Newsgroups |
86. | [153] | Sequential Classification of Aviation Safety Occurrences with Natural Language Processing. | LSTM, BLSTM, GRU, RNN | 27,000 (15 year) | Classification | NTSB |
87. | [57] | Classification of Survivor/Non-Survivor Passengers in Fatal Aviation Accidents: A Machine Learning Approach | NN, DT, PCA | 100 (1 year) | Classification, Data Reduction | BAAA |
References
- Dileep, M.R.; Kurien, A. Air Transport and Tourism: Interrelationship, Operations and Strategies; Routledge: London, UK, 2021. [Google Scholar]
- Netjasov, F.; Janic, M. A review of research on risk and safety modelling in civil aviation. J. Air Transp. Manag. 2008, 14, 213–220. [Google Scholar] [CrossRef]
- Aderibigbe, A. Root cause analysis of a jet fuel tanker accident. Int. J. Appl. Eng. Res. 2017, 12, 14974–14983. [Google Scholar]
- Tanguy, L.; Tulechki, N.; Urieli, A.; Hermann, E.; Raynal, C. Natural language processing for aviation safety reports: From classification to interactive analysis. Comput. Ind. 2016, 78, 80–95. [Google Scholar] [CrossRef]
- Wiener, E.L.; Nagel, D.C. Human Factors in Aviation; Gulf Professional Publishing: Woburn, MA, USA, 1988. [Google Scholar]
- Janakiraman, V.M.; Nielsen, D. Anomaly detection in aviation data using extreme learning machines. In Proceedings of the IEEE 2016 International Joint Conference on Neural Networks (IJCNN), Vancouver, BC, Canada, 24–29 July 2016; pp. 1993–2000. [Google Scholar]
- Vasigh, B.; Fleming, K.; Tacker, T. Introduction to Air Transport Economics: From Theory to Applications; Routledge: London, UK, 2018. [Google Scholar]
- Wild, G.; Baxter, G.; Srisaeng, P.; Richardson, S. Machine learning for air transport planning and management. In Proceedings of the AIAA Aviation 2022 Forum, Chicago, IL, USA, 27 June–1 July 2022; p. 3706. [Google Scholar]
- Zhang, X.; Mahadevan, S. Ensemble machine learning models for aviation incident risk prediction. Decis. Support Syst. 2019, 116, 48–63. [Google Scholar] [CrossRef]
- Ladenbauer, S. European Union Policymaking in the Field of Air Traffic Management: The Endeavor to Implement Functional Airspace Blocks in Light of Fragmented National Interests: A Case Study on the Functional Airspace Block Europe Central (FABEC); University of Zurich: Zurich, Switzerland, 2012. [Google Scholar]
- Verma, S.; Kumar, P. A Comparative Overview of Accident Forecasting Approaches for Aviation Safety. J. Phys. Conf. Ser. 2021, 1767, 012015. [Google Scholar]
- Weber, L. International Civil Aviation Organization; ICAO: Montreal, QC, Canada, 2023.
- Kasula, B.Y. Machine Learning Unleashed: Innovations, Applications, and Impact Across Industries. Int. Trans. Artif. Intell. 2017, 1, 1–7. [Google Scholar]
- Ongsulee, P. Artificial intelligence, machine learning and deep learning. In Proceedings of the IEEE 2017 15th International Conference on ICT and Knowledge Engineering (ICT&KE), Bangkok, Thailand, 22–24 November 2017; pp. 1–6. [Google Scholar]
- Mohammadpour, A.; Karan, E.; Asadi, S. Artificial intelligence techniques to support design and construction. In Proceedings of the International Symposium on Automation and Robotics in Construction (ISARC), Banff, AB, Canada, 21–24 May 2019; IAARC Publications: Montreal, QC, Canada, 2019; Volume 36, pp. 1282–1289. [Google Scholar]
- Cody, T.; Lanus, E.; Doyle, D.D.; Freeman, L. Systematic training and testing for machine learning using combinatorial interaction testing. In Proceedings of the 2022 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW), Valencia, Spain, 4–13 April 2022; pp. 102–109. [Google Scholar]
- Lanus, E.; Freeman, L.J.; Kuhn, D.R.; Kacker, R.N. Combinatorial testing metrics for machine learning. In Proceedings of the 2021 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW), Porto de Galinhas, Brazil, 12–16 April 2021; pp. 81–84. [Google Scholar]
- Kang, Z.; Catal, C.; Tekinerdogan, B. Machine learning applications in production lines: A systematic literature review. Comput. Ind. Eng. 2020, 149, 106773. [Google Scholar] [CrossRef]
- Hastie, T.; Tibshirani, R.; Friedman, J. Overview of supervised learning. In The Elements of Statistical Learning: Data Mining, Inference, and Prediction; Springer: New York, NY, USA, 2009; pp. 9–41. [Google Scholar]
- Ibrahim, K.; Sorayya, M.; Aziida, N.; Sazzli, S.K. Preliminary study on application of machine learning method in predicting survival versus non-survival after myocardial infarction in Malaysian population. Int. J. Cardiol. 2018, 273, 8. [Google Scholar] [CrossRef]
- Hastie, T.; Tibshirani, R.; Friedman, J. Unsupervised learning. In The Elements of Statistical Learning: Data Mining, Inference, and Prediction; Springer: New York, NY, USA, 2009; pp. 485–585. [Google Scholar]
- Bishop, C.M.; Nasrabadi, N.M. Pattern Recognition and Machine Learning; Springer: Berlin/Heidelberg, Germany, 2006; Volume 4. [Google Scholar]
- Chapelle, O.; Scholkopf, B.; Zien, A. Semi-supervised learning (chapelle, o. et al., eds.; 2006) [book reviews]. IEEE Trans. Neural Netw. 2009, 20, 542. [Google Scholar] [CrossRef]
- Wiering, M.A.; Van Otterlo, M. Reinforcement learning. Adapt. Learn. Optim. 2012, 12, 729. [Google Scholar]
- Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction; MIT Press: Cambridge, UK, 2018. [Google Scholar]
- James, G.; Witten, D.; Hastie, T.; Tibshirani, R. An Introduction to Statistical Learning; Springer: Berlin/Heidelberg, Germany, 2013; Volume 112. [Google Scholar]
- Alpaydin, E. Introduction to Machine Learning; MIT Press: Cambridge, UK, 2020. [Google Scholar]
- Chandola, V.; Banerjee, A.; Kumar, V. Anomaly detection: A survey. ACM Comput. Surv. 2009, 41, 1–58. [Google Scholar] [CrossRef]
- Collobert, R.; Weston, J.; Bottou, L.; Karlen, M.; Kavukcuoglu, K.; Kuksa, P. Natural language processing (almost) from scratch. J. Mach. Learn. Res. 2011, 12, 2493–2537. [Google Scholar]
- Meurers, D. Natural language processing and language learning. Encycl. Appl. Linguist. 2012, 10, 4193–4205. [Google Scholar] [CrossRef]
- Bayat, B.; Bermejo-Alonso, J.; Carbonera, J.; Facchinetti, T.; Fiorini, S.; Goncalves, P.; Jorge, V.A.; Habib, M.; Khamis, A.; Melo, K.; et al. Requirements for building an ontology for autonomous robots. Ind. Robot. Int. J. 2016, 43, 469–480. [Google Scholar] [CrossRef]
- Mahmud, F. Human-Intelligence and Machine-Intelligence Decision Governance Formal Ontology; Old Dominion University: Norfolk, VA, USA, 2018. [Google Scholar]
- Andrei, A.; Balasa, R.; Semenescu, A. Setting up new standards in aviation industry with the help of artificial intelligent–machine learning application. In Journal of Physics: Conference Series; IOP Publishing: Bristol, UK, 2022; p. 012014. [Google Scholar]
- Christopher, A.A.; alias Balamurugan, S.A. Prediction of warning level in aircraft accidents using data mining techniques. Aeronaut. J. 2014, 118, 935–952. [Google Scholar] [CrossRef]
- Christopher, A.A.; Vivekanandam, V.S.; Anderson, A.A.; Markkandeyan, S.; Sivakumar, V. Large-scale data analysis on aviation accident database using different data mining techniques. Aeronaut. J. 2016, 120, 1849–1866. [Google Scholar] [CrossRef]
- Malek, S.; Hui, C.; Aziida, N.; Cheen, S.; Toh, S.; Milow, P. Ecosystem monitoring through predictive modeling. Encycl. Bioinform. Comput. Biol. 2019, 3, 1–8. [Google Scholar] [CrossRef]
- Choi, Y.; Gibson, J.R. The effect of COVID-19 on self-reported safety incidents in aviation: An examination of the heterogeneous effects using causal machine learning. J. Saf. Res. 2023, 84, 393–403. [Google Scholar] [CrossRef] [PubMed]
- Koteeswaran, S.; Malarvizhi, N.; Kannan, E.; Sasikala, S.; Geetha, S. Data mining application on aviation accident data for predicting topmost causes for accidents. Clust. Comput. 2019, 22, 11379–11399. [Google Scholar] [CrossRef]
- Minh, D.; Wang, H.X.; Li, Y.F.; Nguyen, T.N. Explainable artificial intelligence: A comprehensive review. Artif. Intell. Rev. 2022, 55, 3503–3568. [Google Scholar] [CrossRef]
- Shi, D.; Guan, J.; Zurada, J.; Manikas, A. A data-mining approach to identification of risk factors in safety management systems. J. Manag. Inf. Syst. 2017, 34, 1054–1081. [Google Scholar] [CrossRef]
- Harris, D.; Li, W.-C. Using Neural Networks to predict HFACS unsafe acts from the pre-conditions of unsafe acts. Ergonomics 2019, 62, 181–191. [Google Scholar] [CrossRef]
- Biau, G.; Scornet, E. A random forest guided tour. Test 2016, 25, 197–227. [Google Scholar] [CrossRef]
- Aziida, N.; Malek, S.; Aziz, F.; Ibrahim, K.S.; Kasim, S. Predicting 30-day mortality after an acute coronary syndrome (ACS) using machine learning methods for feature selection, classification and visualisation. Sains Malays. 2021, 50, 753–768. [Google Scholar] [CrossRef]
- Nanyonga, A.; Joiner, K.; Turhan, U.; Wild, G. Does the Choice of Topic Modeling Technique Impact the Interpretation of Aviation Incident Reports? A Methodological Assessment. Technologies 2025, 13, 209. [Google Scholar] [CrossRef]
- Nanyonga, A.; Wild, G. Analyzing Aviation Safety Narratives with LDA, NMF and PLSA: A Case Study Using Socrata Datasets. arXiv 2025, arXiv:2501.01690. [Google Scholar] [CrossRef]
- Ahadh, A.; Binish, G.V.; Srinivasan, R. Text mining of accident reports using semi-supervised keyword extraction and topic modeling. Process. Saf. Environ. Prot. 2021, 155, 455–465. [Google Scholar] [CrossRef]
- Nanyonga, A.; Wasswa, H.; Wild, G. Topic Modeling Analysis of Aviation Accident Reports: A Comparative Study between LDA and NMF Models. In Proceedings of the IEEE 2023 3rd International Conference on Smart Generation Computing, Communication and Networking (SMART GENCON), Bangalore, India, 29–31 December 2023; pp. 1–2. [Google Scholar]
- Nanyonga, A.; Wasswa, H.; Turhan, U.; Joiner, K.; Wild, G. Comparative Analysis of Topic Modeling Techniques on ATSB Text Narratives Using Natural Language Processing. In Proceedings of the IEEE 2024 3rd International Conference for Innovation in Technology (INOCON), Bangalore, India, 1–3 March 2024; pp. 1–7. [Google Scholar]
- Nanyonga, A.; Joiner, K.; Turhan, U.; Wild, G. Applications of natural language processing in aviation safety: A review and qualitative analysis. In Proceedings of the AIAA SCITECH 2025 Forum, Orlando, FL, USA, 6–10 January 2025; p. 2153. [Google Scholar]
- Zhao, L.; Zhang, L.; Wang, J. Text Mining Classification and Prediction of Aviation Accidents Based on TF-IDF-SVR Method. In Proceedings of the IEEE 2022 4th International Conference on Frontiers Technology of Information and Computer (ICFTIC), Qingdao, China, 2–4 December 2022; pp. 322–327. [Google Scholar]
- Nanyonga, A.; Wasswa, H.; Wild, G. Comparative Study of Deep Learning Architectures for Textual Damage Level Classification. In Proceedings of the IEEE 2024 11th International Conference on Signal Processing and Integrated Networks (SPIN), Noida, India, 21–22 March 2024; pp. 421–426. [Google Scholar]
- Nanyonga, A.; Wild, G. Classification of Operational Records in Aviation Using Deep Learning Approaches. In Proceedings of the 2025 International Conference on Pervasive Computational Technologies (ICPCT), Greater Noida, India, 8–9 February 2025. [Google Scholar]
- Nanyonga, A.; Wasswa, H.; Wild, G. Aviation Safety Enhancement via NLP & Deep Learning: Classifying Flight Phases in ATSB Safety Reports. In Proceedings of the IEEE 2023 Global Conference on Information Technologies and Communications (GCITC), Bangalore, India, 1–3 December 2023; pp. 1–5. [Google Scholar]
- Van Houdt, G.; Mosquera, C.; Nápoles, G. A review on the long short-term memory model. Artif. Intell. Rev. 2020, 53, 5929–5955. [Google Scholar] [CrossRef]
- Zeng, H.; Ren, B.; Zhang, H.; Wu, J.; Liu, C.; Ren, H. A novel method of aviation safety prediction based on Lstm-Rbf model. In Proceedings of the 12th International Conference on Quality, Reliability, Risk, Maintenance, and Safety Engineering (QR2MSE 2022), Emeishan, China, 27–30 July 2022; pp. 1592–1598. [Google Scholar]
- Greenacre, M.; Groenen, P.J.; Hastie, T.; d’Enza, A.I.; Markos, A.; Tuzhilina, E. Principal component analysis. Nat. Rev. Methods Primers 2022, 2, 100. [Google Scholar] [CrossRef]
- İnan, D.; Tolga, T. Classifıcation of Survivor/Non-Survivor Passengers in Fatal Aviation Accidents: A Machine Learning Approach. Int. J. Aviat. Aeronaut. Aerosp. 2022, 9, 8. [Google Scholar] [CrossRef]
- Kitchenham, B.; Brereton, O.P.; Budgen, D.; Turner, M.; Bailey, J.; Linkman, S. Systematic literature reviews in software engineering–a systematic literature review. Inf. Softw. Technol. 2009, 51, 7–15. [Google Scholar] [CrossRef]
- Van Dinter, R.; Tekinerdogan, B.; Catal, C. Automation of systematic literature reviews: A systematic literature review. Inf. Softw. Technol. 2021, 136, 106589. [Google Scholar] [CrossRef]
- Chadegani, A.A.; Salehi, H.; Yunus, M.M.; Farhadi, H.; Fooladi, M.; Farhadi, M.; Ebrahim, N.A. A comparison between two main academic literature collections: Web of Science and Scopus databases. arXiv 2013, arXiv:1305.0377. [Google Scholar] [CrossRef]
- Burnham, J.F. Scopus database: A review. Biomed. Digit. Libr. 2006, 3, 1–8. [Google Scholar] [CrossRef]
- Khallaf, R.; Khallaf, M. Classification and analysis of deep learning applications in construction: A systematic literature review. Autom. Constr. 2021, 129, 103760. [Google Scholar] [CrossRef]
- Slikboer, R.; Muir, S.D.; Silva, S.S.M.; Meyer, D. A systematic review of statistical models and outcomes of predicting fatal and serious injury crashes from driver crash and offense history data. Syst. Rev. 2020, 9, 1–15. [Google Scholar] [CrossRef] [PubMed]
- Arockia Christopher, A.; Appavu alias Balamurugan, S. Prediction of warning level in aircraft accidents using classification techniques: An empirical study. In Intelligent Computing, Networking, and Informatics, Proceedings of the International Conference on Advanced Computing, Networking, and Informatics, Chhattisgarh, India, 12–14 June 2013; Springer: New York, NY, USA, 2014; pp. 1217–1223. [Google Scholar]
- Burnett, R.A.; Si, D. Prediction of injuries and fatalities in aviation accidents through machine learning. In Proceedings of the International Conference on Compute and Data Analysis, Lakeland, FL, USA, 19–23 May 2017; pp. 60–68. [Google Scholar]
- Leite, D.V.; Weigang, L.; Barreto, A.B.; Crespo, A.M. Computational Solution to Prevent Aeronautics Accidents Cause by Wake Turbulence Using Machine Learning. In Proceedings of the IECON 2020 The 46th Annual Conference of the IEEE Industrial Electronics Society, Singapore, 18–21 October 2020; pp. 124–129. [Google Scholar]
- Shi, D.; Cao, S.; Zurada, J.; Guan, J. An Innovative Approach to Modeling Aviation Safety Incidents. In Proceedings of the 55th Hawaii International Conference on System Sciences, Maui, HI, USA, 4–7 January 2022. [Google Scholar]
- Zhou, D.; Zhuang, X.; Zuo, H.; Wang, H.; Yan, H. Deep learning-based approach for civil aircraft hazard identification and prediction. IEEE Access 2020, 8, 103665–103683. [Google Scholar] [CrossRef]
- Harris, E.; Bloedorn, E.; Rothleder, N.; Chaudhuri, S.; Dayal, U. Recent experiences with data mining in aviation safety. In Proceedings of the Special Interest Group on Management of Data, Data Mining and Knowledge Discovery (SIGMOD-DMKD) Workshop, Seattle, WA, USA, 2–4 June1998. [Google Scholar]
- Nogueira, R.P.; Melicio, R.; Valério, D.; Santos, L.F. Learning methods and predictive modeling to identify failure by human factors in the aviation industry. Appl. Sci. 2023, 13, 4069. [Google Scholar] [CrossRef]
- Dhruv, A.J.; Patel, R.; Doshi, N. Python: The most advanced programming language for computer science applications. In Proceedings of the International Conference on Culture Heritage, Education, Sustainable Tourism, and Innovation Technologies (CESIT 2020), Online, 17 September 2020; pp. 292–299. [Google Scholar]
- Hall, M.; Frank, E.; Holmes, G.; Pfahringer, B.; Reutemann, P.; Witten, I.H. The WEKA data mining software: An update. ACM SIGKDD Explor. Newsl. 2009, 11, 10–18. [Google Scholar] [CrossRef]
- Ahmed, M.S.; Khan, L.; Rajeswari, M. Using correlation based subspace clustering for multi-label text data classification. In Proceedings of the 2010 22nd IEEE International Conference on Tools with Artificial Intelligence, Arras, France, 27–29 October 2010; pp. 296–303. [Google Scholar]
- Ahmed, M.S.; Khan, L.; Oza, N.C.; Rajeswari, M. Multi-label ASRS Dataset Classification Using Semi Supervised Subspace Clustering. In Proceedings of the CIDU, Mountain View, CA, USA, 5–6 October 2010; pp. 285–299. [Google Scholar]
- Rose, R.L.; Puranik, T.G.; Mavris, D.N.; Rao, A.H. Application of structural topic modeling to aviation safety data. Reliab. Eng. Syst. Saf. 2022, 224, 108522. [Google Scholar] [CrossRef]
- Sharma, S.; Sabitha, A.S. Flight crash investigation using data mining techniques. In Proceedings of the IEEE 2016 1st India International Conference on Information Processing (IICIP), Delhi, India, 12–14 August 2016; pp. 1–7. [Google Scholar]
- Paul, S.; Purkaystha, B.S.; Das, P. Nlp Tools Used in Civil Aviation: A Survey. Int. J. Adv. Res. Comput. Sci. 2018, 9, 109–114. [Google Scholar] [CrossRef]
- Pimm, C.; Raynal, C.; Tulechki, N.; Hermann, E.; Caudy, G.; Tanguy, L. Natural Language Processing (NLP) tools for the analysis of incident and accident reports. In Proceedings of the International Conference on Human-Computer Interaction in Aerospace (HCI-Aero), Brussels, Belgium, 13 September 2012. [Google Scholar]
- Ackley, J.L.; Puranik, T.G.; Mavris, D. A supervised learning approach for safety event precursor identification in commercial aviation. In Proceedings of the AIAA Aviation 2020 Forum, Virtual, 15–19 June 2020; p. 2880. [Google Scholar]
- Nanyonga, A.; Joiner, K.; Turhan, U.; Wild, G. Semantic Topic Modeling of Aviation Safety Reports: A Comparative Analysis Using BERTopic and PLSA. Aerospace 2025, 12, 551. [Google Scholar] [CrossRef]
- Basora, L.; Olive, X.; Dubot, T. Recent advances in anomaly detection methods applied to aviation. Aerospace 2019, 6, 117. [Google Scholar] [CrossRef]
- Rukabu, O. Application of Machine Learning Techniques for Incident-Accident Classification Problem in Aviation Safety Management. Master’s Thesis, University of Rwanda, Kigali, Rwanda, 2021. [Google Scholar]
- Bluvband, Z.; Porotsky, S. Advanced Text Mining Algorithms for Aerospace Anomaly Identification; Taylor & Francis Group: Boca Raton, FL, USA, 2012. [Google Scholar]
- Bati, F.; Withington, L. Application of machine learning for aviation safety risk metric. In Proceedings of the 2019 IEEE/AIAA 38th Digital Avionics Systems Conference (DASC), Mission Bay, the Hilton San Diego Resort and Spa, San Diego, CA, USA, 8–12 September 2019; pp. 1–9. [Google Scholar]
- Kuşkapan, E.; Sahraei, M.A.; Çodur, M.Y. Classification of aviation accidents using data mining algorithms. Balk. J. Electr. Comput. Eng. 2021, 10, 10–15. [Google Scholar] [CrossRef]
- Paradis, C.; Kazman, R.; Davies, M.; Hooey, B. Augmenting topic finding in the NASA Aviation Safety Reporting System using topic modeling. In Proceedings of the AIAA Scitech 2021 Forum, Virtual, 11–15 and 19–21 January 2021; p. 1981. [Google Scholar]
- Rose, R.L.; Puranik, T.G.; Mavris, D.N. Natural language processing based method for clustering and analysis of aviation safety narratives. Aerospace 2020, 7, 143. [Google Scholar] [CrossRef]
- Čokorilo, O.; De Luca, M.; Dell’Acqua, G. Aircraft safety analysis using clustering algorithms. J. Risk Res. 2014, 17, 1325–1340. [Google Scholar] [CrossRef]
- El Ghaoui, L.; Pham, V.; Li, G.C.; Duong, V.A.; Srivastava, A.; Bhaduri, K. Understanding large text corpora via sparse machine learning. Stat. Anal. Data Mining ASA Data Sci. J. 2013, 6, 221–242. [Google Scholar] [CrossRef]
- Ni, X.; Wang, H.; Che, C.; Hong, J.; Sun, Z. Civil aviation safety evaluation based on deep belief network and principal component analysis. Saf. Sci. 2019, 112, 90–95. [Google Scholar] [CrossRef]
- Nanyonga, A.; Wild, G. Impact of Dataset Size & Data Source on Aviation Safety Incident Prediction Models with Natural Language Processing. In Proceedings of the IEEE 2023 Global Conference on Information Technologies and Communications (GCITC), Bangalore, India, 1–3 December 2023; pp. 1–7. [Google Scholar]
- Nanyonga, A.; Joiner, K.; Turhan, U.; Wild, G. Natural Language Processing for Aviation Safety: Predicting Injury Levels from Incident Reports in Australia. Modelling 2025, 6, 40. [Google Scholar] [CrossRef]
- Ancel, E.; Shih, A.T.; Jones, S.M.; Reveley, M.S.; Luxhøj, J.T.; Evans, J.K. Predictive safety analytics: Inferring aviation accident shaping factors and causation. J. Risk Res. 2015, 18, 428–451. [Google Scholar] [CrossRef]
- Kilkenny, M.F.; Robinson, K.M. Data quality:“Garbage in–garbage out”. Health Inf. Manag. J. 2018, 47, 103–105. [Google Scholar] [CrossRef]
- Nanyonga, A.; Wasswa, H.; Joiner, K.; Turhan, U.; Wild, G. A Multi-Head Attention-Based Transformer Model for Predicting Causes in Aviation Incident. Modelling 2025, 6, 27. [Google Scholar] [CrossRef]
- Nanyonga, A.; Wasswa, H.; Joiner, K.; Turhan, U.; Wild, G. Explainable Supervised Learning Models for Aviation Predictions in Australia. Aerospace 2025, 12, 223. [Google Scholar] [CrossRef]
- Thai-Nghe, N.; Nghi, D.; Schmidt-Thieme, L. Learning optimal threshold on resampling data to deal with class imbalance. In Proceedings of the IEEE RIVF International Conference on Computing and Telecommunication Technologies, Hanoi, Vietnam, 1–4 November 2010; pp. 71–76. [Google Scholar]
- Dwork, C. Differential privacy. In Proceedings of the International Colloquium on Automata, Languages, and Programming, Venice, Italy, 10–14 July 2006; pp. 1–12. [Google Scholar]
- Yao, A.C. Protocols for secure computations. In Proceedings of the IEEE 23rd Annual Symposium on Foundations of Computer Science (SFCS 1982), Chicago, IL, USA, 3–5 November 1982; pp. 160–164. [Google Scholar]
- Christopher, A.A.; alias Balamurugan, S.A. Feature selection techniques for prediction of warning level in aircraft accidents. In Proceedings of the IEEE 2013 International Conference on Advanced Computing and Communication Systems, Coimbatore, India, 19–21 December 2013; pp. 1–6. [Google Scholar]
- Srivastava, A.N. Enabling the discovery of recurring anomalies in aerospace problem reports using high-dimensional clustering techniques. In Proceedings of the 2006 IEEE Aerospace Conference, Big Sky, MT, USA, 4–11 March 2006; p. 17. [Google Scholar]
- Alkhamisi, A.O.; Mehmood, R. An ensemble machine and deep learning model for risk prediction in aviation systems. In Proceedings of the IEEE 2020 6th Conference on Data Science and Machine Learning Applications (CDMA), Riyadh, Saudi Arabia, 4–5 March 2020; pp. 54–59. [Google Scholar]
- Agovic, A.; Shan, H.; Banerjee, A. Analyzing Aviation Safety Reports: From Topic Modeling to Scalable Multi-Label Classification. In Proceedings of the CIDU, Mountain View, CA, USA, 5–6 October 2012; pp. 83–97. [Google Scholar]
- Agarwal, A.; Gite, R.; Laddha, S.; Bhattacharyya, P.; Kar, S.; Ekbal, A.; Thind, P.; Zele, R.; Shankar, R. Knowledge graph-deep learning: A case study in question answering in aviation safety domain. arXiv 2022, arXiv:2205.15952. [Google Scholar]
- Chanen, A. Deep learning for extracting word-level meaning from safety report narratives. In Proceedings of the IEEE 2016 Integrated Communications Navigation and Surveillance (ICNS), Herndon, VA, USA, 19–21 April 2016; pp. 5D2-1–5D2-15. [Google Scholar]
- Rao, A.H.; Marais, K. A state-based approach to modeling general aviation accidents. Reliab. Eng. Syst. Saf. 2020, 193, 106670. [Google Scholar] [CrossRef]
- Miyamoto, A.; Bendarkar, M.V.; Mavris, D.N. Natural language processing of aviation safety reports to identify inefficient operational patterns. Aerospace 2022, 9, 450. [Google Scholar] [CrossRef]
- Baugh, B.S. Predicting General Aviation Accidents Using Machine Learning Algorithms; Embry-Riddle Aeronautical University: Daytona Beach, FL, USA, 2020. [Google Scholar]
- Zhang, C.; Liu, C.; Liu, H.; Jiang, C.; Fu, L.; Wen, C.; Cao, W. Incorporation of Pilot Factors into Risk Analysis of Civil Aviation Accidents from 2008 to 2020: A Data-Driven Bayesian Network Approach. Aerospace 2022, 10, 9. [Google Scholar] [CrossRef]
- Woolam, C.; Khan, L. Multi-concept document classification using a perceptron-like algorithm. In Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, Sydney, NSW, Australia, 9–12 December 2008; pp. 570–574. [Google Scholar]
- Xu, Z.; Saleh, J.H.; Subagia, R. Machine learning for helicopter accident analysis using supervised classification: Inference, prediction, and implications. Reliab. Eng. Syst. Saf. 2020, 204, 107210. [Google Scholar] [CrossRef]
- Dai, Y.; Tian, J.; Rong, H.; Zhao, T. Hybrid safety analysis method based on SVM and RST: An application to carrier landing of aircraft. Saf. Sci. 2015, 80, 56–65. [Google Scholar] [CrossRef]
- Hu, X.; Wu, J.; He, J. Textual indicator extraction from aviation accident reports. In Proceedings of the AIAA Aviation 2019 Forum, Dallas, TX, USA, 17–21 June 2019; p. 2939. [Google Scholar]
- Zhang, X.; Mahadevan, S. A hybrid data-driven approach to analyze aviation incident reports. In Proceedings of the 2018 Aviation Technology, Integration, and Operations Conference, Atlanta, GA, USA, 25–29 June 2018; p. 3982. [Google Scholar]
- Zhang, X.; Mahadevan, S. Bayesian network modeling of accident investigation reports for aviation safety assessment. Eng. Syst. Saf. 2021, 209, 107371. [Google Scholar] [CrossRef]
- de Vries, V. Classification of aviation safety reports using machine learning. In Proceedings of the IEEE 2020 International Conference on Artificial Intelligence and Data Analytics for Air Transportation (AIDA-AT), Singapore, 3–4 February 2020; pp. 1–6. [Google Scholar]
- İnan, T.T.; Gökmen İnan, N. The analysis of fatal aviation accidents more than 100 dead passengers: An application of machine learning. Opsearch 2022, 59, 1377–1395. [Google Scholar] [CrossRef]
- Srinivasamurthy, A.; Motlicek, P.; Himawan, I.; Szaszak, G.; Oualil, Y.; Helmke, H. Semi-supervised learning with semantic knowledge extraction for improved speech recognition in air traffic control. In Proceedings of the Interspeech, Stockholm, Sweden, 20–24 August 2017; pp. 2406–2410. [Google Scholar]
- Lakshman, N.; Raj, R.; Mukkamala, Y. Bird strike analysis of jet engine fan blade. In Proceedings of the 2014 IEEE Aerospace Conference, Big Sky, MT, USA, 1–8 March 2014; pp. 1–7. [Google Scholar]
- Subramanian, S.V.; Rao, A.H. Deep-learning based time series forecasting of go-around incidents in the national airspace system. In Proceedings of the 2018 AIAA Modeling and Simulation Technologies Conference, Kissimmee, FL, USA, 8–12 January 2018; p. 0424. [Google Scholar]
- Kierszbaum, S.; Lapasset, L. Applying distilled BERT for question answering on ASRS reports. In Proceedings of the IEEE 2020 New Trends in Civil Aviation (NTCA), Prague, Czech Republic, 23–24 November 2020; pp. 33–38. [Google Scholar]
- Robinson, S.D.; Irwin, W.J.; Kelly, T.K.; Wu, X.O. Application of machine learning to mapping primary causal factors in self reported safety narratives. Saf. Sci. 2015, 75, 118–129. [Google Scholar] [CrossRef]
- Robinson, S. Visual representation of safety narratives. Saf. Sci. 2016, 88, 123–128. [Google Scholar] [CrossRef]
- Robinson, S.D. Temporal topic modeling applied to aviation safety reports: A subject matter expert review. Saf. Sci. 2019, 116, 275–286. [Google Scholar] [CrossRef]
- Walton, R.O.; Marion, J.W. A textual analysis of dangerous goods incidents on aircraft. Transp. Res. Procedia 2020, 51, 152–159. [Google Scholar] [CrossRef]
- Mathur, P.; Khatri, S.K.; Sharma, M. Prediction of aviation accidents using logistic regression model. In Proceedings of the IEEE 2017 International Conference on Infocom Technologies and Unmanned Systems (Trends and Future Directions)(ICTUS), Dubai, United Arab Emirates, 18–20 December 2017; pp. 725–728. [Google Scholar]
- Rogers, P.; Pavur, R. Airline Safety Data: How Predictable Are Accidents and Fatalities? Fed. Bus. Discipl. J. 2019, 8, 19–29. [Google Scholar]
- Majumdar, N.; Marais, K.; Rao, A. Analysis of General Aviation fixed-wing aircraft accidents involving inflight loss of control using a state-based approach. Aviation 2021, 25, 283–294. [Google Scholar] [CrossRef]
- Distefano, N.; Leonardi, S. Apriori algorithm for association rules mining in aircraft runway excursions. Civ. Eng. Archit. 2020, 8, 206–217. [Google Scholar] [CrossRef]
- Zhou, D.; Zhuang, X.; Zuo, H.; Cai, J.; Zhao, X.; Xiang, J. A model fusion strategy for identifying aircraft risk using CNN and Att-BiLSTM. Reliab. Eng. Syst. Saf. 2022, 228, 108750. [Google Scholar] [CrossRef]
- Boyd, D.D. Causes and risk factors for fatal accidents in non-commercial twin engine piston general aviation aircraft. Accid. Anal. Prev. 2015, 77, 113–119. [Google Scholar] [CrossRef]
- Kuşkapan, E.; Çodur, M.Y. Examination of Aircraft Accidents That Occurred in the Last 20 Years in the World. Düzce Üniversitesi Bilim Ve Teknol. Derg. 2021, 9, 174–188. [Google Scholar] [CrossRef]
- Mangortey, E.; Speirs, A.; Bendarkar, M.V.; Bui, V. Analysis of Helicopter Accidents and Certification Categories Using Machine Learning. In Proceedings of the AIAA SCITECH 2022 Forum, San Diego, CA, USA, 3–7 January 2022; p. 0249. [Google Scholar]
- Babič, F.; Lukáčová, A.; Paralič, J. Descriptive and predictive analyses of data representing aviation accidents. In Proceedings of the New Research in Multimedia and Internet Systems, Wroclaw, Poland, 17–19 September 2014; pp. 181–190. [Google Scholar]
- Xusheng, G.; Jingshun, D.; Wei, C. Flight accident modeling and predicting based on least squares support vector machine. In Proceedings of the IEEE 2010 International Conference on Educational and Information Technology, Chongqing, China, 17–19 September 2010; pp. V3-256–V253-259. [Google Scholar]
- Yu, H.; Li, X. On the chaos analysis and prediction of aircraft accidents based on multi-timescales. Phys. A Stat. Mech. Its Appl. 2019, 534, 120828. [Google Scholar] [CrossRef]
- Lee, H.; Madar, S.; Sairam, S.; Puranik, T.G.; Payan, A.P.; Kirby, M.; Pinon, O.J.; Mavris, D.N. Critical parameter identification for safety events in commercial aviation using machine learning. Aerospace 2020, 7, 73. [Google Scholar] [CrossRef]
- Bhanbhro, J.; Yousuf, F.; Narejo, S.; Furqan, M. PIA Accidents Analysis Using Naïve Bayes Classifier. In Proceedings of the International Conference on Computational Sciences and Technologies (INCCST’20), Virtual, 1–4 July 2020; pp. 17–19. [Google Scholar]
- Carson, J.; Hollingsworth, K.; Datta, R.; Segev, A. Failing &! falling (f&! f): Learning to classify accidents and incidents in aircraft data. In Proceedings of the 2019 IEEE International Conference on Big Data (Big Data), Los Angeles, CA, USA, 9–12 December 2019; pp. 4357–4365. [Google Scholar]
- Switzer, J.; Khan, L.; Muhaya, F.B. Subjectivity classification and analysis of the ASRS corpus. In Proceedings of the 2011 IEEE International Conference on Information Reuse & Integration, Las Vegas, NV, USA, 3–5 August 2011; pp. 160–165. [Google Scholar]
- Williams, J.K. Using random forests to diagnose aviation turbulence. Mach. Learn. 2014, 95, 51–70. [Google Scholar] [CrossRef]
- Fuller, J.G.; Hook, L.R. Understanding general aviation accidents in terms of safety systems. In Proceedings of the 2020 AIAA/IEEE 39th Digital Avionics Systems Conference (DASC), San Antonio, TX, USA, 11–15 October 2020; pp. 1–9. [Google Scholar]
- Kazi, N.M.S. Using Machine Learning Models to Study Human Error Related Factors in Aviation Accidents and Incidents. Doctoral’s Thesis, National College of Ireland, Dublin, Ireland, 2020. [Google Scholar]
- Kuhn, K.D. Using structural topic modeling to identify latent topics and trends in aviation incident reports. Res. Part C Emerg. Technol. 2018, 87, 105–122. [Google Scholar] [CrossRef]
- Marev, K.; Georgiev, K. Automated aviation occurrences categorization. In Proceedings of the IEEE 2019 International Conference on Military Technologies (ICMT), Brno, Czech Republic, 30–31 May 2019; pp. 1–5. [Google Scholar]
- El Ghaoui, L.; Li, G.-C.; Duong, V.-A.; Pham, V.; Srivastava, A.N.; Bhaduri, K. Sparse machine learning methods for understanding large text corpora. In Proceedings of the CIDU, Mountain View, CA, USA, 19–21 October 2011; pp. 159–173. [Google Scholar]
- Abedin, M.; Ng, V.; Khan, L.J. Cause identification from aviation safety incident reports via weakly supervised semantic lexicon construction. Artif. Intell. Res. 2010, 38, 569–631. [Google Scholar] [CrossRef]
- Kerfoot, D.; Hofmann, M. Analysis of Aviation Accidents Data. In CERI 2018 Proceedings; Civil Engineering Research Association of Ireland: Dublin, Ireland, 2018; pp. 350–355. [Google Scholar]
- Berry, M.W.; Gillis, N.; Glineur, F. Document classification using nonnegative matrix factorization and underapproximation. In Proceedings of the 2009 IEEE International Symposium on Circuits and Systems, Iasi, Romania, 9–10 July 2009; pp. 2782–2785. [Google Scholar]
- Puranik, T.G.; Rodriguez, N.; Mavris, D.N. Towards online prediction of safety-critical landing metrics in aviation using supervised machine learning. Transp. Res. Part C Emerg. Technol. 2020, 120, 102819. [Google Scholar] [CrossRef]
- Dong, T.; Yang, Q.; Ebadi, N.; Luo, X.R.; Rad, P.J. Identifying incident causal factors to improve aviation transportation safety: Proposing a deep learning approach. J. Adv. Transp. 2021, 2021, 5540046. [Google Scholar] [CrossRef]
- Heinrich, D.J. Safer* Approaches and Landings: A Multivariate Analysis of Critical Factors; Capella University: Minneapolis, MN, USA, 2004. [Google Scholar]
- Nanyonga, A.; Wasswa, H.; Turhan, U.; Molloy, O.; Wild, G. Sequential classification of aviation safety occurrences with natural language processing. In Proceedings of the AIAA AVIATION 2023 Forum, San Diego, CA, USA, 12–16 June 2023; p. 4325. [Google Scholar]
Search Criteria | Initial No. | Duplicates | Irrelevant | Final |
---|---|---|---|---|
Nasa asrs | 9 | - | 6 | 3 |
ICAO accident database | 15 | - | 13 | 2 |
Accident safety network | 8 | - | 7 | 1 |
NTSB Aviation accident database | 86 | - | 78 | 8 |
ICAO safety occurrence database | 4 | - | 3 | 1 |
Australian transport safety | 29 | - | 29 | 0 |
Aviation Accident Analysis AI | 12 | - | 12 | 0 |
Aviation safety reporting system | 159 | 2 | 132 | 25 |
Aviation Accident Analysis ML | 61 | 5 | 44 | 10 |
Aviation accident database | 457 | 10 | 440 | 7 |
Aviation accident analysis | 2992 | 16 | 2972 | 4 |
Scopus article Total | 3832 | 33 | 3736 | 61 |
Backward search (Google Scholar) | - | - | - | 26 |
Total | 87 |
Quality Control Questions | Scope |
---|---|
1. Is the research objective clearly defined? | All |
2. Is the context of the research clearly defined? | All |
3. Does the study bring value to academia or industry? | All |
4. Are the findings clearly stated and supported by the results? | All |
5. Are limitations explicitly mentioned and analyzed? | All |
6. Is the methodology clearly defined and justified? | All |
7. Is the experiment clearly defined and justified? | All |
8. Has the utilization of ML techniques been comprehensively described and justified? | ML |
9. Are the chosen ML types and tasks appropriate for addressing the research questions? | ML |
10. Is there clarity on the ML algorithms employed, including their rationale and suitability? | ML |
11. Have biases and ethical considerations in the application of ML techniques been addressed? | ML |
12. Are the implications of utilizing ML in post-accident analysis discussed? | ML |
13. Is the integration of ML insights with safety measures thoroughly explored and elucidated? | ML |
Author | Count | First | Last | Affiliation |
---|---|---|---|---|
Khan, L | 5 | 2008 | 2020 | Department of Computer Science, The University of Texas at Dallas |
Mavris, DN | 5 | 2020 | 2022 | Georgia Institute of Technology, Atlanta, GA 30332, United States |
Christopher, AA | 4 | 2013 | 2022 | Research scholar Anna University Tamil Nadu, India |
Puranik, TG | 4 | 2020 | 2022 | Universities Space Research Association, NASA Ames Research Center, Moffett Field, CA, USA |
alias Balamurugan, SA | 3 | 2013 | 2022 | Research Scholar Anna University, Tamilnadu, India |
Mahadevan, S | 3 | 2015 | 2021 | Department of Civil and Environmental Engineering, Vanderbilt University, Nashville, TN, USA |
Rao, AH | 3 | 2018 | 2020 | Collins Aerospace, 400 Collins RD, MS 124–319 Cedar Rapids, USA |
Robinson, SD | 3 | 2015 | 2019 | Parks College of Engineering, Aviation and Technology, Saint Louis University, Saint Louis, MO 63103, USA |
Zhang, X | 3 | 2015 | 2018 | Department of Civil and Environmental Engineering, School of Engineering, Vanderbilt University, Nashville, TN, 37235, USA |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Nanyonga, A.; Turhan, U.; Wild, G. A Systematic Review of Machine Learning Analytic Methods for Aviation Accident Research. Sci 2025, 7, 124. https://doi.org/10.3390/sci7030124
Nanyonga A, Turhan U, Wild G. A Systematic Review of Machine Learning Analytic Methods for Aviation Accident Research. Sci. 2025; 7(3):124. https://doi.org/10.3390/sci7030124
Chicago/Turabian StyleNanyonga, Aziida, Ugur Turhan, and Graham Wild. 2025. "A Systematic Review of Machine Learning Analytic Methods for Aviation Accident Research" Sci 7, no. 3: 124. https://doi.org/10.3390/sci7030124
APA StyleNanyonga, A., Turhan, U., & Wild, G. (2025). A Systematic Review of Machine Learning Analytic Methods for Aviation Accident Research. Sci, 7(3), 124. https://doi.org/10.3390/sci7030124