The timely identification of probable causes in aviation incidents is crucial for averting future tragedies and safeguarding passengers. Typically, investigators rely on flight data recorders; however, delays in data retrieval or damage to the devices can impede progress. In such instances, experts resort
[...] Read more.
The timely identification of probable causes in aviation incidents is crucial for averting future tragedies and safeguarding passengers. Typically, investigators rely on flight data recorders; however, delays in data retrieval or damage to the devices can impede progress. In such instances, experts resort to supplementary sources like eyewitness testimonies and radar data to construct analytical narratives. Delays in this process have tangible consequences, as evidenced by the Boeing 737 MAX accidents involving Lion Air and Ethiopian Airlines, where the same design flaw resulted in catastrophic outcomes. To streamline investigations, scholars advocate for natural language processing (NLP) and topic modelling methodologies, which organize pertinent aviation terms for rapid analysis. However, existing techniques lack a direct mechanism for deducing probable causes. To bridge this gap, this study trains and evaluates the performance of a transformer-based model in predicting the likely causes of aviation incidents based on long-input raw text analysis narratives. Unlike traditional models that classify incidents into predefined categories such as human error, weather conditions, or maintenance issues, the trained model infers and generates the likely cause in a human-like narrative, providing a more interpretable and contextually rich explanation. By training the model on comprehensive aviation incident investigation reports like those from the National Transportation Safety Board (NTSB), the proposed approach exhibits promising performance across key evaluation metrics, including BERTScore with
Precision: (
M = 0.749,
SD = 0.109),
Recall: (
M = 0.772,
SD = 0.101),
F1-score: (
M = 0.758,
SD = 0.097), Bilingual Evaluation Understudy (BLEU) with (
M = 0.727,
SD = 0.33), Latent Semantic Analysis (LSA similarity) with (
M = 0.696,
SD = 0.152), and Recall Oriented Understudy for Gisting Evaluation (ROUGE) with a precision, recall and F-measure scores of (
M = 0.666,
SD = 0.217), (
M = 0.610,
SD = 0.211), (
M = 0.618,
SD = 0.192) for rouge-1, (
M = 0.488,
SD = 0.264), (
M = 0.448,
SD = 0.257),
M = 0.452,
SD = 0.248) for rouge-2 and (
M = 0.602,
SD = 0.241), (
M = 0.553,
SD = 0.235), (
M = 0.5560,
SD = 0.220) for rouge-L, respectively. This demonstrates its potential to expedite investigations by promptly identifying probable causes from analysis narratives, thus bolstering aviation safety protocols.
Full article