Abstract
The causal analysis of historical aviation accidents documented in investigation reports is important for the design, manufacture, operation, and maintenance of aircraft. However, given that most accident data are unstructured or semi-structured, identifying and extracting causal information remain labor intensive and inefficient. This gap is further deepened by tasks, such as system identification from component information, that require extensive domain-specific knowledge. In addition, there is a consequential demand for causation pattern analysis across multiple accidents and the extraction of critical causation chains. To bridge those gaps, this study proposes an aviation accident causation and relation analysis framework that integrates prompt engineering with a retrieval-augmented generation approach. A total of 343 real-world accident reports from the NTSB were analyzed to extract causation factors and their interrelations. An innovative causation classification schema was also developed to cluster the extracted causations. The clustering accuracy for the four main causation categories—Human, Aircraft, Environment, and Organization—reached 0.958, 0.865, 0.979, and 0.903, respectively. Based on the clustering results, a causation knowledge graph for aviation accidents was constructed, and by designing a set of safety evaluation indicators, “pilot—decision error” and “landing gear system malfunction” are identified as high-risk causations. For each high-risk causation, critical combinations of causation chains are identified and “Aircraft operator—policy or procedural deficiency/pilot—procedural violation/Runway contamination → pilot—decision error → pilot procedural violation/32 landing gear/57 wings” was identified as the critical causation combinations for “pilot—decision error”. Finally, safety recommendations for organizations and personnel were proposed based on the analysis results, which offer practical guidance for aviation risk prevention and mitigation. The proposed approach demonstrates the potential of combining AI techniques with domain knowledge to achieve scalable, data-driven causation analysis and strengthen proactive safety decision-making in aviation.
Keywords:
aviation accident; causation analysis; airworthiness; LLM; RAG; prompt engineering; knowledge graph