From Text to Safety: A Novel Framework for Mining Unsafe Aviation Events Using Advanced Neural Network and Feature Network

Wang, Qiang; Xia, Ruicong; Yu, Jiayang; Liu, Qiuhan; Tong, Sirong; Xu, Ziling

doi:10.3390/aerospace11100843

Open AccessArticle

From Text to Safety: A Novel Framework for Mining Unsafe Aviation Events Using Advanced Neural Network and Feature Network

by

Qiang Wang

,

Ruicong Xia

^*,

Jiayang Yu

,

Qiuhan Liu

,

Sirong Tong

and

Ziling Xu

Equipment Management and UAV Engineering College, Air Force Engineering University, Xi’an 710051, China

^*

Author to whom correspondence should be addressed.

Aerospace 2024, 11(10), 843; https://doi.org/10.3390/aerospace11100843

Submission received: 25 July 2024 / Revised: 20 September 2024 / Accepted: 12 October 2024 / Published: 12 October 2024

(This article belongs to the Section Air Traffic and Transportation)

Download

Browse Figures

Versions Notes

Abstract

The rapid growth of the aviation industry highlights the need for strong safety management. Analyzing data on unsafe aviation events is crucial for preventing risks. This paper presents a new method that integrates the Transformer network model, clustering analysis, and feature network modeling to analyze Chinese text data on unsafe aviation events. Initially, the Transformer model is used to generate summaries of event texts, and the performance of three pre-trained Chinese models is evaluated and compared. Next, the Jieba tool is applied to segment both summarized and original texts to extract key features of unsafe events and prove the effectiveness of the pre-trained Transformer model in simplifying lengthy and redundant original texts. Then, cluster analysis based on text similarity categorizes the extracted features. By solving the correlation matrix of these features, this paper constructs a feature network for unsafe aviation events. The network’s global and individual metrics are calculated and then used to identify key feature nodes, which alert aviation professionals to focus more on the decision-making process for safety management. Based on the established network and these metrics, a data-driven hidden danger warning strategy is proposed and illustrated. Overall, the proposed method can effectively analyze Chinese texts of unsafe aviation events and provide a basis for improving aviation safety management.

Keywords:

unsafe aviation event; transformer network model; text segmentation; cluster analysis; feature network analysis

1. Introduction

Aviation equipment is a fundamental element for the completion of flight missions, and ensuring flight safety has always been a matter of great concern. Aviation safety management is a crucial tool to achieve this goal. The International Civil Aviation Organization (ICAO) states in its ‘Safety Management Manual’ that accurate and timely reporting of information related to incidents or accidents is a fundamental activity of safety management [1]. Moreover, historical accident information and lessons learned from similar products can be used to identify all potential hazards throughout the aircraft’s lifecycle. The effective utilization of unsafe event records, such as faults, incidents, and accidents that occur during the service phase of aircraft, is crucial for discovering latent risk factors and patterns. It holds significant importance for improving the safety level of aviation flights, serving as an important data support for ensuring flight safety and identifying risk hazards.

Unsafe aviation event data are typically unstructured text data with inconsistent lengths among various event records. Common data sources include safety reports collected by the Aviation Safety Reporting System (ASRS) in the United States [2], monthly updates of civil aviation accident investigation reports from the National Transportation Safety Board (NTSB) [3], and the Aviation Safety Network (ASN) [4]. Currently, most research utilizes machine learning-based Natural Language Processing (NLP) methods to analyze and process aviation unsafe datasets. Rodrigo L. Rose and his team conducted Structural Topic Modeling (STM) analysis on standardized event narrative text sets collected from ASRS and unstructured accident and incident texts from NTSB, demonstrating the effectiveness of this approach and providing decision-makers with valuable reference information [5]. João S. D. Garcia and colleagues focused on runway excursion accidents commonly occurring in flight and analyzing and predicting the severity of runway excursions using a random forest model on ASRS safety disclaimer report texts. This method exhibited significantly higher prediction accuracy compared to Naive Bayes and Gradient Boosting methods [6]. Tomás Madeira and his team employed semi-supervised label propagation and supervised support vector machine methods to model ASN text data, demonstrating that their proposed approach can effectively predict safety accidents caused by human factors [7]. In general, as explainable algorithms, machine learning methods are effective in dealing with aviation data, but their complexity and dependency on data-structure wellness cannot be ignored.

Different from traditional machine learning methods, deep learning methods, specifically neural networks, as semi-explainable or non-explainable algorithms, are rapidly extending applications in the aviation field, such as weather forecasting [8], aviation travel question and answer systems [9], fretting fatigue predictions [10], etc. With the advantages of less data-structure dependency, better generalization performance, and standardized procedures, deep learning approaches have also been utilized in aviation safety report analysis. Xiaoge Zhang and his team combined Word Embedding with Long Short-term Memory (LSTM) neural networks to establish a classification model for NTSB data, predicting unsafe events such as accidents, aircraft damage, and casualties. This approach facilitates understanding of the relationships between different event sequences, unsafe events, accident probabilities, aircraft damage, or casualties [11]. Tianxi Dong et al. proposed models that can automate causal factor identification of ASRS incident reports based on deep recurrent neural networks, and results proved higher accuracy and adaptability than traditional machine learning methods [12] Monika, Verma, S. and Kumar, P. provided a comparative analysis of time-series-based machine learning and deep learning methods to predict aviation accidents based on ASRS database, and results showed that bidirectional LSTM was superior among several time-series models [13]. Sequoia R. Andrade and Hannah S. Walsh developed a safety-informed aerospace-specific language model by pre-training a transformer network model using datasets from ASRS and NTSB, which can be leveraged in NLP tasks of named-entity recognition, relation detection, information retrieval, etc., related to the aviation filed [14]. To summarize, deep learning networks are becoming increasingly popular in aviation data processing. The up-to-date transformer-based networks, first proposed by Ashish Vaswani [15], are prevailing in NLP tasks, showing great potential in aviation safe reports analysis.

In this paper, considering the inconsistent recording lengths and prominent unstructured characteristics of Chinese aviation unsafe event texts, we proposed the integration of the Transformer model, clustering analysis, and feature network modeling to mine these text data.

Firstly, we adopted and pre-trained three Transformer network models to generate semi-structured texts of long Chinese texts of unsafe aviation events. This initial step was crucial as it allowed us to condense the verbose and complex original texts into more manageable summaries while retaining the essence, which is vital for the subsequent analytical processes. The performance of the three models was evaluated using universal metrics, identifying the GLM model as the most effective in summary generation due to its superior ability to capture the key information within the Chinese language context.

Taking these semi-structured and refined event texts as input, we carried out text feature cluster analysis based on Jieba word segmentation and text similarity calculation to determine and categorize unsafe event features. By comparing the accuracy of word segmentation of original and generated summarized event texts, the effectiveness of the pre-trained Transformer model was proven, demonstrating its capability to simplify and enhance the original texts for feature extraction.

After obtaining these features, we used the Pointwise Mutual Information (PMI) method to calculate the feature correlation matrix, and then a feature network was constructed. This network modeling is a novel approach that provides a visual and analytical representation of the relationships between different features, aiding in the identification of key areas of focus for safety management. Then, both individual and global network metrics were defined and calculated, providing a quantified identification of key feature nodes in the network.

Following this, we put forward a data-driven risk early warning strategy for aviation maintenance activities. This strategy was designed to provide early warning clues for risk investigation and control, thereby enhancing proactive safety measures. The results showed that the proposed method can effectively deal with Chinese unsafe aviation event data and can assist aviation managerial personnel with discovering key safety risks to improve decision-making and safety management.

The technical route of this paper is illustrated in Figure 1.

2. Summary Generation by Neural Network Model

2.1. Dataset Creation

This study has compiled a comprehensive dataset of Chinese text data pertaining to aviation safety incidents sourced from a diverse array of platforms. These include online searches, the authoritative Aviation Safety Information System maintained by the Civil Aviation Administration of China (CAAC) [16], the confidential Sino Confidential Aviation Safety Reporting System (SCASS) [17], and additional pertinent regulatory bodies. We collected a total of 3684 distinct records of unsafe events. The dataset creation process was guided by two pivotal considerations. One was the partial uniformity of textual structure and strong objectivity of word expression in these events, resulting from the documentation work carried out by official authorities, which is different from aviation safety narratives in [5]. The other was the robust small-sample learning proficiency of the Transformer-based neural network models, which were planned for use in our forthcoming model training phase. We numbered all the 3684 records from 1 to 3684 and used a random tool in Python to generate 550 integers between 1 and 3684, and according to these numbers, we selected 550 records. Then, we renumbered these 550 records from 1 to 550, performed the same operation to obtain 50 random integers, and took them as the test subset. The rest of the 500 records made up the training subset. Each record in the dataset has been manually annotated with the desired summary text. A representative example of this annotation is delineated in Table 1.

Here, we would like to explain why only 500 records and 50 records were used as training subsets and test subsets, respectively. In fact, we initially planned to label 1000 records as the training set and 100 as the test set. However, due to the time and labor intensity of manual labeling, we proceeded with training and testing after labeling only 550 records. We found that the model’s training performance met our expectations, especially the GLM model. This also, to some extent, demonstrated the excellent small-sample learning ability of the transformer models. Meanwhile, 50 test data entries were sufficient for calculating the ROUGE metrics (which will be discussed in Section 2.3.2) to make a comparison among the three models.

2.2. Model Deployment

2.2.1. Model Introduction

The Transformer model, introduced by Vaswani et al. in 2017 [15], is distinguished by its self-attention mechanism, which enables the model to capture dependencies between any two positions in a sequence, regardless of the distance between them. This effectively addresses the vanishing or exploding gradient issues encountered by traditional Recurrent Neural Networks (RNNs) when processing long sequences. The structure of a typical Transformer model is shown in Figure 2.

The prominent advantage of the Transformer network lies in its self-attention mechanism, which allows the model to compute attention weights at each position, focusing on other words in the input sequence to capture contextual information. Multi-head self-attention enables the model to perform multiple attention operations in parallel, capturing different aspects or features of the sequence from various perspectives or representational subspaces. For instance, one head might focus on syntactic information, while another concentrates on semantic details.

To date, the Transformer architecture has solidified its position as the prominent framework across a spectrum of NLP applications. Recognizing that the linguistic tapestry of diverse nations is fundamentally an amalgamation of characters, Transformer models pre-trained on distinct linguistic corpora are adept at addressing plenty of NLP challenges that are idiomatic to those languages, such as those encountered in Spanish [18], Korean [19], Japanese [20], etc.

Given the current abundance of open-source Transformer-based network models, each tailored for various text processing scenarios, our study on the task of unsafe events text summarization has identified the Sequence-to-Sequence (Seq2Seq) paradigm as fundamental. This paradigm involves the generation of a short text sequence from a longer one. Consequently, to compare the model performance, we selected three open-source Transformer models renowned for their outstanding performance in Seq2Seq tasks:

T5 Model (Text-to-Text Transfer Transformer): This model employs a unified text-to-text framework capable of handling a variety of Seq2Seq tasks by unifying multiple NLP tasks into a text-to-text format;
GLM Model (General Language Model): This model integrates auto-encoding and auto-regressive pre-training methods, enabling it to construct output text progressively, which aids in generating coherent and fluent text;
BART Model (Bidirectional and Auto-Regressive Transformers): Leveraging the strengths of both BERT and GPT, this model introduces random perturbations to the input text data, allowing it to better learn the semantic and structural information of the text.

2.2.2. Local Model Deployment

Due to the nature of this task being a text summarization within a Chinese context, it is necessary to deploy network models that have been pre-trained on Chinese corpora. The three Transformer models utilized in this study are all derived from open-source code and are as follows: the “mengzi-t5-base” model proposed by the Chinese company Langboat Technology, with a parameter scale of 220 million [21]; the “glm-large-chinese” model proposed by the Data Mining Laboratory of Tsinghua University, with a parameter scale of 335 million [22]; and the “bart-base-chinese” model proposed by the Natural Language Processing Laboratory of Fudan University, with a parameter scale of 110 million [23]. Hereafter, the three models will be referred to as T5, GLM, and BART, respectively.

The aforementioned open-source network model codes were downloaded from the Hugging Face website and deployed on a local computer. The hardware specifications of the computer and the parameters of the deep learning platform are detailed in Table 2.

2.3. Model Training and Evaluation

2.3.1. Training Parameters Setting

In the training phase for the three models, identical parameter settings were applied, focusing on the batch size, the number of training epochs, and the learning rate. Given the limited memory capacity of the local GPU and to avoid memory insufficiency that could lead to training failure due to an overly large batch size, which corresponds to a large volume of input data at each step, we determined through experimentation that a batch size of 4 was optimal. Regarding the number of training epochs, which indicates how many times the training set is iterated, our experiments revealed that setting it to 4 met the performance requirements for the task effectively while preventing overfitting. In comparison to traditional network models, the Transformer models deployed in this study were considerably larger in scale, necessitating a lower learning rate (0.0001) to ensure adequate model training.

2.3.2. Evaluation Method

The assessment of models for the text summarization task was primarily conducted through two avenues. On the one hand, it involved a quantitative evaluation by calculating the ROUGE scores from the outcomes on the test dataset. On the other hand, it included the generation of summaries for original texts that were not part of the training or testing datasets using the trained network. These summaries were then qualitatively evaluated by human judges, who assessed the fluency of the text, grammatical structure, and clarity of semantics. The following section will concentrate on the computation of ROUGE scores.

The ROUGE (Recall-Oriented Understudy for Gisting Evaluation) metric system is a prevalent tool for assessing the efficacy of automatic text summarization. Initially introduced by Lin in 2003 [24], ROUGE employs a length-normalized co-occurrence statistical approach to measure the alignment between an automatically generated summary and its reference counterpart. Within the ROUGE family of metrics, ROUGE-1 (unigram), ROUGE-2 (bigram), and ROUGE-L (longest common subsequence) have gathered significant attention for their utility.

R O U G E - 1 = \frac{\sum_{S \in \{R e f\}} \sum_{1 - g r a m \in S} {C o u n t}_{m a t c h} (1 - g r a m)}{\sum_{S \in \{R e f\}} \sum_{1 - g r a m \in S} C o u n t (1 - g r a m)}

(1)

R O U G E - 2 = \frac{\sum_{S \in \{R e f\}} \sum_{2 - g r a m \in S} {C o u n t}_{m a t c h} (2 - g r a m)}{\sum_{S \in \{R e f\}} \sum_{2 - g r a m \in S} C o u n t (2 - g r a m)}

(2)

R O U G E - L = \frac{(1 + β^{2} R_{L C S} P_{L C S})}{R_{L C S} + β^{2} P_{L C S}}, R_{L C S} = \frac{L C S (r, s)}{|r|}, P_{L C S} = \frac{L C S (r, s)}{|s|}

(3)

where 1 − gram and 2 − gram refer to unigram and bigram tokenization, respectively. Ref stands for the reference summary. The function Count_match(1 − gram) and Count_match(2 − gram) represent the number of common occurrences of 1 − gram and 2 − gram in both the generated summary and the reference summary. On the other hand, Count(1 − gram) and Count(2 − gram) represent the total number of occurrences of 1 − gram and 2 − gram in the reference summary. R_LCS represents recall, P_LCS represents precision, LCS(r, s) is the length of the longest common subsequence between the generated summary and the reference summary, and β is a parameter that balances recall and precision, often set to a large value.

2.3.3. Experiment Results

Following the parameter settings, we initially trained the three network models using the training dataset, documenting the training duration for each model. Concurrently, we calculated the ROUGE scores for the three network models post-training using the test set, with the results presented in Table 3. Upon completion of the training phase, we utilized 3 raw records of unsafe aviation events that were not included in the dataset to generate summaries with the aforementioned trained networks, and the outcomes are displayed in Table 4.

The training outcomes presented in Table 3 indicate that under identical training parameter configurations, the GLM model, with the largest parameter scale, exhibited a longer training duration but achieved a higher ROUGE score. This suggests that the GLM model’s text summarization performance is superior, with a greater match between the generated summaries and the ideal summaries. In contrast, the BART model, despite having the smallest parameter scale, did not have the shortest training time, and its ROUGE score was lower than that of the GLM model but slightly higher than that of the T5 model. The T5 model, with a moderate parameter scale, had the shortest training time but the lowest ROUGE score, indicating that while its training speed is commendable, the training outcome is the poorest.

As demonstrated in Table 4, which compares the summarization results of the three models, the GLM model’s summarization quality significantly outperformed the other two in terms of linguistic fluency, grammatical structure, semantic completeness, and accuracy. The summaries generated by the other two models were marred by grammatical errors and semantic confusion, rendering them nearly unusable. Moreover, the GLM model adeptly encapsulated the specific causes of issues present in the original text, underscoring its high applicability and utility.

2.4. Impact Assessment of Summarization on Text Segmentation

Given the unpredictability of how machine learning algorithms divide Chinese words, the outcomes can often stray from the true meanings of the words. The concept of creating condensed versions of event texts is to strip away unnecessary details and keep only the essential information, which in turn lowers the chance of errors in later word divisions. To illustrate the necessity of text summarization on word segmentation, which is the first step of text mining, we set up a comparison test. We picked 10 random samples of unsafe event texts. Both the original texts and their GLM-generating summaries were segmented into words using the Jieba tool. Table 4 presents the detailed word segmentation results of the first sample, while Figure 3 presents the overall statistics for all the samples.

As shown in Table 5, the event summary focuses on the key details of the event and leaves out parts of the original text that do not have much to do with what caused the event. If a word at the beginning of a sentence is wrongly segmented, it can set off a chain where the next few words are also wrongly segmented. This example suggests that a summary can prevent possible mistakes in how words are divided and make the important information clearer.

In Figure 3a, after the word segmentation, the summaries have fewer correct and incorrect words compared to the original texts, and there are fewer words overall. At first, you might think that having fewer correct and incorrect words would not affect how accurate the division is, but Figure 3b shows that the accuracy of dividing words in the summaries is just as good as in the original texts. We think this is because the pre-trained network which helps make the summaries breaks down the text in a way that matches real words, which means the summaries are more likely to be made up of words that make sense.

In general, the results from the comparison show that dividing the summaries leads to fewer unwanted words, fewer mistakes, better accuracy, and better efficiency. This shows that using a pre-trained network for processing aviation text data is a good idea and works well.

3. Cluster Analysis of Summarized Text

To study the interesting topic of unsafe aviation events, we carried out a cluster analysis of GLM-generating summaries. Cluster analysis includes the following courses: text feature extraction, similarity calculation, and clustering process.

3.1. Text Feature Extraction

Text feature extraction segments the text into words and selects interesting ones as text features; the Jieba method was utilized to segment GLM-generating texts. This process yields segmented words across various grammatical categories, such as nouns, prepositions, adverbs, verbs, adjectives, and so on. In the context of unsafe aviation events, we concentrated on the development of the events. Therefore, the features we focused on were nouns and verbs. We tallied the features of nouns and verbs that appeared more than 30 times, which resulted in 132 features. Since some words may be classified differently based on context (for example, ‘工作’ (work) can be both a noun and a verb), we reviewed and refined these 132 features, eliminating 6 features with overlapping grammatical roles. This left us with 126 distinct unsafe event features.

Considering factors such as the preferences of recorders and word variations, which can influence the phrasing of incident reports, the same event might be documented with different combinations of features (for example, ‘超’ (exceed), ‘超过’ (surpass), and ‘超出’ (go beyond) all suggest going beyond a limit). In text mining, these are treated as synonyms. To ensure the accuracy of the text mining outcomes, it is essential to merge these synonyms. Here, we used the Hash mapping technique (further details to follow) to combine synonyms, ending up with 121 features. These features form a standard library for subsequent analysis. Some of these safety incident features of high occurrence frequency are displayed in Table 6.

3.2. Hash-Mapping Text Similarity Calculation

The calculation of similarity between texts has a significant impact on clustering effectiveness. As unstructured data, the text features of unsafe events cannot be directly used for similarity calculations. A common practice is to process them into structured data before analysis. Hash mapping can convert unstructured data, such as texts, into a series of binary values, which has several advantages. First, it is sensitive to input values, and the probability of the same Hash value for different original data is very small. Second, Hash mapping is highly efficient and can handle large sample datasets efficiently. Therefore, it is widely used for processing unstructured data.

The basic idea of Hash mapping calculating similarity is as follows: First, a hash function is used to map the features of unsafe events into b bit binary hash values, where the length can be adjusted according to needs. Then, the Hamming distance D_hm between the Simhash fingerprint values of two unsafe event features is calculated, and the similarity (SI) between the two features is computed using the following equation:

S I = \frac{D_{h m}}{b}

(4)

From Equation (5), it can be inferred that the larger the Hamming distance D_hm between the features, the lower their similarity (SI) will be. Figure 4 shows an example of calculating the similarity of two features using Hash mapping.

3.3. Clustering Process Based on Simhash Algorithm

This work used text similarity as the clustering method for grouping unstructured data and designed a feature consistency index to evaluate the clustering performance. The main idea is as follows:

First, the number of clusters N is determined by the total amount M of text features:

N = \max (2, \sqrt{M})

(5)

Second, N cluster centers are randomly selected from the input text features.

After that, the similarities between the rest of the text features are calculated, and each feature is assigned to a cluster with the highest similarity.

Then, the cluster centers are recalculated, and the previous step is repeated until the categories of text features no longer change.

The specific process is illustrated in Figure 5.

The Simhash algorithm is a widely used technique for simplifying text data by reducing its complexity [25,26,27]. It works as a type of hashing method that is sensitive to the content’s locality. The process starts with basic text preparation, like breaking down the text into words, to get the main parts of the text and how important each part is. The importance is measured by a common method called Term Frequency-Inverse Document Frequency (TF-IDF) [5].

After that, a hash function helps to find a unique number, or a ‘hash value’, for each part of the text. These values are combined, and a short string of bits, known as a Simhash fingerprint, is created. This fingerprint is made using simple rules, like setting a bit to ‘1’ if the combined value for that bit is more than zero and ‘0’ if it is not. The length of the fingerprint can be changed to fit different needs.

When the center of a group of texts is updated, the step of breaking down the text into words is not needed. Figure 6 shows a diagram of how the centers of text groups are updated using the Simhash algorithm.

3.4. Clustering Results

Following the application of a text similarity-based clustering method to 121 unsafe event feature texts, we identified 11 distinct clusters. The clusters varied considerably in size, with the largest containing 26 features and the smallest only 5. Subsequently, a detailed manual review was undertaken to evaluate the clustering outcomes.

Upon review, certain misclassified unsafe event features were identified and reassigned to more suitable groups. For instance, the feature ‘commander’, initially categorized with the seventh cluster dominated by aircraft component-related features such as frames, covers, and rivets, was relocated to the first cluster, which pertains to human-related aspects.

Furthermore, clusters with closely related unsafe event feature meanings were examined for potential consolidation. These refinements led to the merging of the original 11 clusters into a more streamlined set of 9 clusters. The revised clustering results are detailed in Table 7.

As can be seen from Table 7, the meanings represented by the features within the same category were generally similar, while those between different categories differed significantly. If an accident that belongs to a certain unsafe event feature group occurs, it indicates that other unsafe event features within that category may also occur and require close attention. Additionally, if an unsafe event occurs that belongs to multiple feature groups, it is necessary to consider the relationships and interactions between these groups. This approach allows for a more comprehensive understanding of the potential hazards associated with unsafe event features and can aid in the identification of areas of concern for safety improvement.

4. Unsafe Feature Network Analysis

In this section, the obtained unsafe event features were used to construct a feature network, and the relevant metrics were calculated. Firstly, the correlation between features was quantified using the normalized Pointwise Mutual Information (PMI) method [27,28]. PMI is an efficient and intuitive measure for assessing the co-occurrence of words, offering a straightforward way to capture lexical associations without the need for complex modeling or extensive computational resources. Secondly, the Gephi software was utilized to construct the unsafe event feature network. Both global and individual network structure metrics were calculated to identify key unsafe event features.

4.1. Feature Correlation Matrix

Pointwise Mutual Information (PMI) can quantify the correlation between two events or entities [27,28]. It can take positive, negative, or zero values. To assess the correlation among various unsafe event features, calculating their PMI values is a useful approach. The equation for PMI is as follows:

P M I (w_{1}, w_{2}) = \log_{2} \frac{P (w_{1}, w_{2})}{P (w_{1}) P (w_{2})}

(6)

Specifically, P(w₁,w₂) represents the probability of both unsafe event features w₁ and w₂ occurring together in the same accident. On the other hand, P(w₁) and P(w₂) refer to the probabilities of unsafe event features w₁ and w₂ occurring alone, respectively, without considering the presence of the other feature.

PMI between any two unsafe event features is calculated based on Equation (6). A positive value of PMI indicates an existing correlation between the two unsafe event features, and the stronger the correlation, the higher the value. When a PMI is close to zero, the correlation is weak. Therefore, a threshold for unsafe event feature correlation, ε₁ = 0.5, is set. If ε₁ ≥ 0.5, it is considered that the correlation between the unsafe event features w₁ and w₂ is strong. The element A_ij in the feature correlation matrix A is set to 1. Otherwise, A_ij is set to 0. Based on this, a feature correlation matrix A is obtained.

4.2. Feature Network Construction

Based on the feature correlation matrix T, Gephi 0.10.1 software is used to generate a network graph of unsafe event features, as shown in Figure 7. Different colors in the network of unsafe event features indicate that they belong to different groups of unsafe event features. Direct connections between nodes indicate the existence of a relationship between two unsafe event features. To further explain the relationships between individual unsafe event features and the network itself, relevant metrics were calculated. These metrics provide a quantitative analysis of the network from both global and individual perspectives.

4.3. Network Metrics Calculation

To assess the unsafe event feature network, both global and individual metrics were calculated. Table 8 lists the detailed metrics calculated.

Based on Table 8, all the mentioned metrics were calculated.

In terms of the global metrics, De = 0.255 indicates that the nodes in the network play a significant role as bridges between unsafe event features. Additionally, Tr = 0.411 suggests that the average level of closeness between unsafe event features and their neighboring features in the network is relatively high. Both metrics indicate that, overall, there is good connectivity among the unsafe event features in the network.

As for the individual metrics, due to the large number of unsafe event features, partial calculation results are presented in Table 9.

Refining the analysis from Table 9, we observed that “inspect” achieved the highest Closeness Centrality score of 0.638. This suggests that “inspect” is notably interconnected with other network features, facilitating efficient information dissemination. The “Engine” and “Landing” also exhibited significant CC values, 0.622 and 0.561, respectively, which positioned them as central elements within the network.

Shifting focus to Degree Centrality, “Inspect” claimed the top spot with a 0.6 value, denoting its numerous direct associations with other features. The “Engine”, with a DC of 0.558, was a close second, underscoring its extensive connectivity. Eigenvector Centrality likewise highlighted “Inspect”, with the maximum score of 21.624. This not only signified numerous connections but also the significance of those connected features, thereby amplifying “Inspect’s” overall impact.

In summary, “Inspect” and “Engine” stood out across all metrics, affirming their status as pivotal event features in the network. This insight indicates that inspections are instrumental in identifying a majority of risks associated with aircraft. The frequency of engine-related unsafe events underscores the engine’s role as a crucial component, where any malfunction could pose substantial safety risks.

The elevated status of “Inspect” and “Engine” in the network analysis emphasizes the importance of rigorous inspection protocols and frequent maintenance for aircraft engines. It also suggests the need for robust monitoring systems to swiftly identify and rectify any issues detected during inspections. These insights are invaluable for crafting a strategic aviation safety management plan, where timely issue detection can prevent the progression to grave incidents.

4.4. Data-Driven Risk Early Warning Strategy

To conduct more efficient safety hazard investigations, this paper proposes a feature data-driven aviation safety risk early warning strategy. Based on the textual records of each safety hazard investigation, the strategy achieves the identification and warning of potential safety hazards through standardization of safety hazard data, extraction of safety hazard features, deduction of associated safety hazard features, and ranking of key safety hazard features.

As illustrated in Figure 8, when implementing regular maintenance inspection on airplane engines, the standardized description is “Regular inspection on airplane engines”, then the features extracted of this operation are “Inspect” and “Engine”. Through the feature network, those features that both connected with “Inspect” and “Engine” were determined, as shown in Figure 9. Considering the individual metrics of these features, the ranking of them can be realized. According to the ranking of these unsafe features and their classifications, potential risk information can be specified. In this example of Figure 8, when implementing regular inspection on airplane engines, the following potential risks should be paid more attention to: oil leakage of the fuel pump, grease abnormality, wearing spouts and attachments, abnormal venting temperature, and so on.

The aforementioned risk warning strategy based on the analysis of hidden unsafe feature network can achieve timeliness improvements in two aspects: First, in the warning process, by associating the unsafe features found during the inspection and providing related potential hidden risk information and importance ranking, it provides a direction and clues for safety management personnel with limited energy, thereby avoiding the inefficiency caused by the unplanned search method of the original risk investigation. Second, in the warning tools, a further programmable information system can be formed. Safety management personnel can input the unsafe information found during the hidden risk investigation process; the system background will quickly process the standardization and other links and feedback on the related hidden risks and sorting situation, ensuring the timeliness of the feedback of hidden risk investigation clues.

In practical application, the frequency of network model updates can also be determined according to the management needs and the computing power of computer hardware and software. We suggest that an on-condition update of the newly recorded event data follow and track the changing unsafe features. The update timing highly depends on the new-coming event number over a period. When the number of new events reaches a threshold, the proposed network model should be updated. If the threshold is set too high, it may lead to a delay in grasping unsafe features, thus missing the best opportunity for hazard investigation and control; meanwhile, a too-low threshold can easily lead to high consumption of computational resources and insignificant changes in unsafe features which cannot provide clear guidance for hazard investigation and control. Therefore, optimizing the threshold setting needs further exploration.

5. Conclusions

In order to harness the Chinese unsafe aviation event dataset more efficiently to mine useful information, this paper proposes a standardization and feature processing method for aviation unsafe data that combines transformer neural networks, cluster analysis, and feature network analysis. Initially, the transformer Chinese pre-trained models are utilized to streamline the original event text while retaining key information. A dataset for the event text summary generation task is constructed, and the selected T5, GLM, and BART models are trained and evaluated. Among them, the GLM model demonstrates the best performance in summary generation. Furthermore, by analyzing the impact of summary generation on word segmentation, the advantages of summary generation are elucidated. Based on the aviation unsafe event summary text generated by GLM, the Jieba tool is employed to extract unsafe features, which are then subjected to cluster analysis using the Simhash algorithm, resulting in a total of 9 categories and 121 features. Based on these features, a correlation matrix is solved using the PMI method, and a feature network model is established. The global and individual indicators of the network are calculated, and a data-driven hidden danger warning strategy is proposed based on this network and calculated metrics, providing early warning clues for risk investigation and control in aviation maintenance activities. This paper presents a new method for processing Chinese text data of unsafe aviation events, offering a fresh perspective for further promoting the level of aviation safety management.

6. Discussion

The evolution of aviation report texts from unstructured to semi-structured or well-structured formats is a significant development aimed at enhancing the efficiency and accuracy of data processing. However, unstructured text reports retain their value, particularly in providing detailed narratives and rich contextual information that aid in a comprehensive understanding of complex events. This study addresses the challenge of mining information from such unstructured and verbose Chinese aviation unsafe event reports, leveraging the capabilities of transformer network models to generate concise summaries that retain key information.

This study does not make a direct comparison with specific existing methods because the primary issue addressed by the method proposed is how to mine information from unstructured and verbose Chinese aviation unsafe event reports. The use of transformer network models enabled the generation of concise summaries, ensuring the cleanliness of the subsequent input data, which is a capability that traditional methods do not possess. Additionally, Section 2.4 also analyzed how the summary generation text can improve the efficiency and accuracy of Chinese text word segmentation. Considering that most datasets used in existing research are English aviation safety information texts, and there is a significant difference in language structure between Chinese and English texts, which leads to a considerable difference in word segmentation operations, we considered that from the perspective of initial data input, the method proposed in this paper is not comparable with other studies.

Although the method is innovative and effective in processing unstructured text data of Chinese unsafe aviation events, it also has some limitations:

Generalizability of the Model: Although the Transformer models used in this study have performed well on specific datasets, their generalizability to other languages or domains may decline. This means that the model may require retraining and adjustment for new datasets to maintain its effectiveness;
Consumption of Computational Resources: Transformer models with rather large-scale parameters typically require substantial computational resources for training and inference. This may limit their application in environments with limited resources, especially in scenarios that require real-time or near-real-time analysis;
Limitations of Cluster Analysis: Although cluster analysis is used in this study for feature categorization, the method may be influenced by initial conditions and algorithm choices, leading to different clustering results. And that is why clustering results need manual adjustments;
Inherent Limitations of Data-Driven Approaches: The method proposed in this study relies on historical data to predict and identify potential risk patterns. However, this approach may not fully capture emerging risk factors or those events that do not occur frequently but have significant impacts.

Despite the limitations, there are several areas where future research can build upon the existing framework. Here are potential directions for future research:

Cross-Linguistic Application: To extend the methodology to other languages requires the adaptation of language-specific pre-trained transformer models and word segmentation tools. This expansion will enable the application of the method to a broader range of aviation reports, enhancing its global relevance;
Model Updating Strategy Optimization: As aviation activities continue to evolve, the associated data and risk patterns will also change. Therefore, how to optimize the information update strategy according to the new-coming unsafe events, and the goal is to manage to achieve the timeliness of information updates under limited computational resources;
Temporal Analysis of Unsafe Features: Analyzing the evolution and distribution patterns of unsafe features over time is another area for future exploration. This temporal analysis will provide insights into the dynamics of aviation safety, potentially revealing trends and patterns that can inform proactive safety measures.

Author Contributions

Conceptualization, Q.W. and R.X.; methodology, R.X. and J.Y.; software, R.X. and J.Y.; validation, Q.L. and S.T.; formal analysis, R.X.; investigation, R.X.; resources, Q.W.; data curation, Q.W.; writing—original draft preparation, R.X. and J.Y; writing—review and editing, S.T. and Z.X.; visualization, Z.X.; supervision, Q.W; project administration, Q.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data are not publicly available due to privacy. However, some data can be accessed on the Chinese websites https://safety.caac.gov.cn/index/initpage.act (accessed on 2 February 2024) and http://scass.huahangxinyan.com/pcToReportQuery.do (accessed on 2 February 2024).

Conflicts of Interest

The authors declare no conflicts of interest.

References

International Civil Aviation Organization. Doc9859 Safety Management Manual, 4th ed.; 999 Robert-Bourassa Boulevard: Montréal, QC, Canada, 2018. [Google Scholar]
Aviation Safety Reporting System Database. Available online: https://asrs.arc.nasa.gov/search/dbol.html (accessed on 2 February 2024).
National Transportation Safety Board Aviation Database. Available online: https://www.ntsb.gov/Pages/AviationQueryV2.aspx (accessed on 2 February 2024).
Aviation Safety Network. Available online: https://aviation-safety.net/database/databases.php (accessed on 2 February 2024).
Rose, R.L.; Puranik, T.G.; Mavris, D.N. Natural Language Processing Based Method for Clustering and Analysis of Aviation Safety Narratives. Aerospace 2020, 7, 143. [Google Scholar] [CrossRef]
Garcia, J.S.D.; Jaedicke, C.; Lim, G.L.; Truong, D. Predicting the Severity of Runway Excursions from Aviation Safety Reports. J. Aerosp. Inf. Syst. 2023, 20, 555–564. [Google Scholar] [CrossRef]
Madeira, T.; Melício, R.; Valério, D.; Santos, L. Machine Learning and Natural Language Processing for Prediction of Human Factors in Aviation Incident Reports. Aerospace 2021, 8, 47. [Google Scholar] [CrossRef]
Chen, C.J.; Huang, C.N.; Yang, S.M. Application of Deep Learning to Multivariate Aviation Weather Forecasting by Long Short-term Memory. J. Intell. Fuzzy Syst. 2023, 44, 4987–4997. [Google Scholar] [CrossRef]
Gong, W.; Guan, Z.; Sun, Y.; Zhu, Z.; Ye, S.; Zhang, S.; Yu, P.; Zhao, H. Civil Aviation Travel Question and Answer Method Using Knowledge Graphs and Deep Learning. Electronics 2023, 12, 2913. [Google Scholar] [CrossRef]
Han, S.; Khatir, S.; Wahab, M.A. A deep learning approach to predict fretting fatigue crack initiation location. Tribol. Int. 2023, 185, 108528. [Google Scholar] [CrossRef]
Zhang, X.; Srinivasan, P.; Mahadevan, S. Sequential deep learning from NTSB reports for aviation safety prognosis. Saf. Sci. 2021, 142, 105390. [Google Scholar] [CrossRef]
Dong, T.; Yang, Q.; Ebadi, N.; Luo, X.R.; Rad, P. Identifying Incident Causal Factors to Improve Aviation Transportation Safety: Proposing a Deep Learning Approach. J. Adv. Transport. 2021, 2021, 5540046. [Google Scholar] [CrossRef]
Monika; Verma, S.; Kumar, P. Generic Deep-Learning-Based Time Series Models for Aviation Accident Analysis and Forecasting. SN Comput. Sci. 2024, 5, 32. [Google Scholar] [CrossRef]
Andrade, S.R.; Walsh, H.S. SafeAeroBERT: Towards a Safety-Informed Aerospace-Specific Language Model. AIAA Aviation Forum. 2023, 2023–3437. [Google Scholar] [CrossRef]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is all you need. arxiv 2017, arXiv:1706.03762. [Google Scholar] [CrossRef]
Aviation Safety Information System of Civil Aviation Administration of China. Available online: https://safety.caac.gov.cn/index/initpage.act (accessed on 2 February 2024).
Sino Confidential Aviation Safety Reporting System. Available online: http://scass.huahangxinyan.com/pcToReportQuery.do (accessed on 2 February 2024).
González, J.; Hurtado, L.-F.; Pla, F. TWilBert: Pre-trained deep bidirectional transformers for Spanish Twitter. Neurocomputing 2021, 426, 58–69. [Google Scholar] [CrossRef]
Choi, Y.-S.; Park, Y.-H.; Lee, K.J. Building a Korean morphological analyzer using two Korean BERT models. PeerJ Comput. Sci. 2022, 8, e968. [Google Scholar] [CrossRef] [PubMed]
Kawara, Y.; Chu, C.; Arase, Y. Preordering Encoding on Transformer for Translation. IEEE/ACM Trans. Audio Speech Lang. Process. 2021, 29, 644–655. [Google Scholar] [CrossRef]
Zhang, Z.; Zhang, H.; Chen, K.; Guo, Y.; Hua, J.; Wang, Y.; Zhou, M. Mengzi: Towards Lightweight yet Ingenious Pre-trained Models for Chinese. arXiv 2021, arXiv:2110.06696. [Google Scholar] [CrossRef]
Du, Z.; Qian, Y.; Liu, X.; Ding, M.; Qiu, J.; Yang, Z.; Tang, J. GLM: General Language Model Pretraining with Autoregressive Blank Infilling. arXiv 2021, arXiv:2103.10360. [Google Scholar] [CrossRef]
Shao, Y.; Geng, Z.; Liu, Y.; Dai, J.; Yan, H.; Yang, F.; Qiu, X. CPT: A Pre-Trained Unbalanced Transformer for Both Chinese Language Understanding and Generation. Sci. China Inf. Sci. [CrossRef]
Lin, C.-Y.; Hovy, E.H. Automatic evaluation of summaries using N-gram co-occurrence statistics. In Proceedings of the The 2003 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, Edmonton, AB, Canada, 27 May–1 June 2003. [Google Scholar] [CrossRef]
Qin, J.; Cao, Y.; Xiang, X.; Tan, Y.; Xiang, L.; Zhang, J. An encrypted image retrieval method based on simhash in cloud computing. CMC-Comput. Mater. Con. 2020, 63, 389–399. [Google Scholar] [CrossRef]
Kwon, Y.-M.; An, J.-J.; Lim, M.-J.; Cho, S.; Gal, W.-M. Malware Classification Using Simhash Encoding and PCA (MCSP). Symmetry 2020, 12, 830. [Google Scholar] [CrossRef]
Deng, T.; Huang, Y.; Yang, G.; Wang, C. Pointwise mutual information sparsely embedded feature selection. Int. J. Approx. Reason. 2022, 151, 251–270. [Google Scholar] [CrossRef]
Kucuk, S.; Yukse, S.E. Pointwise Mutual Information-Based Graph Laplacian Regularized Sparse Unmixing. IEEE Geosci. Remote Sens. Lett. 2022, 19, 1–5. [Google Scholar] [CrossRef]

Figure 1. Technical route of this paper.

Figure 2. The structure of a typical Transformer model.

Figure 3. Word segmentation results of original and summarized event texts: (a) Word segmentation counts; (b) Word segmentation accuracy. The highlighted numbers in red are the average accuracy.

Figure 4. Calculating the similarity of two features using Hash mapping.

Figure 5. Workflow of the clustering process.

Figure 6. Diagram of updating cluster centers using the Simhash algorithm.

Figure 7. Feature network of unsafe aviation events.

Figure 8. An example of a risk early warning strategy.

Figure 9. Unsafe features connected with “Inspect” and “Engine”.

Table 1. One record of an unsafe aviation event in a Chinese text dataset.

Aviation Unsafe Event Text	Chinese	XX飞机执行XX-XX航班，落地后航班进入M11机位后，机务检查发现飞机右水平尾翼前缘被外物击伤，有3个凹坑（100毫米 × 50毫米 × 5.5毫米、60毫米 × 70毫米 × 3.5毫米、70毫米×40毫米×4.0毫米），机务确认飞机损伤超标，停场执行修理工作。23:49–23:58，飞行区管理部检查跑道及滑行道均未见异常，该飞机无后续出港计划。
Aviation Unsafe Event Text	English	XX aircraft, operating flight XX-XX, was found to have suffered impact damage to the leading edge of the right horizontal stabilizer after landing and parking at the M11 gate. The damage consisted of three indentations (100 mm × 50 mm × 5.5 mm, 60 mm × 70 mm × 3.5 mm, 70 mm × 40 mm × 4.0 mm). The maintenance crew determined that the damage exceeded acceptable limits, and the aircraft was grounded for repairs. From 23:49 to 23:58, the Flight Area Management Department inspected the runway and taxiway and found no abnormalities. The aircraft had no subsequent departure plans.
Manually Labeled Summary	Chinese	XX飞机落地后机务检查发现，飞机右水平尾翼前缘被外物击伤，损伤超标，执行修理工作。
Manually Labeled Summary	English	After landing, maintenance inspection of XX aircraft revealed impact damage to the leading edge of the right horizontal stabilizer, exceeding acceptable limits, and repairs were performed.

Here, we presented the Chinese texts with English translations, and we would like to explain our considerations. We acknowledged that there are significant differences in the structures and principles of Chinese and English texts. Substituting all Chinese text with English translations was problematic, especially in the word segmentation Section 3.1, where language structure plays a crucial role. Accurately conveying the nuances of Chinese word segmentation in English is challenging, which is why Section 2.4 does not include English translations for these results. Conversely, presenting Chinese text without English translations would alienate readers unfamiliar with the language. To balance, we opted to include both Chinese text and English translations in the summary generation Section 2 and cluster analysis Section 3, as these processes involved direct engagement with Chinese sentences. Regarding the feature network analysis in Section 4, since it focused on individual words rather than overall language structure, we utilized English translations to present all feature words.

Table 2. Hardware and software specifications.

Hardware		Software
CPU	Intel Core i7-7700HQ	Environment	Anaconda
GPU	NVIDIA GTX 1070 8 GB	Python Version	3.9.18
RAM	32 GB @2400 MHz	CUDA Version	11.6
ROM	1 TB SSD	Deep Learning Platform	Pytorch 1.13.1
OS	Windows 10 22H2	Transformer Library Version	4.32.1

Table 3. Model training duration and ROUGE scores.

Model Name	Parameter Scale	Training Duration	ROUGE-1	ROUGE-2	ROUGE-L
mengzi-t5-base (T5)	220 M	3 min 30 s	0.644	0.555	0.573
glm-large-chinese (GLM)	335 M	135 min 39 s	0.832	0.729	0.748
bart-base-chinese (BART)	110 M	13 min 4 s	0.690	0.559	0.611

Table 4. Text summary generation results of three unsafe events.

Original Text of Event #1	Chinese text: 2021年3月30日，XXX公司XXX飞机执行XXX(XXX-XXX)航班。在XXX机场26号跑道着陆过程中，无线电高度10英尺以下遭遇低空风切变:风向风速急剧变化，飞机状态出现较大偏差，机组执行复飞。在复飞过程中出现“BANK ANGLE”(坡度角)语音警告，左坡度最大峰值20.92度，左翼尖擦地。再次进近后在26号跑道正常落地。地面检查发现左大翼翼尖处损伤，部分材料缺失。
	English translation: On 30 March 2021, XXX company’s XXX aircraft was performing the XXX (XXX-XXX) flight. During the landing process on runway 26 at XXX airport, below 10 feet of radio altitude, the aircraft encountered low-altitude wind shear: there was a sudden change in wind direction and speed, and the aircraft’s state deviated significantly, leading the crew to execute a go-around. During the go-around, a “BANK ANGLE” voice warning occurred, with the maximum peak of the left bank angle reaching 20.92 degrees, and the left wingtip scraped the ground. After the second approach, the aircraft landed normally on runway 26. Ground inspection found damage at the tip of the left wing, with some materials missing.
T5 Summary	Chinese text: XXX飞机着陆过程中，飞机状态偏差，复飞出现警告，最大峰值20.92度，材料缺失。
	English translation: During the XXX aircraft’s landing process, there was a significant deviation in the aircraft’s state, and a warning occurred during the go-around, with a maximum peak of 20.92 degrees, and some materials were missing.
GLM Summary	Chinese text: XXX飞机着陆复飞过程中，左翼尖擦地，左大翼翼尖处损伤，部分材料缺失。
	English translation: During the XXX aircraft’s landing and go-around process, the left wingtip scraped the ground, resulting in damage at the tip of the left wing, with some materials missing.
BART Summary	Chinese text: XXX飞机跑道着陆过程复飞过程中，无线电高度低空风切变，出现坡度角，左翼尖擦地。
	English translation: During the XXX aircraft’s runway landing and go-around process, low-altitude wind shear was encountered at a low radio altitude, a bank angle occurred, and the left wingtip scraped the ground.
Original Text of Event #2	Chinese text: 2020年8月29日，XXX航空有限公司XXX号直升机在江西省赣州市龙南县境内执行果树农喷作业。执行任务过程中坠机，机体受损严重。机上共一名人员(飞行员)经现场抢救无效死亡。该事件最大可能是飞行员飞行时未能观察到高压线位置，在从高压线下方经过时，旋翼桨叶挂断高压线一片桨叶断裂(断裂飞出桨叶长约1.5米)，直升机失去平衡、失控翻转坠落在山坡上，并翻滚倒扣。
	English translation: On 29 August 2020, a helicopter operated by XXX Aviation Limited Company, with registration number XXX, crashed while conducting agricultural spraying for fruit trees in Longnan County, Ganzhou City, Jiangxi Province. During the mission, the helicopter suffered a severe crash, resulting in extensive damage to the aircraft. There was one person on board (the pilot) who was pronounced dead after rescue efforts at the scene were unsuccessful. The most likely cause of the incident was that the pilot failed to observe the position of the high-voltage power lines during flight. As the helicopter passed underneath the power lines, a rotor blade struck the high-voltage line, causing one of the blades to break off (the broken blade, approximately 1.5 m in length, flew off). The helicopter lost balance, went out of control, flipped over, and fell onto a hillside, rolling over and landing upside down.
T5 Summary	Chinese text: XXX号直升机执行果树农喷作业任务过程中坠机，机体受损严重，飞行时未能观察到高压线位置。
	English translation: Helicopter XXX, while performing a fruit tree spraying mission, crashed, resulting in severe damage to the aircraft body due to failure to observe the location of high-voltage power lines during flight.
GLM Summary	Chinese text: XXX号直升机坠机，旋翼桨叶挂断高压线一片桨叶断裂，直升机失去平衡、失控翻转坠落。
	English translation: Helicopter XXX crashed, with the rotor blade hitting and breaking a high-voltage power line, causing one of the blades to fracture. The helicopter lost balance and control, flipping and falling out of control.
BART Summary	Chinese text: XXX号直升机执行果树农喷作业，现场抢救无效死亡，翻转坠落在山坡上，并翻滚倒扣。
	English translation: During the fruit tree spraying mission, Helicopter XXX experienced an accident that led to death after ineffective on-site rescue efforts. The helicopter flipped and fell onto a hillside, rolling over and landing upside down.
Original Text of Event #3	Chinese text: 2023年7月30日，XXX有限公司XXX号机执行XXX航班，在XX机场19号跑道着陆过程中偏出跑道，导致跑道边灯和飞机受损，机上人员安全。经调查，该事件是由于机组对夜间大雨天气运行风险管控能力不足，进近和着陆准备不充分，SOP执行不到位；飞机穿过决断高度以后，机长主要依靠HUD指示控制飞行，至30英尺跟丢HUD指引未及时复飞；30英尺以下，机长视线尚未及时完全散开，对飞机状态不清晰不掌握，导致飞机带左坡度（5.4度）、左交叉（7度）和左侧滑惯性在着陆区中线左侧（约9米）接地（1.55 G）；接地后机长虽然立即持续蹬右满舵修正，但已无法制止飞机左偏趋势，飞机向左侧偏出跑道。
	English translation: On 30 July 2023, XXX Limited Company’s aircraft XXX, performing XXX flight, veered off the runway during the landing process on runway 19 at XX Airport, resulting in damage to the runway edge lights and the aircraft, with the crew and passengers remaining safe. The investigation found that the incident was due to the flight crew’s insufficient ability to control the operational risks in heavy rain at night, inadequate approach and landing preparation, and incomplete execution of Standard Operating Procedures (SOPs). After the aircraft passed the decision height, the captain mainly relied on the HUD (Head-Up Display) indications to control the flight and did not initiate a go-around in time after losing the HUD guidance at 30 feet. Below 30 feet, the captain’s vision had not fully cleared in time, resulting in unclear and uncontrolled awareness of the aircraft’s state, leading the aircraft to touch down with a left bank (5.4 degrees), left crab (7 degrees), and a left skid inertia on the left side of the landing zone’s centerline (about 9 m) with a vertical force of 1.55 G. After touchdown, although the captain immediately and continuously applied full right rudder to correct, it was impossible to stop the aircraft’s tendency to deviate to the left, causing the aircraft to veer off the runway to the left.
T5 Summary	Chinese text: XXX号机着陆过程中偏出跑道，跑道边灯和飞机受损，30英尺跟丢HUD指引未及时复飞，飞机状态不清晰不掌握。
	English translation: Aircraft XXX veered off the runway during the landing process, resulting in damage to the runway edge lights and the aircraft. The aircraft lost the HUD guidance at 30 feet and failed to initiate a go-around in time, with the aircraft’s state being unclear and not under control.
GLM Summary	Chinese text: XXX号机着陆过程中偏出跑道，跑道边灯和飞机受损，是由于机组运行风险管控能力不足，进近和着陆准备不充分。
	English translation: Aircraft XXX veered off the runway during the landing process, with damage to the runway edge lights and the aircraft. This was due to the flight crew’s insufficient ability to manage operational risks and inadequate preparation for approach and landing.
BART Summary	Chinese text: XXX号机着陆过程中偏出跑道，机上人员安全，SOP执行不到位，飞机在着陆区中线左侧（约9米）接地（1.55 G）。
	English translation: Aircraft XXX veered off the runway during the landing process, with all crew and passengers on board safe. SOPs were not executed properly, and the aircraft touched down on the left side of the landing zone’s centerline (approximately 9 m) with a vertical force of 1.55 G.

Table 5. Word segmentation results of event example text and its summary.

Original Text	Chinese	维修部中XXX飞机更换右发，装机前孔探发现低压涡轮叶片进气边根部磨损,该发动机XXXX年XX月第X次大修出厂,修后使用XXX小时XX分,总使用时间XXX小时XX分。
	English	In the maintenance department, during the replacement of the right engine for the XXX aircraft, a pre-installation borescope inspection revealed wear at the root of the inlet edge of the low-pressure turbine blade. The engine left the factory for the X overhaul in XXXX, XX month, XXX hours, and XX minutes after repair, and the total use time is XXX hours and XX minutes.
Segmentation Result		维修部中/XXX/飞机/更换/右发/，装机/前孔/探发现/低压/涡轮/叶片/进气/边根部/磨损/,该发动机/XXXX年XX月/第X次/大修/出厂/，修后/使用/XXX小时XX分,总使用/时间/XXX小时XX分。
Comments		25 segmented words in total, 5 incorrect words (in red color), 20 correct words.
summarized Text	Chinese	飞机换发装机前发现，涡扇发动机低压涡轮叶片进气边根部磨损。
	English	Before the replacement and installation of the aircraft engine, it was found that the inlet edge of the low-pressure turbine blade of the turbofan engine was worn at the root.
Segmentation Result		飞机/换发/装机前/发现/,涡扇/发动机/低压/涡轮/叶片/进气边/根部/磨损。
Comments		12 segmented words in total, all correct.

Table 6. Unsafe event features of high occurrence frequency.

Unsafe Events Feature	Word Frequency	Part of Speech	Unsafe Events Feature	Word Frequency	Part of Speech
发动机 (Engine)	1398	noun	检查 (Inspect)	1167	verb
裂纹 (Crack)	656	noun	故障 (Fail)	377	verb
工作 (Work)	557	noun	试车 (Trial Run)	285	verb
叶片 (Blade)	547	noun	告警 (Warn)	183	verb
机务 (Maintenance)	383	noun	断裂 (Breakage)	155	verb
涡扇 (Turbofan)	361	noun	超过 (Exceed)	138	verb
客舱 (Cabin)	109	noun	起飞 (Take off)	133	verb
起落架 (Undercarriage)	123	noun	脱落 (Fall off)	104	verb
空中 (Midair)	122	noun	刹车 (Brake)	90	verb
导航 (Navigation)	115	noun	渗漏 (Leakage)	38	verb

Table 7. Clustering results of unsafe event features.

No.	Cluster Name	Examples of Unsafe Event Features	Number of Features	Comments
1	Scene (The location of the flight)	“空中” “地面” “跑道” midair, ground, runway	3	The relatively low number of features in this cluster suggests that unsafe events related to location may be rare or easily identifiable.
2	Personnel and their operations	“机组” “检查” crew, check	15	The significance and diversity of personnel operations in aviation safety is highlighted as indicated by the higher number of features.
3	Oil and gas	“渗漏” “油管” “油箱” leakage, tubing, fuel tank	15	The moderate number of features in this cluster indicates that the safe management of fuel systems is crucial for aviation safety.
4	Aircraft system	“发动机” “机身” “机翼” engine, airframe, wing	17	The high number of features in this cluster underscores the prevalence and importance of the main aircraft systems in safety events.
5	Aircraft subsystem	“涡轮” “轴承” “叶片” turbine, bearing, blade	23	With the highest number of features, this cluster suggests that meticulous monitoring of aircraft subsystems may be key to preventing accidents.
6	Aircraft damage state	“失效” “裂纹” failure, crack	10	The moderate number of features indicates that early identification of aircraft damage is crucial for preventing accidents.
7	Aircraft system/subsystem status	“温度” “密封” temperature, seal	14	This cluster serves as a reminder of the necessity to monitor system statuses to prevent failures and accidents.
8	Aircraft damage	“断裂” “打伤” “脱落” break, hurt, falloff	7	Although there are not many features in this cluster, the potential impact of each event can be very serious.
9	Aircraft status	“发生” “停车” “中转” occurrence, stop, transfer	17	With a higher number of features, which may be related to the state changes during aircraft operations, requiring real-time monitoring and management

In the rest of the content of this paper, English translations are used to replace the Chinese words.

Table 8. Calculated metrics.

Global Metrics	Formula	Description
$Network Density D e$	$D e = \frac{2 \times A E N}{n \times (n - 1)}$	Connects actual edges to possible maximum, indicating closeness of connections between nodes.
$Network Transitivity T r$	$T r = \frac{1}{n} \sum_{i = 1}^{n} \frac{2 \times A E N_{i}}{k \times (k - 1)}$	High transitivity means that nodes in the network tend to form tight clusters.
Note	$n is the total number of nodes in the network, A E N$ $/ A E N_{i}$ is the actual total edge number of the network/node x_i, k is the number of adjacent nodes to node x_i.
Individual Metrics	Formula	Description
$Degree Centrality D C$	$D C_{i} = \frac{A E N_{i}}{k - 1}$	Counts direct connections to a node.
$Eigenvector Centrality E C$	$A p = λ p$ $p = ({E C}_{1}, {E C}_{2}, \dots, {E C}_{i}, {\dots, E C}_{N})$	Accounts for a node’s connections and their importance.
$Closeness Centrality C C$	$C C_{i} = \frac{n - 1}{\sum_{i \neq j} d_{i j}}$	Indicates how close a node is to all others, affecting information flow.
Note	$A is the adjacent matrix and λ$ $is the maximum eigenvalue of A, d_{i j}$ is the shortest path length between node x_i and x_j.

Table 9. Calculation results of individual metrics of 14 features.

No.	Event Feature	CC	DC	EC	No.	Event Feature	CC	DC	EC
1	Inspect	0.638	0.600	21.624	8	Turbofan	0.556	0.367	14.611
2	Engine	0.622	0.558	21.476	9	Fuel Pump	0.553	0.367	16.309
3	Landing	0.561	0.383	14.641	10	Trial Run	0.553	0.358	16.233
4	Ground	0.558	0.375	15.866	11	Switch On	0.553	0.358	15.918
5	System	0.558	0.375	15.526	12	Temperature	0.553	0.358	15.910
6	Crew	0.558	0.375	13.890	13	Pressure	0.553	0.375	15.780
7	Abnormal	0.556	0.367	15.986	14	Alarm	0.553	0.358	14.425

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, Q.; Xia, R.; Yu, J.; Liu, Q.; Tong, S.; Xu, Z. From Text to Safety: A Novel Framework for Mining Unsafe Aviation Events Using Advanced Neural Network and Feature Network. Aerospace 2024, 11, 843. https://doi.org/10.3390/aerospace11100843

AMA Style

Wang Q, Xia R, Yu J, Liu Q, Tong S, Xu Z. From Text to Safety: A Novel Framework for Mining Unsafe Aviation Events Using Advanced Neural Network and Feature Network. Aerospace. 2024; 11(10):843. https://doi.org/10.3390/aerospace11100843

Chicago/Turabian Style

Wang, Qiang, Ruicong Xia, Jiayang Yu, Qiuhan Liu, Sirong Tong, and Ziling Xu. 2024. "From Text to Safety: A Novel Framework for Mining Unsafe Aviation Events Using Advanced Neural Network and Feature Network" Aerospace 11, no. 10: 843. https://doi.org/10.3390/aerospace11100843

APA Style

Wang, Q., Xia, R., Yu, J., Liu, Q., Tong, S., & Xu, Z. (2024). From Text to Safety: A Novel Framework for Mining Unsafe Aviation Events Using Advanced Neural Network and Feature Network. Aerospace, 11(10), 843. https://doi.org/10.3390/aerospace11100843

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

From Text to Safety: A Novel Framework for Mining Unsafe Aviation Events Using Advanced Neural Network and Feature Network

Abstract

1. Introduction

2. Summary Generation by Neural Network Model

2.1. Dataset Creation

2.2. Model Deployment

2.2.1. Model Introduction

2.2.2. Local Model Deployment

2.3. Model Training and Evaluation

2.3.1. Training Parameters Setting

2.3.2. Evaluation Method

2.3.3. Experiment Results

2.4. Impact Assessment of Summarization on Text Segmentation

3. Cluster Analysis of Summarized Text

3.1. Text Feature Extraction

3.2. Hash-Mapping Text Similarity Calculation

3.3. Clustering Process Based on Simhash Algorithm

3.4. Clustering Results

4. Unsafe Feature Network Analysis

4.1. Feature Correlation Matrix

4.2. Feature Network Construction

4.3. Network Metrics Calculation

4.4. Data-Driven Risk Early Warning Strategy

5. Conclusions

6. Discussion

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI