A Text-Driven Aircraft Fault Diagnosis Model Based on a Word2vec and Priori-Knowledge Convolutional Neural Network

: In the process of aircraft maintenance and support, a large amount of fault description text data is recorded. However, most of the existing fault diagnosis models are based on structured data, which means they are not suitable for unstructured data such as text. Therefore, a text-driven aircraft fault diagnosis model is proposed in this paper based on Word to Vector (Word2vec) and prior-knowledge Convolutional Neural Network (CNN). The fault text ﬁrst enters Word2vec to perform text feature extraction, and the extracted text feature vectors are then input into the proposed prior-knowledge CNN to train the fault classiﬁer. The prior-knowledge CNN introduces expert fault knowledge through Cloud Similarity Measurement (CSM) to improve the performance of the fault classiﬁer. Validation experiments on ﬁve-year maintenance log data of a civil aircraft were carried out to successfully verify the effectiveness of the proposed model.


Introduction
As an extremely complex system, faults often occur on aircraft due to human error, material defects, manufacturing errors, operating environment fluctuations, etc. [1].When these aircraft faults occur, maintainers usually will first subjectively judge the fault type through experience and then decide what kind of maintenance strategy to adopt.However, the aircraft system is too complex to judge the fault type accurately based on subjective experience, especially for young and inexperienced maintainers.Therefore, scholars have always been actively exploring how to objectively judge the fault type at the data level.
Especially with the development of machine learning and sensor technology, datadriven fault diagnosis has been developing [2,3].Data-driven fault diagnosis models are increasingly proposed.Nguyen et al. [4,5] proposed a magnitude order balance method to diagnosis quadcopters actuator faults based on sensor data and developed an attitude fault-tolerant control based on a nonsingular fast terminal sliding mode and a neural network to compensate the actuator fault.Gao et al. [6] proposed a novel artificial neural network model by fusing a Deep Belief Network (DBN) and a Quantum Inspired Neural Network (QINN) and injected four fault modes to structure an aircraft fuel system fault diagnosis model based on oil pressure data.Shen et al. [7] developed a novel hybrid multimode machine learning framework by exploiting inherent embedded health information contained in Input or Output (I/O) sensor data to monitor aircraft gas turbine engine health status, which effectively improved the accuracy of fault diagnosis.
Although these data-driven aircraft fault diagnosis models have shown good effects, they are mostly based on structured data.As unstructured data cannot be directly recognized by computers, aircraft fault diagnosis driven by unstructured data represented by text and image has not been widely studied.However, in real life, most data tend to be unstructured or semi-structured [8].Especially in the life cycle of aircraft, ample maintenance and support textual data are recorded in every aircraft fault maintenance activity.These aircraft fault texts usually record the abnormal working state, the fault phenomenon, and other aircraft fault knowledge, which can be used to judge the fault type.However, with no effective processing technology, such aircraft fault description text is not utilized effectively, which results in great waste.
Based on the above problems, we establish the research objective of this paper, which is to develop an effective aircraft fault diagnosis model based on text data to make full use of aircraft fault-description text data and improve the level of aircraft fault diagnosis.To achieve the above research objective, Word2vec as a text feature extraction algorithm is used to solve the problem that the computer cannot recognize the text data directly.A novel prior-knowledge CNN is proposed to construct a classifier for improving fault diagnosis accuracy.We carried out verification experiments on the five-year maintenance log data of a civil aircraft to verify the effectiveness of the proposed text-driven aircraft fault diagnosis model.Based on the research objective and plan, the main contributions of this work are to structure a novel text-driven aircraft fault diagnosis model and propose a prior-knowledge CNN classifier, which introduces an expert fault knowledge base composed of historical fault text data judged by experts as prior knowledge.
The merits of our model are: (1) as a data-driven model, the proposed aircraft fault diagnosis model can automatically and quickly judge which failure type the failure described in the text belongs to once a failure-description text is entered from an objective point; (2) Word2vec as a more efficient method is used to do text feature extraction instead of the traditional Term Frequency & Inverse Document Frequency (TF-IDF) and Latent Dirichlet Allocation (LDA); (3) a novel prior-knowledge CNN is proposed by introducing the expert fault knowledge to improve the accuracy of fault diagnosis.
The remainder of this paper is organized as follows: 1. Section 2 presents a literature review of text feature extraction and CNNs.

2.
In Section 3, the proposed text-driven aircraft fault diagnosis model is first discussed and the three core parts of the model, including text data preprocessing, Word2vec text feature extraction, and the prior-knowledge CNN, are then explained in detail.

3.
Section 4 describes the experiment and discusses the experimental results.

Text Feature Extraction
Text feature extraction is used to transform text data into a structured format to solve the problem that computers cannot directly recognize unstructured data such as text [9].At present, the vector space models are widely used for the structured processing of text data [10].TF-IDF [11] and LDA [12] are two typical vector space models.They are also widely used in text-driven fault diagnosis models.Rodrigues et al. [13] used TF-IDF and Multilayer Perceptron (MLP) to perform aircraft interior failure pattern recognition.Wang et al. [14] used LDA and a Support Vector Machine (SVM) to develop a fault diagnosis model for railway systems.TF-IDF and LDA are easy to operate and run efficiently.However, TF-IDF generates a word vector without considering the context and easily leads to dimension explosion [15].Although LDA considers the context, as an unsupervised algorithm, there is blindness in the process of word vector generation [16].To solve the shortcomings of TF-IDF and LDA, Zhou et al. [14] proposed a fusion feature extraction model called TI-LDA, based on TF-IDF and LDA, and applied it to text-driven aircraft fault diagnosis.TI-LDA not only considers context and word order, but also avoids ambiguity.However, TI-LDA still has the problem of dimension explosion.To solve the above problems, Mikolov et al. [17] proposed the Word2vec text feature extraction algorithm.Word2vec adopts a three-layer neural network trained by inputting the context words to predict the current word or inputting the current word to predict the context words to map words into a low dimensional vector space, which means Word2vec does not cause dimension explosion while considering context and word order [18].Therefore, Word2vec is widely used in the field of fault diagnosis and has made good progress.Chang et al. [19] applied the Word2vec moving distance model to obtain a failure occurrence sequence, which effectively improves the accuracy of fault diagnosis.Bai et al. [20] used Word2vec to extract the power grid system alarm text feature, which was put into an ensemble classifier to perform power grid system fault diagnosis, and their experimental result shows the proposed model has a good identification effect.

CNN
The CNN is a well-known deep learning framework inspired by the natural visual perception mechanism of living creatures [21].Since LeCun et al. [22] published the seminal paper establishing the modern framework of the CNN in 1990, it has been used in image recognition [23], real-time object detection [24], time series prediction [25], etc.Since deep learning theories have reformed the traditional fault diagnosis in the 2010s [26,27], the CNN, as a deep learning algorithm, is also widely used in the field of fault diagnosis.Eren et al. [28] developed a generic real-time bearing fault diagnosis approach from raw time series sensor data based on a one-dimensional CNN classifier.In the study of Zhong et al. [29], a transfer learning method was investigated based on a CNN and an SVM for gas turbine fault diagnosis under a small fault sample condition.Zhao et al. [30] proposed a normalized CNN for the rolling bearing diagnosis of different fault severities and orientations under scenarios of data imbalance and variable working conditions.Although these single CNN models have achieved certain results, the prior-knowledge CNN has been shown to be more effective.Ma et al. [31] encoded expert prior knowledge into Regional Convolutional Neural Networks (R-CNN), which effectively improved the accuracy of facial action unit recognition.In Wei's work [32], an end-to-end weak scratch model is built by embedding prior knowledge into an encoder-decoder CNN to significantly improve the accuracy of the weak scratch inspection of optical components.These studies show that the prior-knowledge CNN is more effective than a single CNN.

Methodology
To make full use of aircraft fault text, a novel text-driven aircraft fault diagnosis model is proposed based on the Word2vec text feature extraction algorithm and a prior-knowledge CNN classification algorithm.The construction process of the proposed aircraft fault diagnosis model is shown in Figure 1.Firstly, text data preprocessing is carried out for the input aircraft fault text data, and this includes eliminating the repeated data, eliminating the missing data, performing word segmentation, and removing stop words.Secondly, the preprocessed text data is mapped to the word vector space by Word2vec to obtain the aircraft fault text vector data.Finally, the aircraft fault text vector data enters the prior-knowledge CNN model to train the classifier.The trained prior-knowledge CNN classifier can automatically give the corresponding fault type, on the premise of inputting an aircraft fault description text, to realize the intelligent aircraft fault diagnosis.The three parts of the text-driven aircraft fault diagnosis model, including text data preprocessing, Word2vec text feature extraction, and the prior-knowledge CNN, will be described in the following.
preprocessing, Word2vec text feature extraction, and the prior-knowledge CNN, will be described in the following.

Aircraft Fault Text
Text Preprocessing Word2vec Feature Extraction Priori-Knowledge CNN Fault Diagnosis

Text Data Preprocessing
Text data preprocessing is quite different from structured data preprocessing.Text data not only needs to perform normal preprocessing such as eliminating the repeated and missing data, but also needs to remove stop words.Stop words mainly refers to emotional particles and punctuation marks in the text, which have no contribution to the semantic expression.The existence of stop words will not only lead to a virtual high dimension of text feature vectors, but also interfere with the training of the classifier.Therefore, the stop word must be removed for text data.
In addition, for special language texts such as the Chinese text data used in this paper, word segmentation is also needed before removing the stop words.There is no clear separation mark between words in Chinese text, but a continuous string of Chinese characters.Word segmentation is the first step in Chinese text processing, which refers to the segmentation of sentences in the text into words through certain rules and methods.Common word segmentation methods mainly include dictionary-based word segmentation methods, statistics-based word segmentation methods, and rule-based statistical methods [33].At present, the application effect is better, and the most widely used process is the word segmentation method based on dictionaries such as Jieba.The Jieba word segmentation tool is based on the Trie tree structure [34] and uses dynamic programming to find the maximum probability path to obtain the word segmentation results.It uses the Hidden Markov Model (HMM) [35] and the Viterbi [36] algorithm to identify unregistered words and can improve the disambiguation and unambiguousness in a custom way.Liu et al. [37] proposed a new approach to process unknown words in financial public opinions with Jieba.Yu et al. [38] proposed to explicitly display the central words of a movie through a combination of Jieba lexicon.For the problem of log Chinese text word recognition, Jieba is currently the most effective tool.

Text Feature Extraction Based on Word2vec
Since TF-IDF easily leads to dimension explosion and LDA tends to be ambiguous, Word2vec is used in this paper to perform text feature extraction.Word2vec is a neural network probabilistic language model proposed by Mikolov et al. [17] and is mainly used to realize the transformation of text information from an unstructured form to a vectorized form [39]. Compared with the traditional high-dimensional TF-IDF word vector, the dimension of the Word2vec word vector is usually 100-300.A low word vector dimension can greatly reduce computational complexity and the risk of dimension explosion.In addition, the Word2vec word vector is calculated according to the context and word order, which fully captures the semantic information of the text.As a result, Word2Vec has been widely used and studied since its release.Based on the different ways of training word vectors, Word2vec can be divided into two models, the Skip-Gram-Continuous Model (Skip-gram) and the Continuous Bag-of-Words Model (CBOW).Skip-Gram inputs the current word to predict the surrounding words, while CBOW inputs the surrounding words to predict the current word.In comparison, the CBOW model is more effective in processing small corpora, while the Skip-Gram model is more suitable for processing large corpora.The aircraft maintenance text log used in this paper is a typical small corpus, so CBOW is more suitable for text feature extraction in our study.

Text Data Preprocessing
Text data preprocessing is quite different from structured data preprocessing.Text data not only needs to perform normal preprocessing such as eliminating the repeated and missing data, but also needs to remove stop words.Stop words mainly refers to emotional particles and punctuation marks in the text, which have no contribution to the semantic expression.The existence of stop words will not only lead to a virtual high dimension of text feature vectors, but also interfere with the training of the classifier.Therefore, the stop word must be removed for text data.
In addition, for special language texts such as the Chinese text data used in this paper, word segmentation is also needed before removing the stop words.There is no clear separation mark between words in Chinese text, but a continuous string of Chinese characters.Word segmentation is the first step in Chinese text processing, which refers to the segmentation of sentences in the text into words through certain rules and methods.Common word segmentation methods mainly include dictionary-based word segmentation methods, statistics-based word segmentation methods, and rule-based statistical methods [33].At present, the application effect is better, and the most widely used process is the word segmentation method based on dictionaries such as Jieba.The Jieba word segmentation tool is based on the Trie tree structure [34] and uses dynamic programming to find the maximum probability path to obtain the word segmentation results.It uses the Hidden Markov Model (HMM) [35] and the Viterbi [36] algorithm to identify unregistered words and can improve the disambiguation and unambiguousness in a custom way.Liu et al. [37] proposed a new approach to process unknown words in financial public opinions with Jieba.Yu et al. [38] proposed to explicitly display the central words of a movie through a combination of Jieba lexicon.For the problem of log Chinese text word recognition, Jieba is currently the most effective tool.

Text Feature Extraction Based on Word2vec
Since TF-IDF easily leads to dimension explosion and LDA tends to be ambiguous, Word2vec is used in this paper to perform text feature extraction.Word2vec is a neural network probabilistic language model proposed by Mikolov et al. [17] and is mainly used to realize the transformation of text information from an unstructured form to a vectorized form [39]. Compared with the traditional high-dimensional TF-IDF word vector, the dimension of the Word2vec word vector is usually 100-300.A low word vector dimension can greatly reduce computational complexity and the risk of dimension explosion.In addition, the Word2vec word vector is calculated according to the context and word order, which fully captures the semantic information of the text.As a result, Word2Vec has been widely used and studied since its release.Based on the different ways of training word vectors, Word2vec can be divided into two models, the Skip-Gram-Continuous Model (Skip-gram) and the Continuous Bag-of-Words Model (CBOW).Skip-Gram inputs the current word to predict the surrounding words, while CBOW inputs the surrounding words to predict the current word.In comparison, the CBOW model is more effective in processing small corpora, while the Skip-Gram model is more suitable for processing large corpora.The aircraft maintenance text log used in this paper is a typical small corpus, so CBOW is more suitable for text feature extraction in our study.
The core idea of the CBOW model is to input the set of surrounding 2c words Context(w) = {Context(w) 1 , Context(w) 2 , . . . ,Context(w) 2c } to predict the current word w. 2c means to take c words forward and c words backward with w as the center.As shown in Figure 2, CBOW is a three-layer neural network, including the input layer, projection layer, and output layer.
Aerospace 2021, 8, x 5 of 14 The core idea of the CBOW model is to input the set of surrounding 2 words ( ) = { ( ) , ( ) , … , ( ) } to predict the current word . 2 means to take words forward and words backward with as the center.As shown in Figure 2, CBOW is a three-layer neural network, including the input layer, projection layer, and output layer.Output Layer: The output layer corresponds to a binary tree, with the words appearing in the corpus as leaf nodes, and the times of each word appearing in the corpus as weight to construct a Huffman tree.In the Huffman tree, there are n leaf nodes ( = | |), corresponding to the words in dictionary , and − 1 none-leaf nodes.
For the corpus , the objective function of CBOW is usually the logarithmic likelihood function shown in Equation (1), which means the probability that the current word is when ( ) is known is maximized.
For any word in the dictionary , there must be a unique path from the root node to the node in the Huffman tree.There are | | − 1 branches on path .If each branch is regarded as a binary classification, then each classification will produce a probability.Multiplying these probabilities is the required ( | ( )).The stochastic gradient ascent algorithm is then used to maximize the objective function.Finally, the vector on the leaf nodes of the Huffman tree in the output layer is the final word vector of .

Prior-Knowledge CNN Based on Cloud Similarity Measurement (CSM)
A prior-knowledge CNN model is used to construct the classifier in this paper.Different from the traditional CNN model, the expert prior knowledge, which mainly refers to the expert fault knowledge base, is encoded into the prior-knowledge CNN model.Meanwhile, a similarity measure algorithm named Cloud Similarity Measurement (CSM) [40,41] is introduced to quantify the similarity between the text to be classified and the historical fault text in the expert fault knowledge base.Projection Layer: 2c vectors are added to the input layer to obtain X w , namely Output Layer: The output layer corresponds to a binary tree, with the words appearing in the corpus as leaf nodes, and the times of each word appearing in the corpus as weight to construct a Huffman tree.In the Huffman tree, there are n leaf nodes (n = |D|), corresponding to the words in dictionary D, and n − 1 none-leaf nodes.
For the corpus C, the objective function of CBOW is usually the logarithmic likelihood function shown in Equation (1), which means the probability that the current word is w when Context(w) is known is maximized.
For any word w in the dictionary D, there must be a unique path P w from the root node to the w node in the Huffman tree.There are |P w | − 1 branches on path P w .If each branch is regarded as a binary classification, then each classification will produce a probability.Multiplying these probabilities is the required p(w|Context(w)) .The stochastic gradient ascent algorithm is then used to maximize the objective function.Finally, the vector on the leaf nodes of the Huffman tree in the output layer is the final word vector of w.

Prior-Knowledge CNN Based on Cloud Similarity Measurement (CSM)
A prior-knowledge CNN model is used to construct the classifier in this paper.Different from the traditional CNN model, the expert prior knowledge, which mainly refers to the expert fault knowledge base, is encoded into the prior-knowledge CNN model.Meanwhile, a similarity measure algorithm named Cloud Similarity Measurement (CSM) [40,41] is introduced to quantify the similarity between the text to be classified and the historical fault text in the expert fault knowledge base.

CNN Algorithm
This paper uses maintenance log data with tags.The supervised learning algorithm is more suitable for the application scenarios and data characteristics of this paper.Common supervised learning algorithms include the Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), Long-Short Term Memory (LSTM), and the Support Vector Machine (SVM).The CNN refers to those neural networks that use convolution operations in at least one layer of the network to replace general matrix multiplication operations.Its goal is to learn local neighborhood matching through nonlinear mapping to achieve data dimensionality reduction.In this way, the number of parameters to be learned will be greatly reduced due to the shared convolutional layer filter weight.The CNN is more suitable for the high-dimensional characteristics of unstructured data.As a deep learning algorithm, the CNN has been successfully applied in fields such as natural language processing, image processing, and video processing.Jin et al. [42] used a deep convolutional neural network to solve inverse problems in imaging.Acharya et al. [43] proposed an algorithm for the automated detection and diagnosis of seizure using Electroencephalogram (EEG)signals with a convolutional neural network.Poria [44] presented the first deep learning approach to aspect extraction in opinion mining with a CNN.
The CNN is a feed-forward neural network, which is mainly based on three basic concepts: a local receptive field, weight sharing, and pooling.The local receptive field reduces the weight parameters that need to be trained by mapping each neuron to a local feature.Weight sharing ensures that all neurons in the same convolution kernel have the same weight, thereby greatly reducing the number of training parameters in the network.Pooling can reduce the scale of features and ensure the invariance of features.Therefore, a CNN can guarantee the robustness of input features in displacement, tilt, scaling, or other deformations.
A CNN consists of input layer, convolutional layer, pooling layer, fully connected layer, and output layer.From the point of view of data processing, the overall structure of a CNN can be divided into two parts: one is responsible for feature extraction, including the input layer, the convolutional layer, and the pooling layer; the other is responsible for data classification, including the fully connected layer and the output layer.The convolutional layer and the pooling layer are feature extractors in CNN.They will extract potential features from the original data, and the fully connected layer is the CNN classifier, which uses the features obtained from the last pooling layer as input for classification.The CNN structure is shown in Figure 3.

CNN Algorithm
This paper uses maintenance log data with tags.The supervised learning algorithm is more suitable for the application scenarios and data characteristics of this paper.Common supervised learning algorithms include the Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), Long-Short Term Memory (LSTM), and the Support Vector Machine (SVM).The CNN refers to those neural networks that use convolution operations in at least one layer of the network to replace general matrix multiplication operations.Its goal is to learn local neighborhood matching through nonlinear mapping to achieve data dimensionality reduction.In this way, the number of parameters to be learned will be greatly reduced due to the shared convolutional layer filter weight.The CNN is more suitable for the high-dimensional characteristics of unstructured data.As a deep learning algorithm, the CNN has been successfully applied in fields such as natural language processing, image processing, and video processing.Jin et al. [42] used a deep convolutional neural network to solve inverse problems in imaging.Acharya et al. [43] proposed an algorithm for the automated detection and diagnosis of seizure using Electroencephalogram (EEG)signals with a convolutional neural network.Poria [44] presented the first deep learning approach to aspect extraction in opinion mining with a CNN.
The CNN is a feed-forward neural network, which is mainly based on three basic concepts: a local receptive field, weight sharing, and pooling.The local receptive field reduces the weight parameters that need to be trained by mapping each neuron to a local feature.Weight sharing ensures that all neurons in the same convolution kernel have the same weight, thereby greatly reducing the number of training parameters in the network.Pooling can reduce the scale of features and ensure the invariance of features.Therefore, a CNN can guarantee the robustness of input features in displacement, tilt, scaling, or other deformations.
A CNN consists of input layer, convolutional layer, pooling layer, fully connected layer, and output layer.From the point of view of data processing, the overall structure of a CNN can be divided into two parts: one is responsible for feature extraction, including the input layer, the convolutional layer, and the pooling layer; the other is responsible for data classification, including the fully connected layer and the output layer.The convolutional layer and the pooling layer are feature extractors in CNN.They will extract potential features from the original data, and the fully connected layer is the CNN classifier, which uses the features obtained from the last pooling layer as input for classification.The CNN structure is shown in Figure 3. Generally speaking, multiple convolutional layers can be included in the CNN structure.These convolutional layers perform local feature detection on the data of the previous layer (not necessarily the input layer) and store the detection results as a feature map.A convolutional layer usually has multiple different convolution functions (i.e., convolution kernels) to try to find different potential features in the input data.Generally speaking, multiple convolutional layers can be included in the CNN structure.These convolutional layers perform local feature detection on the data of the previous layer (not necessarily the input layer) and store the detection results as a feature map.A convolutional layer usually has multiple different convolution functions (i.e., convolution kernels) to try to find different potential features in the input data.
Assuming that the input data of the convolution layer is a two-dimensional matrix, the output result of the convolution kernel can be obtained by Equation ( 2): where y ij is the value of an output point in the feature map; H and W are the vertical and horizontal dimensions of the input data; F represents the length and width of the convolution kernel; S represents the step size of the convolution kernel to move once; x (r+i×S)(c+j×S) represents the value of the input data at the coordinate (r + i × S)(c + j × S); b and w rc represent the offset and the weight at coordinates (r, c), respectively; σ represents any nonlinear activation function used for feature extraction.The pooling layer is usually located behind the convolutional layer.The pooling layer takes the output of the convolutional layer as its input and reduces the dimensionality of the feature data by performing regional aggregation on the feature map output by the convolutional layer.Maximum pooling is simply to select the maximum value in the area through a filter of size.Mean pooling is to calculate the average of all feature values in the current area.
x(i, j) P ) 1/P (3) where P represents the pooling parameter, P = 1 represents the mean pooling, and P = ∞ represents the maximum pooling.In fact, the pooling layer extracts the feature data twice, which reduces the complexity of the model while still retaining a large amount of original information.
The fully connected layer is similar to the multilayer perceptron, and the neurons between adjacent layers are interconnected in pairs.The fully connected layer integrates the local feature information extracted by the convolutional layer and the pooling layer and then generates classification features that can be processed by the output layer.
where f (•) represents the activation function of the fully connected layer; y l represents the output value of the l th layer; y l−1 represents both the input of l th layer and the output value of (l − 1) th layer; w l and b l , respectively, represent the weight and offset of the l th layer.

Text Similarity Measurement Based on CSM
This paper introduces the CSM to quantify the degree of similarity between the text to be classified and the historical fault text.The CSM algorithm comes from the cloud model and is used to describe the differences between different clouds.In data mining, the CSM algorithm can overcome the shortcomings of Euclidean distance, Dynamic Time Warping (DTW) distance, and classical method mode distance in the similarity measurement of two time series, so as to achieve better measurement accuracy.CSM is composed of a reverse cloud generation algorithm including two parts: the cloud characteristic vector and the angle cosine.
For the input fault text description data where N and M are the data lengths of A j and B k , the calculation process of the CSM algorithm is as follows: (1) Calculate the expected value of A j : First-order center distance: .
Sample variance: (2) Calculate the expected value E A of the cloud model: (3) Calculate the characteristic entropy of A j : (4) Calculate the super entropy of A j : where E A , E n , and H e are used to describe the overall characteristics of A j .The cloud vector of A j is then Similarly, the cloud vector of another data set B k is The cosine value of the cosine angle between two cloud vectors is expressed as the similarity of the two sequences: It can be seen in Equation (11) that sim jj = sim kk = 1; that is, the similarity between the cloud vector and itself is 1.At the same time, sim jk = sim kj ; that is, the similarity satisfies the symmetry.

Construction of Prior-Knowledge CNN
Based on the CNN and CSM, this paper proposes a prior-knowledge CNN model to construct the classifier.Its core principle is to use expert prior knowledge to modify the prediction results of the CNN.The principle to judge whether a text is modified is whether the prediction accuracy of the CNN is lower than the maximum CSM similarity between the text and the expert knowledge base.Therefore, the realization of the prior-knowledge CNN generally includes three parts: training the CNN classifier, calculating the CSM text similarity, and fixing the prediction results.The specific structure of the prior-knowledge CNN model is shown in Figure 4, which mainly includes the following steps: classifier to obtain the initial predictions fault type FC i .Acc and FC i make up the tuple (Acc, FC i ).(4) Fourthly, the similarity between the fault text vector in the expert fault knowledge base E and fault text vector i to be classified is calculated to obtain the similarity set S i = {Sim 1i , Sim 2i , . . . ,Sim mi }(m = |E|).The maximum value of set S i is taken to obtain Sim ji = Max(S i ) (j ∈ [1, m]).Sim ji and FS j make up the tuple (Sim ji , FS j ).
(5) Fifthly, the operation shown in Equation ( 12) is performed on (Acc, FC i ) and (Sim ji , FS j ) to obtain the final fault type F i corresponding to the fault text vector i.
( , ) to obtain the final fault type corresponding to the fault text vector .
Finally, Steps 3), 4), and 5) are performed for each text in the test set to complete the correction of the initial CNN classifier. Type

Experiments and Result Analysis
To verify the effectiveness of the proposed aircraft fault diagnosis model, verification experiments were carried out on a real aircraft fault text data set, which is comprised of five-year maintenance log data of Chinese text from a civil aircraft.After data cleaning, more than 50,000 aircraft fault texts were obtained, some of which are shown in Table 1.The second column in the table records the contents of the aircraft fault description text, and the third column records the fault type corresponding to the aircraft fault description text.For the aircraft fault text data set used in this study, a total of 10 fault types are involved.To facilitate the follow-up processing, we coded the 10 fault types as follows: sensor fault (0), circuit fault (1), equipment ablation (2), resistance fault (3), mechanical fault (4), equipment aging (5), lamp fault (6), indicator fault (7), computer fault (8), and switch fault (9).According to the proposed aircraft fault diagnosis model construction process, the validation experiment mainly includes text data preprocessing, Word2vec text feature extraction, and construction of the prior-knowledge CNN classifier.As the five-year maintenance log data is comprised of Chinese text, word segmentation needs to be performed, and stop words need to be removed before further processing.Therefore, we first used Jieba to segment the Chinese fault description text and then removed the stop words in the fault description text.The text data obtained after the above preprocessing operation is shown in Table 2. Compared with the original data in Table 1,

Experiments and Result Analysis
To verify the effectiveness of the proposed aircraft fault diagnosis model, verification experiments were carried out on a real aircraft fault text data set, which is comprised of five-year maintenance log data of Chinese text from a civil aircraft.After data cleaning, more than 50,000 aircraft fault texts were obtained, some of which are shown in Table 1.The second column in the table records the contents of the aircraft fault description text, and the third column records the fault type corresponding to the aircraft fault description text.For the aircraft fault text data set used in this study, a total of 10 fault types are involved.To facilitate the follow-up processing, we coded the 10 fault types as follows: sensor fault (0), circuit fault (1), equipment ablation (2), resistance fault (3), mechanical fault (4), equipment aging (5), lamp fault (6), indicator fault (7), computer fault (8), and switch fault (9).According to the proposed aircraft fault diagnosis model construction process, the validation experiment mainly includes text data preprocessing, Word2vec text feature extraction, and construction of the prior-knowledge CNN classifier.

Text data preprocessing
As the five-year maintenance log data is comprised of Chinese text, word segmentation needs to be performed, and stop words need to be removed before further processing.Therefore, we first used Jieba to segment the Chinese fault description text and then removed the stop words in the fault description text.The text data obtained after the above preprocessing operation is shown in Table 2. Compared with the original data in Table 1, the stop words in the Chinese fault description text have been removed, and separators have been added between words.

Word2vec text feature extraction
Since the computer cannot directly process the text, it is necessary to perform text feature extraction to transform the text data into a structured format after text preprocessing.Word2vec is used to extract the text features, and the results are shown in Table 3.It can be seen that the aircraft fault text is mapped to a 100-dimensional vector space.

Constructing the prior-knowledge CNN classifier
As mentioned above, the construction of the prior-knowledge CNN mainly includes three parts: training the CNN classifier, calculating the CSM text similarity, and fixing the prediction results.Therefore, we first put the text vector data extracted by Word2vec into the CNN for training and tested the trained CNN with the test set to obtain Acc = 0.9623.The similarity between the fault text vectors in the test set and the expert fault knowledge base by CSM was then calculated, and the similarity (0-1) value is shown in Table 4. Finally, the predicted values of the test set were fixed by comparing the size relationship between the CNN classification accuracy Acc and the maximum similarity Sim ji .Taking the No.2 text in the test set as an example, the CSM similarity values between the No.2 text and the 10 fault types in the expert knowledge base are 0.8154, 0.6126, 0.2278, 0.7386, 0.6260, 0.8790, 0.9900, 0.4981, 0.5860, and 0.6609.The maximum is 0.9900.As 0.9900 is greater than 0.9625, the fault type of the No.2 text is corrected to lamp fault (6).The above operations were performed on each text in the test set to complete the training of the prior-knowledge CNN classifier.and E, it can also be seen that Word2vec can indeed improve the performance of the classifier compared with TF-IDF and LDA.It can also be seen that the proposed prior-knowledge CNN is better than MLP, SVM, and CNN on Acc, F 1 , and AUC by comparing the experimental results of Groups E, F, G, and H.
To study the effect of the expert fault knowledge base for different types of fault diagnosis, this study compares the confusion matrix and ROC curve of the initial CNN classifier and the prior-knowledge CNN classifier under different fault types, as shown in Figure 5.As shown in the figure, the diagnosis accuracy of the prior-knowledge CNN classifier is higher than that of the initial CNN classifier for each fault type, except for mechanical fault (4).Among them, the prior-knowledge CNN improves the accuracy of switch fault (9) diagnosis the most.Therefore, the switch fault (9) knowledge of the expert fault knowledge base is relatively complete, while the mechanical fault (4) knowledge needs to be supplemented.This means that a high-quality expert fault knowledge base is the key to further improving the performance of the proposed aircraft fault diagnosis model based on Word2vec and the prior-knowledge CNN.

Conclusions
The Compared with similar work [13,14], we innovated in the following aspects: (1) A new text-driven aircraft fault diagnosis framework based on Word2vec and the prior-knowledge CNN is proposed in this paper, and it has a higher fault diagnosis accuracy compared with the previous text-driven aircraft fault frameworks.(2) To further improve the accuracy of fault diagnosis, a more efficient Word2vec method, instead of the traditional TF-IDF and LDA methods, is used to extract text features.(3) A novel prior-knowledge CNN is proposed in this paper by fusing a CNN and CSM, which improves the performance of the CNN classifier and is much better than the traditional MLP and SVM classifiers.( 4) The text-driven aircraft fault diagnosis model developed in this paper can process not only English text but also Chinese text.
In summary, the text-driven fault diagnosis model based on Word2vec and the priorknowledge CNN proposed in this paper can exactly judge the fault type according to the aircraft fault description text to realize the full mining and application of maintenance log data and provide support for aircraft maintenance.In the future, we can fuse the structured data and the unstructured data for fault diagnosis, so that we can easily find the cause of the fault at the data level and explain the specific mechanism of the fault at the mechanism level.

Figure 1 .
Figure 1.The Proposed Aircraft Fault Diagnosis Model Structure.

Figure 1 .
Figure 1.The Proposed Aircraft Fault Diagnosis Model Structure.

( 1 )
Firstly, the text data set D is divided into training set D s and test set D T according to a certain proportion.(2) Second, the training set D s enters the CNN to train the initial CNN classifier, and the test set D T enters the initial CNN classifier to test the classification accuracy Acc of the initial CNN classifier.(3) Thirdly, for any fault text vector i in the test set D T , it is put into the initial CNN

Figure 4 .
Figure 4.The prior-knowledge CNN model structure.

Figure 4 .
Figure 4.The prior-knowledge CNN model structure.Finally, Steps (3), (4), and (5) are performed for each text in the test set to complete the correction of the initial CNN classifier.

Figure 5 .
Figure 5. Diagnosis effect comparison of CNN and prior-knowledge CNN for different fault types.

Figure 5 .
Figure 5. Diagnosis effect comparison of CNN and prior-knowledge CNN for different fault types.
lack of effective technical means leads to the substantial waste of aircraft fault description text.Therefore, a text-driven fault diagnosis model was developed in this study based on Word2vec, a CNN, and CSM.Word2vec is used to perform text feature extraction, while the CNN and CSM are used to build the prior-knowledge CNN classifier.The main contribution of the proposed prior-knowledge CNN is that it is encoded into the expert fault knowledge by CSM similarity between the text to be classified and the historical fault text in the expert fault knowledge base to improve the accuracy of aircraft fault diagnosis.According to the experimental results on five-year maintenance log data comprised of Chinese text from a civil aircraft, we can draw the following conclusions: (1) The proposed aircraft fault diagnosis model based on Word2vec and the priorknowledge CNN reached 0.9742, 0.9740, and 0.9844 in Acc, F 1 , and AUC, respectively.The accuracy is more than 97%, so the fault type can be accurately judged according to the fault description text by this model.(2) For this study, Word2vec is a more effective text feature extraction method compared with TF-IDF and LDA and it can improve the performance of the classifier.(3) The CNN classifier is better than the MLP classifier and the SVM classifier for the performance indicators of Acc, F 1 , and AUC.Introducing expert fault knowledge to the CNN by CSM can further improve the accuracy of fault diagnosis.(4) A high-quality expert fault knowledge base is the key to further improving the performance of the prior-knowledge CNN classifier.

Table 1 .
Examples of aircraft fault text.

Table 1 .
Examples of aircraft fault text.

Table 2 .
Examples of aircraft fault text after preprocessing.

Table 3 .
Aircraft fault text feature vector extracted by Word to Vector (Word2vec).

Table 4 .
[14]similarity between test set and expert fault knowledge base.To verify the superiority of the proposed aircraft fault diagnosis model, our aircraft fault diagnosis model based on Word2vec and the prior-knowledge CNN was compared with Rodrigues's[13]aircraft fault diagnosis model based on TF-IDF and MLP and with Wang's[14]aircraft fault diagnosis model based on LDA and SVM.Seven control groups and an experimental group were designed.Common classification indicators including Accuracy (Acc), F 1 Score (F 1 ), and Area Under Curve (AUC) were used to evaluate the performance of these classifiers.The results are shown in Table5.We can see clearly that all the classification indicators of the proposed aircraft fault diagnosis model based on Word2vec and the prior-knowledge CNN are very high and better than the other five models, which proves the superiority of the aircraft fault diagnosis model proposed in this paper.By comparing the experimental results of Groups C, D,

Table 5 .
Comparison table of the classifier evaluation results.