Research Method for Ship Engine Fault Diagnosis Based on Multi-Head Graph Attention Feature Fusion

: At present, there are problems such as low fault data, insufficient labeling information, and poor fault diagnosis in the field of ship engine diagnosis. To address the above problems, this paper proposes a fault diagnosis method based on probabilistic similarity and rank-order similarity of multi-head graph attention neural networks (MPGANN) models. Firstly, the ship engine dataset is used to explore the similarity between the data using the probabilistic similarity of T_SNE and the rank order similarity of Spearman’s correlation coefficient to define the neighbor relationship between the samples, and then the appropriate weights are selected for the early fusion of the two graph structures to fuse the feature information of the two scales. Finally, the graph attention neural networks (GANN) incorporating the multi-head attention mechanism are utilized to complete the fault diagnosis. In this paper, comparative experiments such as graph construction and algorithm performance are carried out based on the simulated ship engine dataset, and the experimental re-sults show that the MPGANN outperforms the comparative methods in terms of accuracy, F1 score, and total elapsed time, with an accuracy rate of 97.58%. The experimental results show that the model proposed in this paper can still fulfill the ship engine fault diagnosis task well under unfavorable conditions such as small samples and insufficient label information, which is of practical significance in the field of intelligent ship cabins and fault diagnosis.


Introduction
As the main power source for ship navigation and power generation, ship engines occupy an important position in the field of ships.Due to its importance, if the engine operation failure cannot be found in time, it will cause huge economic losses or even casualties.Therefore, timely monitoring and diagnosis of the engine's working condition can effectively improve the reliability, economy, and safety of normal ship operation.
With the wave of the era of ship intelligence and the continuous development and improvement of artificial intelligence and algorithmic arithmetic power, more and more scholars apply artificial intelligence to the field of condition detection and fault diagnosis, and the fault diagnosis methods of ship engines also progress and develop.Fault diagnosis of ship engines faces challenges such as data scarcity, multiple fault types, data noise interference, and insufficient labeling information.In order to provide safety for normal ship navigation, Zhong et al. [1] introduced deep learning into ship diesel engine fault diagnosis and proposed a fault diagnosis method based on correlation distribution and deep confidence networks, which achieved good results.However, this type of method does not take into account the problem of imbalance in the ship fault data categories, resulting in some defects in the training and application of the model.Aiming at the above problems Hou et al. [2] first used principal component analysis (PCA) to reduce the dimensionality of the data.Then, the sample size optimization algorithm (SSO) was used to solve the problem of data imbalance, and the three-dimensional Arnold mapping to the particle swarm optimization algorithm was introduced to improve the holding vector machine (SVM), which effectively improves the generalization ability of the classification model.Finally, the model is validated by example using the fault data of the fuel supply system, and the experimental results show that the method is able to alleviate the impact of unbalanced data.Zhong et al. [3] proposed a semi-supervised principal component analysis (SSPCA) to improve the diagnostic accuracy of the model by fusing labeled and unlabeled samples as an alternative to unsupervised learning, with better robustness to false alarms.
The above model has achieved good results in fault monitoring, but it can only determine whether the fault occurs or not and cannot diagnose the specific cause of the fault, which makes it difficult to play a guiding role for the subsequent operation and maintenance decision-making work.In this regard, Ren et al. [4] established a fault tree, transformed the fault tree into a Bayesian network, constructed a Bayesian network diagnostic model of the lubrication system, and, in the example validation, the model was able to accurately diagnose the cause of the failure.Han et al. [5] introduced a convolutional neural network to solve the problem of propulsion fault detection and classification of time series.Xu et al. [6] fused an artificial neural network (ANN) model, a belief rule-based reasoning (BRB) model, and an ER rule model and used a genetic algorithm to optimize the importance weights of each model to improve the overall performance of the fusion system, and finally, through the three models, joint decision-making to realize the fault diagnosis of the ship system.In recent years, algorithmic improvements have achieved significant results in the field of industrial fault diagnosis [7][8][9][10][11].Samet [12] proposed a new kNN-based classifier (PFS-kNN) to find the k-nearest neighbors using Minkowski's metric of the image fuzzy soft matrix, which was validated on the UCI medical dataset, and the PFS-kNN outperformed the most state-of-the-art kNN-based algorithms.Yang et al. [13] proposed a diagnostic method based on fault sign simulation to solve the problem of a too-small sample size of faults in the fault diagnosis of a particular engine.Agrawal et al. [14] proposed a data-driven model for predictive health maintenance to monitor the health status of the equipment and take action before a fault occurs.The model will alert for any major defects in the system and can be effective in avoiding work interruptions in the production process.
A graph neural network (GNN) is a neural model that captures graph dependencies through message passing between graph nodes.Zhou et al. [15] divided GNN applications into six directions: natural language processing, computational technology, natural science research, knowledge graph, combinatorial optimization, and graph generation, and made a more comprehensive and detailed summary of GNN applications.
In recent years, variants of GNNs such as Graph Convolutional Networks (GCNs) [16] and GANN [17] have achieved breakthroughs in many deep learning tasks.Kipf et al. [18] applied GCNs to fast and scalable semi-supervised classification of nodes in graphs, which demonstrated significant superiority over a large number of datasets.Wang et al. [19] proposed a multi-graph-based GCN fault diagnosis method that fuses two feature embeddings into one combined embedding by introducing a self-attention mechanism to improve classification accuracy and stability in unbalanced datasets.Li et al. [20] proposed a domain adversarial graph convolutional network (DAGCN), which incorporates three types of information-class label, domain label, and data structureinto a deep network for modeling, thus realizing unsupervised domain adaptive (UDA) fault diagnosis.Zheng et al. [21] proposed a new quantum graph convolutional neural network (QGCN) model by drawing on quantum neural networks and graph convolutional neural networks, which utilize quantum parametric circuits and the high computational power of quantum systems to accomplish the traditional task of graph classification in machine learning.In all of the above studies, GCN has achieved good results in the field of graph structure classification due to its advantage of being able to handle non-Euclidean distance data well.However, GCN's mean-aggregation operation in the metric matrix makes it insufficiently attentive to important nodes.Therefore, Veličković et al. [22] constructed GANN by adding a self-attention layer to traditional GCNs, which are able to set different weights for different nodes in the neighborhood, thus solving the drawback that GCNs cannot consider the importance of neighbors.On this basis, Yang et al. [23] proposed a full graph attention neural network (FGANN), which also considers the influence of nodes other than neighboring nodes and can handle the graph classification task well.Zhang et al. [24] noticed that in sentiment classification tasks, traditional models are not sensitive to syntactic structural information due to the complexity of syntactic analysis relationships, and the models lack the utilization of external sentiment knowledge.In this regard, a new graph attention neural network is proposed that realizes attention to multiple levels of syntax, semantics, and knowledge by introducing the mechanism of multi-head attention and acquires knowledge about sentiment by considering information such as syntax.
The relationship between numerous thermal parameters of ship engines is not just a simple linear relationship but is often very complex and agnostic.In this paper, to address this problem, graph learning theory is introduced to explore the relationship of geometric structure among data, and a MPGANN ship engine fault diagnosis model is proposed.Based on the ship engine failure dataset, the model uses both T_SNE and Spearman to construct the probabilistic graph structure and ordinal graph structure, selects appropriate weights for early fusion of the two graph structures, and finally adopts GANN to transfer and extract the combined features.The core of the model is that it can mine the similarity between the data from two scales and more comprehensively describe the relationship between the data and the graph structure to provide more useful information for the model.The main contributions of this paper can be summarized as follows: (1) Transform the ship engine dataset into two graph structures from different scales and make the two graph structures contain similarity relationships from multiple perspectives by extracting the neighbor relationships between samples so as to achieve complementation and extension of the model input information.(2) Introducing fusion weights, the two graph structures are structurally fused according to appropriate weights to obtain a fused graph structure that contains deeper information.(3) Input the obtained fusion graph structure into the GANN of fused multi-head attention for multi-channel feature extraction, and finally connect the Softmax layer to realize fault diagnosis.(4) The ship engine thermal parameter dataset is used to verify that the MPGANN proposed in this paper outperforms other classical algorithms and achieves higher accuracy.
The remaining sections of this paper are organized as follows: In Section 2, the theory related to the MPGANN model is introduced in detail.Section 3 presents experiments and comparisons of the construction of the model proposed in this paper, the setting of hyperparameters, and the performance of the algorithm through case studies.Section 4 then summarizes and outlooks the paper.

Data Topology Construction
Due to the complexity of the work of the ship engine, there is a more complex connection between the various data.In order to fully explore the deep connection between the data, this paper will use the conditional probability and the correlation coefficient between the features of the different nodes to calculate the adjacency relationship between two different scales, as a way to construct the probability graph structure and the similarity graph structure, and the fusion of the two graph structures to get the new fusion graph structure as the input of the network for fault diagnosis.

Probabilistic Graph Structure
Inspired by the T-SNE method [25], in this paper, we use the conditional probability calculated by transforming the distance between the data to represent the correlation between the data, and the correlation ( , ) serves as the degree of correlation between the data samples and , and the larger ( , ) is, the higher the likelihood that will be a neighbor of .
For a dataset = { , , … }, the conditional probability (denoted as | ) between any pair of sample points, and , is expressed by Equation ( 1): In this context, when both | and | are set to 0, represents the Gaussian variance centered on point .It is worth noting that because data density can vary, there is no universally optimal value for across all datasets.The entropy of this distribution increases as grows larger, so a binary search approach is employed to determine the appropriate value of .Perplexity (denoted as ( )) is defined as the metric for computing and typically falls within the range of [5,50].Its definition is as follows: By specifying a perplexity range as described earlier, the information entropy ( ) in the equation above is calculated using the following definition: In the low-dimensional space, the symbol | is employed to denote the conditional probability between samples and to symbolize the data correlation ( , ), as shown in Equation ( 4): We utilize the KL divergence to quantify the level of concurrence between | and | .When the KL divergence is 0, it signifies that | = | , signifying that and in the low-dimensional space precisely mirror the correct degree of correlation between data points and in the high-dimensional space.The KL divergence is defined as Equation (5): The KL divergence is minimized through the process of gradient descent: After the above formula, the degree of correlation between the two samples ( , ) is calculated, and the larger ( , ) is, the stronger the correlation between and .The probabilistic correlation ( , ) value between each node constitutes the probabilistic feature matrix p: 2.1.2.Rank-Order Graph Structure In this paper, the Spearman correlation coefficient [26] is utilized to derive the similarity graph structure among nodes.The samples in the original data are sorted in ascending order.Let ′ and ′ represent the positions of and after this sorting; these positions are referred to as the rank order of samples and .The Spearman correlation coefficient between two samples is calculated as Equation ( 8): where d is the difference between ′ and ′ and n represents the number of samples.The Spearman correlation coefficient values , between the nodes form the correlation characterization matrix :

Feature Graph Fusion
Multi-scale feature fusion is mainly used to fuse the probabilistic graph structure and rank-order graph structure constructed earlier and fully excavate the intrinsic feature connection between each graph structure.The more widely used feature fusion methods are early fusion [27], late fusion, and attention feature fusion.Attention feature fusion is able to effectively deal with multimodal data, and it has strong adaptability [28,29].Late fusion can capture the temporal correlation and interaction effects between data sources well and has better performance in time series tasks [30].However, the above two methods have the shortcomings of a complex model structure, a high demand for computational resources, and a high cost of training time when the number of samples is small [31].Therefore, for the discrete small-sample ship engine failure sample set, this paper chooses early fusion [32] as the method of fusion of the multi-graph structure, sets the feature weight and neighbor threshold of each graph to construct the fusion graph structure, and its formula is as follows: min( ) ( ) max( ) min( ) where is the fusion feature matrix; ( ) represents the feature matrix of each graph after normalization; is the original feature matrix; signifies the adjacency matrix after multi-graph fusion; and denotes the elements of the adjacency matrix in the ith row and jth column of after applying the adjacency threshold .The larger the value of the element in , the stronger the similarity between node i and node j.How to set the threshold to define the adjacency matrix is the key.In this paper, we use quartiles to determine the threshold.The calculated values are arranged from small to large and divided into four equal parts.The three values in the numerical split point position are defined as Q1, Q2, and Q3, and Q3 is set to be the critical value with .Compare with .When is greater than , it means that node i and node j are connected, and = 1.When is smaller than , it means that i-node and j-node are not connected, then = 0.

Graphical Attention Neural Network
Both GANN and GCN studied in this paper are variants of GNN.Different from the Laplacian smoothing of all directly adjacent nodes at each node of GCN [33], GANN uses the attention mechanism to calculate the edge weight coefficient.The attention mechanism should be to the operation of aggregating neighbors of GCN.According to the three elements of the attention mechanism: query, source, and attention value [34], we will determine to set query, which is the feature vector of the current central node, source to the feature vector of all neighbors, and attention value to the new feature vector of the center node after the aggregation operation.
Consider the feature vector corresponding to node in layer 1 of the graph as ℎ ℎ ∈ ( ) , where ( ) denotes the feature length of the node.After applying an aggregation operation with an added attention mechanism, the output is a new feature vector ℎ ℎ ∈ ( ) for each node, with ( ) denoting the length of the output feature vector.We refer to this aggregation operation as the Graph Attention Layer (GAL).Define the central node as , then the correlation from neighboring nodes to is as follows: The weight parameter of the feature transformation of the nodes in the layer is denoted as ∈ ( ) × ( )   .The function (•) calculates the correlation between two nodes, and in this paper, a single fully connected layer is chosen: The weight vector ∈ ( ) is used along with the activation function LeakyReLU with a negative slope = 0.2.Here, "∥" denotes a vector splicing operation.To better assign the weights, the computed correlation is normalized using Softmax: is the weight coefficient, which is calculated in the equation above to ensure that the aggregated weight coefficients of all neighboring nodes of the central node sum to one.Figure 1 illustrates the computation process.ℎ and ℎ are first parameterized by , then activated by LeakyReLU, and finally normalized by Softmax to obtain the weight parameters.Here's the complete formula for calculating the weight coefficients: The update formula for the feature vector of the node is calculated as follows based on the weight coefficients obtained from the above formula: where ℎ ( ) is the updated feature vector of node xi in layer l + 1; represents the activation function (commonly used is LeakyReLU); denotes the neighbors of node ; is the weight coefficient calculated using the previously mentioned weight coefficient formula; and is the weight matrix parameter.This formula calculates the updated feature vector for each node by aggregating information from its neighbors, with the aggregation weights determined by the attention mechanism.
To enhance the expressive capacity of the attention layer, this paper employs the multi-head attention mechanism, similar to the approach utilized by Vaswani et al. [35], to enhance the model's expressive ability.The typical method for combining the outputs involves concatenating the outputs from sets of mutually independent attention mechanisms in the equation provided above: This equation represents the concatenation operation.Each is the weight coefficient computed by the th set of attention mechanisms, and ( ) corresponds to the associated learnable parameter.This approach combines the outputs from different attention heads, each with its own set of weights and parameters, allowing the model to capture diverse patterns and relationships in the data.

Data Description
The subject of this experiment is a Wärtsilä 9L34DF marine engine.The real ship failure simulation is costly, dangerous, and destructive; therefore, the engine failure simulation performed by AVL-BOOST (R2019.2) software is used in this paper.The simulation model of the marine engine is shown in Figure 4, where SB is the system boundary; CL is the air filter; TC is the supercharger; CO is the air cooler; PL is the intake and exhaust manifold; C is the cylinder; MP is the arrangement of the detection points; and the black line is the corresponding connecting pipe.  1.As can be seen from Table 1, the error between the measured and real values of each main performance parameter under different loads is less than 4%.Especially under 100% load, the error of each main performance parameter is within 2%, and the error is within the acceptable range of requirements, so this paper chooses the model under 100% working conditions for fault simulation and fault sample collection and simulates five states, namely, injection timing advance (F1), injection timing delay (F2), supercharger efficiency decline (F3), air cooler efficiency decline (F4), and normal operation (F5).Five states are simulated, and the fault simulation scheme is shown in Table 2.After the above fault simulation scheme, in order to fully explore and analyze the relationship between internal engine parameters and faults and realize the diagnosis of engine faults, a total of 400 sets of samples of 23-dimensional detection index parameters were selected, of which 80 sets of samples for each state were used to carry out the fault diagnosis study.The dataset is as specific as Table 3, and the collected detection indexes are shown in Table 4. Exhaust manifold flow m/s / / /

Comparison of Graph Structures
In this section, by comparing and analyzing the commonly used correlation coefficients, including Pearson correlation coefficient, Kendall correlation coefficient, and Spearman correlation coefficient, as well as their constructed correlation graph structure and probability graph structure, respectively, we compare the model accuracy and loss and select the graph structure construction method that is suitable for the proposed model in this paper.The activation function selected for the model is ReLU, the learning rate is 0.01, the number of iterations is 1000, and the number of multi-attention heads is 16.The ratio of the model training set, validation set, and test set is 7:1:2, which is selected randomly.Each graph structure construction method was subjected to 20 experiments, and the average accuracy of the 20 ornaments was used as the basis for judgment.
As can be seen in Figure 5, the average accuracy of the Spearman correlation coefficient in the correlation graph structure is better than that of the Pearson correlation coefficient and Kendall correlation coefficient, and the average accuracy of the fusion graph obtained after the fusion of the graph structures is improved as a model input.This is due to the fact that different graph construction methods extract information from different aspects of the data, and the fused graph structure contains information from both aspects of the data, which can be input into the neural network with more information than a single graph structure.Especially when the probabilistic graph structure and the correlation graph structure constructed by the Spearman correlation coefficient are fused, the average accuracy of the model is greatly improved, reaching 96.36%.It can be seen that the model performance is optimal when the correlation graph structure is constructed with the Spearman correlation coefficient, which is chosen to construct the graphs in all subsequent experiments.Figure 6 shows the fused graph structure after fusion using the T_SNE probabilistic graph structure and the Spearman rank order graph structure.The edges between the nodes indicate that the nodes are neighbors to each other, as can be seen from the figure that the node relationships are only roughly delineated in a single graph structure, and after fusion of the graph structures, the neighbor relationships between the nodes have been clearly described.After constructing the neighbor relationship, the labels of some nodes are removed; nodes with color in Figure 7 indicate that the node has label information, while nodes without color indicate that the node has no label information.During the graph learning process, these unlabeled nodes do not provide labeling information, but they can aggregate the neighbor's node information to be used as auxiliary data in order to provide the information used to explore the similarity structure between the nodes.The information of the nodes is used to aggregate the neighboring information through multi-head attention, and after the graph learning, the updated new nodes are classified, and all the samples are tagged with labels.The above figure shows the classification results, and it can be seen that although the model is not given all the labeling information, it can still correctly label the unlabeled nodes.

Network Parameter Setting
In this subsection, we will delve into the critical hyperparameters of the neural network model for fault diagnosis.These include the activation function, the number of attention heads, and the model learning rate.We maintain the ratio of the training set, validation set, and test set at 7:1:2, conducting a total of 20 experiments.The results from these experiments will be averaged to determine the average accuracy rate, serving as the basis for the final assessment and guiding the selection of the most suitable network structure.

Activation Function
The activation function is a very important part of the neural network structure, which can nonlinearly transform the input data.In this paper, five typical activation functions, Sigmoid, ReLU, ELU, ReLU6, and Tanh, as well as three novel activation functions, SupEx [36], αSechSig, and αTanhSig [37], are selected for comparison, and the number of iterations is set to be 1000 and the learning rate is set to be 0.01. Figure 8 shows the total time consumed and the average accuracy in the test set for the models with different activation functions selected. of the total elapsed time and the average accuracy of the test set.In Figure 8, it can be seen that Sigmoid has the lowest accuracy when it is used as an activation function, and the total time consumed by αSechSig and ELU is significantly higher than the other activation functions, and the total time consumed by the rest of the activation functions, even though the total time consumed by SupEx and αTanhSig is less, the total time consumed is not long in the data of the small samples in the first place, so the accuracy is used as the main criterion and the total time consumed is used as the auxiliary criterion, and the model's classification accuracy for the test set is over 0.97 when Tanh and ReLU6 are used as activation functions, the classification accuracy of the model on the test set exceeds 0.97, reaching 0.9758 and 0.9727, of which Tanh not only has high accuracy but also the total time consumed is less than ReLU6.
In Figure 9, we can intuitively see that Tanh, as the activation function, has the highest average accuracy; the fluctuation range of the experimental results is also the smallest; and most of the experimental results are in the 0.975 or so range.The synthesis of the above two results can be seen in this paper's model Tanh, as the activation function is the optimal effect.

Number of Attention Heads
Multi-head attention allows the model to focus on different channels of the input at the same time, thus improving the expressiveness and performance of the model.It should be noted that while multi-head attention increases the complexity of the model, it also increases the computational overhead of training and inference, so the optimal number of heads needs to be determined experimentally to achieve the best performance of the model.The activation function selected in the experiment is Tanh, the value range of H is set to [1,50], and other settings are the same as in the previous experiment.The final comparison of the average accuracy of the test set under different numbers of heads and the experimental results is shown in the following figure.
Based on Figure 10, it is evident that the model achieves higher accuracy within the range of head counts H = 16, H = 22, and H = 30.The highest accuracy is attained at a learning rate of Lr = 0.01 with H = 16.However, it is worth noting that excessively high head counts lead to longer running times and increased computational costs.Therefore, the optimal configuration of the model is Lr = 0.01 and H = 16.

Algorithm Performance Comparison
In order to verify the validity of the proposed model in this paper, GCN, CNN, SVM, and BPNN algorithms are selected in the experiments in this subsection to compare the algorithm performance with the MPGANN model proposed in this paper.The parameters of GCN, such as the neighbor graph construction method, the number of hidden layers, the number of neurons in the hidden layer, and the dropout rate, are all the same as those of MPGANN.CNN has a total of four convolutional layers.The size of the convolution kernel is 5-3-2-2, the stride is 1-1-1-1, and each convolution layer is connected to an average pooling layer after each convolution layer.The window size is 2-2-2-2, padding is 0, and stride is 1.The kernel function of SVM is the Gaussian kernel function; set the penalty factor c = 1.0 and set the gamma to auto.In the BPNN hidden layer, the number of neurons is set to 16, the number of iterations for all algorithms is set to 1000, and the learning rate is set to 0.01.
Accuracy is an overall performance indicator; if it is used as a performance evaluation index for diagnostic algorithms, it will likely mask the specific classification of diagnostic algorithms, so in order to solve this type of problem, this paper introduces precision, recall, and the F1 score, calculated as: Figure 12 shows the average accuracy, average precision, average recall, and F1 score of each diagnostic algorithm.The performance of each diagnostic algorithm can be seen more clearly by comparing the figures.CNN, as a network model for deep learning, is widely used in the field of mechanical fault diagnosis and has achieved excellent results.However, due to the fact that there is no large amount of data for model training in this experiment, the training effect of CNN is not good, and it cannot make correct classification judgments.GCN is limited by its own information propagation rules, and even though it adopts the same topology structure for fault diagnosis, the diagnostic effect is still not good.However, the MPGANN proposed in this paper uses the multi-head attention mechanism in the aggregation operation, which makes the feature aggregation more reasonable and increases the learning efficiency of the model, and the average precision, average accuracy, average recall, and F1 score reach 0.978, 0.976, 0.975, and 0.976.

Conclusions
In this paper, a new MPGANN model is proposed, and in the experiments on the ship engine dataset, firstly, different graph structure construction methods are compared, and the fused graph structure enhances the expressive ability, especially the fusion of probabilistic and ordinal graph structures, which dramatically improves the diagnostic accuracy of the model.Secondly, the accuracy of the model under different activation functions, head counts, and learning rates is greatly different, which shows that the influence of hyperparameters on the model is also not negligible.Finally, when MPGANN is compared with other algorithms, it outperforms the various methods compared in diagnosis in terms of accuracy, precision, recall, and F1 score.Therefore, the MPGANN proposed in this paper has a certain practical significance in the field of intelligent ship fault diagnosis.In summary, the research for this paper is summarized as follows: (1) The two graph structure construction methods in the MPGANN model can effectively obtain probabilistic similarity and ordinal similarity between data samples and transform the data into probabilistic graph structure and ordinal graph structure.(2) Early fusion is employed to combine the probability map structure and rank-order map structure with the incorporation of feature weights.This integration process effectively amalgamates information from samples at various scales.(3) The multi-head attention mechanism is applied to conduct multi-channel feature screening on the fusion graph structure, extracting feature information with higher relevance to enhance the diagnostic performance of the model.(4) The model effect was validated using the ship engine fault dataset, and compared with other models in terms of accuracy, precision, recall, and F1 score, MPGANN was the most effective, with a diagnostic accuracy as high as 97.58% and an F1 score of 97.6%.
The proposed diagnostic model in this paper fuses features from different aspects of the data, mining the information of the dataset at a deeper level, but it also has some limitations, such as: the fused features contain only probabilistic similarity and rank-order similarity; the fusion weights need to be set in advance; and so on.

Figure 1 .
Figure 1.Weighting factors of the calculation process.

Figure 2
Figure 2 shows a schematic diagram of the structure of the MPGANN model proposed in this paper.Figure 3 is a schematic diagram of the algorithm flow of the MPGANN model in this paper.Firstly, the values of the detection indexes of the engine in different states are collected by sensors, and the data are normalized.Furthermore, each sample is regarded as a node, and by constructing a probabilistic graph structure and an ordinal graph structure, the two structural graphs are fused as inputs to the multi-head GANN model by the multi-graph accumulation method, so that the model can learn the sample features at multiple scales.The features of neighboring nodes are aggregated in the model and computed iteratively to maximize the classification accuracy between the extracted feature values and the labeled values.

Figure 4 .
Figure 4. Simulation model of the ship's engine.

Figure 5 .
Figure 5. Test accuracy for each graph structure and total elapsed time.

Figure 6 .
Figure 6.Fusion of probabilistic graph structures with rank-order graph structures.

Figure 8 .
Figure 8.Average accuracy and total elapsed time for different activation functions.

Figure 9 .
Figure 9. Boxplot of the accuracy of different activation functions.

Figure 10 .
Figure 10.Heat map of model accuracy at different learning rates and head counts.

Table 1 .
Comparison results between model simulation data and measured data.

Table 2 .
Engine failure simulation scheme.