Cellular Network Fault Diagnosis Method Based on a Graph Convolutional Neural Network

The efficient and accurate diagnosis of faults in cellular networks is crucial for ensuring smooth and uninterrupted communication services. In this paper, we propose an improved 4G/5G network fault diagnosis with a few effective labeled samples. Our solution is a heterogeneous wireless network fault diagnosis algorithm based on Graph Convolutional Neural Network (GCN). First, the common failure types of 4G/5G networks are analyzed, and then the graph structure is constructed with the data in the network parameter, given data sets as nodes and similarities as edges. GCN is used to extract features from the graph data, complete the classification task for nodes, and finally predict the fault types of cells. A large number of experiments are carried out based on the real data set, which is achieved by driving tests. The results show that, compared with a variety of traditional algorithms, the proposed method can effectively improve the performance of network fault diagnosis with a small number of labeled samples.


Introduction
The growth of mobile data traffic and the increase in service types have led to challenges in mobile communication network capacity. Heterogeneous cellular networks have been widely explored in the literature as a means to address the bandwidth problem and improve system capacity.
However, the expansion and complexity of heterogeneous networks have made the management and maintenance of cellular networks increasingly complex. Traditional network fault detection and diagnosis methods are primarily performed manually, making it difficult to establish the mapping relationship between network symptoms and fault categories. Additionally, these methods require significant human and material resources. Therefore, there is an urgent need to establish a fast and accurate network fault diagnosis model. In response to this need, researchers have conducted in-depth research on network fault diagnosis.
In [1], the authors proposed Adaptive Root Cause Analysis (ARCA), an automatic fault detection and diagnosis solution. This method utilizes Bayesian networks combined with expert domain knowledge to perform efficient and automatic network root cause analysis based on acquired network measurement data.
The paper in [2] introduced a Bayesian network-based network fault diagnosis method for universal mobile communication systems. They constructed an automatic diagnostic model using a naive Bayesian classifier and established a relationship between network failures and root causes.
In [3], the authors analyzed the time evolution of multiple network metrics from the perspective of time dependencies between these metrics. They explored the interdependence between network metrics of the primary cell and neighboring cells in the presence of network failures. By comparing these metrics with stored historical failures, they determined the root cause of the failure.
A comprehensive detection and diagnosis framework was proposed in [4]. The detection phase relied on radio measurements and other performance indicators, comparing them with the normal behavior obtained in the profile. The diagnosis phase utilized past fault cases to identify and understand their impact on different performance metrics.
In recent years, intelligent fault diagnosis has shifted from traditional model-based methods to data-driven approaches. Gomez-Andrades et al. [5] introduced an automatic diagnosis system for LTE (Long-Term Evolution) networks based on unsupervised learning.
Their system employed Self Organizing Maps (SOMs) and the sum of squares of deviations to ensure solution quality through an iterative process. The effectiveness of this diagnosis system was verified using real and simulated LTE data.
The focus in [6] was on fault diagnosis at the air interface. The fault diagnosis model collected user measurement data related to the radio frequency (RF) interface. It then used SOMs to analyze the specific measurements of each user and determine their RF status. The results were aggregated into cell-level indicators, allowing an analysis of the air quality of the entire cell.
In [7], the authors treated the performance tuning process of cellular networks as a reinforcement learning problem and proposed a method to improve the performance of indoor and outdoor environments. To achieve effective diagnostic performance, the authors in [8] combined SoftMax neural networks and support vector machines (SVM) in supervised learning, enabling the algorithm to be implemented effectively with off-the-shelf tools.
Zhang et al. [9] proposed the use of an improved back-propagation (BP) neural network for fault diagnosis in local area networks (LAN). Experimental results confirmed the effectiveness of the improved BP neural network in LAN fault diagnosis.
In [10], a method was proposed to identify the causes of changes in performance indicators by analyzing their correlation with multiple indicators. The diagnosis involved the use of classification/regression trees to predict the membership of event counters in one or more categories of interest for performance indicators.
The authors in [11] introduced a fault-diagnosis model that combines graph neural networks and one-shot learning to address the challenge of fault classification with limited data. The method utilizes a short-time Fourier transform to convert vibration signals into two-dimensional data, enabling feature extraction and small-sample learning.
To tackle the problem of bearing fault detection, a method called Graph Neural Network-Based Bearing Fault Detection (GNNBFD) was proposed in [12]. The method constructs a graph based on sample similarity, utilizes a graph neural network (GNN) to map the features of the samples, and incorporates information from neighboring samples. The mapped samples are then fed into a base detector for fault detection, with the top n samples exhibiting the highest outlier scores being identified as faulty.
While most existing fault diagnosis solutions utilize supervised learning algorithms in machine learning, network fault diagnosis models based on supervised learning have certain shortcomings. On the one hand, during the training process of the diagnostic model, a large amount of unlabeled sample information remains underutilized and goes to waste. On the other hand, although these methods can learn the key performance indicators (KPI) of the network and the network state, they do not consider the similarity association between unlabeled samples and labeled samples, thereby limiting the performance improvement of these models.
Furthermore, obtaining labeled data in actual networks is often time-consuming, and the cost of manual category labeling is prohibitively high, resulting in a scarcity of labeled datasets for existing networks. Manual category labeling refers to the process of assigning specific fault labels to the collected data. This manual effort can involve significant human resources and expertise, which can be expensive and may not be feasible for all organizations or researchers. As a result, this scarcity hampers the development and training of machine learning models that require labeled data to learn and make accurate predictions. Overcoming these obstacles is crucial to improving the effectiveness and efficiency of fault diagnosis systems in networks.
Although we used GCN in our paper, there are other techniques that could have been employed for the network fault diagnosis. The following are some suggestions:

1.
Bayesian Networks: Bayesian networks are probabilistic graphical models that represent variables and their dependencies using directed acyclic graphs. They can be used to model network faults and their relationships, allowing for diagnosis and prediction of faults based on observed data. However, GCN was considered in this paper for the reason that it can leverage parallel processing and optimization techniques, making them scalable and efficient for large-scale network analysis. They can handle millions of nodes and edges efficiently, which is crucial for real-world network fault diagnosis. Bayesian networks, especially when dealing with complex and large networks, can be computationally expensive and may face challenges in scalability.

2.
Support Vector Machines (SVM): SVMs are supervised learning models that can classify data into different categories. They can be trained on labeled network data to detect and diagnose faults based on patterns and features extracted from the network. We chose GCNs because they have the ability to learn end-to-end representations of the network data, automatically extracting relevant features directly from the graph structure. This can be advantageous for fault diagnosis as it avoids the need for manual feature engineering. SVMs, on the other hand, typically rely on handcrafted features to perform classification or anomaly detection, which may require domain expertise and time-consuming feature engineering.
As a branch of deep learning, the Graph Convolutional Neural Network (GCN) leverages graph convolutional layers with strong learning abilities to aggregate information between nodes and their neighboring nodes. This allows the network to learn new feature representations of nodes and form a neural network model by stacking graph convolutional layers, capturing the complex nonlinear relationship between key performance indicator parameters and fault types.
Moreover, when training the GCN model, both labeled and unlabeled data are used, making GCN an unsupervised learning method that reduces the reliance on labeled data.
Simultaneously, GCN utilizes the adjacency matrix of the graph to represent the similarity between sample data, enhancing training speed and improving the model's performance [13].
Based on the above-stated challenges, we sought to come up with a solution that would perform better than the work conducted before.
In this paper, we present a network fault diagnosis method based on Graph Convolutional Networks (GCN). The contribution of our work can be summarized as follows: • Construction of Graph Data: Our method constructs graph data by leveraging the similarity between nodes, capturing the intricate relationships within the network. • Utilization of GCN: The constructed graph data is then fed into GCN, enabling effective network fault diagnosis by leveraging the power of graph-based learning. • High Diagnostic Accuracy: Experimental simulations on the drive test dataset demonstrate the effectiveness of our proposed algorithm. Even with a reduced number of training samples, our method achieves high diagnostic accuracy, outperforming traditional algorithms.

System Parameters
The 5G network scenario considered in this paper is depicted in Figure 1. Within the same cell, there is an overlap between the macro base station and the micro base station. We assume that the network status can be categorized into six types, which include the normal situation, uplink interference, downlink interference, space coverage, air interface failure, and base station failure. Air interface failure results in the failure of data packet transmission, disrupting the normal communication process and causing data congestion within the network.
transmission, disrupting the normal communication process and causing data cong within the network.
In most cases, base station faults are manifested by the incorrect configuration evant parameters. For instance, when the antenna down tilt angle parameter is exces configured, the coverage area becomes smaller than the ideal coverage area, resul fewer users being served by the cell. Another example is the improper configurat handover parameters, leading to mobility issues in a cell where successful handove not be achieved. In this paper, the key performance indicators we selected are shown in Table 1, includes 16 parameter indicators.  In most cases, base station faults are manifested by the incorrect configuration of relevant parameters. For instance, when the antenna down tilt angle parameter is excessively configured, the coverage area becomes smaller than the ideal coverage area, resulting in fewer users being served by the cell. Another example is the improper configuration of handover parameters, leading to mobility issues in a cell where successful handover cannot be achieved.
In this paper, the key performance indicators we selected are shown in Table 1, which includes 16 parameter indicators.

Feature Parameter Selection
The characteristic parameters or attributes used to train the model are referred to as "key performance index" (KPI) parameters. When dealing with a data sample containing numerous feature parameters, it becomes unfavorable for constructing the graph's adjacency matrix, leading to an unrealistic matrix. Additionally, during the fault diagnosis stage, a large number of feature parameters can reduce the model training speed. Therefore, it is essential to obtain the optimal combination of network feature parameters based on their importance, aiming to reduce the dimensionality of input features.
To solve the optimal combination of characteristic parameters, we utilized the extreme gradient boosting algorithm (XGBoost). In each training iteration, XGBoost constructs a new model by descending the gradient of the loss function from the previously built decision tree model. This process gradually reduces the residual between the predicted value and the true value of the previous model. When constructing m decision trees, the objective function of XGBoost is represented by Equation (1): i ) is the loss function, which is used to indicate the difference between the predicted network fault labelŷ (m) i and the real network state label y i . Ω( f m ) is the regularization term, which is used to limit the number of sub-nodes in the decision tree and prevent overfitting in the algorithm process.
where f m represents all m decision trees, T is the number of leaf nodes on the current decision tree, λ is the regularization parameter, γ is the learning rate, and ω j represents the weight size of the j th leaf node on the decision tree.
where, g i and h i indicate that in the previous m-1 round model, the first-order derivative and the second-order derivative of the loss function of the i th sample were obtained, respectively.
The traversal of the samples in the objective function Obj (m) can be converted into the traversal of the leaf nodes on the decision tree, so the objective function Obj (m) can be rewritten as follows: where: where; I j is the set of samples on each leaf node. Assuming that the structure of the model has been determined, the weight ω j on each leaf node of the decision tree can be obtained by derivation of Equation (7) and then equated to zero, that is, Substitute the calculated weight value into the original objective function to obtain the minimum value of the objective function. Obj* is the final objective function; the smaller the value, the closer the prediction result is to the actual result, as shown in Equation (11).
Based on this, we used the XGBoost model in the case of constructing multiple decision tree models. According to the classification results of the samples, the importance scores of each feature parameter can be obtained, and these feature parameters are sorted in descending order of importance scores while looking for the number of characteristic parameters in the optimal combination and finally screening the characteristic parameters from the sorted characteristic parameters. Assuming that the number of feature parameters in the original data set is d, after feature screening, the number of feature parameters is d 0 And in order not to let the higher values of these feature parameters dominate the entire training process, before using these feature parameters, it is necessary to map their values to the interval [0, 1] by a normalization method.
In this paper, the maximum value normalization is performed for each feature parameter, that is, where; x i and x i represent the i th feature before and after normalization, respectively.
Max(x i ) represents the maximum value that occurs in the i th feature attribute.

Generation of Graph Data
The method we proposed needs to map the dimension reduction network fault dataset { (x 1 , y 1 ), . . . , (x l , y l ), (x l+1 , 0), . . . , (x n , 0)} to an undirected graph G = (V, E). The graph consists of two types of elements, namely node set V and edge set E.
In the data set, the class label y i ∈ L, of the labeled sample, and the label set L = {1, 2, . . . , c} contains the class c network fault class label (c = 6).
The first l pieces of data x i (1 ≤ i ≤ l) in the dataset are labeled data with class labels, and the remaining data x u (l + 1 ≤ u ≤ n) are unlabeled data, let their class labels be y u = 0. First, we construct a feature matrix X ∈ R n×d 0 from the dataset.
The feature matrix X is formed by stacking and combining the feature parameter vectors of each sample node. Each row vector X i ∈ R d 0 in the matrix represents the feature parameter vector of a piece of data in the data set after removing the category label.
n is the number of sample nodes (the total number of samples in the network fault parameter dataset).
In order to facilitate the calculation of the cross-entropy loss function for the backward weight update of the GCN, we need a label matrix to represent the label category of the samples in the dataset. Specifically, the class labels of the labeled data in the dataset are first encoded according to the encoding style of Table 2, the class labels of the unlabeled data are represented as zero vectors, and then the label vectors of all data are stacked into a label matrix Y ∈ R n×c .  Again, an adjacency matrix A ∈ R n×n is needed to represent the edge relationship between nodes. First, a metric s i,j (0 ≤ s i,j ≤ 1) based on Euclidean distance is introduced to represent the similarity measure between any two sample nodes.
where δ = 1 is called the Gaussian bandwidth parameter. When x i and x j are more similar to each other, s i,j is larger; otherwise, s i,j is smaller. Then, initialize and set a threshold α, if the similarity measure s i,j between nodes is greater than the threshold α, then the elements in the adjacency matrix Finally, the adjacency matrix A of the required graph data is obtained, and the adjacency matrix is a symmetric matrix.
As for how to reasonably select the value of α, specifically, first initialize the threshold α to 0.95 and then decrease it at intervals of 0.05 (for example, 0.95, 0.90, 0.85, 0.80. . .), according to the accurate diagnosis of the model rate, to evaluate the rationality of the threshold selection. The simulation experiments in Section 5 show that the diagnostic accuracy of the model is optimal when the threshold is 0.80.

Graph Convolutional Neural Networks
In recent years, researchers have introduced graph convolutional neural networks to extend CNN to non-Euclidean data, such as graph data. In order to obtain the implicit representation or label of nodes, the graph convolution operation will aggregate the node feature information of nodes in the graph and their neighboring nodes. In the convolution operation, the relationship between nodes is mined for feature representation to form new node features and update nodes. Finally, the new node features are used to predict the label of nodes. When solving node-level tasks, such as node classification, we only focused on how to learn better local expressions for each node. At this time, the pooling operation for graph-level tasks is not necessary, so we only focused on the construction of convolution operators on the graph [14].
Since the graph data does not have translation invariance, the spectral method in the graph convolutional neural network will first transform the graph signal into the spectral domain through the Fourier transform and then use the convolution in the spectral domain according to the convolution theorem. The inverse fourier transform is converted back to the original node domain, implementing the definition of graph convolution. The convolution of the signal vector x of the node in the graph signal and the convolution kernel g θ in the spectral domain can be expressed as Ug θ U T x, where U is the symmetrically normalized Laplacian matrix L of the graph and the eigenvector after eigen decomposition matrix, L = ULU T . g θ can be thought of as a function of the eigenvalues L of the Laplacian matrix L, i.e., g θ (L).
In order to satisfy the locality of the graph convolutional neural network and reduce the calculation amount of the feature vector decomposition during the original convolution operation, Defferrard et al. [15] pointed out that the truncated expansion of the Chebyshev polynomial to the Kth order can be used to fit the original convolution kernel, and the elements can be scaled to [−1, 1].
where θ k is a vector of Chebyshev coefficients. ∼ L = 2L/λ max − I N is the scaled eigenvalue diagonal matrix, λ max and it's the largest eigenvalue of matrix L, and I N is the identity matrix. Therefore, the convolution of the image signal by the convolution kernel filter can be rewritten as: where ∼ L = U ∼ LU T represents the scaled Laplacian matrix. Equation (17) is the K-order polynomial of the Laplacian matrix L, which means that only the adjacent nodes that are at most K hops away from the current node are considered when information aggregation between nodes is performed in the convolution operation.
Kipf and Welling et al. [16] rewrote the definition of graph convolution as a linear function on the Laplacian matrix through a first-order approximation, that is, K = 2, and proposed the concept of GCN for the first time.
Since the eigenvalues of I N + D − 1 2 AD − 1 2 are in the range [0, 2], repeated use of this expression can lead to the explosion or disappearance of gradients in deep neural networks. To address this problem, Kip F et al. further employed a normalization trick A. Therefore, the convolution operation of the graph convolution layer can finally be expressed as: where, H (l) is the output feature matrix of the l th layer; σ represents the activation function; W (l) is the trainable weight matrix of the l th layer.

Building a Fault Diagnosis Model Based on Graph Convolutional Neural Network
In this paper, we regard network fault diagnosis as a classification task in deep learning, and the graph convolutional neural network is used to classify the nodes of the dataset samples, and the cross-entropy loss function is used as the optimization goal of the graph convolutional neural network.
During the convolution process, GCN obtains the adjacent nodes of the current node according to the adjacency matrix and then aggregates and weighs the feature attributes of the current node itself and the feature attributes of the adjacent nodes so as to obtain a new feature expression of the node. These new features will then pass through the activation function (for example, the rectified linear unit, ReLU), aggregate the attributes of the more distant and near nodes by stacking multiple graph convolution layers to gradually obtain the higher-level features of the node, and use these high-level features to complete the node classification task.
In the training process of the model, with the help of graph data, GCN learns a new feature representation of a node and then updates the weight by backpropagating the cross-entropy loss. The neighboring nodes of the current node are likely to be unlabeled. At this time, GCN uses the feature attribute information of unlabeled samples, so the training set contains labeled and unlabeled data. From this perspective, GCN is semi-supervised learning, so it reduces the use of labeled data so that the model can also achieve good diagnostic accuracy when using small sample sizes for training.
According to the above, using the network fault data set after dimension reduction, the features of these nodes are converted into an n × d 0 -dimensional feature matrix X, and then an n×n-dimensional adjacency matrix A is constructed according to the similarity between nodes. Take X and A as inputs to GCN.
As mentioned earlier, the forward excitation propagation formula defined by GCN is: In Equation (21), σ is the activation function. ∼ A = A + I N , since the adjacency matrix A only contains the connection information of each node and the adjacent nodes in the graph, after adding the unit matrix I N , the graph convolution operation can aggregate the feature attribute information of the node itself and the adjacent nodes. ∼ D is the degree matrix of matrix ∼ A, with the values on the diagonal being the degrees of each node and the rest of the elements being 0. W (l) is a trainable weight matrix in the l th layer, which acts as a filter parameter matrix, and the parameters in it can be updated by back-error propagation.
H (l) is the output feature matrix of the l th graph convolutional layer for the input layer, H (0) is equal to the initial node feature matrix X. Each graph convolution layer in GCN only aggregates the node feature attributes of nodes in the first-order neighborhood of the current central node. Therefore, if you want to aggregate the node feature information of adjacent nodes in the multi-hop neighborhood to obtain higher-level node features, you need to stack a multi-layer graph convolution layer to achieve this.
The output of the graph convolutional neural network is a node feature matrix, Z ∈ Rr n×c where c is the number of predefined network fault classes, i.e., c = 6.
To further illustrate the structure and workflow of GCN, Figure 2 shows a GCN model that contains a total of two graph convolutional layers.
As shown in Figure 2, we first calculateÂ = in Equation (21),Â represents the normalized symmetric adjacency matrix, and can prevent numerical instability in the convolution operation. Since the matrixÂ contains the associated information of each node itself and its adjacent nodes,ÂX can aggregate the feature attributes of each node and the adjacent nodes. Next, a new set of node features ÂXW (0) is obtained by multiplying with the trainable weight matrix W (0) . Finally, an activation function is selected for the new feature matrix to obtain the output feature matrix H (1) of the first graph convolutional layer; that is, the new node feature representation learned by the first graph convolutional layer is given as follows: of the current central node. Therefore, if you want to aggregate the node feature information of adjacent nodes in the multi-hop neighborhood to obtain higher-level node features, you need to stack a multi-layer graph convolution layer to achieve this. The output of the graph convolutional neural network is a node feature matrix, ∈ × where c is the number of predefined network fault classes, i.e., c = 6.
To further illustrate the structure and workflow of GCN, Figure 2 shows a GCN model that contains a total of two graph convolutional layers. As shown in Figure 2, we first calculate = in Equation (21), represents the normalized symmetric adjacency matrix, and can prevent numerical instability in the convolution operation. Since the matrix contains the associated information of each node itself and its adjacent nodes, can aggregate the feature attributes of each node and the adjacent nodes. Next, a new set of node features ÂXW (0) is obtained by multiplying with the trainable weight matrix ( ) . Finally, an activation function is selected for the new feature matrix to obtain the output feature matrix ( ) of the first graph convolutional layer; that is, the new node feature representation learned by the first graph convolutional layer is given as follows: Stacking multiple graph convolutional layers will aggregate the feature attribute information of neighboring nodes in higher-order neighborhoods. Therefore, the output Stacking multiple graph convolutional layers will aggregate the feature attribute information of neighboring nodes in higher-order neighborhoods. Therefore, the output H (l) of the previous graph convolutional layer is used as the input of the second graph convolutional layer. After passing through the second graph convolutional layer, another set of node featuresÂH (1) However, it should be noted that since the GCN in Figure 2 uses only two graph convolutional layers, the output feature matrix of the second graph convolutional layer should have the same size as the label matrix Y; more specifically, this feature vector dimension of the node in the time graph changes to c.
Finally, the feature matrix is input into the SoftMax activation function for calculation, and the final output node feature matrix is: where W (1) is the weight matrix of the second graph convolutional layer. The SoftMax activation function needs to be applied to each row of the feature matrixÂH (1) W (1) . For the matrix Z = [Z 1 , Z 2 , . . . , Z n ], its form is similar to the label matrix Y, and each row vector Z i (1 ≤ i ≤ n) in Z corresponds to the predicted final network fault category of the sample node x i in the original dataset. Specifically, for the row vector At the same time, in the GCN training process, it is necessary to calculate the crossentropy loss function for the labeled sample nodes in the training set and use the gradient descent method to update the weights of the weight matrices in the convolutional layers of each graph through error back propagation.
where l is the number of labeled samples, c is the total number of previously defined fault classes, and Y is the label matrix of the previously defined nodes.

GCN Fault Diagnosis Process
In [17], the authors proposed a novel method for transformer fault diagnosis using a graph convolutional network (GCN) to enhance accuracy. The method combined the advantages of existing techniques, utilizing the adjacency matrix of GCN to represent similarity metrics and employing graph convolutional layers for strong feature extraction and classification. The results demonstrated the superiority of GCN over other methods such as convolutional neural networks, support vector machines, and Siamese networks in terms of diagnostic accuracy for different input features and data volumes.
The training process of GCN mainly includes two processes: forward excitation propagation and backward weight update. For the forward excitation propagation process, the input features of the sample data are processed by the multi-layer graph convolutional layer and transferred to the final SoftMax layer to obtain the output matrix Z, and the sample labels to be predicted are represented by the corresponding row vectors in the matrix Z. For the backward weight update process, the diagnostic results and the actual results are used to calculate the cross-entropy loss function, transfer the error from the output layer to the hidden layer, and then use the gradient descent method to update the weights of the weight matrices of each layer. When the model is trained, the remaining test set is used to evaluate the performance of the GCN. The process of network fault diagnosis algorithm based on GCN is shown by Algorithm 1.

1.
Train and use the model: Fault diagnosis and classification: The fault labels of the previously unlabeled data are obtained in the output node feature matrix.

Experimental Simulation and Performance Evaluation
In practice, the rapid development of drive testing technology makes it easy to collect a large number of network sample data points, but it is time-consuming and laborious for engineers or domain experts to manually label these data points through historical experience, resulting in a low number of labeled samples in the data set. Therefore, the dataset obtained in the actual fault diagnosis often consists of a large number of unlabeled samples and a small number of labeled samples. To simulate this situation, in the following comparative simulation experiments, the training set contains only a few labeled samples.

Dataset Description
In this section, we used the actual dataset collected in the Nanjing Road Test in July 2019. The total number of raw data samples is 5987. In order to keep the balance of data samples for each fault type, we deleted some data samples from the same fault type.
The filtering process was performed according to the proportion of the number of samples in each category to the total number of samples in the dataset. Finally, the normal situation, uplink interference, and downlink interference are reserved, as are space coverage, air interface failures, and base station failures, which are six state types with a large proportion of data samples, including a total of 3258 samples. The specific distribution of the samples is shown in Table 3.

Feature Filtering
In order to reduce the redundant features in the sample data, make the constructed adjacency matrix more reasonable, and improve the training speed of the next GCN model. This section uses the feature screening function of the XGBoost algorithm to preprocess the original data. First, use the previous algorithm formula to give each feature attribute in the original dataset an importance score, and sort all the feature attributes according to the importance score from high to low. The simulation results are shown in Figure 3. ensors 2023, 23, x FOR PEER REVIEW

Dataset Description
In this section, we used the actual dataset collected in the Nanjing Roa 2019. The total number of raw data samples is 5987. In order to keep the b samples for each fault type, we deleted some data samples from the same fa The filtering process was performed according to the proportion of t samples in each category to the total number of samples in the dataset. Final situation, uplink interference, and downlink interference are reserved, as ar age, air interface failures, and base station failures, which are six state type proportion of data samples, including a total of 3258 samples. The specific d the samples is shown in Table 3. Base Station Failure 496

Feature Filtering
In order to reduce the redundant features in the sample data, make th adjacency matrix more reasonable, and improve the training speed of t model. This section uses the feature screening function of the XGBoost algo process the original data. First, use the previous algorithm formula to give attribute in the original dataset an importance score, and sort all the feat according to the importance score from high to low. The simulation results Figure 3. According to the sorted feature attributes, on the basis of ensuring tha of the model does not drop significantly, the first several feature attributes w According to the sorted feature attributes, on the basis of ensuring that the accuracy of the model does not drop significantly, the first several feature attributes with the highest importance scores are selected as the feature attributes of the sample data after dimensionality reduction. The accuracy of the XGBoost model under different numbers of feature attribute combinations is shown in Figure 4. It can be seen that if all the features of the data are retained, the model accuracy rate is the highest; as the selected feature attributes gradually decrease, the model accuracy rate will decrease slightly; when the number of features is reduced to 11 or less, the model accuracy rate will have a sharp drop. Therefore, considering the diagnostic accuracy and the complexity of training, the top 11 features in Figure 3 are selected as the data features after dimensionality reduction.
rs 2023, 23, x FOR PEER REVIEW Therefore, considering the diagnostic accuracy and the complexi features in Figure 3 are selected as the data features after dimens

GCN Parameter Settings
In order to improve the accuracy of network fault diagnos determine the optimal structure and parameters of GCN, which m ber of graph convolutional layers, the size of the filter matrix in ea old α when constructing the adjacency matrix A.
First, the neural network structure of GCN needs to be deter ber of layers of the graph convolution layer. Although the featu higher level of the node can be obtained by setting the multi-la layer, this does not mean that the deeper the depth of the GCN, a GCN model that is too deep will cause an over-smooth prob carried out explanations and experiments in this regard. They p standing of the graph convolutional network (GCN) model and a tal limits. The study revealed that the graph convolution in GCN smoothing, which explains their effectiveness but also raises conc ing with multiple layers. To overcome these limits, the paper p self-training approaches that enhance GCNs' performance in le beled data and eliminate the need for additional validation labels Based on the above, as the number of GCN layers increase

GCN Parameter Settings
In order to improve the accuracy of network fault diagnosis, it is first necessary to determine the optimal structure and parameters of GCN, which mainly include the number of graph convolutional layers, the size of the filter matrix in each layer, and the threshold α when constructing the adjacency matrix A.
First, the neural network structure of GCN needs to be determined, that is, the number of layers of the graph convolution layer. Although the feature representation of the higher level of the node can be obtained by setting the multi-layer graph convolutional layer, this does not mean that the deeper the depth of the GCN, the better the effect, and a GCN model that is too deep will cause an over-smooth problem. The authors in [18] carried out explanations and experiments in this regard. They provided a deeper understanding of the graph convolutional network (GCN) model and addressed its fundamental limits. The study revealed that the graph convolution in GCNs is a form of Laplacian smoothing, which explains their effectiveness but also raises concerns about over-smoothing with multiple layers. To overcome these limits, the paper proposed co-training and self-training approaches that enhance GCNs' performance in learning with limited labeled data and eliminate the need for additional validation labels.
Based on the above, as the number of GCN layers increases, the current node also aggregates the neighboring nodes in farther neighborhoods when performing aggregation operations, and these neighboring nodes are very useful.
It may be the adjacent nodes of other nodes (in the most extreme case, each node is aggregated on the whole graph when aggregating features), resulting in all sample data in the dataset being in the process of forward excitation propagation, the end of the model. The output results tend to be consistent, reducing the classification accuracy.
This section uses two metrics, Accuracy and Macro F1, to measure how good the model is. The accuracy rate can intuitively reflect the accuracy of the model in predicting network failures, and Macro F1 takes into account the precision rate and recall rate of the classification model so that the evaluation of the model's performance is not too one-sided. By using the dataset provided in this paper, the diagnostic accuracy and Macro F1 of the GCN model as a function of the number of graph convolutional layers are shown in Figure 5. When only one layer of graph convolutional layers is used, the model cannot well obtain the nonlinear mapping relationship between the feature attributes of the sample and its corresponding fault category; when two or three layers of graph convolutional layers are used, the diagnostic accuracy is the highest. As the number of graph convolutional layers increases, the diagnostic accuracy not only does not improve but also severely declines. Therefore, in the following simulation experiments, the model uses two layers of graph convolutional layers. Since the input feature dimension of the sample data is 11 and dimension is 6, the size of the convolution filter is chosen to b namely ( ) ∈ × in the first layer of the graph convolution layer of the graph convolutional layer. In ( ) ∈ × , the reaso when passing through each layer of graph convolution, the samp dimension reduction processing of graph node features while agg jacent node feature information. The input layer and the first gr are followed by a dropout layer with a probability of 0.25 to allev lem of the model. The final GCN structure is shown in Table 4. Graph Convolutional Layer 1 Since the input feature dimension of the sample data is 11 and the final output feature dimension is 6, the size of the convolution filter is chosen to be 7 and 6, respectively, namely W (0) ∈ R 11×7 in the first layer of the graph convolutional layer and the second layer of the graph convolutional layer. In W (1) ∈ R 7×6 , the reason for this setting is that when passing through each layer of graph convolution, the sample data can also perform dimension reduction processing of graph node features while aggregating high-order adjacent node feature information. The input layer and the first graph convolutional layer are followed by a dropout layer with a probability of 0.25 to alleviate the overfitting problem of the model. The final GCN structure is shown in Table 4. Finally, consider the optimal value of the threshold α when constructing the adjacency matrix A. If the value of α is set too high, the connection relationship will exist only when the similarity between nodes is very high. In this case, some connected edges between nodes that do exist may be missed, resulting in a generated adjacency matrix that cannot truly reflect each node. The similarity relationship with other nodes reduces the accuracy of the model; if the value of α is too small, the number of edges in the graph will be too large. At this time, the training speed of the model will be very slow, overfitting will occur, and the model cannot classify the samples well. The specific experimental results are shown in Table 5. In the simulation experiment, corresponding adjacency matrices are generated for different values, and then the corresponding diagnostic accuracy is obtained according to the GCN model determined above. After 10 experiments, the average value is obtained, and compared. It can be seen from Table 5 that when the value of α is 0.80, the effect of the model is the best.

Comparative Experimental Analysis
In order to evaluate the effectiveness of the proposed algorithm model, GCN is compared with several other traditional algorithms. Since GCN is a semi-supervised learning method, both labeled and unlabeled data are used when training the model, so when discussing the training set of GCN, only the number of labeled samples in the training set is considered. A total of five sets of comparative experiments are set up in this section, and the number of labeled samples in the training set of each set of experiments is 32, 64, 128, 256, and 512, respectively. Each training set contains sample data of all network state types, and the remaining data is used as a test set to evaluate the performance of the model. Considering the actual situation, the datasets collected in the fault diagnosis process usually consist of a large number of unlabeled samples and a small number of labeled samples; therefore, experiments with more than 512 labeled samples in the training set are no longer performed in this section. Under different training sets, the diagnostic accuracy and Macro F1 of each algorithm are shown in Figures 6 and 7, and the results of each group of experiments are the average of 10 repeated experiments. It can be seen that GCN has higher diagnostic performance than other algorithms under five different training sets. At the same time, by observing the data in Figure 6, it can be found that when the amount of labeled data used for training is too small, the accuracy of all algorithm models is not high due to insufficient labeled information obtained by the model. However, with the increase in training set size, the accuracy of each algorithm model on the test set is significantly improved because the information learned from the training samples is more abundant, which improves the classification ability of the model. Moreover, when the training set contains 512 samples, that is, when the training set samples account for about 16% of the data set samples, the diagnostic accuracy of GCN can reach a high level, while the rest of the algorithms are basically supervised learning methods. Therefore, neither Accuracy nor Macro F1 as compared to GCN are ideal without a large amount of labeled data for model training. This shows that GCN can also achieve good network fault diagnosis accuracy with a small number of training samples. the rest of the algorithms are basically supervised learning meth Accuracy nor Macro F1 as compared to GCN are ideal without a data for model training. This shows that GCN can also achieve g nosis accuracy with a small number of training samples.

Conclusions
In this paper, we propose a network fault diagnosis method based on GCN. Our approach constructs graph data based on the similarity between nodes and finally inputs the graph data into GCN for network fault diagnosis. The validity of the method is verified by the experimental simulation on the drive test dataset using the model and the comparison with the traditional algorithm. Our simulation results show that the proposed algorithm can achieve high diagnostic accuracy even when using fewer samples for training.
However, when new samples need to be added to the dataset for fault diagnosis, the GCN model needs to be retrained because the size of the adjacency matrix A depends on the number of samples. In addition, the matrix operation of GCN is time-consuming, and the storage cost of the computer is large. Therefore, how to use GCN to diagnose network faults efficiently and in real time needs to be further explored.