A Network Trafﬁc Prediction Method for AIOps Based on TDA and Attention GRU

: Fault early warning is a challenge in the ﬁeld of operation and maintenance. Considering the improvement of accuracy and real-time standards, as well as the explosive growth of operation and maintenance data, traditional manual experience and static threshold can no longer meet the production requirements. This research fully digs into the difﬁculties in fault early warning and provides targeted solutions in several aspects, such as difﬁculty in feature extraction, insufﬁcient prediction accuracy, and difﬁculty in determining alarm threshold. The TCAG model proposed in this paper creatively combines the spatiotemporal characteristics and topological characteristics of speciﬁc time series data to apply to time series prediction and gives the recommended dynamic threshold interval for fault early warning according to the prediction value. A data comparison experiment of a core router of Ningxia Electric Power Co., Ltd. shows that the combination of topological data analysis (TDA) and convolutional neural network (CNN) enables the TCAG model to obtain superior feature extraction capability, and the support of the attention mechanism improves the prediction accuracy of the TCAG model compared to the benchmark models.


Introduction
With the rapid development of machine learning and deep learning, researchers have shifted their attention from traditional automatic operation and maintenance to artificial intelligence for IT operations (AIOps). Major companies, including the state grid, Google, Microsoft, and Baidu, are deploying research in this field.
The term AIOps comes from Gartner [1], who aimed to enable software and service engineers to effectively use artificial intelligence and machine-learning technologies to efficiently build and operate services that are easy to support and maintain. AIOps is of great value, which can effectively ensure high service quality and customer satisfaction, improve engineering efficiency, and reduce operating costs. Its applications include anomaly detection [2][3][4], cluster analysis [5], fault prediction [6][7][8], and cost optimization [9].
Machine learning and deep learning are effective methods to solve classification and regression problems. However, when these methods are applied to AIOps, there are mainly the following challenges: (1) With the gradual deployment of 5G base stations, the data acquisition capacity of various information equipment has been greatly improved, and the amount of operation and maintenance data has also increased explosively. How to extract the most concise and effective feature expression from a massive operation and maintenance data is a major challenge. (2) Most of the operation and maintenance data exist in the form of time series. The traditional machine-learning model cannot learn the long-term correlation information, while the deep-learning models, such as DCNN and Bert, have problems, such as huge super parameters, long training times, and the reliance on more GPU resources. Therefore, most of the current industrial application models are a compromise between model complexity, resource consumption, and time consumption. (3) The traditional automatic operation and maintenance mode is solidified and depends more on manual experience. At this stage, AIOps also needs to set rules manually in many scenarios. Therefore, how to use the algorithm model to automatically learn various rules to reduce manual intervention is not only the continuous goal of AIOps but is also an urgent problem to be solved.
Based on the above challenges, aiming at the problem of fault early warning in AIOps, this paper proposes a set of intelligent operation and maintenance architecture. The architecture can predict the value of the future period according to the historical data of information equipment and set the dynamic threshold interval based on predicted value to eliminate the dependence on manual experience and overcome the problem that the traditional threshold setting methods are difficult to deal with frequent scene changes. Specifically, the architecture first uses the convolutional neural network to extract spatiotemporal features in time series and then creatively applies topological data analysis to time series to extract topological features in data. Finally, the neural network model is trained using the extracted spatiotemporal and topological features to obtain the predicted value and dynamic threshold interval. To deal with the long-term dependence on time series data, the attention mechanism is applied in the gated cyclic unit (GRU) and is then used as the prediction model. The main contributions of this paper can be summarized as follows.
(1) To solve the problem of fault early warning in AIOps, we propose a novel dynamic threshold-setting mechanism that calculates a threshold interval that automatically adjusts to scene changes based on prediction results and can fully consider the prediction error caused by the increase in step size. (2) Owing to the possible instability and nonlinearity in time series and long-time dependence, the performance of traditional feature extraction methods is not satisfactory.
To obtain the most concise and effective feature expression of the data, this study uses a convolutional neural network to extract the spatio-temporal features in the data and uses topological data analysis to extract the topological features in the data. As far as the authors are aware, this is the first time topological data analysis has been applied to research in the field of AIOps. (3) To obtain more accurate prediction results and realize more intelligent fault early warning, this paper proposes the TCAG model, which connects the spatiotemporal and topological characteristics of data, trains the GRU neural network, and automatically calculates the dynamic threshold interval according to the prediction results. In addition, this study applied the attention mechanism to the output of the GRU at each time step. Experiments conducted on the core router data of State Grid Ningxia Electric Power Co., Ltd., show that the TCAG model is superior to the existing benchmark model in terms of prediction accuracy and robustness.
In the rest of this article, the second section refers to related work, and the third section proposes a system model for fault early warning. In the fourth section, this article introduces the modeling process and comparative experiments, the fifth section displays the experimental results and analysis, and the sixth section summarizes and prospects for this article.

Related Work
Previous research on intelligent operation and maintenance can be divided primarily into two categories: traditional machine-learning methods and deep-learning methods. Methods based on machine learning mainly include support vector machines, decision trees, random forests, and other models. Methods based on deep learning mainly include cyclic neural networks, convolutional neural networks, and residual neural networks.

Methods Based on Machine Learning
Jin et al. [10] proposed a single-index anomaly detection algorithm, which consists of an isolated forest (IF) algorithm, support vector machine (SVM) algorithm, local outlier factor (LOF) algorithm, and 3σ principle composition. The algorithm achieved an average score of 0.8304 for the sample data from the 2020 International AIOps Challenge. Soualhi, Medjaher, and Zerhouni [11] used an SVM algorithm for bearing fault diagnosis and an SVR algorithm for time series prediction to estimate the remaining service life of the bearing to monitor the health status of the key components of the bearing. Zhao et al. [12] proposed a general framework called "Period". The framework clusters the daily sequence by using algorithms, such as ED, DBSCAN and K-means, to accurately detect the periodic distribution of the sequence. Finally, the performance of anomaly detection is improved by adapting to different periodic distributions. Since the performance of a single prediction model on unbalanced data sets is not adequate, Wang et al. [13] uses the prediction results calculated by three algorithms (XGBoost classification, LSTM classification, and XGBoost regression) as the feature input of stacked integrated-learning models that can generate more results to obtain robust prediction results. Experimental results show that the proposed stack ensemble-learning model can accurately predict disk failures 14 to 42 days in advance.
The machine-learning model can adapt to a small amount of data and relatively simple application scenarios. However, with the advent of the 5G era, all kinds of structured and unstructured data are growing explosively. Limited by the complexity of the model, the machine-learning model is gradually becoming stretched in the current application.

Method Based on Deep Learning
In recent years, with the proposal of residual neural network and gated cyclic network, the depth of neural network has been greatly expanded. Larger network layers and more network parameters enable the model to adapt to more complex scenarios; thus, deep learning begins to shine in various fields, including intelligent operation and maintenance.
Khalil et al. [14] proposed an early fault prediction method for circuits. This method uses fast Fourier transform (FFT) to obtain fault frequency characteristics, principal component analysis (PCA) to obtain dimensionality reduction data, and convolutional neural network (CNN) to learn and classify faults. Wen et al. [15] proposed a new CNN model based on LeNet-5 for fault diagnosis. By converting the signal into a two-dimensional (2D) image, this method can extract the features of the converted two-dimensional image and eliminate the influence of manual features.
In [16], a combination of a deep belief network, an autoencoder, and LSTM were introduced into the field of renewable energy power forecasting. Compared to the standard MLP and physical prediction models, the results of the deep-learning algorithm showed excellent prediction performance. In [17], a framework based on a long-term short-term memory (LSTM) recurrent neural network was proposed, which can adapt to the high volatility and uncertainty in time series to accurately predict the power load of a single user. Chen et al. [18] proposed a short-term power load forecasting model based on the deep residual network, which can integrate domain knowledge and researchers' understandings of tasks with the help of different neural network building blocks. Several test cases and comparisons with existing models show that the proposed model provides accurate load forecasting results and has high generalization ability.
To summarize, the traditional machine-learning model is difficult to deal with unstructured data and adapt to massive big-data scenarios. The neural network model needs a deeper number of layers to obtain the full picture of the data, needs to consider the balance between fitting and receptive field, and the deep neural network model depends on more system resources. Therefore, considering the above problems, this paper proposes a TCAG combination model.

Topological Data Analysis (TDA)
TDA is an emerging field in complex data analysis, which studies some special geometric properties, which can remain unchanged after the shape of the graph is continuously changed and are labelled "topological properties" [19]. Through calculations using algebraic tools, the topological properties of the data in each spatial dimension can be strictly defined. For example, the two-dimensional space mainly includes the number of points and the degree of connection between the points, whereas the three-dimensional space mainly includes the number of hollow spheres and the degree of connection between the spheres.

Vietoris Rips Complex
TDA uses a simple complex method to express the original data shape. A simple complex is composed of one or more simplexes [20], which can simulate more complex shapes and is easier to deal with mathematically and computationally than the original graphics. A multidimensional simplex is shown in Figure 1

Topological Data Analysis (TDA)
TDA is an emerging field in complex data analysis, which studies some metric properties, which can remain unchanged after the shape of the graph ously changed and are labelled "topological properties" [19]. Through calcula algebraic tools, the topological properties of the data in each spatial dimen strictly defined. For example, the two-dimensional space mainly includes the points and the degree of connection between the points, whereas the three-d space mainly includes the number of hollow spheres and the degree of con tween the spheres.

Vietoris Rips Complex
TDA uses a simple complex method to express the original data shap complex is composed of one or more simplexes [20], which can simulate mo shapes and is easier to deal with mathematically and computationally than graphics. A multidimensional simplex is shown in Figure 1.

Continuous Homology
Continuous homology is a method used to measure the topological chara shape and function. It can transform data into a simple complex and describ topology at different spatial resolutions [21,22]. More persistent topologies tected on a wider spatial scale and are more representative of the real chara the underlying space. The input of continuous homology is usually a point clo tion [23], whereas the output depends on the nature of the analysis and is u posed of a persistence graph or landscape.
The calculation steps of topological features are summarized as follows: Step 1: Calculate the Euclidean distance matrix, , 1, … , .
Step 2: Construct the birth and death of homology groups for the added For each , use the closed loop of with radius /2 and , . If an older topology, , , and a younger topology, , , merge into a at some point of , , will become , and , will die.
Step 3: The persistence graph is the output of a group of points representi and death relationships of homology groups in the point cloud, expressed as Ω ̃ , , , , , : 0,1, … ; 1,2, … , Finally, , is drawn on the x-axis and , on the y-axis.

Continuous Homology
Continuous homology is a method used to measure the topological characteristics of shape and function. It can transform data into a simple complex and describe the spatial topology at different spatial resolutions [21,22]. More persistent topologies can be detected on a wider spatial scale and are more representative of the real characteristics of the underlying space. The input of continuous homology is usually a point cloud or function [23], whereas the output depends on the nature of the analysis and is usually composed of a persistence graph or landscape.
The calculation steps of topological features are summarized as follows: Step 1: Calculate the Euclidean distance matrix, Step 2: Construct the birth and death of homology groups for the added value of λ. For each λ, use the closed loop B λ (v i ) of v i with radius λ/2 and K(λ) to calculate α p,k . If an older topology, α p,k1 , and a younger topology, α p,k2 , merge into a single α p,k at some point of λ, α p,k1 will become α p,k and α p,k2 will die.
Step 3: The persistence graph is the output of a group of points representing the birth and death relationships of homology groups in the point cloud, expressed as Finally, λ p,k1 is drawn on the x-axis and λ p,k2 on the y-axis.

TCAG Model
In the existing models, TDA can extract stable and persistent topological features from data and transform the original data into a concise and effective feature expression; however, it may miss important information in the time dimension. CNN can extract the temporal and spatial characteristics of data and obtain important information on data in time and space dimensions, but it requires a deeper number of layers to present a more complete picture of the data. Therefore, in the TCAG model proposed in this study, the advantages of the TDA and CNN were fully combined to express the topological and spatiotemporal features in the final feature set. Finally, the obtained feature set was used to train the attention GRU neural network, and an excellent prediction performance was obtained.
As shown in Figure 2, the TCAG model consisted of four modules. The original data first entered the dimensionality reduction module, and the features with high correlation were screened through correlation analysis and the Granger causality test to reduce the data dimension. The dimension-reduced data were entered into the topology data analysis module, and four types of topology features were obtained through continuous homology, including continuous landscape, continuous entropy, topology vector, and distance matrix. In addition, the dimensionality reduction data will also enter the CNN module to obtain the panorama of the data in the spatiotemporal dimension through multiple convolution and pooling operations. Finally, the extracted topological features were connected with spatiotemporal features to train the GRU neural network to obtain the final prediction results. In addition, this study applied an attention mechanism to the output of each time step of the GRU to filter out inefficiency information and obtain more accurate prediction results [24]. time and space dimensions, but it requires a deeper number of layers to present a complete picture of the data. Therefore, in the TCAG model proposed in this stud advantages of the TDA and CNN were fully combined to express the topological an tio-temporal features in the final feature set. Finally, the obtained feature set was u train the attention GRU neural network, and an excellent prediction performance w tained.
As shown in Figure 2, the TCAG model consisted of four modules. The origina first entered the dimensionality reduction module, and the features with high corre were screened through correlation analysis and the Granger causality test to redu data dimension. The dimension-reduced data were entered into the topology data sis module, and four types of topology features were obtained through continuo mology, including continuous landscape, continuous entropy, topology vector, an tance matrix. In addition, the dimensionality reduction data will also enter the CNN ule to obtain the panorama of the data in the spatiotemporal dimension through m convolution and pooling operations. Finally, the extracted topological features wer nected with spatiotemporal features to train the GRU neural network to obtain th prediction results. In addition, this study applied an attention mechanism to the o of each time step of the GRU to filter out inefficiency information and obtain more rate prediction results [24]. For the extraction of topological features, the time series data is first embedde three-dimensional point cloud data through Takens embedding, and then, the V For the extraction of topological features, the time series data is first embedded into three-dimensional point cloud data through Takens embedding, and then, the Vietoris rips complex is constructed, and continuous homology is carried out. The result of persistent homology is a series of birth-and death-point pairs in the topological structure. Through the analysis and calculation of the point set, the persistent landscape, persistent entropy, and distance matrix can be obtained based on the Euclidean, bottleneck, and Wasserstein distances. The combination of these features is the final output of the module, which is a topological feature of the data.
The detailed design of CNN module is shown in Figure 3, and the original data dimension are (719 × 6 × n). The module uses a filter with step size of 1 and dimension of (1 × 3) and uses the "samepadding" to perform three convolution operations. The numbers of filters used are 16, 32, and 64, respectively. Using step size of 2 and dimension of (1 × 2) performs a maximum pool operation for the filter. Finally, the tensor is expanded by a flattened operation and connected to two full connection layers to obtain the final output of the CNN module-the spatiotemporal characteristics of the data. [25,26]. The best prediction effect was obtained in this study.
To realize the early warning function of the TCAG framework, the dynamic threshold interval must be given according to the prediction results. Therefore, we designed a dynamic threshold setting mechanism, fully considering the influence of confidence and prediction step size on the threshold interval consulting about confidence intervals and naive forecast methods. Formula (2) shows the calculation of dynamic threshold interval.
where TR represents the dynamic threshold interval, represents the network traffic prediction result, N represents the sample size, represents the i-th sample, u represents the average value of the sample, h is the prediction steps, and k represents the confidence factor, which is obtained from the selected confidence. Table 1 shows the relationship between confidence factor and value.    To make better use of the extracted topological and spatio-temporal features and obtain more accurate prediction results, the attention GRU neural network model was used in this study. GRU is currently a popular time-series prediction model as it uses fewer parameters but can achieve the same prediction performance as LSTM in most scenarios [25,26]. The best prediction effect was obtained in this study.
To realize the early warning function of the TCAG framework, the dynamic threshold interval must be given according to the prediction results. Therefore, we designed a dynamic threshold setting mechanism, fully considering the influence of confidence and prediction step size on the threshold interval consulting about confidence intervals and naive forecast methods. Formula (2) shows the calculation of dynamic threshold interval.
where TR represents the dynamic threshold interval,ŷ represents the network traffic prediction result, N represents the sample size, x i represents the i-th sample, u represents the average value of the sample, h is the prediction steps, and k represents the confidence factor, which is obtained from the selected confidence. Table 1 shows the relationship between confidence factor and value.

Dataset Description
The dataset used in this experiment came from the monitoring data of a core router of State Grid Ningxia Electric Power Co., Ltd. with a time span of (1 May 2020 00:00-30 April 2021 23:55:00). The sampling frequency was five minutes, and 96,620 pieces of data were used. The prediction target of this experiment was the received traffic data of the core router.

Data Preprocessing
For the above datasets, this paper uses correlation analysis for feature selection, and the results are shown in Table 2. For topological data analysis, it was necessary to convert two-dimensional time-series data into three-dimensional point-cloud data. Therefore, after normalizing the data, the Takens embedding function in the gtda package was used for the conversion. The data before and after the conversion is shown in Figure 4. In 3-D point cloud map, points with different colors represent different homology groups.

Evaluating Indicator
To accurately evaluate the experimental effect, the evaluation indexes used in this paper were MSE, MAPE, , 20% ACC, 15% ACC, and 10% ACC.

Evaluating Indicator
To accurately evaluate the experimental effect, the evaluation indexes used in this paper were MSE, MAPE, R 2 , 20% ACC, 15% ACC, and 10% ACC.
whereŷ is the predicted value, y is the actual value, n is the number of samples, and count(A > B) is the number of samples that meet the conditions of A > B.

Experimental Results and Analysis
To fully demonstrate the advantages of the TCAG model, three aspects are discussed in this section. First, the entire process of topology data analysis is shown in detail. Secondly, comparative experiments of different types of features were conducted, and finally, different models were carried out.

Topology Data Analysis
This experiment used topological data to analyze and mine hidden topological features in point cloud data. Through a continuous homology process, homologous topological structures were obtained at different scales. Figure 5 shows the homology group states four times during the continuous homology process.
In Figure 5, points with different colors represent different homology groups, and each subgraph is controlled by different super parameters. Among the four super parameters, n_cubes is the resolution, representing the number of hypercubes in each dimension. The n_cubes in chronological order is respectively 30, 15, 10, and 5. n_nodes is the number of nodes, representing the number of topology. The n_ nodes in chronological order is respectively 600, 250, 125, and 25. In this experiment, the Euclidean distance was used as the measure of distance, and other super parameters include eps = 0.1, which represents the maximum distance that one of the two samples was considered to be near the other, and min_samples = 15, indicating the number of samples in the neighborhood where a point is regarded as a core point.
The upper-left figure shows the initial state for the first time. The distribution of homology groups is relatively discrete, and there is a spiral topology in the center, indicating that there is some periodicity in the original time series data. There are also many outliers around, which indicates that there is a large fluctuation in the original time-series data. The upper-right figure shows the state at the second time, and the central spiral structure still exists, indicating that the cycle is relatively significant. With the development of homology, an increasing number of old homology groups have died, and new homology groups have emerged. In the bottom-left figure, the homology group gradually converges to the three-layer structure, corresponding to the evenly distributed three-layer topology in the bottom-right figure, representing the simple, efficient, and easy-to-use topological features extracted from the original data. In addition, the process fully demonstrates the smoothing and noise reduction functions of the TDA. The node distributions corresponding to each state in Figure 5 are shown in Figure 6. the number of nodes, representing the number of topology. The n_ nodes in chronological order is respectively 600, 250, 125, and 25. In this experiment, the Euclidean distance was used as the measure of distance, and other super parameters include eps = 0.1, which represents the maximum distance that one of the two samples was considered to be near the other, and min_samples = 15, indicating the number of samples in the neighborhood where a point is regarded as a core point. The upper-left figure shows the initial state for the first time. The distribution of homology groups is relatively discrete, and there is a spiral topology in the center, indicating that there is some periodicity in the original time series data. There are also many outliers around, which indicates that there is a large fluctuation in the original time-series data.
The upper-right figure shows the state at the second time, and the central spiral structure still exists, indicating that the cycle is relatively significant. With the development of homology, an increasing number of old homology groups have died, and new homology groups have emerged. In the bottom-left figure, the homology group gradually converges to the three-layer structure, corresponding to the evenly distributed three-layer topology in the bottom-right figure, representing the simple, efficient, and easy-to-use topological In Figure 6, columns with different colors represent different node types, and the color corresponds to Figure 5. As shown in Figure 6, the change in node type is divided into two stages. In the first stage, with the continuous increase in resolution, the node types increase, and the observed topology becomes increasingly diversified and deeper. For example, at the resolution of 30, a clear spiral structure is observed. In the second stage, with a continuous increase in resolution, the node type decreased, and the dimension of the data decreased. Outliers and noise were gradually fused in the second stage to weaken or even eliminate their influence on the feature representation. For example, when the resolution was 10, the outliers in the lower left corner of Figure 5 were eliminated when the resolution was reduced to five. This also explains the powerful capability of TDA in the field of noise reduction.
Appl. Sci. 2022, 12, x FOR PEER REVIEW 10 of 17 features extracted from the original data. In addition, the process fully demonstrates the smoothing and noise reduction functions of the TDA. The node distributions corresponding to each state in Figure 5 are shown in Figure 6.
Node Type Node Type Node Type Node Type In Figure 6, columns with different colors represent different node types, and the color corresponds to Figure 5. As shown in Figure 6, the change in node type is divided into two stages. In the first stage, with the continuous increase in resolution, the node types increase, and the observed topology becomes increasingly diversified and deeper. For example, at the resolution of 30, a clear spiral structure is observed. In the second stage, with a continuous increase in resolution, the node type decreased, and the dimension of the data decreased. Outliers and noise were gradually fused in the second stage to weaken or even eliminate their influence on the feature representation. For example, when the resolution was 10, the outliers in the lower left corner of Figure 5 were eliminated when the resolution was reduced to five. This also explains the powerful capability of TDA in the field of noise reduction.
The result of continuous homology is the birth-death point pair of a series of topological features, and its visual display is shown in Figure 7.  In Figure 7, the blue dot represents the birth and death of the 0th order homologous group, and the persistence of the blue dot represents the dispersion of the points in the point cloud. The larger the number of blue dots are near the diagonal, the more dispersed the point cloud. The red dots indicate birth and death in the first-order homologous group. The red dot is far from the diagonal, and the topological property is more significant in the characterization data. Through the analysis of the persistence density map in Figure 7, the order 0 homologous group was frequently born and dies in the interval with a birth time of (0.015, 0.035), indicating that the data topological characteristics at this stage are unstable, and there are many points on the diagonal on the persistence map. In Figure 7, the blue dot represents the birth and death of the 0th order homologous group, and the persistence of the blue dot represents the dispersion of the points in the point cloud. The larger the number of blue dots are near the diagonal, the more dispersed the point cloud. The red dots indicate birth and death in the first-order homologous group. The red dot is far from the diagonal, and the topological property is more significant in the characterization data. Through the analysis of the persistence density map in Figure 7, the order 0 homologous group was frequently born and dies in the interval with a birth time of (0.015, 0.035), indicating that the data topological characteristics at this stage are unstable, and there are many points on the diagonal on the persistence map.
A persistence diagram can make people intuitively feel the emergence and extinction of topological features and the persistence landscape can be easily combined with mathematical analysis and neural networks.
Any order in a persistent landscape can be used as a topological characteristic. Loworder persistent landscapes contain information on important topological characteristics, whereas high-order persistent landscapes can handle topological noise. Therefore, selecting the order of the persistent landscape as a feature requires maintaining a delicate balance between the loss of important signals and the introduction of excessive noise.

Feature Comparison Experiment
To explore the advantages and disadvantages of the combination of topological features and spatio-temporal features, this study used the GRU model to predict the received traffic value in the next 24 h and uses the original features, PCA features, ICA features, TSNE features, and features extracted by TDA + CNN as training data to carry out the first group of comparative experiments. The experimental results are listed in Table 3. In Table 3, the best and worst values for each evaluation indicators are marked in bold. It can be seen from Table 3 that the model trained with TDA + CNN features is significantly better than the model trained with other features in MSE, R 2 , 10% ACC, and other evaluation indexes and was similar to the model trained with TSNE features and ICA features in MAPE and 15% ACC but still better than both. The model trained with PCA features achieved the worst score in most indicators, which shows that PCA is not as good as other comparison methods in feature dimensionality reduction of time-series data. It is worth noting that the score of the original feature is also better than that of PCA, which indicates that PCA loses some useful information in the process of dimensionality reduction of the data, resulting in a decline in the prediction performance of the model.
From the perspective of dimension reduction principle, PCA maps high-dimensional data to low-dimensional space through linear projection. TSNE enables similar objects to have a higher probability of being selected. CMPA [27] selects effective features with the help of chaotic sequences. In this paper, spatial features are combined with temporal features, which obviously have stronger abilities in feature expression.
In conclusion, the TDA + CNN feature training model achieved the best score on most evaluation indexes, which proves the feasibility of applying this method to timeseries prediction and the powerful ability of this method in feature extraction and data dimensionality reduction.

Model Comparison Experiment
To explore the advantages and disadvantages of the TCAG model, this study used the features extracted by TDA + CNN to train MLP, TCN, LSTM, GRU, TCAG, and other neural network models for comparative experiments.
In Table 4, the best and worst values for each evaluation indicators are marked in bold. Amongst the four models compared, the R 2 value of the MLP model was the smallest, indicating that there were complex changes in the data, resulting in poor fitting of the model. The R 2 score of TCAG is the only value greater than 0, which further explains the complexity of the data and shows that the TCAG model can achieve a better fitting effect than other benchmark models. The average absolute percentage error of TCAG was 7.56%, 8.96% higher than that of the worst MLP. Compared with other benchmark models, they satisfy higher accuracy requirements. As shown in Table 4, the performances of TCN, LSTM, and GRU are similar in various indices, which are located between MLP and TCAG. In addition, it can be seen from Figure 8 that the accuracy of TCAG does not decline with the increase of prediction steps, which indicates that the model has strong generalization ability and can learn the historical laws of data well. Finally, the dynamic threshold interval is calculated according to F shown in Figure 13. The blue solid line is the actual value, the red solid line i value, and the blue shaded area represents the dynamic threshold interva dence. The upper and lower boundaries of the area are used as the threshol alarm. When the real value exceeds the threshold, it can be determined tha probability that the equipment is abnormal, and the alarm should be sent i  It can be seen from Figures 8-12 that TCAG has the best fitting effect among the five models, and only two of the predicted 24 points exceeded the range of 15% ACC. The MLP model has the worst fitting effect, with a large number of predicted values exceeding the range of 15% ACC, and the error increases with an increase in the prediction step size. The TCN model misestimates the fluctuation range of the value in the range of the step size [4,9] but then returns to a more normal state. The effect of LSTM is close to that of the GRU and second only to the TCAG model.          In summary, the TCAG model achieved the best score for all evaluation indicators, which proves that the model is superior to the four benchmark models in terms of prediction accuracy and robustness and proves the correctness of selecting the GRU model as the TCAG model component.
Finally, the dynamic threshold interval is calculated according to Formula (2), as shown in Figure 13. The blue solid line is the actual value, the red solid line is the predicted value, and the blue shaded area represents the dynamic threshold interval at 95% confidence. The upper and lower boundaries of the area are used as the threshold to trigger the alarm. When the real value exceeds the threshold, it can be determined that there is a 95% probability that the equipment is abnormal, and the alarm should be sent immediately.

Conclusions and Prospect
This paper proposes a TCAG model that uses a convolutional neural network to extract the spatio-temporal features, uses topological data analysis to extract the topological features, and then combines the spatio-temporal features and topological features to train the attention GRU neural network model. Finally, according to the prediction results of the GRU model, the dynamic threshold interval was calculated for the early fault warning of the information equipment. Based on the experimental results, the following conclusions can be drawn: (1) The proposed dynamic threshold-setting mechanism solves the problem of early

Conclusions and Prospect
This paper proposes a TCAG model that uses a convolutional neural network to extract the spatio-temporal features, uses topological data analysis to extract the topological features, and then combines the spatio-temporal features and topological features to train the attention GRU neural network model. Finally, according to the prediction results of the GRU model, the dynamic threshold interval was calculated for the early fault warning of the information equipment. Based on the experimental results, the following conclusions can be drawn: (1) The proposed dynamic threshold-setting mechanism solves the problem of early fault warning in AIOps. The mechanism calculates the threshold interval automatically adjusted with scene change according to the prediction results and can fully consider the prediction error caused by step growth, which has great practical significance. (2) Traditional feature extraction methods are difficult to adapt to instability and nonlinear structures in a time series. To solve this challenge, spatiotemporal features and topological features were combined to train the attention GRU neural network model. Compared with the original, TDA, ICA, and TSNE features, the TCAG model was able to extract the most concise and effective feature expression in the data and significantly improve the prediction effect of the model. (3) To verify the prediction performance of the TCAG model, comparative experiments were carried out using MLP, TCN, LSTM, and GRU models. The results showed that the TCAG model is superior to the benchmark model in various evaluation indexes and obtains the best prediction accuracy and robust performance. It is worth mentioning that the average absolute percentage error increased by 8.96% compared with that of the MLP model.
In this study, the early fault warning of AIOps was extensively studied and achieved ideal results. Future work will explore other directions of AIOps (abnormal monitoring, fault self-healing, etc.) and will continue to contribute to the improvement of the intelligent degree of operation and maintenance.