Dynamic QoS Prediction Algorithm Based on Kalman Filter Modification

With the widespread adoption of service-oriented architectures (SOA), services with the same functionality but the different Quality of Service (QoS) are proliferating, which is challenging the ability of users to build high-quality services. It is often costly for users to evaluate the QoS of all feasible services; therefore, it is necessary to investigate QoS prediction algorithms to help users find services that meet their needs. In this paper, we propose a QoS prediction algorithm called the MFDK model, which is able to fill in historical sparse QoS values by a non-negative matrix decomposition algorithm and predict future QoS values by a deep neural network. In addition, this model uses a Kalman filter algorithm to correct the model prediction values with real-time QoS observations to reduce its prediction error. Through extensive simulation experiments on the WS-DREAM dataset, we analytically validate that the MFDK model has better prediction accuracy compared to the baseline model, and it can maintain good prediction results under different tensor densities and observation densities. We further demonstrate the rationality of our proposed model and its prediction performance through model ablation experiments and parameter tuning experiments.


Background and Motivation
With the widespread use of Service-Oriented Architecture (SOA) in software development efforts, the reusability and interoperability of services have been greatly enhanced while promoting continuous progress in research on service clustering and classification [1][2][3], service recommendation [4][5][6], and service combination [7][8][9]. With the popularity of SOA architectures, the number of available web services has grown exponentially, which has resulted in a large number of functionally identical or similar services in the network. Quality of Service (QoS) is a set of service non-functional evaluation metrics widely used nowadays, through which the merits of network services with the same functions can be easily measured. It has become an important challenge to construct high-quality services that meet the non-functional needs of users from a large number of services with the same function [10].
For example, when using a service or building a combination of services, users can usually select a large number of candidates whose functionality meets the requirements of the user, but it is difficult to visualize whether the QoS properties of these services meet the requirements, which contains two main aspects of the problem.
Objectively, services are usually deployed in cloud servers, and users invoke these services remotely through the network. Therefore, the QoS felt by the client will be affected by both the state of the service itself and the network environment, such as the service operation state, service load, network fluctuation situation, and network congestion, making the QoS a real-time changing value. This is also the reason why different users get different QoS experiences for the same service. Therefore, it is difficult for users to obtain a credible QoS directly.
Subjectively, it is a costly task for users to actively evaluate the QoS value of a service. On the one hand, service providers usually charge for executing service invocations, which can cause significant financial expenses; on the other hand, continuous observation of all compliant services for the purpose of QoS assessment consumes a lot of time and resources. Therefore, it is often difficult for users themselves to evaluate the service QoS by means of service invocations.
In summary, at a high cost, users can usually only invoke a limited number of services with sparse QoS in order to comprehensively evaluate the QoS properties of services and avoid consuming high costs. How to predict the missing QoS becomes the core problem of building high-quality web services today.

Related Works
Prediction problems are studies in which people speculate about the trends that will emerge in the future based on the development patterns of historical things. Research on prediction problems is developing rapidly in a wide range of fields such as epidemiology [11,12], network science [13,14], engineering management [15,16], and cloud computing [17,18]. In the field of service recommendation, current research on QoS prediction problems can be divided into two categories, static QoS prediction methods [19][20][21][22][23] and dynamic QoS prediction methods [24][25][26][27][28][29][30][31].
Static QoS prediction problems often perform QoS prediction under fixed time slices according to the contextual relationships between users and services such as location information and network information. Studies based on collaborative filtering (CF) methods usually fall into this category of problems. CF-based QoS prediction methods can be classified as neighborhood-based and model-based. The neighborhood-based CF algorithm assumes a stable similarity relationship between users or services. The similarity between users can be calculated by using historical QoS values as features and filling in the missing QoS values with historical information between similar users. Shao et al. [19] first used a collaborative filtering approach for QoS prediction. This study mainly constructs the QoS matrix of users and services, obtains a similarity relationship between users, and finally predicts the missing QoS of the target users. Zheng et al. [20], based on the previous work, proposed the method of integrating user similarity and service similarity for QoS prediction, which effectively improves the accuracy of QoS prediction.
Model-based CF algorithms acquire the implicit relationships between users and services by building specific models to predict the desired QoS values. Xia et al. [21] first extracted multi-source features through a combination of matrix decomposition and neural networks and improved the prediction capability of sparse QoS data by deep neural networks for feature learning. Zou et al. [22] proposed a domain-integrated deep matrix decomposition algorithm, which improves the ability to obtain implicit features of users and services through the fusion of deep neural networks and matrix decomposition. Nguyen et al. [23] proposed an attention probability matrix model, which learns latent features by introducing a neural attention network on the basis of probability matrix decomposition and proposes a neural network architecture to learn latent features of services.
However, in the real environment, different users will access the same service at different times. At the same time, the QoS value of the same service will change over time, which will make the QoS value of the service invoked by the user always in dynamic change. Having the ability to predict QoS changes for future time slices would better meet the demand of users for high-quality services.
A dynamic QoS prediction algorithm aims to predict the future QoS development pattern based on historical QoS data features, and this method is now becoming a new research hotspot. The main research directions of current dynamic QoS prediction algo- rithms can be divided into feature engineering-based methods and deep learning-based methods. Feature engineering-based QoS prediction methods are often developed based on time series prediction algorithms [24][25][26]. Yan et al. [24] considered the QoS prediction problem as a time series prediction problem and proposed an SVD-based ARIMA model for predicting multiple QoS values, which effectively improved the prediction accuracy of QoS. Hu et al. [25] proposed a personalized QoS prediction method, which combined a Kalman filter with an ARIMA model to provide the traditional ARIMA model with the ability to obtain feedback and correct the prediction. Keshavarzi et al. [26] proposed an online QoS time series prediction method combining clustering and minimum description length (MDL), which first clusters similar time series, and then the model generator uses MDL to obtain similar features from the time series.
Feature engineering-based methods often require targeted feature extraction methods designed for time series characteristics, which require skilled feature extraction theory and a high level of manual technical experience. In contrast, deep learning-based QoS prediction methods mainly use a deep neural network to obtain time series features, which has a low technical cost and significant effect improvement compared with traditional methods. Currently, dynamic QoS prediction algorithms based on deep learning frequently use previous QoS values as input, record their time series features, and then forecast QoS values at future points [27,28]. Jin et al. [29] divided the QoS prediction process into predictions based on historical time slices and predictions based on current time slices and proposed a two-stage method TWQP, which effectively solved the dynamic QoS prediction problem under different situations. Zhang et al. [30] proposed a multivariate time series QoS prediction approach that uses phase space reconstruction to translate multivariate historical data into a dynamic system and then uses a radial basis function (RBF) neural network modified by the Levenberg-Marquardt (LM) algorithm to execute a dynamic multi-step prediction. Zou et al. [31] provide a GRU-based deep neural network to mine the user and service temporal properties between users and services to predict unknown QoS. Additionally, they suggested an enhanced temporal characterization of users and services. However, there are two problems in the above research: firstly, the traditional deep learning-based QoS prediction algorithm fails to consider the real-time QoS values generated by the user when invoking the service as augmented information to be utilized in the prediction, making it impossible to further improve the model prediction accuracy. Secondly, the traditional deep learning models cannot fill in historical sparse QoS datasets during training, which affects the prediction accuracy of the models. A summary of the related work is shown in Table 1.

Main Contributions
To address the aforementioned issue, we propose MFDK, a three-part dynamic QoS prediction model: first, the user-service-time data is formed as a third-order tensor and decomposed into non-negative Tuckers to fill in the missing values in the tensor. Second, the tensor data is put into a CNN-BiLSTM deep learning model for training, and finally, the model predictions are adjusted by fusing QoS realistic observations through a Kalman filtering algorithm.
The main contributions of this paper are as follows.  Outperforms the 6 baseline models in the article in terms of response time and throughput Jin et al. [29] MulA-LMRBF: A method to input historical QoS data using phase-space reconstruction method, afterwards implementing dynamic multi-step prediction by RBF neural network improved by Levenberg-Marquardt algorithm.
Outperforms the 5 baseline models in the article in terms of response time and throughput Zhang et al. [30] DeepTSQP: A method propose a deep neural network with gated recurrent units (GRU), learning, and mining temporal features among users and services.
Outperforms the 9 baseline models in the article in terms of response time and throughput Zou et al. [31] 2. Preliminaries

Tucker Decomposition
Tucker decomposition is one of the main tensor decomposition methods. The fundamental concept is to approximate the decomposition of the initial tensor as the product of the kernel tensor and the factor matrix. Using the third-order tensor, as an illustration, the decomposition has the form depicted in Figure 1.

Tucker Decomposition
Tucker decomposition is one of the main tensor decomposition methods. The mental concept is to approximate the decomposition of the initial tensor as the pro the kernel tensor and the factor matrix. Using the third-order tensor, as an illustrati decomposition has the form depicted in where  is the approximation tensor of  , Given a tensor X ∈ R I 1 ×I 2 ×···×I N , then the Tucker decomposition process of the tensor X can be expressed as: whereX is the approximation tensor of X , G ∈ R J 1 ×J 2 ×···×J N is the core tensor, and are the factor matrices. Solving the Tucker decomposition problem can be transformed into an optimization problem, as in Equation (2): In this paper, the multiplicative updating algorithm is used to iteratively solve the problem. The objective function equation can be expressed as: where X (n) is the mode-n matrix of tensor X and G (n) is the mode-n expansion matrix of the tensor G. We define A (n) , which collects Kronecker products of mode matrices except A (n) , where ⊗ is the Kronecker product. Then the updated equations for the factor matrices and the core tensor are: ⊗ ) T , and * is the Hadamard products. After performing the update, the model can be controlled to converge by setting the convergence threshold or by setting the number of iterations.

Kalman Filter
The Kalman filter algorithm mainly includes the time update equation and the state update equation, assuming the existence of a linear system with state and observation equations as follows: where x k and x k−1 are the states of the system at time k and k − 1, respectively, u k−1 is the control variable at time k − 1, A and B are the state transfer matrix and the input state transfer matrix, respectively, z k is the observation at time k, H is the transformation matrix from state variable to observation variable, and w k−1 and v k are Gaussian-distributed noise. The time update equation for the Kalman filter is: wherex k is the prior state estimate at time k,x k−1 is the posterior state estimate at time k − 1, P k is the prior estimate covariance at time k, P k−1 is the posterior estimate covariance at time k − 1, and Q is the state noise covariance. The state update equation is: where K k is the Kalman filter gain matrix,x k is the a posteriori state estimate at time k, P k is the a posteriori estimated covariance at time k − 1, and z k is the observed value at time k. R is the observation noise covariance.

Description of the Problem
Before going into detail about the model proposed in this paper, a formal description of the problem solved in this paper is given.
Assuming that there exists a set of I users U = {u 1 , u 2 · · · u i · · · u I }, a set of J services S = s 1 , s 2 · · · s j · · · s J , and a set of K time slices T = {t 1 , t 2 · · · t k · · · t K }, afterward, a third-order tensor Y ∈ R I×J×K can be constructed such that Y ijk , the elements of Y, represent the QoS values generated when the user i invokes the service j at time k. In real environments, the obtained QoS values are usually very sparse, which also makes the QoS tensor Y very sparse. As shown in Figure 2, the problem addressed in this paper is whether the QoS value at moment k + 1 can be predicted under the sparse case of Y. To solve this problem, this paper proposes a dynamic QoS prediction method, MFDK, whose main process is described as follows.

Description of the Problem
Before going into detail about the model proposed in this paper, a formal description of the problem solved in this paper is given.
Assuming that there exists a set of I users , and a set of K time slices can be constructed such that ijk  , the elements of  , represent the QoS values generated when the user i invokes the service j at time k. In real environments, the obtained QoS values are usually very sparse, which also makes the QoS tensor  very sparse. As shown in Figure 2, the problem addressed in this paper is whether the QoS value at moment k + 1 can be predicted under the sparse case of  . To solve this problem, this paper proposes a dynamic QoS prediction method, MFDK, whose main process is described as follows. As seen in Figure 3, the historical QoS data is first converted into a user-service-time third-order tensor and then decomposed into a non-negative Tucker to fill in missing data. QoS predictions are obtained after dividing the full third-order tensor into a training and validation set and feeding it to a deep neural network for training. The final QoS predictions are obtained by combining the predictions produced by the deep neural network with the reality observations through the Kalman filter.   As seen in Figure 3, the historical QoS data is first converted into a user-service-time third-order tensor and then decomposed into a non-negative Tucker to fill in missing data. QoS predictions are obtained after dividing the full third-order tensor into a training and validation set and feeding it to a deep neural network for training. The final QoS predictions are obtained by combining the predictions produced by the deep neural network with the reality observations through the Kalman filter.

Description of the Problem
Before going into detail about the model proposed in this paper, a formal description of the problem solved in this paper is given.
Assuming that there exists a set of I users , and a set of K time slices can be constructed such that ijk  , the elements of  , represent the QoS values generated when the user i invokes the service j at time k. In real environments, the obtained QoS values are usually very sparse, which also makes the QoS tensor  very sparse. As shown in Figure 2, the problem addressed in this paper is whether the QoS value at moment k + 1 can be predicted under the sparse case of  . To solve this problem, this paper proposes a dynamic QoS prediction method, MFDK, whose main process is described as follows. As seen in Figure 3, the historical QoS data is first converted into a user-service-time third-order tensor and then decomposed into a non-negative Tucker to fill in missing data. QoS predictions are obtained after dividing the full third-order tensor into a training and validation set and feeding it to a deep neural network for training. The final QoS predictions are obtained by combining the predictions produced by the deep neural network with the reality observations through the Kalman filter.

Missing Data Filling Based on Non-Negative Tucker Decomposition
The Tucker decomposition algorithm mentioned in Section 2.1 can serve to fill the missing values in the tensor to a certain extent; however, in practice, the missing values filled by Tucker decomposition can be negative, which has no practical significance in a QoS tensor with positive values. To improve the accuracy of the model prediction, this paper adds a non-negativity constraint to the Tucker decomposition, i.e., the missing data is filled by a non-negative Tucker decomposition. Thus, the objective function of the QoS tensor Y ∈ R I×J×K for non-negative Tucker decomposition can be expressed as: whereŶ ijk is the element in the tensor to be predicted. Combined with the equations in Section 2.1, the non-negative Tucker decomposition Algorithm 1 can be summarized as follows: Algorithm 1: Non-negative Tucker decomposition.
Inputs: user-service-time tensor Y, rank on each modal I, J, K Output: Tensor after filling sparse values Y 1: BEGIN 2: Initialize the core tensor G and the factorization matrices A (1) , A (2) , and A (3) 3: REPEAT 4: for n in 3 do: 5: Iteratively update each factor matrix.
: end for 7: Updating the core tensor.
: Tensor iterative update. Y ←Ŷ 10: until the error converges or the maximum number of iterations is reached 11: END

Training Dataset Construction
After filling in the historical data, a CNN-BiLSTM neural network model is constructed in this paper to train on the historical data. Prior to training, the user-service-time tensor needs to be constructed as training data that can be fed into the neural network. To make predictions of QoS values for future time slices, the tensor is expanded into fibers forming along the time dimension, as in Figure 4.
In this paper, the first K− 1 time slices of each vector after unfolding are used as model training data and the Kth step is predicted to achieve QoS prediction for future time slices.

Convolutional Neural Network
In this paper, the local features of QoS time series are extracted by convolution neural network. The advantage of CNN lies in its ability of local feature extraction and parameter sharing. Convolutional kernels are used by the convolutional layer to convolve local regions of the QoS time series in order to create corresponding features and reduce the risk of over-fitting through the parameter sharing of the convolution kernel. The process of convolution kernel convolution operation is as follows: where F t represents the result of the convolution operation of the t-th convolution kernel; ϕ represents the nonlinear activation function; and W t , x t−1 , and b t represent the filter kernel, the input of the CNN layer, and bias term of the t-th convolution kernel.  In this paper, the first 1 K − time slices of each vector after unfolding are used as model training data and the K th step is predicted to achieve QoS prediction for future time slices.

Convolutional Neural Network
In this paper, the local features of QoS time series are extracted by convolution neural network. The advantage of CNN lies in its ability of local feature extraction and parameter sharing. Convolutional kernels are used by the convolutional layer to convolve local regions of the QoS time series in order to create corresponding features and reduce the risk of over-fitting through the parameter sharing of the convolution kernel. The process of convolution kernel convolution operation is as follows: where t F represents the result of the convolution operation of the t-th convolution kernel; ϕ represents the nonlinear activation function; and t W , 1 − t x , and t b represent the filter kernel, the input of the CNN layer, and bias term of the t-th convolution kernel. The output of convolution layer is shown as the connection of all convolution kernel calculation results, which is shown as: where n represents the maximum number of convolution kernels in the convolution layer; output represents the final output of the convolution layer.

Bidirectional Long Short-Term Memory Neural Network
In this paper, a BiLSTM layer is added after the CNN layer to obtain the global characteristics of time series. BiLSTM is a unique RNN structure. It is composed of a two-layer LSTM network, which can obtain sequence features from two directions. The core structure of the LSTM consists of an input gate, a forgetting gate, a memory unit, and an output gate. Input information enters the LSTM unit through the input gate, determines the information to be retained and forgotten through the forgetting gate, and finally outputs the information through the output gate. Through such a unique gate structure, LSTM alleviates the problems of gradient disappearance and gradient explosion in traditional RNN networks. The structure of the LSTM unit is shown in Figure 5. The output of convolution layer is shown as the connection of all convolution kernel calculation results, which is shown as: where n represents the maximum number of convolution kernels in the convolution layer; output represents the final output of the convolution layer.

Bidirectional Long Short-Term Memory Neural Network
In this paper, a BiLSTM layer is added after the CNN layer to obtain the global characteristics of time series. BiLSTM is a unique RNN structure. It is composed of a two-layer LSTM network, which can obtain sequence features from two directions. The core structure of the LSTM consists of an input gate, a forgetting gate, a memory unit, and an output gate. Input information enters the LSTM unit through the input gate, determines the information to be retained and forgotten through the forgetting gate, and finally outputs the information through the output gate. Through such a unique gate structure, LSTM alleviates the problems of gradient disappearance and gradient explosion in traditional RNN networks. The structure of the LSTM unit is shown in Figure 5.
The calculation process of forgetting gate is as follows: where f t is the output of the forgetting gate; x t represents the input feature sequence; and h t−1 represents the output sequence of the previous time. W f is the weight matrix of the forgetting gate; b f represents the offset matrix; and σ is the sigmoid activation function, whose expression is: The calculation process of input gate and memory unit is as follows: The calculation process of forgetting gate is as follows: where t f is the output of the forgetting gate; t x represents the input feature se and 1 t -h represents the output sequence of the previous time. f W is the weight of the forgetting gate; f b represents the offset matrix; and σ is the sigmoid ac function, whose expression is: The calculation process of input gate and memory unit is as follows: Finally, the output information of the LSTM unit is determined through the gate, whose expression is: Finally, the output information of the LSTM unit is determined through the output gate, whose expression is: where o t represents the output of the output gate, which together with C t determines the short-term memory h t of the LSTM unit at time t. The BiLSTM network used in this paper consists of the above LSTM units. Its composition is shown in Figure 6. The BiLSTM network is composed of forward LSTM layer and reverse LSTM layer, whose expression is:  The BiLSTM network is composed of forward LSTM layer and reverse LSTM layer, whose expression is: where → h n is the calculation result of the forward LSTM network, ← h n is the calculation result of the reverse LSTM network, and h t is the final output of the BiLSTM network. In this way, BiLSTM can well obtain the global feature information of time series.

The Overall Structure of The Model
In order to fully obtain the temporal characteristics of historical QoS data, a neural network model based on CNN and BiLSTM is adopted in this paper. Due to its circular structure, a recurrent neural network (RNN) has good advantages in capturing the characteristics of time series. However, the traditional RNN neural network faces the challenge of gradient disappearance and gradient explosion. In order to fully obtain the characteristics of historical QoS data and improve the universality of the network, this paper uses a BiLSTM neural network to obtain the time series characteristics of historical QoS data. The structure of a BiLSTM neural network can obtain features from the forward and reverse of time series, which greatly improves the feature capture ability of the model while alleviating the problems of gradient disappearance and gradient explosion. On this basis, the CNN layer is added to the BiLSTM model to enhance the model's ability to obtain local features of time series.
As shown in Figure 7, since the normalized data will make the model complete convergence faster, the QoS data will be normalized first. In this paper, we use the maximumminimum normalization to operate on the data, and the calculation formula is: where X is the original data in the time series, X min is the minimum value in the time series, X max is the maximum value in the time series, and X norm is the normalized data. After the output of the BiLSTM layer, the model is expanded into a one-dimensional sequence through the flatten layer and a single element is output as the next moment QoS In this paper, the local features in the time series are first extracted using a onedimensional CNN layer. For the features of the QoS time series, this model uses a convolutional layer with a window length of 5 and a total of 64 one-dimensional convolutional kernels to extract features from the time series.
The main purpose of the BiLSTM neural network is to further learn the overall temporal characteristics of the time series context based on the local features acquired by the CNN network. The model uses BiLSTM layers containing 64 LSTM units per layer.
After the output of the BiLSTM layer, the model is expanded into a one-dimensional sequence through the flatten layer and a single element is output as the next moment QoS prediction through the fully connected layer. The Adams optimizer algorithm was used to continuously update the model parameters by monitoring the training error "MAE". The batch size was 32, and the maximum number of training epochs was 200.

Derivation of the Algorithm Formula
The application of deep learning in the time series prediction process is often based on the prediction results of the model itself as the final prediction value, while in a realistic QoS prediction application environment, real-time QoS observations will also be constantly available during the calculation of deep learning prediction values. If the real-time QoS observations can be combined with the prediction values of deep neural networks, it will be possible to improve the prediction accuracy based on the deep neural network prediction. Based on the above ideas, this paper proposes a Kalman filter-based correction algorithm for deep learning prediction values. Based on the Kalman filter formula in Section 1.2, this algorithm treats the predicted values of the deep neural network as the next predicted values of the Kalman filter, and the predicted QoS values are all in elemental form. Then the state equation will be transformable into: where net k is the prediction value of the neural network at moment k − 1 and loss k−1 is the prediction error of the neural network at moment k − 1. To calculate the error of the prediction process and quantify the value of loss k−1 , assume that there exists a time-variable state transfer variable A k−1 such that: x k = net k = A k−1 net k−1 (27) Then Equation (9) can be transformed into: where A k−1 = net k net k−1 ; then, substituting Equations (21) and (22) into the state update equation gives: where QoS k is the observed value at moment k and Knet k is the predicted value at moment k corrected by the Kalman filter. In summary, the Algorithm 2 is summarized as follows.

Optimization Strategies in the Face of Sparse Observations
Algorithm 2 achieves an effective combination of model predictions and real-time QoS observations. However, in practice, real-time QoS observations are also sparse, and missing observations added to the Kalman filtering process may lead to an increase in algorithm error. In this paper, an algorithm optimization strategy is proposed for the case of sparse observations. Algorithm 3 is the optimized algorithm, and the specific contents are as follows

. Data Set and Experimental Environment
This paper conducts experiments on a response time dataset from the WS-DREAMdataset 2 dataset, which is widely used in the field of QoS prediction. It contains response time data generated by 142 users interacting with 4500 web services over 64 time slices. The experiments in this paper construct the data in the dataset as a tensor and then perform missing value filling, expand the tensor fibrillation into a time series vector according to the method in Section 3.3.1, and keep the first 63 time slices in the vector as historical data for training and predicting the outcome of the next time slice. The value of the 64th time slice is taken as the true value. When dividing the dataset, 85% of the vectors were used as the deep learning model training set and the remaining 15% were used as model validation for training.
This experiment is based on the Intel (R) core (TM) i7-10875h CPU at 2.30 GHz. On the Windows platform of the processor, the NVIDIA Geforce GTX 2060 graphics processor and CUDA version 11.1.114 are used for the training process of deep learning. All codes are programmed through the PyCharm platform.

Evaluation Indicators
The mean absolute error (MAE) and root mean square error (RMSE) are used in the experiments in this paper as evaluation metrics in the dynamic QoS prediction process. The MAE and RMSE are defined as: where n is the number of predicted QoS values, and real i and predicted i are the true and predicted values, respectively. The smaller the MAE and RMSE values, the greater the prediction effect of the relevant model.

Model Comparison Experiments
This section compares the MFDK algorithm proposed in this paper with existing time series prediction algorithms and time-aware QoS prediction-based algorithms.
One of them is the ARIMA [24] algorithm (Autoregressive Integrated Moving Average model, ARIMA), which is a classical time series prediction algorithm that performs model prediction by smoothed time series parameters obtained after differencing.
TCN [32] (Temporal Convolutional Network, TCN) is a type of convolutional neural network that uses causal convolution and dilated convolution to enable the convolutional neural network to handle time series prediction problems. It is a new model that outperforms traditional neural networks such as LSTM and RNN in prediction performance.
WSPred [33] is a time series aware QoS prediction algorithm based on tensor decomposition, which predicts missing values by performing a third-order tensor decomposition on the user-service-time tensor.
CLUS [34] clusters the user service time tensor into different clusters through the K-means algorithm and uses the similarity relationship within the cluster to predict QoS.
The PMF [35] (Probabilistic Matrix Factorization) method performs QoS prediction by means of probabilistic matrix factorization.
To test the adaptability of the different methods in a realistic QoS sparse environment, we randomly removed data from the tensor to control the tensor density; e.g., a tensor with a tensor density of 0.1 represents a tensor where we randomly removed 90% of the elements of the original tensor. We finally compared the RMSE and MAE metrics of the different prediction methods for tensor densities equal to 0.1, 0.15, 0.2, 0.25, and 0.3. The final experimental results are shown in Table 2. The experimental results demonstrate that the prediction accuracy of each model generally improves as the tensor density increases. The RMSE and MAE indices of the model proposed in this paper are lower than those of the comparison models under various tensor densities, indicating that the MFDK model has superior QoS prediction capability. Specifically, the ARIMA model has limited capacity to capture sparse time series characteristics and cannot fill in missing data, making it hard to construct an appropriate prediction model for sparse data. The TCN algorithm is a deep learning method that captures the features of time series data excellently. However, the capacity for generalization and prediction of a TCN network trained on sparse data would be considerably diminished. This further demonstrates the significance of data filling for historical data. The PMF, CLUS, and WSPred algorithm models have better prediction capabilities compared to the ARIMA and TCN models, with their RMSE metrics decreasing by averages of 53.2%, 23.5%, and 34.6%, respectively, and their MAE metrics decreasing by averages of 28.6%, 17.3%, and 12.1%, respectively. This is due to the capacity of time-aware QoS prediction algorithms to fill in sparse data. They are better at adapting to sparse data than ARIMA and TCN algorithms. Among these, the WSpred algorithm outperforms PMF and CLUS in terms of prediction accuracy. This is because the WSpred algorithm iteratively predicts missing values in a gradient descent manner using a third-order tensor containing time series relationships, and the prediction accuracy is greater. However, the aforementioned model lacks the combination of real-time QoS observations, and its prediction impact is still insufficient when compared to the MFDK model. To summarize, the MFDK model with sparse data filling and real-time QoS observations has greatly increased QoS prediction accuracy when compared to baseline QoS prediction approaches.

Model Ablation Experiments
The experiments in this part are designed to check the functions of the various components of the model provided in this study to further evaluate the model's rationality and predictive capacity. The Tucker + net and net + Kalman models are created by deleting the Kalman filter correction and non-negative matrix decomposition parts of the MFDK model, respectively, while the net model merely keeps the neural network prediction element of the MFDK model. The outcomes of their experiments are displayed in Figure 8.
Based on the experimental findings, it is evident that as tensor density increases, the overall prediction accuracy of each model improves. Among these, the MFDK model achieves better experimental results, with average reductions of 69.4% and 44.0% in RMSE and Mae indicators, respectively, compared with net. From the experimental results of the Tucker + net model and net + Kalman model, it can be seen that, relative to the net model, the RMSE and MAE of the former decreased by averages of 66.9% and 30.0%, while the RMSE and MAE of the latter decreased by averages of 23.7% and 28.7%, which were less than the former. This is because the non-negative matrix decomposition predicts the missing QoS data, fills the sparsity of the training data, considerably enhances the capacity of the neural network to forecast, and greatly increases the overall prediction accuracy of the model. The experimental findings in this part demonstrate that both non-negative matrix factorization and the Kalman filter have significantly enhanced the neural network's prediction outcomes, proving that the MFDK model has good performance. Tucker + net model and net + Kalman model, it can be seen that, relative to the net model, the RMSE and MAE of the former decreased by averages of 66.9% and 30.0%, while the RMSE and MAE of the latter decreased by averages of 23.7% and 28.7%, which were less than the former. This is because the non-negative matrix decomposition predicts the missing QoS data, fills the sparsity of the training data, considerably enhances the capacity of the neural network to forecast, and greatly increases the overall prediction accuracy of the model. The experimental findings in this part demonstrate that both non-negative matrix factorization and the Kalman filter have significantly enhanced the neural network's prediction outcomes, proving that the MFDK model has good performance.

Effect of Observation Sparsity on Model Performance
In the real environment, the real-time observation value of the QoS value is also sparse. This section mainly tests the adaptability of MFDK model to the sparsity of different observations and its applicability in the real environment. This experiment verifies the prediction performance of the model under different tensor densities when the observation value densities are 1, 0.9, 0.7, 0.5, 0.3, and 0.1. The experimental results are shown in Figure 9 below.
It can be seen from the experimental results that the overall prediction accuracy of the model shows an upward trend with the increase of tensor density, and the prediction accuracy of the model is also rising with the increase of observation density. Specifically, starting from the observation density of 0.1, the RMSE and MAE of the model will decrease by 16.6% on average every time the observation density increases by 0.2. When the observation density is one, that is, the observation value has no sparsity, the prediction effect of the model reaches the best, and the RMSE and MAE are 0.3752 and 0.2736 on average. It can be further concluded from the experiment that the MFDK model can well adapt to the observed values under different sparsities. At the same time, the predicted values of the model can be modified by integrating the observed values with the Kalman filter algorithm, which can effectively improve the prediction accuracy of the model. The lower the sparsity of the observed values, the better the prediction effect of the model.

Effect of Observation Sparsity on Model Performance
In the real environment, the real-time observation value of the QoS value is also sparse. This section mainly tests the adaptability of MFDK model to the sparsity of different observations and its applicability in the real environment. This experiment verifies the prediction performance of the model under different tensor densities when the observation value densities are 1, 0.9, 0.7, 0.5, 0.3, and 0.1. The experimental results are shown in Figure 9 below.
It can be seen from the experimental results that the overall prediction accuracy of the model shows an upward trend with the increase of tensor density, and the prediction accuracy of the model is also rising with the increase of observation density. Specifically, starting from the observation density of 0.1, the RMSE and MAE of the model will decrease by 16.6% on average every time the observation density increases by 0.2. When the observation density is one, that is, the observation value has no sparsity, the prediction effect of the model reaches the best, and the RMSE and MAE are 0.

Effect of Kalman Filter Parameters on Model Performance
The experiment in this section mainly discusses the influence of changes in state noise covariance Q and observation noise covariance R in Kalman filter parameters on the prediction performance of the model. Q and R are the initial parameters in the Kalman filter algorithm, and their values represent the confidence of the Kalman filter to the predicted value of the model and the actual observed value, respectively, thus affecting the correction ability of the predicted value of the model. It should be noted that in the actual calculation process, Q and R do not affect the Kalman filter process alone. According to Formulas (22) and (23), Q and R jointly determine the calculation process of Kalman filter gain. Therefore, this experiment will discuss the change of the prediction ability of the model under the conditions of [0, 20] Q ∈ and [0, 20] R ∈ , when the observed value density is 0.5 and the tensor densities are 0.1, 0.3, and 0.5, respectively. The experimental results are as follows.
As shown in Figure 10, the three-dimensional chart is composed of the x-axis as the state noise covariance Q, the y-axis as the observation noise covariance R, and the z-axis as the model evaluation indicators RMSE and MAE, respectively. From the chart, it can be concluded that the changes of R and Q values have an important impact on the prediction accuracy of the model. Specifically, with the continuous improvement of tensor density, the overall prediction accuracy of the model is also rising. Further, in each group of experiments, when the R value decreases and Q value increases, RMSE and MAE decline as a whole, and the prediction ability of the model increases; This is because with the decrease of R value and the increase of Q value, the gain value of Kalman filter will increase, the ability of Kalman filter to correct the predicted value of the model will be enhanced, and the prediction accuracy will also be improved. From this analysis, we can draw a conclusion: the changes of Kalman filter parameters R and Q have a significant impact on the prediction accuracy of the model. When initializing Kalman filter parameters, increasing Q and decreasing R will further improve the prediction accuracy of the model.

Effect of Kalman Filter Parameters on Model Performance
The experiment in this section mainly discusses the influence of changes in state noise covariance Q and observation noise covariance R in Kalman filter parameters on the prediction performance of the model. Q and R are the initial parameters in the Kalman filter algorithm, and their values represent the confidence of the Kalman filter to the predicted value of the model and the actual observed value, respectively, thus affecting the correction ability of the predicted value of the model. It should be noted that in the actual calculation process, Q and R do not affect the Kalman filter process alone. According to Formulas (22) and (23), Q and R jointly determine the calculation process of Kalman filter gain. Therefore, this experiment will discuss the change of the prediction ability of the model under the conditions of Q ∈ [0, 20] and R ∈ [0, 20], when the observed value density is 0.5 and the tensor densities are 0.1, 0.3, and 0.5, respectively. The experimental results are as follows.
As shown in Figure 10, the three-dimensional chart is composed of the x-axis as the state noise covariance Q, the y-axis as the observation noise covariance R, and the z-axis as the model evaluation indicators RMSE and MAE, respectively. From the chart, it can be concluded that the changes of R and Q values have an important impact on the prediction accuracy of the model. Specifically, with the continuous improvement of tensor density, the overall prediction accuracy of the model is also rising. Further, in each group of experiments, when the R value decreases and Q value increases, RMSE and MAE decline as a whole, and the prediction ability of the model increases; This is because with the decrease of R value and the increase of Q value, the gain value of Kalman filter will increase, the ability of Kalman filter to correct the predicted value of the model will be enhanced, and the prediction accuracy will also be improved. From this analysis, we can draw a conclusion: the changes of Kalman filter parameters R and Q have a significant impact on the prediction accuracy of the model. When initializing Kalman filter parameters, increasing Q and decreasing R will further improve the prediction accuracy of the model.

Conclusions
This study introduces MFDK, a novel dynamic QoS prediction approach. The approach is divided into three sections. To begin, non-negative Tucker decomposition is used to fill in the sparse values in historical data; after that, the historical QoS data is transformed into training data, which is then transferred to the CNN-BiLSTM deep learning model we built for training, and the QoS value of the future time slice is predicted. Finally, the prediction values are adjusted using the real-time QoS method introduced in this research, which combines the real observation values with the Kalman filter to obtain more accurate QoS prediction data. The MFDK model successfully handles the challenges of historical QoS data filling and real-time QoS data fusion via trials. Experiments on the WS-dream dataset revealed that the MFDK model outperforms the classic dynamic QoS prediction approach.
In the future work, the development direction of the MFDK model will be investigated in two aspects: on the one hand, it will continue to improve the filling ability of sparse QoS data. Information such as user geographic location and user-service-similarity relationships can be combined to make the historical data filling results of the MFDK model more accurate; on the other hand, the performance of the deep learning network will be improved so that the ability of the new neural network to capture time series features can be enhanced.