Collaborative Service Selection via Ensemble Learning in Mixed Mobile Network Environments

: Mobile Service selection is an important but challenging problem in service and mobile computing. Quality of service (QoS) predication is a critical step in service selection in 5G network environments. The traditional methods, such as collaborative ﬁltering (CF), suffer from a series of defects, such as failing to handle data sparsity. In mobile network environments, the abnormal QoS data are likely to result in inferior prediction accuracy. Unfortunately, these problems have not attracted enough attention, especially in a mixed mobile network environment with different network conﬁgurations, generations, or types. An ensemble learning method for predicting missing QoS in 5G network environments is proposed in this paper. There are two key principles: one is the newly proposed similarity computation method for identifying similar neighbors; the other is the extended ensemble learning model for discovering and ﬁltering fake neighbors from the preliminary neighbors set. Moreover, three prediction models are also proposed, two individual models and one combination model. They are used for utilizing the user similar neighbors and servicing similar neighbors, respectively. Experimental results conducted in two real-world datasets show our approaches can produce superior prediction accuracy.


Introduction
With the emergence of 5G era, various mobile devices and tablets have been widely developed and used in enterprises' marketing activities [1].Developing mobile services has become an increasingly important way for various enterprises to deliver their marketing applications that only customs' functional demands and satisfy their expected non-functional requirements.Since many mobile services have been or are being developed as interfaces to access resources on mobile environments, including 4G, 5G, and WIFI, the number of mobile services has increased dramatically.Meanwhile, users tend to find those services that satisfy their business requirements and also provide high quality service.With a growing number of available services with similar functions, the problem of selecting a suitable candidate service has become an urgent task.
5G networks provide faster speeds and better quality of service (QoS).QoS is a key factor in service selection, both in industry and academia [2].Recently, QoS-aware service selection has been a critical problem in service and mobile computing [3][4][5].This selection is based on a primary premise that all QoS values are pre-provided [6][7][8][9].In real-world cases, however, such a premise is hardly true, since it is impossible to invoke all services at the same time.Therefore, it is an indispensable task to predict QoS.
There has been much research into QoS prediction [10][11][12].For example, collaborative filtering (CF) uses historical QoS records to predict unknown QoS values.Many works demonstrate that CF-based prediction approaches have better prediction accuracy [13][14][15] for utilizing invocation records of the target user/service to identify the similar user/service neighbors.They then predict the missing QoS values collaboratively.Prediction accuracy largely relies on the quality of the identified similar services or users.Several similarity computation methods (e.g., Pearson correlation coefficient or PCC and cosine similarity) are used for the identification of similar user neighbors or service neighbors, but the similarity between two users/services is always overestimated in existing methods, which leads to lower prediction results.We also notice that recent related works pay attention to the improvement of similarity computation.However, in a complex network environment, such as the mixed mobile networks in which networks of different types co-exist (for example, a mobile network mixed with 5G, 4G, Wi-Fi), the abnormal QoS values are easier to generate, and a part of the abnormal QoS data is likely to be mixed with normal QoS data.The abnormal QoS data have the potential to impair the prediction accuracy significantly.Another key issue in QoS prediction is ignored in most current research.Therefore, we aim to detect abnormal QoS data and reduce their impact in QoS prediction in this paper.
To address the above problems, a novel collaborative prediction approach inspired by ensemble learning (i.e., AdaBoost) is proposed in this paper.The proposed approach is oriented to a mixed mobile network environment.Our approach has the capability to improve similarity between two users/services.Moreover, it can also handle abnormal QoS data.We first employ two techniques to discovery two sets of user similar neighbors based on the historical QoS records.The computations and DBScan clustering [16] of these two techniques are similar.Then, we propose an extended ensemble learning model based on an extension of the AdaBoost method that identifies abnormal QoS data and produces clusters of QoS data.AdaBoost, short for adaptive boosting, is a well-known ensemble learning method [17].Meanwhile, the probability belonging to each cluster can be generated.Next, we filter abnormal users from QoS clusters by investigating the clusters with the proposed ensemble learning method and remove abnormal users from the neighbors set.In this way, the final two neighbors sets can be formed, and the weight of each similar neighbor from the two sets is decided by the probability of belonging to each individual cluster.Following that, we propose two individual collaborative prediction methods using the discovered similar neighbors sets.Finally, we design a combined method to generate the final prediction result.In summary, the main contributions of this paper are as follows: 1.
We propose a new neighbors selection method extended from the DBScan algorithm that performs well in handling high data sparsity.

2.
We propose an ensemble learning method extended from AdaBoost that can identify abnormal QoS data.The false neighbors can be filtered from the candidate neighbors.

3.
We propose two individual collaborative prediction methods, one for user and the other for service.We also propose a combined method that can combine the prediction results of the two individual methods.

4.
Experimental results conducted in two real-world datasets show our approaches can produce superior prediction accuracy and have strong flexibility to the experiment setting.
This paper is organized as follows: related work is discussed in Section 2, our framework is presented in Section 3, our prediction models are explained in Section 3, and experimental results are given in Section 4. Conclusion and future work is presented in Section 5.

Related Work
Service selection aims to discover quality services for target users, and the effectiveness of a prediction method is the key in service selection.The collaborative filtering (CF for short) method has been popularly used for QoS prediction [6][7][8][9].There are two groups in the CF family, i.e., neighbor-based (focusing on identifying the similar relationship among users or services) and model-based (learning both the latent features of a user and a service and the relation between the latent features of users and services) [18][19][20][21][22][23][24][25][26][27].
There are three types of the neighbor-based CF methods: user-based, service-based, and hybrid.Shao et al. [4] standardized QoS values and presented a user-based CF algorithm.In this method, the similarity was computed by PCC, and the results showed that similar users tended to share similar QoS values.Sun et al. [5] normalized QoS values to 0 and 1.They used Euclidean distance to compute the similarity.Context information, like network location or geographical location, is closely related to QoS [6].Some researchers integrated such context information into the CF method, achieving more accuracy in prediction.Liu et al. [14] proposed a network location-based CF method that identified the potential autonomous systems (AS) based on the IP addresses of users and services.They assumed that the users that are located near to each other had similar network environments and thus were likely to experience similar QoS.Chen et al. [10] constructed a bottom-up hierarchical clustering algorithm utilizing the user geographical location to mine similar regions and further integrated the region similarity into the CF algorithm.Yao et al. [12] proposed a content-based CF algorithm that utilized the description content extracted from WSDL files to mine user preference to service invocation.Wu et al. [18] proposed a time-aware QoS prediction approach.The most advantage is derived from collaborative filtering.This approach first computed user-service pairs that had historical invocation experiences and then used CF-based method to predict QoS values.Chen et al. [7] proposed a hybrid model that combined the service-based CF method and latent semantic analysis.Jiang et al. [8] also proposed a CF-based hybrid model that was a linear combination of user-based CF and service-based CF.Zheng et al. [9] constructed a combination model that combined the prediction results of user-based CF algorithm and service-based CF algorithm with a predefined parameter.However, most of the existing methods ignore the abnormal QoS data.However, involving abnormal data can significantly lower prediction accuracy.In addition, many approaches suffer from data sparsity.
Yin et al. [11] presented three prediction models that all adopted matrix factorization (MF for short) and network location-aware neighbor selection.He et al. [21] proposed a geographic location-based hierarchical MF model, in which the user-service invocation matrix was partitioned into several local matrices, using K-means algorithm.The final prediction result was computed as the combination of the results that had been produced using the whole matrix and local matrices, respectively.Tang et al. [22] proposed a network-aware QoS prediction approach by integrating MF with network information.By employing network information, they computed the network distances among users and further identified user neighborhoods.Xu et al. [23] extended the PMF model (short for probabilistic matrix factorization) with geographical information [28].Based on the geographical location of the target user, their method learned the user latent feature vector, investigating the impact of the features of similar neighbors.However, model-based methods are vulnerable to data sparsity, which is common in the mixed network environment.This defect can easily lead to inaccurate prediction results.
Additionally, Ma et al. [29] unified the modeling of multi-dimensional QoS data via multi-linear algebra tools and tensor analysis for predicting QoS.In [30], Wu et al. proposed CAP: credibility-aware prediction.CAP employed K-means clustering for identifying the untrustworthy users and cluster QoS values for untrustworthy index calculation and users.
There are quite limited existing studies for QoS prediction considering abnormal QoS data, which is a key issue, especially in the mixed network environment.In this paper, we aim to solve the abnormal QoS data problem in QoS prediction.

Collaborative QoS Prediction via Ensemble Learning
Due to the variety of network conditions and the potential changing of user location, the abnormal QoS data can be generated from normal users.If we can identify the abnormal QoS data and further filter the corresponding false neighbors from the neighbors set, the prediction accuracy will have a high probability of improvement.
Considering the following scenario: the three users u 1 , u 3 , and u 4 are the candidate neighbors of u 2 , and x 1 and x 2 are the missing values in Table 1.We find that the QoS values of u 3 and u 4 on s 2 are close, but the QoS value of u 1 is much greater than those of u 3 and u 4 .Supposing that the real value of x 1 is 1.5, when we want to predict the value of x 1 based on the known QoS values of u 1 , u 3 , and u 4 , the prediction result will significantly deviate from the real value.Thus, the QoS value of u 1 is the potentially abnormal QoS value.Clearly, if u 1 can be removed from the candidate neighbors set of u 2 , the prediction accuracy is likely to be promoted.On the service side, supposing that the real value of x 2 is 8.3, when we try to predict the value of x 2 , the smallest QoS value of s 3 should be regarded as one abnormal value.It is difficult to manually set a fixed threshold or determine the false neighbors by comparing the QoS values to a fixed threshold.In this paper, we propose an ensemble learning method, extended from AdaBoost, with the frequency feature vector as the input.According to the value range of historical QoS values, we can create several categories of QoS data by classifications.The AdaBoost method can generate the probability of every missing QoS value belonging to each category.We select the top K categories with the largest probabilities.If a QoS value is not contained in any category, this value will be regarded as an abnormal value.Consequently, its corresponded neighbors will be removed from the candidate neighbors.Figure 1 illustrates the complete procedure of our approach, which basically consists of the following three steps: 1.
Similar neighbors selection.We use a DBScan co-occurrence matrix to compute the similarity between users.The similarity computation result is used to build the similar neighbors set DC_N(u). 2.
Neighbors filtering.We discover a feature vector by combining the frequency vectors of a user and a service for the prediction of the corresponding missing QoS value.and further filter the corresponding false neighbors from the neighbors set, the prediction accuracy will have a high probability of improvement.
Considering the following scenario: the three users u1, u3, and u4 are the candidate neighbors of u2, and x1 and x2 are the missing values in Table 1.We find that the QoS values of u3 and u4 on s2 are close, but the QoS value of u1 is much greater than those of u3 and u4.Supposing that the real value of x1 is 1.5, when we want to predict the value of x1 based on the known QoS values of u1, u3, and u4, the prediction result will significantly deviate from the real value.Thus, the QoS value of u1 is the potentially abnormal QoS value.Clearly, if u1 can be removed from the candidate neighbors set of u2, the prediction accuracy is likely to be promoted.On the service side, supposing that the real value of x2 is 8.3, when we try to predict the value of x2, the smallest QoS value of s3 should be regarded as one abnormal value.It is difficult to manually set a fixed threshold or determine the false neighbors by comparing the QoS values to a fixed threshold.In this paper, we propose an ensemble learning method, extended from AdaBoost, with the frequency feature vector as the input.According to the value range of historical QoS values, we can create several categories of QoS data by classifications.The AdaBoost method can generate the probability of every missing QoS value belonging to each category.We select the top K categories with the largest probabilities.If a QoS value is not contained in any category, this value will be regarded as an abnormal value.Consequently, its corresponded neighbors will be removed from the candidate neighbors.Figure 1 illustrates the complete procedure of our approach, which basically consists of the following three steps: 1. Similar neighbors selection.We use a DBScan co-occurrence matrix to compute the similarity between users.The similarity computation result is used to build the similar neighbors set DC_N(u). 2. Neighbors filtering.We discover a feature vector by combining the frequency vectors of a user and a service for the prediction of the corresponding missing QoS value.The feature vector is the input of the ensemble learning model used to generate the probability of belonging to each category.After that, we can filter false neighbors DC_N(u) by selecting the top K categories with the highest probabilities.

Similar Neighbors Selection
Similar neighbors selection is a critical step in CF-based method for QoS prediction.Some similarity computation methods (for example, PCC and Euclidean distance) are employed to select neighbors in existing works [4,9,10].However, most existing methods suffer from the problem of similarity overestimation [4,9], especially in the case of high sparsity.Meanwhile, due to common abnormal QoS data in a mixed mobile network environment (for example, a mobile network mixed with 4G, 5G, and Wi-Fi), the prediction accuracy can easily decline.In this paper, we propose a novel similarity computation method based on DBScan clustering aiming to solve this problem.DBScan is a density-based clustering algorithm in which the clustering structure is generated according to the connection among data.The data to be clustered are interconnected according to connectivity.DBScan computes the data distribution with a group of parameters (ε and MinSamples).Compared to K-means, DBScan does not require us to pre-define the number of clusters and is applicable to the clustering task of data with various shapes rather than as K-means, which usually only performs well on data with a sphere shape.DBScan is also able to better identify noise and is more robust to outliers.As a result, the proposed similarity computation method is less affected by abnormal QoS data.Two phases are involved in our method: co-occurrence matrix construction and similar neighbors selection.

Phase 1: Co-Occurrence Matrix Construction
The m × n matrix M denotes m users and n services, representing the user-service invocation relationship.Each entry q i,j in matrix M denotes the QoS value of user i invoking service j.Users invoked the same service are clustered by employing the DBScan algorithm.The users or services having similar QoS values will be clustered into one same group.C f i or C f j denotes the f th cluster of user i or service j, where f is the cluster index.
After QoS values clustering, we construct the co-occurrence matrix to store the clustering result.The initial value of each entry is set as 0 in the co-occurrence matrix.The co-occurrence matrix is an m × n symmetric matrix, notated as A, and the entry a i,j of A denotes the times of user u i and user u i being clustered in the same group.For example, user u 1 and user u 3 are clustered in the same group on two services s 0 and s 2 in Figure 1, then the entry a 1,3 = a 3,1 = 2 in Table 2.A larger entry value in matrix A means a higher similarity between the two users associated with the entry.Table 3 shows an example of a 3 × 8 QoS matrix consisting of three services and eight users.We use DBScan algorithm to cluster the QoS values of each service.The clustering results are given in Figure 2. The co-occurrence matrix A can be constructed according to the clustering results and shown in Figure 3.The values of a 5,8 and a 8,5 is 2, because they are clustered into the same group of C 1 0 (on s 0 ) and C 0 2 (on s 2 ).-u0 u1 u2 u3 u4 u5 u6 u7 u8 s1 -1.1 2.7 1.2 0.9 2.9 --2.8 s2 -1.

Phase 2: Neighbors Selection
After constructing the user co-occurrence matrix and service co-occurrence matrix, we can identify the similar neighbors.Similar neighbors selection is a core step in achieving QoS prediction with high accuracy, since false similar neighbors are likely to impair the prediction accuracy.
In the co-occurrence matrix, an entry represents the similarity between two users.We choose the top K most similar neighbors to predict the missing value.
The CF algorithm assumes that the invocation behavior of a user will be relatively stable.Such an assumption is reasonable in many cases, but in some cases, it is not applicable.This is because the historical QoS records are likely to contain noise data, and the similarity computation on the noise data tends to fail to select the proper neighbors.Also, if the preference of the target user or service changes, which indeed can happen, it is naturally inaccurate to predict missing QoS values based on the assumed fixed preference.-u0 u1 u2 u3 u4 u5 u6 u7 u8 s1 -1.1 2.7 1.2 0.9 2.9 --2.8 s2 -1.

Phase 2: Neighbors Selection
After constructing the user co-occurrence matrix and service co-occurrence matrix, we can identify the similar neighbors.Similar neighbors selection is a core step in achieving QoS prediction with high accuracy, since false similar neighbors are likely to impair the prediction accuracy.
In the co-occurrence matrix, an entry represents the similarity between two users.We choose the top K most similar neighbors to predict the missing value.
The CF algorithm assumes that the invocation behavior of a user will be relatively stable.Such an assumption is reasonable in many cases, but in some cases, it is not applicable.This is because the historical QoS records are likely to contain noise data, and the similarity computation on the noise data tends to fail to select the proper neighbors.Also, if the preference of the target user or service changes, which indeed can happen, it is naturally inaccurate to predict missing QoS values based on the assumed fixed preference.

Phase 2: Neighbors Selection
After constructing the user co-occurrence matrix and service co-occurrence matrix, we can identify the similar neighbors.Similar neighbors selection is a core step in achieving QoS prediction with high accuracy, since false similar neighbors are likely to impair the prediction accuracy.
In the co-occurrence matrix, an entry represents the similarity between two users.We choose the top K most similar neighbors to predict the missing value.
The CF algorithm assumes that the invocation behavior of a user will be relatively stable.Such an assumption is reasonable in many cases, but in some cases, it is not applicable.This is because the historical QoS records are likely to contain noise data, and the similarity computation on the noise data tends to fail to select the proper neighbors.Also, if the preference of the target user or service changes, which indeed can happen, it is naturally inaccurate to predict missing QoS values based on the assumed fixed preference.
To fix such issues, we further filter the similar neighbors by utilizing the classification result of AdaBoost.Such filtering can improve the selection accuracy of similar neighbors.Figure 4 shows the framework of the AdaBoost classifier.In this paper, we adopt an ensemble learning method (i.e., AdaBoost algorithm) and use the decision tree as the weak classifier to further filtering similar neighbors.The weak classifier learns different weights to form different classifiers from the distribution of samples.The ensemble classifier aggregates the classification results from the individual classifiers to produce the final result.The detailed explanation of the proposed classifier is given in the following section.
To fix such issues, we further filter the similar neighbors by utilizing the classification result of AdaBoost.Such filtering can improve the selection accuracy of similar neighbors.Figure 4 shows the framework of the AdaBoost classifier.In this paper, we adopt an ensemble learning method (i.e., AdaBoost algorithm) and use the decision tree as the weak classifier to further filtering similar neighbors.The weak classifier learns different weights to form different classifiers from the distribution of samples.The ensemble classifier aggregates the classification results from the individual classifiers to produce the final result.The detailed explanation of the proposed classifier is given in the following section.

Feature Selection
In this paper, we discover a new set of features: the frequency feature vector, which can better depict the individual features of user-service relationship, In real dataset [9], QoS values are within a certain range; for example, the response time is [0,20].By rounding off the response time, we can build 21 discrete categories.The label of each category is assigned to an element in the set {−1, 0, 1, …, 19}.Thus, the round-off QoS value of a special service can fall into a special category.
The entry in the frequency feature matrix is defined as the frequency of times occurring in a certain category of the QoS value generated by the invocation of a user to a service.Formally, the user frequency feature matrix is defined as U: M × d, and the service frequency feature matrix is defined as S: N × d.M is the number of users, N is the number of services, and d is the number of categories.
We use the frequency vectors of users and services to construct the feature vector of the target user invoking the target service.For a missing value, the input feature vector is a vector with integers as the elements and 42 dimensions (21 user features plus 21 service features).The input feature vector combines the features of user and service to improve the depicting capability and classification accuracy.

Frequency Feature Vector
Figures 3 and 5 show three services feature vectors and three user feature vectors.The vertical ordinate numbers are the times of round-off QoS value of a user invoking a service on a special classify label.Some latent attributes of users and services can be seen from the two figures.
The feature quality has a great impact on the generalization of the learning model.The frequency feature vector of user-service pairs can better depict the relation of the target user-service pair in order to distinguish from other user-service pairs and further improve the prediction accuracy of classification.
First, let us focus on service b and user b.Almost all QoS values are in the same category, which indicates the features being stable with little variability.For instance, all QoS values of service b are classified into Classify Label 0. It can be further inferred that if the target service and target user are

Feature Selection
In this paper, we discover a new set of features: the frequency feature vector, which can better depict the individual features of user-service relationship, In real dataset [9], QoS values are within a certain range; for example, the response time is [0, 20].By rounding off the response time, we can build 21 discrete categories.The label of each category is assigned to an element in the set {−1, 0, 1, . . ., 19}.Thus, the round-off QoS value of a special service can fall into a special category.
The entry in the frequency feature matrix is defined as the frequency of times occurring in a certain category of the QoS value generated by the invocation of a user to a service.Formally, the user frequency feature matrix is defined as U: M × d, and the service frequency feature matrix is defined as S: N × d.M is the number of users, N is the number of services, and d is the number of categories.
We use the frequency vectors of users and services to construct the feature vector of the target user invoking the target service.For a missing value, the input feature vector is a vector with integers as the elements and 42 dimensions (21 user features plus 21 service features).The input feature vector combines the features of user and service to improve the depicting capability and classification accuracy.

Frequency Feature Vector
Figures 3 and 5 show three services feature vectors and three user feature vectors.The vertical ordinate numbers are the times of round-off QoS value of a user invoking a service on a special classify label.Some latent attributes of users and services can be seen from the two figures.
The feature quality has a great impact on the generalization of the learning model.The frequency feature vector of user-service pairs can better depict the relation of the target user-service pair in order to distinguish from other user-service pairs and further improve the prediction accuracy of classification.
First, let us focus on service b and user b.Almost all QoS values are in the same category, which indicates the features being stable with little variability.For instance, all QoS values of service b are classified into Classify Label 0. It can be further inferred that if the target service and target user are service b or user b, the corresponding category is highly likely to be the current category.Thus, for these types of users and services, the classification result is easy to generate.In contrast, the QoS values of service a and service c are distributed in many different categories.Especially for service c, whose stability is low and variability is high.For such type of services, the classification result is hard to predict, but if we can fully utilize the user features, the classification accuracy can be improved.For example, when the task is to predict the missing QoS of user a invoking service c, although service c is unstable, user a has two stable categories, so the classification result is in the two categories with high probability.
Even if both the user and the service are unstable (e.g., the pair of user c and service c), the curves of the frequency vectors are clearly different.In the historical records, the combination of frequency vectors of different users and different services corresponds to different categories.The AdaBoost algorithm is mainly employed for the supervised learning problems and can learn different combination patterns.service b or user b, the corresponding category is highly likely to be the current category.Thus, for these types of users and services, the classification result is easy to generate.
In contrast, the QoS values of service a and service c are distributed in many different categories.Especially for service c, whose stability is low and variability is high.For such type of services, the classification result is hard to predict, but if we can fully utilize the user features, the classification accuracy can be improved.For example, when the task is to predict the missing QoS of user a invoking service c, although service c is unstable, user a has two stable categories, so the classification result is in the two categories with high probability.
Even if both the user and the service are unstable (e.g., the pair of user c and service c), the curves of the frequency vectors are clearly different.In the historical records, the combination of frequency vectors of different users and different services corresponds to different categories.The AdaBoost algorithm is mainly employed for the supervised learning problems and can learn different combination patterns.

Similar Neighbors Filter
We can calculate the probability of the QoS value belonging to each label using the ensemble learning model.Then we select the K labels with the largest probabilities in all labels.The similar neighbors set of a user is constructed as follows: where N(u) is the similar neighbor set of user u and qv,j is the real QoS value of similar user v having invoked service j.The label set is the set of the k labels with largest probabilities of the missing QoS values.( ) N u is the subset of N(u), where the members satisfy the condition , ∈ v j q labelset .
Here is an example of ( ) N u .Assuming that , and q labelset, we filter the neighbor user v4 from N(u).We can therefore get the new similar neighbor set Similarly, the similar neighbors of a service can be produced as follows: where N(j) is the preliminary similar neighbors set of service j and qu,h is the real QoS value of user u having invoked the similar service h.
( ) N j is the subset of the set N(i), where the members satisfy , ∈ u h q label set .

Similar Neighbors Filter
We can calculate the probability of the QoS value belonging to each label using the ensemble learning model.Then we select the K labels with the largest probabilities in all labels.The similar neighbors set of a user is constructed as follows: where N(u) is the similar neighbor set of user u and q v,j is the real QoS value of similar user v having invoked service j.The label set is the set of the k labels with largest probabilities of the missing QoS values.N(u) is the subset of N(u), where the members satisfy the condition q v,j ∈ label set.
Here is an example of N(u).Assuming that N(u) = {v 1 , v 2 , v 3 , v 4 }, label set = {1, 2, 3}, and q v1,j = 1.1, q v2,j = 2.1, q v3,j = 3.1, q v4,j = 7.1.Clearly, because of q v 4 ,j / ∈ label set, we filter the neighbor user v 4 from N(u).We can therefore get the new similar neighbor set Similarly, the similar neighbors of a service can be produced as follows: where N(j) is the preliminary similar neighbors set of service j and q u,h is the real QoS value of user u having invoked the similar service h.N(j) is the subset of the set N(i), where the members satisfy q u,h ∈ label set.

The Proposed Prediction Methods
We propose a new user-based CF method that utilizes the probabilities associated with the labels of the QoS values, replacing the traditional similarity in existing CF-based methods.Since the ensemble learning model takes an important role in our framework, we name the proposed method User-based CF with Ensemble learning (UCF-E), and the corresponded prediction is given by, where w v,j is the probability of q v,j belonging to the label and q v,j is the real value that user v received after invoking service j.
In a similar way, we propose a new service-based CF method using the probability belonging to the labels of the missing QoS values, also replacing the traditional similarity.This proposed method is named Service-based CF with Ensemble learning (SCF-E), and the prediction is given as, where w u,h is the probability of q u,h belonging to the label.q u,h is the real value after user u invoked service h.However, considering the high sparsity of the service invocation records, the UCF-E method or SCF-E method probably does not fully utilize all of the information in the historical QoS records.To further improve prediction accuracy, we propose a hybrid prediction method that combines the prediction results of UCF-E and SCF-E, aiming to fully take advantage of the whole QoS data.We name this method Hybrid CF with Ensemble learning (HCF-E), given as follows: where the parameter θ is used to control the proportions of the two individual models.UCF-E u,j and SCF-E u,j are the prediction values of the two individual models respectively.In the extreme case of being 0, the hybrid model is degraded to SCF-E.If θ is 1, the hybrid model is degraded to the UCF-E.

Dataset and Experiment Setting
The experiment is conducted with WSDream dataset [8], which is a real-world dataset.It has two sub-datasets and consists of 339 users and 5825 services.The aim of WSDream is mainly to evaluate throughput and response time.
As for the experiment setting, we select part of QoS records from the dataset randomly for training set, and all the other for testing set.In the experiment, four different training set densities are configured: 5%, 10%, 15%, and 20%.If the training set density is configured 15%, it means that 15% of the whole data is used for training set, while the other 85% data are for testing.Every set of experiments is run 10 times, and the average result is used for evaluation.The experimental results are given in Table 4 (response time dataset) and Table 5 (throughput dataset).

Performance Comparison
Some well-known QoS prediction models are implemented for evaluating the proposed model.They are explained below.

1.
UMean: Use the mean of each user's historical QoS value as prediction value.

2.
IMean: Use the mean of each user's historical QoS value as prediction value.

3.
UPCC: User-based collaborative filtering algorithm that uses the historical QoS records of similar users to predict the missing values [12].4.
IPCC: Service-based collaborative filtering algorithm that uses the historical QoS records of similar services to predict the missing values [13].
SVD: As a matrix factorization model, this method tries to learn latent factors to mine the user latent features and service latent features [20].7.
LBR: This method selects similar users with geographical location information and take advantage of similar users in matrix factorization [23].8.
NIMF: Contain three predictions models and employs two techniques of matrix factorization and location-aware neighbors selection [5].

9.
CAP: Identifies false neighbors and then use reliable clustering results [24] to predict missing QoS values.
Both Mean Absolute Error (MAE) and Normalized Mean Absolute Error (NMAE) are adopted for measuring prediction accuracy.Tables 4 and 5 show the response time and throughput results respectively.

1.
All the three proposed models (SCF-E, UCF-E and HCF-E) are better in prediction accuracy.

2.
As the training set densities increase, MAE and NMAE values also decrease.Therefore, the more historical QoS records, the better prediction accuracy will be.3.
UCF-E achieves higher prediction accuracy than SCF-E.This is mainly from dataset, the number of users is only 339, but the number of services is 5825.A larger number of services are likely to introduce neighbors not so similar as noise, further to reduce the prediction accuracy.
Parameter sensitivity is also studied in the below subsections.

Sensitivity Analysis of Classification Precision
The parameter topKLabel (L) controls the number of potential labels of the prediction values.For example, in the case of L being 3 and the classification precision being 0.9017, the probability of prediction value belonging to the three most likely labels is 0.9017.The experimental results are shown in Figure 6, where the training set density is from 5% to 20%.
When L increases, the classification precision first increases rapidly and then tends to converge.It indicates that a small interval of labels can better predict the label.Besides, the classification precisions are close to each other due to convergence in different densities (5% to 20%), which means that our proposed method has a stable performance in high data sparsity.3. UCF-E achieves higher prediction accuracy than SCF-E.This is mainly from dataset, the number of users is only 339, but the number of services is 5825.A larger number of services are likely to introduce neighbors not so similar as noise, further to reduce the prediction accuracy.
Parameter sensitivity is also studied in the below subsections.

Sensitivity Analysis of Classification Precision
The parameter topKLabel (L) controls the number of potential labels of the prediction values.For example, in the case of L being 3 and the classification precision being 0.9017, the probability of prediction value belonging to the three most likely labels is 0.9017.The experimental results are shown in Figure 6, where the training set density is from 5% to 20%.
When L increases, the classification precision first increases rapidly and then tends to converge.It indicates that a small interval of labels can better predict the label.Besides, the classification precisions are close to each other due to convergence in different densities (5% to 20%), which means that our proposed method has a stable performance in high data sparsity.

Sensitivity Analysis of θ
The parameter θ is used to control the weight of the two individual models (UCF-E and SCF-E) in the combination model.We investigate the sensitivity of our method to θ in the range of 0 to 1.The experimental results are shown in Figure 7, and the training set density is from 5% to 20%.In four different training set densities, the optimal value of θ is all in the value of 0.7-0.9.We set θ to 0.8 as the default.The result indicates that our combination model can achieve better performance by the utilization of the results of both UCF-E and SCF-E.Besides, the MAE in low data sparsity (20%) is clearly lower than that in high data sparsity (5%).This indicates that collecting more QoS data is an effective way to improve the prediction accuracy.

Sensitivity Analysis of θ
The parameter θ is used to control the weight the two individual models (UCF-E and SCF-E) in the combination model.We investigate the sensitivity of our method to θ in the range of 0 to 1.The experimental results are shown in Figure 7, and the training set density is from 5% to 20%.In four different training set densities, the optimal value of θ is all in the value of 0.7-0.9.We set θ to 0.8 as the default.The result indicates that our combination model can achieve better performance by the utilization of the results of both UCF-E and SCF-E.Besides, the MAE in low data sparsity (20%) is clearly lower than that in high data sparsity (5%).This indicates that collecting more QoS data is an effective way to improve the prediction accuracy.

Sensitivity Analysis of topKNeighbors (T)
In this paper, we use the parameter topKNeighbors (T) to control the size of the user or service neighborhood.A smaller T can reduce time complexity and reduces the prediction time.We find that the change trends of MAE and NMAE are quite similar, so we report the result of MAE in Figure 8.
In Figure the MAE value first decreases with the increase of T and then increases, and the whole change is slight in the whole value range.After T is larger than 6, some neighbors that are not so similar can reduce the prediction accuracy.At the point of T being 6, the model achieves the best MAE value.Therefore, we set the default value of T as 6.

Sensitivity Analysis of ε
The parameter ε controls the size of core object in DBS can clustering.Using DBScan method, we can select the similar user or service neighbors.In this paper, we use a relatively small to distinguish different QoS values.
As Figure 9 shows, with the increase of ε , the MAE value decreases from 0.01 to 0.04 and then increases.After ε is larger than 0.04, the connectivity of QoS begins to relax, which probably brings

Sensitivity Analysis of topKNeighbors (T)
In this paper, we use the parameter topKNeighbors (T) to control the size of the user or service neighborhood.A smaller T can reduce time complexity and reduces the prediction time.We find that the change trends of MAE and NMAE are quite similar, so we report the result of MAE in Figure 8.
In Figure 8, the MAE value first decreases with the increase of T and then increases, and the whole change is slight in the whole value range.After T is larger than 6, some neighbors that are not so similar can reduce the prediction accuracy.At the point of T being 6, the model achieves the best MAE value.Therefore, we set the default value of T as 6.

Sensitivity Analysis of topKNeighbors (T)
In this paper, we use the parameter topKNeighbors (T) to control the size of the user or service neighborhood.A smaller T can reduce time complexity and reduces the prediction time.We find that the change trends of MAE and NMAE are quite similar, so we report the result of MAE in Figure 8.
In Figure 8, the MAE value first decreases with the increase of T and then increases, and the whole change is slight in the whole value range.After T is larger than 6, some neighbors that are not so similar can reduce the prediction accuracy.At the point of T being 6, the model achieves the best MAE value.Therefore, we set the default value of T as 6.

Sensitivity Analysis of ε
The parameter ε controls the size of core object in DBS can clustering.Using DBScan method, we can select the similar user or service neighbors.In this paper, we use a relatively small to distinguish different QoS values.
As Figure 9 shows, with the increase of ε , the MAE value decreases from 0.01 to 0.04 and then increases.After ε is larger than 0.04, the connectivity of QoS begins to relax, which probably brings

Sensitivity Analysis of ε
The parameter ε controls the size of core object in DBS can clustering.Using DBScan method, we can select the similar user or service neighbors.In this paper, we use a relatively small to distinguish different QoS values.
As Figure 9 shows, with the increase of ε, the MAE value decreases from 0.01 to 0.04 and then increases.After ε is larger than 0.04, the connectivity of QoS begins to relax, which probably brings some neighbors that are not similar to the target user or service.At the value of ε being 0.04, the model achieves the best MAE value.Therefore, we set the default of ε to 0.04.

Conclusions and Future Work
A novel collaborative QoS prediction framework that consists of a novel neighbor selection method is proposed in this paper.The proposed novel neighbor selection is based on the DBScan algorithm, which is verified to be effective, especially in cases of high data sparsity.Our approach can also filter false neighbors using the proposed ensemble learning model to generate a quality neighbors set.We also propose a hybrid model that can utilize the results of the two individual models.Experimental results conducted in two real-world datasets show our approaches can produce superior prediction accuracy.
Although our proposed approach has successfully demonstrated that the ensemble learning model can identify false neighbors, some challenges still exist.For example, we plan to analyze the performance of the proposed method in other QoS properties.We also plan to construct a real-world service selection system for mixed mobile networks.

Conclusions and Future Work
A novel collaborative QoS prediction framework that consists of a novel neighbor selection method is proposed in this paper.The proposed novel neighbor selection is based on the DBScan algorithm, which is verified to be effective, especially in cases of high data sparsity.Our approach can also filter false neighbors using the proposed ensemble learning model to generate a quality neighbors set.We also propose a hybrid model that can utilize the results of the two individual models.Experimental results conducted in two real-world datasets show our approaches can produce superior prediction accuracy.
Although our proposed approach has successfully demonstrated that the ensemble learning model can identify false neighbors, some challenges still exist.For example, we plan to analyze the performance of the proposed method in other QoS properties.We also plan to construct a real-world service selection system for mixed mobile networks.

Figure 1 .
Figure 1.The framework of quality of service (QoS) prediction.

Figure 3 .
Figure 3.The service frequency vector.

Figure 3 .
Figure 3.The service frequency vector.

Figure 3 .
Figure 3.The service frequency vector.

Figure 6 .
Figure 6.The classification precision with topKLabel (L) increasing in different training set densities.

Figure 6 .
Figure 6.The classification precision with topKLabel (L) increasing in different training set densities.

Figure 7 .
Figure 7.The estimation error with θ increasing in different training set densities.

Figure 8 .
Figure 8.The estimation error with topKNeighbors (T) increasing in different training set densities.

Figure 7 .
Figure 7.The estimation error with θ increasing in different training set densities.

Figure 8 .
Figure 8.The estimation error with topKNeighbors (T) increasing in different training set densities.

Figure 8 .
Figure 8.The estimation error with topKNeighbors (T) increasing in different training set densities.

Entropy 2017 ,
19, 358 13 of 15 some neighbors that are not similar to the target user or service.At the value of ε being 0.04, the model achieves the best MAE value.Therefore, we set the default value of ε to 0.04.

Figure 9 .
Figure 9.The estimation error with ε increasing in different training set densities.

Figure 9 .
Figure 9.The estimation error with ε increasing in different training set densities.

Table 1 .
An example of a QoS matrix.

Table 1 .
An example of a QoS matrix.

Table 3 .
An example of QoS matrix M.

Table 3 .
An example of QoS matrix M.

Table 3 .
An example of QoS matrix M.

Table 4 .
Accuracy comparison (a smaller value means higher accuracy, response time dataset).

Table 5 .
Accuracy comparison (a smaller value means higher accuracy, throughput dataset).