An Entropy-Based Kernel Learning Scheme toward Efﬁcient Data Prediction in Cloud-Assisted Network Environments

: With the recent emergence of wireless sensor networks (WSNs) in the cloud computing environment, it is now possible to monitor and gather physical information via lots of sensor nodes to meet the requirements of cloud services. Generally, those sensor nodes collect data and send data to sink node where end-users can query all the information and achieve cloud applications.


Introduction
A wireless sensor network (WSN) should consist of many sensor nodes in a certain area, and it deals with issues of collecting, processing and transmitting monitored data [1,2].With the collected data, a WSN is able to detect and evaluate the events occurred just in the monitored field accurately.While WSNs are employed to monitor certain information in an environment, some sensor nodes will collect different information in the system, and that information would be sent to the sink node, which will be used by the end-user.WSNs have been successfully applied to many practical tasks, such as monitoring and early warning, resource management, and so on [3].Most recently, with the ever-increasing cloud computing services, it places greater demands on efficient data resource acquisition, reconfiguration, and management in cloud-assisted mobile computing environment [4][5][6][7].Consequently, it also provides an effective support to the applications of WSN.In this paradigm, under the cloud computing environment, it is now possible to monitor and gather physical information via WSN, and those collected data from WSN can be returned from the cloud to the end-users in accordance with their requests [8].For instance, the integration of WSN and mobile cloud computing is attracting growing interest in both academia and industry, where powerful cloud computing technique is developed to implement processing analysis for the data obtained through WSN and share the results with mobile end-users [9].
There is considerable interest in the paradigm of cloud-assisted WSNs as to the computing strategy of improving working performance in WSNs.Particularly, in a WSN, most devices of sensor nodes have limited power energy and it's hard to recharge these sensor nodes considering the big amount of those devices in some applications [10].Furthermore, because the sensed data in a continuous time period are probably closely related to their adjacent data, redundant transmissions between different nodes would accordingly consume much energy [11].Hence, some data prediction schemes have been proposed to reduce transmissions in a WSN.These schemes make the best of prediction algorithms to lessen redundant transmissions and improve fault tolerance while serving the purpose of reducing energy consumption and hence extending the network life cycle [12,13].
For instance, some computational models and algorithms, including replicated dynamic probabilistic model, auto-regressive model, and cluster-based data fusion scheme, were presented to decrease communication effort [14][15][16].In addition, some popular machine learning approaches, i.e., least squares support vector machine (LSSVM) and extreme learning machine (ELM), were also used to achieve data prediction in a WSN [11,17].The detailed analysis for the above methods could be found in Section 2.2.Here, it should be indicated that these methods may have some limitations while achieving ideal prediction performance.
To avoid those limitations, a novel data prediction scheme employing kernel least mean square (KLMS) and information entropy is proposed in this article to decrease unnecessary transfers in WSN.This method is named E-KLMS.KLMS is a typical kernel-based learning algorithm by combining the strategies of memory based iterative learning and error self-correction [18][19][20][21].For instance, in [21], we applied KLMS to predict missing values in Web QoS data, which is a good application of kernel learning methods.In addition, those random errors could be eliminated by calculating entropy weights [22].With E-KLMS, the computational complex could be accordingly improved while achieving high-quality solutions.It is worth to note that, different from our previous works [11,23] in which much time are spent in data training because those methods need to construct training set at every sampling point, the samples are only trained once with the proposed method in this article, which will save much time.In consideration of the above reasons, E-KLMS may perform better in data prediction in WSN.In the process of data prediction, two types of nodes employ the same prediction mechanism to maintain data consistency.The end-users define an acceptable threshold.If the derived error is less than a certain threshold, the transmission between different type of nodes is cancelled, meanwhile these nodes would regard the predicted value as the real value at current sampling point.Otherwise, the communication between sink node and sensor node would be normal.
The main contributions in this article are listed as below.
(1) The sensed data are pretreated using entropy technique before data prediction and fusion.
In so doing, it will reduce computational errors while decreasing energy consumption since entropy-based optimization can reduce the size of data set.
(2) In the flexible working mechanism of keeping the prediction data synchronous in WSN, E-KLMS can achieve better prediction performance regarding the training time and computational accuracy compared with other machine learning methods such as ELM and LSSVM.
This article is organized as below.In Section 2, some related works are analyzed.In Section 3, we present some backgrounds.The entropy-based kernel learning scheme is proposed in Section 4. In Section 5, we provide the experiments, and we summarize the conclusion in Section 6.

Data Prediction Approaches
Data prediction is now widely used in various fields such as industrial production, economic analysis, and so on.Generally, data prediction approaches could be divided into two major categories, i.e., statistics-based methods and learning-based methods.
The former mainly includes regression model, exponential smoothing model, and autoregressive integrated moving average model (ARIMA).Regression model is an effective tool for data analysis, while revealing the relationship between random variables.However, if there is no correlation between variables, the right results could not be obtained by using it.The exponential smoothing method is widely employed in time series forecasting.But the smooth parameter in single and double exponential smoothing models is unchanged in conventional practices [24].In statistics and econometrics, ARIMA is widely used under some circumstances with data of non-stationarity, and it could be employed to time series value prediction in the sequence [25].Then it should define a proper initial different step to avoid the non-stationarity.Nevertheless, the error is increasing with the extension of prediction time.
Considering the limitations of those statistics-based methods, some learning-based methods have been proposed to improve the prediction performance.Usually, they are implemented by using neural network (NN) based and kernel-based machine learning methods.Error back propagation (BP) is a common method of training NN used in conjunction with an optimization method such as gradient descent, but it has low learning efficiency and slow convergence speed [26].Support vector machine (SVM) is a significant kernel-based machine learning method which is proposed using the structural risk minimization principle rather than the empirical error commonly implemented in the NN, and thus it has better generalization ability and precision compared with traditional NN.However, SVM trains its model periodically and most of its nonlinear identification is offline.Moreover, its computation is very large [27].Recently, a novel single-hidden layer feedforward network (SLFN) based ELM is proposed to make the predicted value approximate its actual value while achieving a good generalization performance with extremely fast speed [28,29].But the prediction accuracy of traditional ELM is not satisfying in some cases.
Although basic SVM has its limitations, those kernel-based methods construct an increasing radial basis function network with an increasing memory structure [30].KLMS is just one of frequently-used and effective kernel-based methods.Moreover, in order to reduce random errors, some techniques are developed to improve the performance of kernel learning methods.

Prediction Schemes in WSN
Among the available approaches, the replicated dynamic probabilistic model was presented to reduce communications between sensor nodes and sink node in [14].In this scheme, it is always difficult to effectively mine spatial correlations among sensor nodes unless this model could be maintained while cancelling unnecessary communication between sensors.In [15], the authors proposed a data fusion strategy in WSNs with delay-aware structure, where sensor nodes are divided into many clusters of different sizes and every cluster could contact with the fusion center.However, it is very difficult to achieve ideal effect if the minimum available compression ratio is unknown.In [16], the authors applied auto-regressive modeling technique to WSNs and presented a novel data fusion method.However, the predicted errors are large while dealing with nonlinear data set under certain circumstances.Consequently, a multisensor data fusion method considering quality was presented to overcome a false indication caused by temporary loss of data, signal interference, or invalid data [31].Meanwhile, to improve the computational efficiency, a method was developed to compute a weighted average of sensor measurements while making the data fusion centre work directly with the compressed data streams [32].However, the model would become more complicated with the growing of the networks.Recently, many data prediction schemes were accordingly proposed through the use of advanced machine learning algorithms.In [17], a scheme GM-LSSVM in WSN was designed, where the initial prediction is implemented by using grey model (GM) and LSSVM is introduced to reduce the predicted errors.Although the LSSVM-based scheme may achieve a good prediction effect, it is impracticable in some monitoring systems in real time because there is an obstacle of employing LSSVM to predict nonlinear time series online.In [11], the author combined ELM and GM for data fusion.Different from GM-LSSVM, the calculating effort of this scheme is improved greatly due to the fact that the training phase in ELM could be completely implemented with a very fast computational speed.Through this method, the computational time could be reduced greatly.However, its precision would be worse than GM-LSSVM in some applications.Although those schemes improve the performance relating to some metrics, it may be difficult to achieve a good tradeoff between accuracy and computational efforts in WSN [23].

Problem Statement
In cloud-assisted network environments, the data collected from the WSN will be returned from the cloud to the end-users in accordance with their requests.Figure 1 shows such an example.In an area, some sensor nodes are deployed to gather various information.And those sensory data are transmitted from the WSN to the cloud.Meanwhile, the cloud services are available to the end-users who can conduct some applications through the use of sensory data sent by cloud.In the cloud-assisted WSN, it is expected that with the use of an effective prediction scheme, the performance of sensor nodes in WSN may be further improved.The prediction process of sensing temperature data in WSN can be described in Figure 2. In this example, the sensor nodes and sink node are supposed to employ the same prediction strategy to keep data synchronous.The sensor node sends the first 2500 data points to the sink node.Then they both conduct training to seek the hidden relationship between inputs and outputs, and then predict the following data points.If the predicted error is less than a certain threshold, the prediction value is considered as the actual value while cancelling the transmission between sink node and sensor node.Otherwise, the sensor node must respond to the sink node by sending its actual sensed data that are described by the red points in Figure 2.

Kernel Least Mean Square Algorithm
In the past few years, the idea of least mean square (LMS) finds a wide application in the field of engineering computing, but it is not suitable for nonlinear cases.In view of it, KLMS is proposed to deal with nonlinear data [30].At iteration i, the kernel function is employed to transform the input u(i) into a high-dimensional feature space F as ψ(u(i)).Applying LMS on the basis of the data sequence {ψ(u(i)), γ(u(i))} and we could get where ν(u(i)) indicates the estimation of the weight vector in higher dimensional space, γ(u(i)) denotes the desired output, and λ denotes the step-size parameter.Generally, we define ν(u(0)) = 0.By repeating rules Equations ( 1) and ( 2) through iterations and we could get Then, the weight estimation is characterized by all the former inputs, and its value depends on the predicted errors and step-size parameter.After that, with the given input u , the output value could be summarized as below According to the kernel mapping Equation ( 5), we could obtain the filter outputs on the basis of kernel evaluations where κ denotes a mapping function.And a function with Gaussian kernel could be expressed by Here, a is the parameter of kernel.Consider f i as the mapping relationship between the inputs and outputs at iteration i, thus the computational procedure of KLMS can be summarized as follows KLMS is shown in Algorithm 1 [30] and a diagram is given in Figure 3.Then, a(i) represents the coefficient in the i-th iteration, and a(i) = a j (i) Besides, C(i) represents the center set.With the new input vector u * in the i-th iteration, we can obtain its corresponding output Algorithm 1 KLMS Algorithm select the kernel κ and a proper step parameter λ, i = 1;

Information Entropy
Information entropy could survey the disordered measurement of information.The bigger the information entropy, the less effective information it contains in a system [33].On the contrary, the smaller the information entropy, the more useful information it contains in the system.Here, the entropy of a variable X is defined as follows where p(x) represents the probability of x, and b is assigned a value of 2 or e.Through the use of information entropy rules, the entropy weight method is developed to measure how much information the variable contains.The calculation steps are as follows.
For a data matrix where u j denotes the j-th input vector and Then, a new matrix could be obtained by U + = [u + ij ] m×n .The information entropy of the j-th input vector u j is Here, k is a constant which is related to the dimension of input vector m.Suppose that the system is in a complete disorder (i.e., u + ij = 1 m ), the entropy value H = 1.We have So, k = 1 lnm .Then, 0 e j 1.For the j-th input vector, the validity coefficient h j is defined by By applying Equation ( 17), the entropy weight of each input vector can be defined by It can be seen from the above equations, the bigger the entropy weight, the more effective information the input vector contains.In addition, it means that the vector is more important to the whole system.Otherwise, the input vector is less important.Specifically, we use this entropy weight to evaluate and optimize the input data in our kernel learning scheme.

Learning and Prediction in the Proposed Scheme
Aiming at addressing the nonlinear property of data set to reduce prediction error and achieve a better prediction performance, we present an entropy-based scheme called E-KLMS with the help of strong generalization ability of kernel learning method.In the proposed scheme, the entropy weights of input vectors are firstly calculated to measure their importance.Then we remove those input vectors whose entropy weights are less than the average value.Meanwhile their corresponding outputs are also removed.By this means, the original data set is optimized.Finally, we train the KLMS learning model with the modified training set to find out the hidden relationships between inputs and outputs.Through the use of the relationships and given inputs, the output values therefore can be predicted.
There are five steps in our proposed method.The specific descriptions are shown as below.
(1) The default predicted error threshold ε will be sent to all working sensor nodes, and all sensor nodes deliver the first n actual values to the sink node which are used as training set data.(2) These two kinds of nodes implement prediction according to the same prediction strategy and data set.Here is the sequence: where n indicates the size of data points, the inputs and outputs of training set are constructed as follows Here k denotes the dimension of input vectors and The entropy weight of each input vectors is calculate on the basis of Equations ( 14)- (18).
Then, we have where w n denotes the entropy weight of u n .(4) After comparing entropy weights of all input vectors with the average entropy weight, we remove those input vectors whose entropy weights are less than the average value, and their corresponding outputs are deleted from training set at the same time.( 5) With the modified training set, the KLMS learning model is trained.Then, the coefficient a between inputs and outputs is obtained.And it reveals the hidden relationships between inputs and outputs.(6) The testing input vector is constructed through the use of its previous k data points.Next, the prediction is conducted by using Algorithm 1.If the prediction error δ is less than ε, the transmission between these two type of nodes will be cancelled and the prediction value will be considered as the real value.Repeat this step until all the outputs are predicted.
Finally, the details of our approach E-KLMS is shown in Figure 4.

Analysis of Complexity
We briefly analyze the time complexity of the scheme proposed in this article.Obviously, E-KLMS spends most of the time on entropy calculating and KLMS model training.For every input vector in training set, the calculations would be conducted k times for generating the entropy weight, where k denotes the dimension of input vectors.And the number of input vectors in training set is n.Hence, the computational complexity of entropy calculation is O(kn).Similarly, the learning complexity of KLMS is also O(kn) [34,35].Consequently, it could be concluded that the complexity of E-KLMS is also O(kn).
Moreover, in [34], it was noted that the time complexity of ELM for n samples and k bases models is O(kn + k 3 ).When n is much larger than k, the complexity of ELM approximately equals to O(kn).That is to say, if the size of data set is large enough, the time of these methods mentioned above could be considered as the same.

Experiment Settings
Here, our experiments are concerned with the purpose of verifying the effectiveness of the proposed intelligent data prediction scheme.It should be noted that as shown in Figure 1, we do not provide other experiments for the data transmission and processing in the cloud computing environment since they are implemented in the usual way.We import two real data sets into our experiments.Considering the analysis in Sections 1 and 2, some traditional schemes have some limitations.The computational speed of LSSVM is not fast enough, and ELM has very fast computational speed but its precision may be unsatisfactory for some practical tasks.Meanwhile, KLMS may perform worse than LSSVM in some situations.Hence, we carry out experiments with some comparisons among our scheme, KLMS, ELM, and LSSVM to validate the computational speed and precision.LSSVM algorithm mentioned here is realized by employing the MATLAB Toolbox [36,37].
The real data sets are obtained from Intel Berkeley lab, and the distribution of sensor nodes is shown in Table 1 [38].Here, the X and Y coordinates of sensors are in meters relative to the upper right corner of the lab.These sensor nodes obtain different type of data, regarding temperature, humidity, and light.We select one sensor node randomly in the experiments to test the algorithm performance on predicting its humidity and temperature data.It contains 4000 continuous data values, where the first 3000 data points are employed to construct training set and the following 1000 data points will be predicted.As mentioned above, ε denotes the threshold defined by end-users.Therefore, we compare the prediction errors, computational time, and prediction precision with various threshold ε.Moreover, to achieve optional prediction effect, the kernel parameter a and step-size parameter λ are chosen from the range {2 −1 , 2 −2 , 2 −3 , • • • , 2 −20 }, respectively.

Metrics
In addition to successful prediction rate, two metrics, including the root mean squared error (RMSE) and the mean absolute error (MAE), are selected to test prediction precision in our experiments.Their definitions are summarized as below where u j is the predicted value and u j is actual value.Generally, a larger RMSE or MAE indicates a worse effect.From Equation ( 19), the squares of predicted errors are firstly calculated, and the average value is accordingly obtained.Then, calculating the square root is conducted.Therefore, this metric works when decreasing the influence of large errors.

Analysis for the Implementation of Our Work
As we know, WSN in cloud-assisted network environments have become so widespread in many special fields, where data prediction and fusion schemes are rapidly developed.Thus, improving the prediction precision and reducing the energy consumption are becoming increasingly important.Aiming at such issues of improving accuracy, calculation effort, and implementation framework, we propose an efficient data prediction scheme combining KLMS and information entropy.There are some obvious advantages during the implementation of our scheme.Firstly, the synchronization mechanism in this article could greatly improve the fault tolerance of sensor nodes in WSN.Actually, if there is something wrong with sensor node at some time, the sink node is able to predict data values without receiving data from the wrong sensor node, which will be really helpful in cloud computing environment.Secondly, since the sensor nodes have limited power and it is hard to charge the batteries, reducing energy consumption is urgently needed.Data pretreatment with entropy technique will decrease the size of data set and calculation effort.By this means, energy consumption could be immensely reduced.Thirdly, in addition to monitoring system, the proposed scheme could also be employed to other practical applications with information interaction in cloud-assisted network environments.

Test Case in Temperature Data Set
Figure 5 depicts the prediction effect for temperature set when ε = 0.2.It could be seen that the predicted values of four schemes are consistent with the real values.Figure 6 provides a comparison for the successful predicted rate as ε changes.It is obvious that as the threshold increases, E-KLMS always has the highest successful prediction rate.In other words, compared with other schemes, it will produce less communication overhead in WSN through the use of E-KLMS.Thus, the goal of saving energy and extending lifetime of the whole WSN could be achieved.
Figures 7 and 8 show the MAE and RMSE of these four schemes when the threshold ε changes.A smaller MAE or RMSE indicates a better forecast effect.The figures show that the MAE and RMSE of E-KLMS are minimum.As the threshold ε increases, the performances of LSSVM is very close to E-KLMS.Meanwhile, the comparison between KLMS and E-KLMS demonstrates the superiority of our proposed scheme.Considering the above results, the whole forecast effect of E-KLMS is best.
Particularly, when ε = 0.2, the prediction errors while using E-KLMS are shown in Figure 9.It could be observed that the errors are restricted within a small scope.

Test Case in Humidity Data Set
When ε = 0.4, the real humidity value and the predicted value using four schemes at every data point are demonstrated in Figure 10.It can be observed that the predicted results of these four methods are consistent with the real humidity data.
However, at some sampling points, ELM, LSSVM, and KLMS perform worse with bigger errors than E-KLMS.In the proposed method, the data containing less information which may have a negative impact in prediction process, is removed from training data.Therefore, the prediction effect of E-KLMS is better than other three schemes.
Moreover, Figure 11 shows the successful prediction rate with various threshold ε.The result of LSSVM approximates to that of E-KLMS.ELM has the minimal successful prediction rates and E-KLMS has the maximal ones.
Figures 12 and 13 show the corresponding MAE and RMSE of these four schemes when the threshold ε changes.Similarly, the MAE and RMSE of E-KLMS are minimum.When ε = 0.4, Figure 14 shows the prediction errors of E-KLMS.Furthermore, we compare the whole computational effort of these methods in data fusion and prediction process.The results are listed in Table 2. Specifically, we also compare the average computational effort of these four schemes for predicting a single data.The results are listed in Table 3.It is clear that ELM spends least time in calculation and the computational cost of LSSVM is much more than other schemes.Meanwhile, E-KLMS spends less time than KLMS in calculation.Although E-KLMS takes more time in entropy weight computation, its computational complexity will be decreased after optimizing the training set.Generally, calculation effort is closely related to energy consumption.If the sensor node spends less time in calculation, its energy consumption will be accordingly reduced.Considering the time consumption and prediction precision, our scheme E-KLMS is a better choice in this application field.Even though ELM spends less time in calculation, it is difficult to choose proper design parameters during the implementation process.Then, the prediction accuracy of ELM may be lower than other methods.Meanwhile, LSSVM has preferable prediction effect, but it takes a very long period of time in computation.KLMS learning algorithm just maps the data into high-dimensional feature space, through adjusting itself adaptively to the projection.Hence, with every input vector, KLMS is concerned with the function space.To that end, through a combination of KLMS and entropy weight technique, E-KLMS has its superiority when using the proposed computational paradigm.

Conclusions
Generally, in cloud-assisted network environment, the sensed data values in a continuous time period are closely related to their adjacent data, and the storage space and energy of sensor nodes are limited.Hence, the data prediction technique in WSN is so popular with the purpose of decreasing energy consumption, cancelling unnecessary data transmissions, and extending the network life cycle.A data prediction scheme E-KLMS using KLMS and entropy is accordingly developed to deal with the problems analyzed here.The presented scheme employs a forecast technique to maintain the data consistency.During the data prediction process, the entropy weights of training set inputs are firstly calculated.Then, for the inputs and their corresponding outputs whose entropy are less than the average value, we remove them which may cause relatively large errors in prediction.Next, through the use of KLMS, the training operation is conducted to find the hidden relationships between inputs and outputs.On the basis of it, the following data value could be predicted.E-KLMS could effectively solve the tradeoff problem between precision and calculation effort while immensely simplifying the training process.Furthermore, entropy-based kernel learning method ensures the prediction performance through improving the accuracy and reducing errors.It can be seen from the experimental results that our scheme can efficiently decrease redundant transmissions while improving the prediction precision.By this means, the energy of sensor nodes is also saved and the fault tolerance is improved.Meanwhile, the experimental comparisons between E-KLMS and other traditional schemes are implemented.The prediction accuracy of ELM is not satisfying although its computational speed is really fast compared with other schemes.On the contrary, the prediction effect of LSSVM is so good, but it spends too much time on computation.The proposed scheme E-KLMS performs best according to prediction precision, and its calculation speed is also competitive.Furthermore, through the comparison of E-KLMS and KLMS, both the prediction accuracy and computation metrics demonstrate the effectiveness of information entropy used in our scheme.In view of those facts mentioned above, E-KLMS may be a better choice in cloud-assisted network environment for WSN.

Figure 1 .
Figure 1.The framework of the cloud-assisted wireless sensor network.

Figure 3 .
Figure 3. Schematic diagram in the i-th iteration computation of KLMS.

Figure 4 .
Figure 4.The flowchart of proposed scheme.

Figure 5 .
Figure 5. Prediction value of four schemes for temperature item.

Figure 6 .Figure 7 .
Figure 6.Successful prediction rate with different threshold of four schemes for temperature item.

Figure 8 .
Figure 8.The RMSE of four schemes for temperature item.

Figure 9 .
Figure 9.The predicted error of E-KLMS for temperature item.

Figure 10 .
Figure 10.Prediction value of four schemes for humidity item.

Figure 11 .Figure 12 .
Figure 11.Successful prediction rate with different threshold of four schemes for humidity item.

Figure 13 .
Figure 13.The RMSE of four schemes for humidity item.

Figure 14 .
Figure 14.The predicted error of E-KLMS for humidity item.

Table 1 .
The distribution of the sensor nodes.

Table 2 .
Computational time for the whole process of data fusion and prediction.

Table 3 .
Computational time for predicting a single data.