A Hybrid Prediction Model for Energy-Efﬁcient Data Collection in Wireless Sensor Networks

: Energy consumption because of unnecessary data transmission is a signiﬁcant problem over wireless sensor networks (WSNs). Dealing with this problem leads to increasing the lifetime of any network and improved network feasibility for real time applications. Building on this, energy-efﬁcient data collection is becoming a necessary requirement for WSN applications comprising of low powered sensing devices. In these applications, data clustering and prediction methods that utilize symmetry correlations in the sensor data can be used for reducing the energy consumption of sensor nodes for persistent data collection. In this work, a hybrid model based on decision tree (DT), autoregressive integrated moving average (ARIMA), and Kalman ﬁltering (KF) methods is proposed to predict the data sampling requirement of sensor nodes to reduce unnecessary data transmission. To perform data sampling predictions in the WSNs efﬁciently, clustering and data aggregation to each cluster head are utilized, mainly to reduce the processing overheads generating the prediction model. Simulation experiments, comparisons, and performance evaluations conducted in various cases show that the forecasting accuracy of our approach can outperform existing Gaussian and probabilistic based models to provide better energy efﬁciency due to reducing the number of packet transmissions.


Introduction
WSNs are spatially distributed autonomous sensory devices that control physical or environmental conditions. Their applications are wide-ranging, including disaster management, congestion monitoring in smart cities, and ecological supervision. The energy consumption and stability are essential considerations and critical challenges because of the limited capacity of the sensor nodes' batteries and the impracticality of frequent replacement of batteries in WSNs. Data extraction and transmission of the data packets are the most important factors impacting energy consumption. This is mainly because the node needs to acquire all sensor readings continuously and precisely. Such nodes utilize vast quantities of energy during the accurate extraction, aggregation, and transmission of data.
Data prediction can be a logical way to deal with these issues [1], wherein one performs prediction operations using the past data measured by sensors. By using this technique, there is no need to transmit the data measured by the sensor node continuously [2]. In some existing studies, such as [3,4], simple techniques to develop a predictor for the network of sensors to transfer the data from the entire array of sensors to a base station are utilized. However, the prediction methods employed in these works might not function appropriately when data values change significantly and continuously. To address this issue, local prediction based on clustering in sensor networks can be an effective. The local prediction model would be energy efficient, since the shorter length of the routing path is used to transmit sensor data. However, clustering-based local prediction is facing a couple of challenges. The first challenge is related to the high cost of training a predictor which is affected by the trade-off between communication and computation. The second challenge is the dynamic characteristics of the sensor data-in particular, when the prediction models are not working well for a set of less predictable data.
Filtering and clustering methods can be used to improve the spatial and temporal correlation between the sensor data to reduce the energy. To that end, we used a self-tuning approach based on Kalman filtering (KF) [5][6][7][8], which demonstrates high potential due to its non-biased and optimized estimation while minimizing covariance errors. The researchers in [9] aimed at improving link energy consumption during the transmission process by reducing the number of hops. Based on this, we developed a novel hybrid model for data sampling prediction of sensor nodes. The energy consumption of nodes can be balanced by adding the new link which can improve data transmission with low delay.
The main objective of the data prediction model for cluster-based WSNs is to reduce energy consumption, which is effected by radio transmissions, by decreasing the number of transmissions between the sender and the receiver. To that end, it needs to perform data sampling predictions in the WSNs effectively while clustering and data aggregating to each cluster head for overhead reduction.
These key considerations also motivated us to use cluster-based WSNs in this work. The primary contributions of this paper are as follows: 1 We designed a model based on decision tree (DT), autoregressive integrated moving average (ARIMA), and Kalman filtering (KF) methods for data prediction in order to reduce unnecessary data transmissions and as a result decrease energy consumption. This model employs a minimal set of sensor nodes for data collection based on intra-cluster prediction and processing of data. In the proposed model, DT is used to filter data associated with each node in order to derive a tree for clustering the sensor data. Additionally, a self-tuning approach based on KF is utilized to optimize estimation while minimizing covariance errors. 2 We provide the MATLAB simulation-based practical demonstration of the proposed model to measure the data packet transmission and energy consumption in sensor nodes under different numbers of distributed sensor nodes in the network.
The remaining sections of this paper are organized as follows: Section 2 explains the related works. Section 3 represents the preparation techniques and the primary modeling process. In Section 4, the performance of the proposed approach against other related methods has been evaluated, and lastly, Section 5 concludes the paper.

Related Works
Data prediction includes developing an intellectual phenomenon such as a model explaining data evolution. In meeting this objective, data prediction methods can be divided into three major classes, including stochastic, time series prediction, and algorithmic techniques. The major disadvantage of the stochastic classification approach [10] is the high cost of computation, which could be too much for sensor devices with limited analysis and power. Stochastic techniques tend to be more suitable to be used in the presence of several robust sensors [11]. The methods for time series forecasting [12] can offer accuracy that is satisfactory when using simple techniques (i.e., low order AR/MA).
Most current researchers in the WSN fields are concentrated on stability, efficiency in energy consumption, scalability, and improving the operational lifetimes of sensing nodes. Many clustering algorithms have been developed for a different types of applications [4,13,14]. Clustering techniques are used in WSNs for organizing and grouping nodes and specifying which specific node in each cluster undertakes the task of intra-cluster and inter-cluster data communications. This technique can be effective for reducing the number of data transmissions, so can reduce power consumption and improve the lifetime of the network. This technique for local prediction of sensor nodes is an example of one of these applications [15] where the cluster head acts as a sensor node and also keeps the historical data for every cluster's sensor node.
In [16], one of the distributed voting algorithms was developed. The sensors' tree structure acts like a tiny, robust computational device at the tree's root, built to overcome the issue of classification in using the algorithm. Some methods are presented in [3,4,13] for solving such failure issues. Reference [3] proposed a data prediction model that can reduce load on SNs to enhance network lifetime. Experimental results with the proposed model improved better accuracy and energy consumption in comparison with the traditional data prediction methods like linear regression model. In [4], the authors proposed a data reduction mechanism by building a model on both the edge node and the IoT devices. In [13], the authors proposed a data-conscious energy-saving method based on cluster head. The data are reduced by a prediction algorithm, which is done by the ARIMA model. Each round, the data model predicted is compared to the observed data. If there is dislodgement beyond the special threshold at that place, nodes send a difference of information to the cluster head. The data differences collected by the cluster head are compressed and the compressed information is sent to the sink node afterward. However, such prediction mechanisms have some disadvantages, especially for such things as a high frequency motion sensor during data collection. Two adapted protocols, namely, energy-LEACH and multi-hop-LEACH were proposed in [14]. Multi-hop LEACH protocol has offered to expand the communication trade-off among the cluster head and the sink. The term of sink refers to the cluster heads for the predicted model. Many of the advanced methods, such as those in [11], did not utilize the sensed data exchange until the model was available in the time series forecasting methods. An exciting direction includes a multi-model technique adoption, as used in [12]. In [15], the authors introduced the basic principle for a flexible method for discovering the event at constant delay y considering energy consumption reduction. The time-division multiple access (TDMA) cycle, which is a channel access method for shared medium networks, is used on the nodes under the same parent node. Therefore, a efficient data prediction technique on multivariate time series is required. The long range (LoRa) communication technology [16] has been proposed to solve the main issues of IoT applications, such as scalability and multiple sensor integration. This architecture contains gateways, sensors. Additionally, there is a network server for user applications in order to access data and a server for the application's data.
Prediction-based data-aware clustering (PDC) [17] is also a prediction method based on data-aware clustering to arrange a steady cluster of nodes. The presented prediction method shows high accuracy and low computational cost. Researchers in [18] aimed at implementing the multi-level route-aware clustering (MLRC) method for preserving energy in decentralized clustering protocols. The proposed protocol constructs a cluster and routing tree to decrease an unnecessary generation of routing control packets.
A data prediction transmission scheme based on clustering is proposed in [19]. In this model, the cluster aggregates data from the cluster head nodes and then transmits the data to the base station. However, this model has a limitation in solving the reconstruction problem of the lost data, and this issue leads to high energy consumption.
The correlation-based data collection algorithm was designed in [20]. In this algorithm, the network is divided into two main parts-clusters and sub-clusters. A amount of data needs to be collected during the data collection. Additionally, the multivariate Gaussian model is used for estimation of non-transmitted data.
Based on the available knowledge, there is a lack of an energy-efficient data collection model that not only collects and transmits data efficiently but also reduces both energy consumption and overheads in WSN.

The Proposed Model
In this work, we propose a model for energy-efficient data collection based on decision tree (DT), autoregressive integrated moving average (ARIMA), and Kalman filtering (KF) methods (see Figure 1). In the following sections, we first describe the algorithms and then describe the proposed hybrid model.

The Algorithms Employed
• KF is an algorithm that provides estimates of some unknown variables given the measurements observed over time. Kalman filter is used to estimate states based on linear dynamical systems in state-space format. It has a relatively simple form and requires small computational power.

•
DT is a popular classification algorithm to understand and interpret. The goal of DT is to create a training model that can be used to predict the class or value of the target variable by learning simple decision rules inferred from prior data.

•
ARIMA is an analysis model that uses time series data to predict future trends. It is a hybrid autoregressive model with the moving average model.

The Hybrid Method
This model predicts the data sampling requirements of sensor nodes. In the proposed model, Kalman filters are used to filter the sensor data streams associated with each node. This is based on the use of error covariance's self-tuning with fast adaptive capabilities under sudden signal input changes. The DT algorithm uses filtered data associated with each node to derive a tree for clustering the sensor data associated with each node. By combining the decision tree and Kalman filter, a high-precision prediction algorithm based on ARIMA assisted model is proposed. Compared with other machine learning methods like the deep convolutional network (DCN), the decision tree is intuitive and fast and is computationally efficient and suitable to be integrated into a real-time prediction system. Here, the hierarchical structure of the tree enables data within each node to be clustered to form some specific clusters where each cluster has its cluster head. A hierarchical structure is employed where the neighboring nodes should be clustered against a cluster head node. This hierarchical structure can decrease the communication cost and preserve energy by adaptively using the cluster head node for data collection in the cluster's coverage [21].
As explained above, DT is a classification algorithm that uses information about all the nodes to derive a tree. The cluster heads collect the information about the node, such as ID, residual energy, and position of all nodes in the cluster. The cluster heads store this information in a list form.
After collecting this information from all nodes, the DT algorithm for nodes clustering is performed. Then, the cluster heads can interact in a linear manner with the base station each time; the base station runs the DT algorithm and picks the nodes appropriate to be the next cluster head [22]. Temporal correlations between sensor readings can be presented in the time series comprising of the clustered tasks (represented by their centroids) associated with each sensor node. The correlation can be measured using mathematical models, including the linear ARIMA model [23]. Thus, the time series can be calculated using appropriate mathematical models, and the number of model parameters usually is significantly lower than the length of the whole series. The sensor node can then choose to transmit its values of data selectively. In Algorithm 1, the proposed hybrid model that includes the pseudocode description of the decision tree algorithm, cluster head procedure, and cluster nodes procedure is illustrated.
The Kalman filter has a relatively simple form and requires small computational power. Applying an effective data filtering structure is needed for removing the redundant data at the sensors along with at the cluster-head nodes to reduce the number of data transmitting. This study proposes a distributed KF to process data series noise for extending the sensor nodes' lifetime by reducing data transmissions redundancy and conserving power during continuous data collections. The filtering of data to reduce the error of covariance is computed by the general theory on state estimation. Consequently, the adaption of the end filter to changes is established based on the consensus and its optimal estimator. It is assumed that a linear dynamical system is defined by the following Equation (1): whereby t represents the time index; x(t) represents the state of a system; the F matrix M × M describes the way the system changes over time; and w(t) calculates the uncertainty process as a function of the weight matrices, modeled by utilizing the white Gaussian noise at zero-mean with covariance matrix known as Q w . The N sensors network performs the system's monitoring, engaged at specified areas randomly, and its observations using time t are as follows: The N-dimension vector y(t) collects the N sensors' observations; the matrix H(N × M) measures time-varying channel fading which impacts every position's observation; and v(t) is aligned with the observation noise, which is modeled with the Gaussian with Q v covariance matrix zero-mean. This is commonly used in stating the linearity and Gaussian assumptions; the Kalman filter describes the scheme that is recursive [24] and offers a system state's optimum estimates. Every node i in a scenario that distributed only has access to its y i (t) observation. As such, every node should calculate the expressions using Equation (3) according to only its observation and through the exchange of information with just the one-hop neighbors. Due to the time (t) vector observations, the a priori estimationx − (t) is defined by the filter to yield the estimation for posterior, as shown in the following: where G(t) represents the filter gain. The expression in (1) in this case, can be measured using all the distributed nodes, and every node can run the local filter version. The performance of the distributed Kalman filter is effective, adaptable, and strong for stochastic assessment on the WSN, which is influenced by various network nodes' unreliable connections. To deal with the missing data, it assumed that missing values were randomly distributed. Before making predictions, the Kalman filter marginalizes the lost data by merging the estimations of all the nodes that are active. We denote the missing parts of measurements asz, and we assume the estimated information asz, i.e., z = [z,z]. Whilez is not empty, p(q | c, z) is used to marginalize over the missing dataz to predict p(q | c,z).
Consequently, the Kalman filter keeps and sends the attributes with missing data as the essential attributes into the decision tree.

Adaptive Update of Clustering by DT
A method for adapting the clustering is needed to measure the local operations. One option is a total re-clustering; however, this could be quite costly as it includes the setting up of a map for clustering of all the data points for each sensor node. The complete transformation in the cluster membership suggests that the total historical data and models should be built from the beginning. The DT algorithm presented in this section is for performing the initial clustering, and subsequently, the dynamic splitting and merging of clusters with low-cost communication. At the initial phase, the randomized node sets are chosen as the cluster heads. Upon receiving control information of every sensor node, the DT algorithm should run to select a new suitable cluster headset from the entire set of sensor nodes. First, it creates a cluster tree using distances between observations in the data by using Euclidean distance. After that, the list of cluster heads is transmitted to every sensor node. Next, each sensor node attaches itself to its cluster head.
Most algorithms for clustering can be utilized in such study; the update for adaptive clustering is normally needed to measure the change in the patterns of the locality. A total re-clustering is an option; however, this could be quite costly. That includes the setting up of a map for clustering of all the sensors. The complete transformation in the cluster membership suggests that the totality of historic data and models should be built from the beginning. The DT algorithm is presented in this section, for the clusters' dynamic splitting and merging with a low-cost communication. Clusters of sensor nodes that can be either active or inactive are taken into consideration. Active sensor nodes continuously monitor the x attribute while generating the x t data values at every time tick t. At least one sensor is responsible for monitoring each point of the area. The connected nodes can create the set of active sensors, the result of the clustering mechanism. To assist in the computation of intersection points, each node keeps a table which contains the information of each neighbor (such as node ID, position and residual energy of all nodes, location, status: active/inactive, number of nodes before eliminating i-th node, number of nodes after eliminating i-th node, number of levels before eliminating i-th node, number of levels after eliminating i-th node) and periodically updates its current location and status. If the sensors are active, they monitor and generate a datum x t at every time point t. A sensor node with no ability to make local predictions at cluster heads transmits the totality of values of the data to the cluster head and then calculates the distributed data accordingly. The sensor node can transmit its values of data to the cluster heads selectively using local prediction based on an error bound the error bound > 0. In general, the proposed scheme in this study was developed according to the following factors: (i) distance of a node from the cluster centroid; (ii) the degree of mobility; (iii) the remaining battery power; and (iv) the vulnerability index. Obviously, the vulnerability index shows an effective way for computing the node vulnerability in sensors' tree structure. To compute the vulnerability factor of each node, the following equation can be used (4): where n i is number of nodes before eliminating the i-th node, n j is the number of nodes after eliminating the i-th node, a i is the number of levels before eliminating i-th node, and a j is the number of levels before eliminating i-th node. The BS measures the distance in every node and the r cluster centroid. A shorter distance denotes a higher probability of the node to be a cluster head. Moreover, the higher the power of the battery, the higher the probability of the node becoming a cluster head. The mobility of the node affects the lifetime of the network significantly. Upon completion of the start-up phase, every sensor node transmits its data to a specific cluster head, which, in turn, broadcasts the members' list to other nodes [25]. The process of the cluster head selection is repeated using a predetermined interval or via meeting the criteria for the threshold value. The selective transmission model is the approximation of -loss: Due to the error bound > 0, a sensor node transmits the x t value to the cluster head if |x t −x t | > , wherebyx t represents a predicted data value. If the chosen value is closer to the predicted value, it would be pointless to be documented. Variation of the chosen value from the predicted value is an important consideration in measuring data distribution.

ARIMA Prediction Model
In this study, the methodology aimed at designing the best possible ARIMA-based model for predicting the energy consumption of sensor nodes during data collection. ARIMA models are univariate as they utilize the history of the time series for expressing how the variables react with an earlier stochastic variation. ARIMA might be executed through a four-step process after gathering historical data of the relevant parameters. The four steps include: (i) identifying the model; (ii) estimating the parameters; (iii) recognizing the model; and (iv) verifying and predicting the model [26]. A general ARIMA (p, d, q) model describing the time series expressed as follows: whereby x t and e t represent energy consumption and random error at time t, respectively. B refers to the backward shift operator described using Bx t = x t−1 , and about ; d represents the order of differencing; and θ(B) represent autoregressive (AR) and moving average (MA) operators of orders p and q separately, as described in the following: whereby φ 1 , φ 1 , · · · , φ p represent the autoregressive coefficients and θ 1 , θ 2 , · · · , θ q represent the moving average coefficients. The time series x t can be represented using the linear transfer function of the noise series: where ϕ(B) can be computed as ϕ(B) = θ(B)/φ(B). Figure 2 shows the Flow of proposed work.

Experiment Evaluation and Analysis
We conducted a series of experiments to compare the performance of the proposed algorithm with alternative techniques. We used the sensor datasets of the Intel Lab data [27] to measure the performance of our prediction method. The data collected from 54 nodes spread around their laboratory during one month. We filled in the missing data values with the averages of the values at different time epochs. We selected the entire temperature records for one week (15 September to 21 September). Every node contains an average of five nodes in the radio range. Data in each node i were modeled according to x t = α i x t−1 + e t , whereby e t ∼ N(0, 0.01) and α i ∼ N(1, 0.01). All nodes were used to generate 2500 values of data. Each node initialized according to α 1 = 1 and x 0 = 0. The distributed density estimation methods based on parametric techniques were chosen for comparison [28][29][30]. The method [31] stores the last datum sent at the sink and sensors. If the data value contains an error bound, then the sensor node cannot broadcast the data value. This model has a training phase with a probability density function (pdf) that refers to a set of obtained attributes. These consist of an advanced robust aggregation technique to extract statistical information from sensor networks described in [30], and that includes a distributed algorithm to compute a Gaussian mixture model (GMM), and a probabilistic model [32].
For a fair comparison, our model uses a GMM algorithm for cluster breakup or inter-cluster aggregation stage. Since we propose to use a hybrid DT optimized ARIMA model in our system, the two methods have similar message sizes to have a fair comparison. The cluster head preserves a circular array of historical data for each cluster member. Since our model assists cluster members in place of cluster heads, we quantify the energy consumption of cluster members. There exist 15 clusters and 45 cluster members on average. We show the sum of the energy consumption of all these 40 cluster members [16]. As energy consumption data involve non-stationary properties, different methods must be applied to change the non-stationary properties. An ARIMA model [ARMA (p, q)] for x time series involving n cases was predetermined through forecasting Equations (9) and (10) as follows: where x t indicates the original results, andx t represents the forecasting data. The m-dimensional vector ω t represents unmodified random data with a mean of 0 and covariance matrix of R, and θ = (p, q) shows the order of the forecast for which p is the number of auto-regressive expressions, q is the number of lagged prediction errors and A 1 , · · · , A p and B 1 , · · · , B q are the m × m coefficient matrices of the multivariate (MV) ARIMA model. Before implementing the ARIMA, the stationary process was tested in terms of the AR and MA coefficients. The coefficients of the AR and MA matrices are presented in Table 1. The data represent the optimal values of the AR and MA coefficients. Note that as measurement error increases, the AR and MA coefficients will tend to become zero. According to Table 1, all the AR and MA coefficients were weak. The coefficients should be precisely equal to one. Furthermore, their high standard deviations between 4.05 and 7.85 were not significant for all the cases. The heterogeneity of data distribution resulted in low precision (or relatively high standard deviation). After testing the AR and MA coefficients, it is concluded that the mixed ARIMA model is the desirable choice.
In ARI MA(p, d, q), p corresponds to the number of auto-regressive terms, q corresponds to the number of lagged forecasting errors and d corresponds to the number of non-seasonal differences. It is assumed that random errors (ω t ) are independent with equivalent distribution with a steady variance. To evaluate the model's performance, main indices such as average relative error (ARE), (RMSE), and (MAE), were measured and revealed the prediction accuracy of the models. The basic ARIMA model parameters are shown in Table 2.  Figure 3 shows how the proposed algorithm works on the member nodes and cluster head node of cluster 1. The error threshold is set to ±0.1 • . The real temperature value and prediction value for 2000 samples are plotted in Figure 3a. Figure 3b is the larger version of Figure 3a for 600 samples from time 600 to 800. Figure 3 reveals the fitting and forecasting results. The optimized hybrid model decreases the forecasting error based on the Kalman filter-DT optimized ARIMA model in all the datasets. This represents an important improvement for energy consumption forecasting. The proposed model has been compared with [13] that used ARIMA for WSN data prediction. In [13] ARIMA has been applied to stationary data, which means the data series have no trend, with little variations of the mean that has a constant amplitude and has its short-term random patterns looking the same over time.
Traditional performance indices, such as (ARE), (RMSE), and (MAE), are used as measures for prediction accuracy. The number of nodes was varied to examine the performance sensitivity. Note that each one denotes the ratio between the transmission energy consumption and the prediction energy consumption. These indices are shown in Figure 4 as follows: Figure 4 shows the performance indices (ARE), (RMSE), and (MAE) over the number of nodes i.e., 500, 1000, and 1500, in which the our hybrid model involved the lowest RMSE, MAE, and ARE, as assessed for various numbers of nodes. This finding indicated the capability of the hybrid model in successfully reducing the error of the predicted values. With consideration of the different suggested models in the present work, acceptable outcomes can be demonstrated with great correctness utilizing superior linear and non-linear methods, particularly when both models exhibited precise and great forecasting robustness. Due to the ARIMA model's capability to satisfactorily estimate the linear part of data, it is demonstrated that the use of the DT and KF algorithms for non-linear parts of the model could increase predicting results more efficiently.  A MATLAB simulator was implemented where simulation parameters were fixed based on hardware configurations of MICA2 [31]. For conciseness, only the representative results were reported. A real-world sensor dataset, Intel Lab data [27] dataset was used for scalability analysis of the algorithms which is shown in Figure 5. Greater scalability of the proposed framework could be observed by integrating data aggregation. This is due to the ability of the distributed technique to complete the local updating of data and local prediction, whereas the centralized system sustains a high-communication cost to transmit the data to the sink. The cluster head collects data within each cluster. Then, the cluster head should complete local prediction on the data distribution. The members of each cluster need to execute prediction, and then predicted data should transmit to the cluster head. Hence, each cluster head has a clear perspective of all sensor data through the cluster. Thus, the communication cost significantly decreased. The energy consumed for a fixed number of frames and different head-sizes compared in this section. We assess the number of transmitted packets by all nodes in the models [27,28].   Figure 6 demonstrates the scalability of the different methods with the size of the network using a real-time dataset [32]. The Gaussian distribution scheme in [28] suffers a high-communication cost for spreading the data to the sink. Additionally, the probabilistic model described in [24,29], is complicated by high-level computations such as aggregating, with the costly computational costs. This is because of local and global probabilistic prediction models that are constructed at the sensor nodes and sink respectively. After comparison, we can realize that the increased scalability of our algorithm using the aggregation of data. Our model can perform local data updates and local data prediction. Thus, our proposed algorithm has better performance, as it is possible to utilize our data aggregation approach to process raw data at the sensor nodes or at middle nodes to decrease packet transmissions and save energy. In this section, the energy consumption for a fixed number of frames and diverse head-sizes is compared.  Figure 7 displays the variation in the energy consumed per node in terms of the number of clusters and network diameter. The clusters' number and the amount of energy consumption in one round are designated on the x-axis and y-axis, respectively. The energy consumed in one node is estimated by Equation (9). Energy (E) is the energy of a node at a fixed time. This energy should be appropriate for at least one round. Each node becomes a member of the headset for one time during one round and a non-cluster head for ( n km − 1) times. There are k clusters and k nodes. In each iteration of DT, k nodes were selected for each cluster. Thus, km nodes were selected in each iteration as members of the headsets. Some iterations needed for all of the selected n nodes using ( n km ), which is the number of iterations needed in one round. Since there are m nodes in a headset, ECH/inter/cluster is uniformly shared between the headset members, as follows:

Samples
The graph demonstrates that energy consumption decreased where the number of clusters is improved. The graph illustrates that the optimal sort of clusters lies between 35 and 60 for 1500 nodes. There exist 15 clusters and 45 cluster members on average. Additionally, the graph illustrates that the sensor nodes should send data to distant cluster heads when the number of clusters is lower than the optimal range, for example, 20. In contrast, when the number of clusters is larger than the optimal set there will be further communications to the distant base station.  Figure 8 illustrates an energy dissipation model used by the transmitter and receiver. Computation of energy required is carried out using the model [9] during data transmission. The transmitter dissipates energy to perform radio electronics and amplification. The receiver dissipates energy to execute radio electronics only. Transmission (T trans ) is the time required to transmit the message through the transmission channel given by (14).

Conclusions
In this paper, an energy-efficient data collection model based on clustering and prediction has been proposed. In the clustering phase, sensor nodes form the clusters while the cluster heads collect and store the data measured by sensor nodes. The proposed hybrid prediction model was utilized to examine the trade-off between the communications and prediction. The performance of the model has been evaluated by using various numbers of nodes at different periods. Based on the simulation experiment, it was proven that the proposed model significantly outperformed the other related approaches in terms of prediction accuracy and energy efficiency. As a result, it can reduce the energy consumption used for data collection in hierarchical networks and extends the network lifetime-incredibly, even when a high number of clusters are allocated. The traffic generator, such as a LoRa network operator, can be integrated into the proposed application as future work. We plan to use the LoRa traffic generator to easily field test the network distribution of proposed method on a large-scale geographical area, to reduce the initial deployment costs.