A Machine Learning Approach to Solve the Network Overload Problem Caused by IoT Devices Spatially Tracked Indoors

: Currently, there are billions of connected devices, and the Internet of Things (IoT) has boosted these numbers. In the case of private networks, a few hundred devices connected can cause instability and even data loss in communication. In this article, we propose a machine learning-based modeling to solve network overload caused by continuous monitoring of the trajectories of several devices tracked indoors. The proposed modeling was evaluated with over a hundred thousand of coordinate locations of objects tracked in three synthetic environments and one real environment. It has been shown that it is possible to solve the network overload problem by increasing the latency in sending data and predicting intermediate coordinates of the trajectories on the server-side with ensemble models, such as Random Forest, and using Artiﬁcial Neural Networks without relevant data loss. It has also been shown that it is possible to predict at least thirty intermediate coordinates of the trajectories of objects tracked with R 2 greater than 0.8.


Introduction
Technology has been increasingly present in people's daily lives, and the growth of the Internet of Things (IoT) applications is one of the accelerators of this process. Currently, many objects are able to interconnect by transmitting and receiving data from the cloud, enabling communication between people, processes, and environments [1][2][3][4]. This popularization of the IoT is especially due to great technological advances in the area of embedded systems, where countless opportunities arise to be commercially exploited, such as in the context of smart homes, with the innovation of domestic appliances [5], and in the context of personal assistance, with "smart" personal items emerging, contributing to the comfort or assistance of the hearing impaired, for example [2,6]. In 2019 there were around 36 billion IoT devices, however, with this number growing by 12% annually, a total of 125 billion is expected by 2030 [7].
In the context of monitored objects indoors, companies have offered tracking services that can use numerous technologies such as Bluetooth Low Energy (BLE) [8], Ultra-Wideband [9], gyroscope, and accelerometer [10]. In this way, there can be hundreds or even thousands of objects tracked simultaneously in a single environment, which can lead to an overload of the network where this data transmission is happening [11,12].
A possible solution to reduce the traffic load on the network is to increase the sending data latency. For example, if objects send their coordinates across the network every two seconds, the latency could be increased to five-second intervals to provide traffic relief. However, increasing latency has the direct consequence of reducing the accuracy of paths stored on the server-side, which can make future analyses that characterize movement patterns that occur within these environments difficult.
In the literature, it is possible to find numerous articles that aim to interpolate/predict coordinates to discover points on a path [13][14][15][16]. These articles focus on predicting trajectories in open environments, with typically tracked objects being vehicles, which can lose a global positioning system (GPS) signal in tunnels or other areas that restrict satellite communication. Among the most used techniques, we can mention the use of interpolation [17], assuming midpoints or also applications of the Kalman Filter (KF) in conjunction with Constant Turn Rate and Acceleration models [18] as trajectory estimators. The effectiveness of these techniques is especially worth considering because, in the context of predicting trajectories in open environments, there is only the need to predict trajectories at a macro level. For example, it is not a problem if the interpolated path passes over a corner; it is more important to know which streets the vehicle passed through.
Despite the effectiveness of these techniques, in closed environments the demand for understanding micro movement patterns is greater. In the specific context of a supermarket, for example, using linear interpolation (LI) would imply predicted paths crossing aisles, which could confuse the analysis of subsequent trajectories of the tracked objects/people. In this context, the answer to the following questions would not be accurate: Which sections or products did a particular employee, who wears a traceable wristband, pass through? How many square meters were cleaned with a tracked vacuum cleaner?
In this article, we propose a machine learning (ML) model for the problem of interpolating coordinates of tracked objects indoors. We use real data collected by researchers at the University of Guelph [19]. To allow for a more complete evaluation of the proposed modeling, synthetic data were generated for three different environments. We aim to answer the following research questions: • RQ1-Would the modeling used be able to predict routes that avoid obstacles in closed environments? • RQ2-How much can latency be increased without loss of performance from ML algorithms? • RQ3-What is the impact of the amount of data on the performance of ML algorithms?
This article is organized as follows: Section 2 will present some related works and how this article intends to contribute to the theme. Section 3 will show how the synthetic environments were built and how the data generation process took place. In Section 4 the modeling used for the ML models will be presented in detail. In Section 5, the results of this research will be presented and the research questions will be answered. In Section 6, final considerations will be made, considering the results achieved.

Related Work
One of the most used solutions to predict intermediate points is the LI method [20]. This method builds a continuous function from data discs connecting two interpolated points [21]. Wu et al. (2020) [22] states that LI is a method that has several variations and is used to solve problems in many areas such as computer vision [23], digital photography [24], computer graphics [25], and image calibration [26]. In this article, a method is presented that improves the performance of LI; in addition, a method to evaluate the quality of the interpolator is described [22].
In 1960, Kalman developed a recursive solution to the linear filtering problem of discrete data [27]. His method became quite famous and was used in several applications [28][29][30]. With this popularization, the algorithm came to be known as the KF in honor of its creator. Lam et al. (2018) [31] uses the KF combined with an ML model, Optimized Support Vector Machine (O-SVM), in order to correct coordinates in a sample of collected data. As the acquired data was noisy, before being sent to a server, the KF was applied for pre-processing to smooth it out. In a second step, on the server, the O-SVM model trained on this clean data was used to correct it. Finally, this method was compared with others that were already known: CoreLocation Framework, Open ALTBeacon Standard, Linear Regression, and Non-Linear Regression. What was obtained was an average error lower than these other methods. Li et al. (2018) [32] performs the same initial procedure, uses the KF to smooth the data coming from BLE trackers and then the Back Propagation Neural Network optimized by Particle Swarm Optimization is applied, thus repositioning the coordinate that will be sent to the server. Hirakawa et al. (2018) [33] developed a method based on reinforcement learning to fill in missing coordinates in animal paths in nature. In theory, the GPS should register them every minute, but for various reasons, this did not happen. So a reward space was built based on the environmental preferences of the studied species, a seabird. In this method, in order to make the prediction smoother, the paths made between preference points are straight, because these birds can take indirect paths.
Chai et al. (2020) [34] developed a method by using Convolutional Neural Network in seismic data to reconstruct missing parts, whether regular or irregular. The presented result surpassed the method based on rank reduction [35]. Similarly, Kang et al. (2019) [36] also did work related to data reconstruction, but in time series. River water flow data from 1970 to 2016 were used and 1586 data reconstructions were performed by using MissForest, an ML algorithm. As a result, the algorithm presented satisfactory results (R 2 > 0.6), being much better than the linear regression model in all analyses performed.
AlHajri et al. (2018) [37] developed a model to classify types of indoor environments by using IoT devices, such as a laboratory, narrow corridor, lobby and a more open area. In this work, Decision Trees, Support Vector Machine and k-NN algorithms were used, together with Channel Transfer Function (CTF) and Frequency Coherence Function (FCF). They concluded that the combination that presented the best result was with k-NN, using CTF and FCF, resulting in an accuracy of 99.3% and prediction time below 10 µs. Table 1 presents the contributions of the main articles cited in this section. The main contribution of our article is to solve the problem of network overload caused by numerous IoT devices trying to access it simultaneously. By predicting server-side paths using machine learning, it could be possible to increase the latency of sending data on the client side, reducing the load of data traveling over the network. We also compare the accuracies of a LI, some ensemble models and an Artificial Neural Network (ANN) model [38,39]. This article represents an evolution of a recent research published in Portuguese [40]. The research evolved by working with three dimensions (environments with more than one floor), by using a significantly larger database, by using more ML algorithms to validate the results, improving discussion, and, finally, by expanding the related works.

Dataset
In order to carry out this research, data from trajectories in two or three dimensions with a time interval between the coordinates that represent the movement of an object in a closed environment were required. A data generator was created by using the UNITY development engine [41]. This engine was used to model environments and generate object trajectories within them. An entity was created that represents a moving object and another that randomly selects new destinations for that object. At each new destination, the tracked object performs a new optimal path in relation to the distance traveled within the environment. A tracker was attached to return the position of the object tracked every frame per second of the simulator and record its X, Y, and Z coordinates in a file. Three environments were created, namely Environment 1, Environment 2, and Environment 3, with manually positioned obstacles.
We also used a set with real data, obtained in [19]. This dataset was populated by tracking the locomotion of individuals within an office for twenty days. Each individual screened was uniquely identified and data from only one person was used, resulting in 17,050 records. To perform the tracking, Raspberry Pi and BLE trackers were used. The data was recorded by using Beaconpi software [42]. This environment will be called Environment 4 in this article.
The four environments can be seen in Figure 1 and the process of simulating routes indoors can be viewed at https://youtu.be/9gSDB31t3Yc (accessed on 30 March 2022). In (A-C) there are also yellow squares, which represent access stairs from one floor to the other. In images with a black background, each white dot represents a place where an object was during its tracking.

Methodology
In order to answer the research questions, four prediction models based on ML algorithms were used (Random Forest (RF) [43], Ada Boost (ADA) [44], Extreme Gradient Boost (XGB or XGBoost) [45], Histogram-Based Gradient Boost (HGB) [46,47] and ANN). In addition, the LI method was used as a baseline to compare with the efficiency of the interpolations performed by ML models.
Then, two validation scenarios were performed. The first one aimed to observe the effect that the variation in latency time would have on the quality of the interpolated routes. This was done by varying the number of points to be interpolated between a start and end point. The second aimed to observe the impact of the quantity of samples on the quality of interpolation of the trajectories. This is done by gradually increasing the number of examples to be used in training the models.

Feature Modeling
The features were extracted from the data described in Section 3, and all algorithms used in this article use the same modeling. The features used in the modeling are listed below:

1.
Starting point of a path (P i ), composed of X, Y, and Z coordinates; 2.
End point of a path (P f ), also composed of the coordinates X, Y, and Z; 3.
Relative time at which the end point was recorded (T f ). The value of T f is calculated by adding the number of points from P i to P f multiplied by the latency. It is important to mention that the time of P i is a reference value, so it is always 0; thus it is not necessary to use it as a feature; 4.
Relative time at which the intermediate point is to be predicted (T n ) with n being the indicator of chronological order of the point. For example, if you want to predict only one point between P i e P f , then there will be an observation with T 1 . If we want to predict m intermediate points, then we will have an observation with T 2 , and another with T 3 before finally reaching the last point to be predicted (T m ).
The targets of each observation are the X, Y, and Z coordinates of each point located between P i e P f . Furthermore, in order to build several examples, we used the concept of the sliding window [48] which is illustrated in Figure 2. We also used a parameter that indicates the number of intermediate points between P i and P f , the d. In Figure 2, the invariant Z axis is assumed for ease of visualization. Each gray circle represents a point (which have X, Y, and Z coordinates, and the relative time that was recorded) and intervals with d = 2. All windows, composed by P i ,P f and a Target, are size three, so we have 0 ≤ n ≤ 3.  A is the first example that has P i = P 1 , P f = P 4 and Target = P 2 . Then, in B, a shift is made on the Target. As there are no more possibilities to build an example with this window, due to the target having passed through all the points between P i and P f , then it slides to the right, that is, P i = P 2 , P f = P 5 and Target = P 3 , in this way example C is created.

Setting Up and Running ML Algorithms
For each algorithm, a search for optimal hyperparameters was performed by using a grid search. The quality of the hyperparameters was ranked from the average of the mean absolute error (MAE) values of the X, Y, and Z axes. It is worth mentioning that for the RF, ADA and HGB algorithms, we use implementations of scikit-learn [49] (https: //scikit-learn.org/stable/modules/classes.html#module-sklearn.ensemble (accessed on 30 March 2022)). For the XGB algorithm, we use the xgboost module implementation (https://xgboost.readthedocs.io/en/stable/python/python_intro.htm (accessed on 30 March 2022)). Finally, for the ANN algorithm, we use the implementation of the tensorflow (https://www.tensorflow.org/api_docs/python/tf/keras (accessed on 30 March 2022)) [50].
For the RF algorithm [43], we used the RandomForestRegressor, a component implemented by scikit-learn. The hyperparameters max_features = sqrt and min_samples_split = 6 were used. Regarding the latter, its insertion was aimed at reducing excessive memory usage, in exchange for a negligible deterioration in performance.
The ANN algorithm used was the multiLayer perceptron type [38,39]. The implementation of tensorflow [50] was used. After a search for good settings, the one used had four layers, the first with 2000 neurons, the second with 400 neurons, the third with 15 neurons, and 3 neurons in the last one. In regard to the activation functions, in the first three, ReLu was used and in the last one, hyperbolic tangent. The optimizer used was Adam with an initial learning rate of 0.0001 (reduced during execution due to callbacks).
For more details, we provide the explanation about all parameters of the algorithms listed in the Supplementary Material.
The first four algorithms mentioned were used in conjunction with the MultiOutpu-tRegressor an component implemented by scikit-learn that allows multiple outputs in the models. With the exception of XGB, which proved to be deterministic, all the others were run ten times, due to their stochastic characteristics.
Regarding the division of data into test and training, a predefined amount of recordings were used for training and a fixed amount of 100,000 recordings for testing, taken from the end of the dataset. Three metrics were collected for each combination of d, environment and model used. They are Determination Coefficient (R 2 ), MAE and Root Mean Squared Error (RMSE).
All the code needed to perform the experiments described in this article is available at https://github.com/ddrc1/indoors-prediction-JSAN (accessed on 30 March 2022). Figure 3 illustrates the results obtained in the first experiment, where the prediction models were executed with data from a real environment. It is possible to observe three sections of a route where LI does not deviate from obstacles, a behavior which was already expected. However, ML models were able to move around curves, avoiding obstacles, thus answering RQ1. In other words, what is observed is the ability of ML models to learn where the obstacles are in the environment. This learning takes place without the use of any information from the plan of the environment, using only the coordinates of the tracked objects  Figure 4 illustrates the variation of d = {x|x ∈ N, x ≤ 30} simulating instances of low and high latency by using a sample of 50,000 records. It is observed that there is a performance loss as d increases. This is due to the increase in the distance between the interpolated coordinates and consequently the greater difficulty of predicting the trajectory taken. However, even for values of d = 30, ML models maintain R 2 > 0.8. This result answers RQ2, showing that it is possible to increase latency up to 30 times while maintaining good predictions. As for the LI method, it is possible to observe that the values of the metrics worsen considerably as the value of d is increased. This is fundamentally due to the method of assuming a path in a straight line between the two interpolated points. As d increases, it is more unlikely that the path taken was in a straight line. Regarding Figure 4, it is observed that the higher the value of d, the higher the MAE and RMSE values. By definition, MAE and RMSE have the same values when the error is uniform for all examples in the test dataset [51]. In cases where the total error accumulates in a few instances of the test dataset, RMSE will increase further [52]. This is due to the RMSE characteristic of being more sensitive to outliers. In the case of Environment 1, it is possible to observe that the peaks of the RMSE values are approximately twice the MAE values, ≈12 and ≈24 respectively. The proportional difference is smaller for the case of Environment 2, with MAE ≈ 28 and RMSE ≈ 48. This greater proportional difference occurs due to the error of the predictions being concentrated in fewer examples in the case of Environment 1. The distribution of coordinates tracked in the environments illustrated in Figure 1 is revealing for the understanding of this greater concentration of error in a few trajectories. It is observed that in Figure 1A, especially in the lower left corner of the second floor, there is a smaller amount of coordinates, which reduces the training data for trajectories in this region and consequently increases the error of the predictions for trajectories that take place in this space. Regarding Figure 1B the distribution of coordinates is more homogeneous in space, which minimizes the appearance of outliers.

Results
In Figure 5, the amount of data was varied, increasing from 5000 to 170,000 with intervals of 15,000. In general, it is observed that there is a significant improvement in the prediction performance of ML algorithms until reaching a potential stabilization. It is noticeable that ANN has greater variation in the values of the metrics (see pink shading) and is also one of the most sensitive models to the amount of data, where for MAE, in the three environments, it was one of the models that had the worst results for training with 5000 examples, but outperformed other training models with 170,000 examples. This result answers RQ3.

Conclusions
The main contribution of this article was to solve the problem of network overload caused by a large number of IoT devices simultaneously sending coordinates to the cloud. By modeling this as an ML problem, it was shown that it is possible to predict, with a good accuracy rate, the trajectories performed by objects tracked indoors, and that with larger amounts of data available for training the models it is possible to improve the model's performance. The proposed modeling allows ML algorithms to predict trajectories that avoid obstacles or that can pass through doors and corridors. It was also observed that this modeling allows for predictions of up to 30 intermediate coordinates of a trajectory with R 2 > 0.8.
These predictions raised the possibility of increasing the latency of collecting this data, enabling the prediction of the paths taken on the server-side. This would require data to be collected with minimal latency over a short period of time. From this, the collected data can be used to feed ML models, allowing them to learn how trajectories happen in the monitored environment. It has been shown that the algorithms have learned the location of doors, walls, and obstacles, even without any access to the blue print.
In regard to future works, we highlight the use of ANN architectures recently used in the literature for time series predictions which can add to our results, especially by using the Attention mechanism [53] and Transformer [54]. These architectures have excelled in handling sequential data, such as text (translation), audio (speech identification), and time series (prediction). We believe that, in the context of this research, these architectures can contribute to an even more significant improvement in the results obtained.
Supplementary Materials: The following are available at https://www.mdpi.com/article/10.339 0/jsan11020029/s1, Table S1. These values corresponds to the experiments presented in Figure  4 (Environment 1); Table S2. These values corresponds to the experiments presented in Figure 4 (Environment 2); Table S3. These values corresponds to the experiments presented in Figure 4 (Environment 3); Table S4. These values corresponds to the experiments presented in Figure 4 (Environment 4); Table S5. These values corresponds to the experiments presented in Figure 5 (Environment 1); Table S6. These values corresponds to the experiments presented in Figure 5 (Environment 2); Table S7. These values corresponds to the experiments presented in Figure 5 (Environment 3); Table  S8. Tunned hyperparameters of RandomForestRegressor component, implemented by scikit-learn; Table  S9. Tunned hyperparameters of AdaBoostRegressor component, implemented by scikit-learn. In the Table S10 are defined the hyperparameters used by DecisionTreeRegressor, also implemented by scikit-learn; Table S10. Tunned hyperparameters of DecisionTreeRegressor component, implemented by scikit-learn; Table S11. Tunned hyperparameters for XGBRegressor component, implemented by xgboost; Table S12. Tunned hyperparameters of HistGradientBoostingRegressor component, implemented by scikit-learn. Reference [55] are cited in the supplementary materials.

Conflicts of Interest:
The authors declare no conflict of interest.