Machine Learning Analytic-Based Two-Staged Data Management Framework for Internet of Things

In applications of the Internet of Things (IoT), where many devices are connected for a specific purpose, data is continuously collected, communicated, processed, and stored between the nodes. However, all connected nodes have strict constraints, such as battery usage, communication throughput, processing power, processing business, and storage limitations. The high number of constraints and nodes makes the standard methods to regulate them useless. Hence, using machine learning approaches to manage them better is attractive. In this study, a new framework for data management of IoT applications is designed and implemented. The framework is called MLADCF (Machine Learning Analytics-based Data Classification Framework). It is a two-stage framework that combines a regression model and a Hybrid Resource Constrained KNN (HRCKNN). It learns from the analytics of real scenarios of the IoT application. The description of the Framework parameters, the training procedure, and the application in real scenarios are detailed. MLADCF has shown proven efficiency by testing on four different datasets compared to existing approaches. Moreover, it reduced the global energy consumption of the network, leading to an extended battery life of the connected nodes.


Introduction
The Internet of things (IoT) connects a huge number of things to the internet. It comprises complex environments having heterogeneous components. This IoT environment generates enormous data and therefore imposes a demand for storage, processing, and transmission. Since IoT provides many applications using other technologies such as Fog, Edge, and Cloud to help us in our day-to-day life applications, which are not limited to our daily lives, however, they include other more important sectors such as remote patient monitoring, precision agriculture, environmental monitoring, disaster mitigation, and other smart city applications. We expect these applications will increase day by day without any limit. The only limitation we found is in the resources of IoT. The constraints in IoT resources pose many challenges at the network, hardware, and software levels. Since the applications are increasing, resource management at the different levels of IoT systems becomes necessary. These resources include battery life, size, processing power, storage, and bandwidth. Lightweight algorithms and protocols are implemented to acquire, process, and store the data due to the pervasive nature of some IoT applications. An optimized resource management framework is required to utilize such applications' resources efficiently. Due to the increased data generation by IoT devices, the demand for processing and storage also increases, necessitating including some advanced nodes in the IoT network in the form of edge devices, fog nodes, or smart gateways.
The resources in IoT are of two types, i.e., physical and virtual. The physical resources in IoT systems include processing, storage, energy, and bandwidth at the IoT device level and edge level. The virtual resources include algorithms and protocols used for data aggregation and processing at the IoT device level. Encryption and virtualization are added to the algorithms and protocols at the edge level. The development of more specific and lightweight protocols and algorithms is a key challenge in the resource-constraint applications of IoT. Data aggregation and resource management approaches must be refined to optimize such applications. Data management is a key factor that directly impacts an IoT system's resource management. Since the data size, structure, and process to communicate the data is a key resource, and all other resources depend on it. Virtualization can help to optimize the limited available resources to some extent. However, combining this with the classification of the data at the device level and proper resource allocation within the IoT network will help improve resource management.
The solution to today's resource-constrained IoT environment is not to upgrade resources. Therefore, data management is the key to optimizing these resources. Sensor data is unstructured and difficult to analyze and classify. Analyzing all scenarios for these sensors is another critical step. As part of this paper, we examined three scenarios for IoT data sensors. The IoT nodes/sensors in the device layer of the IoT environment are resource-constrained and hence data management at this level is a key attribute of our research work. The second key attribute is the selection of the right Machine Learning (ML) algorithm. There have been numerous solutions to manage data proposed by various researchers. However, none have explained the limitations of implementation at the device level. Nor have they analyzed different scenarios of the explosion of data at the device level.
The main contributions of the paper are: • Analyze and classify IoT data at device and edge levels; • Create data management framework for IoT edge-cloud architecture for resourceconstrained IoT applications; • Design and implement machine learning approach for resource optimization; • Compare the proposed work with the existing approaches.
In order to achieve our objectives, we first created an IoT environment with the help of six IoT devices. This is explained in Section 3.12. The classification of the IoT data on the basis of device configuration was the biggest challenge in this work. To achieve this, we categorize the dataset into two subsets. The first category of the dataset comes from the IoT sensors, and the second category is the primary data about the configuration of the IoT devices. This primary data helps us to create a regression model that sets the basis for data classification at the device level. With the help of this regression model, we designed a data management framework that works with the help of two algorithms at the device and edge level of the IoT environment. We have proposed a model capable of deciding whether to push the data to the above level (edge) or to process the data at the device level. As a result, the overall energy of the (the number of alive nodes) of the IoT network increased by 11.9 percent. This paper presents the two staged data management frameworks for the internet of things with main emphases on machine learning-based modeling and IoT data classification. The rest of the paper is organized as follows: Section 2 gives the background and literature survey of machine learning algorithms in IoT environment, Section 3 presents the proposed methodology (MLADCF), Section 4 gives the results for the proposed system, and finally, Section 5 concludes the paper.

Background and Related Work
The arrival of the Internet and technological advances in recent years have allowed us to be more and more connected with the environment surrounding us. The use of mobile phones and other smart devices has become part of our daily lives, giving rise to a connection between people and an interconnection that encompasses any element present in our environment. The Internet of Things [1], also known by its acronym IoT, arose from P. Larrañaga, C. Bielza, and J. Diaz-Rozo have also improved accuracy using clustering techniques. This experiment focused on enhancing the industrial environment and created datasets by simulating random values from a gaussian distributor mixture. The clustering technique is the most popular technique for creating clusters from random data sets, as seen in papers [10][11][12]. These researchers have used clustering techniques to improve innovative industries, IoT infrastructure, and Wireless Sensor Networks (WSN). When a researcher wants to visualize the data in more dimensions, the Support Vector Machine (SVM) proved to be the most efficient choice among all the techniques. In papers [13][14][15][16][17], the researchers have combined the SVM with other techniques, e.g., KNN, NB, Artificial Neural Network (ANN), etc., for better results. Disaster management is another critical aspect of smart cities. S. El-Tawil, Y. Na, A. Eltawil, and A. Ibrahim have worked on noise reduction in the data sets and have used SVM and KNN techniques. They have proposed the technique for disaster management and have shown the results in their research paper [13]. Ahamed et al. have used ANN, KNN, and SVM techniques for smart city disaster management [15]. The researchers are using combinations of different Machine Learning (ML) techniques to improve day-to-day life, which is technically called smart cities. In [18], Alfian et al. have proposed a methodology for food quality checking using Radio Frequency Identification (RFID). In this experiment, they used KNN, DT, NB, LR, and RF and showed the results of the proposed technique in their research paper. Following their experiment, the paper [19] also used these ML techniques to improve Smart Farming.  Machine Learning techniques have shown us how lives can also be saved by using algorithms efficiently. In paper [8], L. Bononi, F. Montori, and L. Bedogni have used Natural Language Processing (NLP) and other techniques for environmental monitoring. They have proposed a smart city model and discussed the need for machine learning techniques for a better tomorrow. In paper [9], Z. Fu, K. Shu, and H. Zhang have worked on improving accuracy and have used techniques like Linear Regression (LR), Naïve Bayes (NB), K-Nearest Neighbors (K-NN), Support Vector Machine (SVM), Decision Tree (DT), and Random Forest (RF). They have used the data sets of smartwatches used by athletes. The data collection was real-time with the help of 12 players playing ping pong. Similarly, in paper [10], P. Larrañaga, C. Bielza, and J. Diaz-Rozo have also improved accuracy using clustering techniques. This experiment focused on enhancing the industrial environment and created datasets by simulating random values from a gaussian distributor mixture. The clustering technique is the most popular technique for creating clusters from random data sets, as seen in papers [10][11][12]. These researchers have used clustering techniques to improve innovative industries, IoT infrastructure, and Wireless Sensor Networks (WSN). When a researcher wants to visualize the data in more dimensions, the Support Vector Machine (SVM) proved to be the most efficient choice among all the techniques. In papers [13][14][15][16][17], the researchers have combined the SVM with other techniques, e.g., KNN, NB, Artificial Neural Network (ANN), etc., for better results. Disaster management is another Deep Learning (DL) is called a subset of machine learning by many researchers. It plays a crucial role in today's world of data science. The use of this technique has proven to perform better for parameters like accuracy, F1 score, precision, etc. [20][21][22][23][24][25]. The DL technique is currently being used in many fields of data science. It has proven its worth by improving waste management, natural power resources, water quality, feature extraction, smart farming, smart energy, etc. In an IoT environment, resource optimization is very crucial. IoT resources are limited in terms of memory, energy, power/battery, processing power, and bandwidth. In [26], Lee et al. have proposed a technique for IoT infrastructure. They worked on clustering techniques for the problem of data classification. Han et al. have worked on data classification models and have proposed a model for the smart city [27]. A similar model was also proposed by Huang et al., where they used the Bayesian elimination method for data classification [28]. These machine learning algorithms are not limited to a particular domain; instead, they are also used for storage problems, data filtration, data recovery, compression, data accuracy, etc. These problems are also seen in an IoT environment due to the flow of endless and huge amounts of data. Zhang and Wang have proposed a technique for the storage problem in an IoT network. They have completed their experiment in Optimized Network Engineering Tools (OPNET) modeler, and the focus of their experiment was to optimize the IoT network [29].
In [30], Alimjan et al. proposed a hybrid classification technique for remote sensing data. In this paper, they have combined KNN and SVM techniques and given their technique the name 'Hybrid Classification approach' (SV-NN). They have also compared the results with other existing methods and got an accuracy of 91.18% and 94.62% for datasets 1 and 2, respectively. In [31], Muhammad Salad-ud-din et al. have worked on resource-constrained IoT devices for wireless multimedia of things. In this paper, they have shown an improvement in the number of alive nodes with the help of mitigating redundant transmission across the network. The summary of their article is the improvement in the overall energy of the IoT network. Table 1 presents a summary of the work done by other researchers.  In this paper, our focus is on resource optimization for a resource-constrained IoT environment. As we have seen, various hybrid techniques have been used by researchers to improve the system's overall energy in terms of accuracy, F1 score, precision, recall, etc. In this paper, we have proposed our framework for optimizing the IoT framework and have compared our technique with the existing techniques.

Materials and Methods
The proposed framework, "Machine Learning Analytics-based Data Classification Framework" (MLADCF), distributes the processing load to the last nodes of a digital network (sensors in the case of IoT). The use of computing type poses very attractive advantages for IoT solution providers. For example, they allow minimizing latency and preserving network bandwidth, operate reliably, speed up decision-making, capture and protect a large number and types of data, and transfer the data to the most appropriate place for processing, with better analysis of local data. Edge computing technologies have been on the rise for several years, but the reach of IoT technology is accelerating its take-off process. As for the factors driving this change, two stand out: Falling prices for peripheral devices with increasing processing power. Centralized infrastructures support the increasing workload. Our proposed framework MLADCF is a two-staged framework that combines the regression model and hybrid resource-constrained KNN (HRCKNN), giving the system a complete learning approach for future IoT environments. Table 2 presents the list of notations that are used in our proposed framework MLADCF. To implement our framework, let us first consider n number of IoT devices in an IoT environment. Let the matrix A m * n denotes the IoT devices and its Corresponding sensors, as shown below. where 'm' is the number of IoT devices and 'n' is the number of sensors included. Let I k be the element a ij of matrix A.
If a ij = 1 Then go to vector S α 0 Sensor not Present The S α below denotes the vector from the matrix A where β is the number of data Chunks/clusters generated by 'n' sensors. Let d j be the jth element of vector S α and each d j will be a column vector as where γ is the number of packets in the jth data cluster and d j ≤ S α {where d j is a subset of S α }. The equation will allow us to optimize the resources and classify the data packet for the edge node. From the above Equation (4), If Determinant of ∑ γ i=1 d i ≤ S k then the data will be processed at the device. This can be written as: which means the device is not capable of processing the data. Therefore, data will be offloaded and will be pushed to a higher level. Similarly, Equation (5) can also be written as , that means the device is fully capable of processing the data at the kth sensor and will not be forwarded for processing.
, that means the device is not capable of processing the data and data will be forwarded to the edge level.

Proposed Device Level Data Mangement
Thefirst step of our framework is defined in Algorithm 1, where the value of d i will be calculated and compared with the threshold value. The final architecture designed, implemented, and tested is outlined and described in Section 3.9 below. In the first place, the components involved will be specified, and then the three-layer functionality, in general, will be presented. IoT results from the convergence and evolution of ubiquitous or pervasive computing, internet protocols, sensing technologies, and embedded systems. These technologies form an ecosystem where the real and digital worlds are in continuous symbolic interaction.

Scenarios I, II & III
During our implementation, we come across three different scenarios. The first scenario, as shown in Figure 2a, is the scenario I, where data is less in the sense that the node is capable of processing and able to communicate the data without any resource constraints. As shown in Figure 2a, node5 and node8 have sensed the data and can process the data, hence forwarding the data to node7 and node9, respectively. Fog 1, the edge node, receives the sensed data from its nearest nodes, i.e., node2 and node10.

Scenarios I, II & III
During our implementation, we come across three different scenarios. The first scenario, as shown in Figure 2a, is the scenario I, where data is less in the sense that the node is capable of processing and able to communicate the data without any resource constraints. As shown in Figure 2a, node5 and node8 have sensed the data and can process the data, hence forwarding the data to node7 and node9, respectively. Fog 1, the edge node, receives the sensed data from its nearest nodes, i.e., node2 and node10. Fog 2, the edge node, receives the data from its nearby slave nodes. Figure 2b is an example of scenario II, where data is more than the scenario I, and the nodes are not capable of processing the data; therefore, the nodes will break the data into data chunks and push the data for processing the data at nearby nodes and from those nodes the data will be pushed to the fog from the slave nodes. As shown in Figure 2c, node 4 and node 8 are sensing the data but are not capable of processing the data, dividing the data into data chunks, and sending the data chunks to nearby nodes node1, node7 node5, node6 and node 9, node 3 respectively.
Scenario II leads us toward scenario III, which has two different nodes, i.e., homogeneous and heterogeneous nodes. Homogeneous nodes have the same capability in processing power, battery, memory, etc. In heterogeneous nodes, a cluster head has higher capability than sensing nodes. Therefore, we considered heterogeneous nodes to give practicality to our experiment. This scenario III leads us toward data classification, and our proposed framework and adaptive machine learning algorithm come into play. Classification of data is the base for our adaptive ML algorithm, where the IoT nodes will lead Fog 2, the edge node, receives the data from its nearby slave nodes. Figure 2b is an example of scenario II, where data is more than the scenario I, and the nodes are not capable of processing the data; therefore, the nodes will break the data into data chunks and push the data for processing the data at nearby nodes and from those nodes the data will be pushed to the fog from the slave nodes. As shown in Figure 2c, node 4 and node 8 are sensing the data but are not capable of processing the data, dividing the data into data chunks, and sending the data chunks to nearby nodes node1, node7 node5, node6 and node 9, node 3 respectively.
Scenario II leads us toward scenario III, which has two different nodes, i.e., homogeneous and heterogeneous nodes. Homogeneous nodes have the same capability in processing power, battery, memory, etc. In heterogeneous nodes, a cluster head has higher capability than sensing nodes. Therefore, we considered heterogeneous nodes to give practicality to our experiment. This scenario III leads us toward data classification, and our proposed framework and adaptive machine learning algorithm come into play. Classification of data is the base for our adaptive ML algorithm, where the IoT nodes will lead us to the IoT network, how to process, state, and communicate the data so that it can reach the fog level.

Building a Regression Model
Assume that P p is the processing power of an IoT device. As the processing power is directly proportional to the value of effectiveness, this can be written as where Z is the constant value and θ x is the value of effectiveness (VoE).
Let E e be the Energy/Power of an IoT device. And energy is directly proportional to the value of effectiveness. Then the VoE (θ e ) in terms of power/energy can be written as where Y is the constant value. The combination of Equations (8) and (9) can be written as where t 1 is the constant. Let S s be the storage capacity of the Sth IoT device. As the storage is directly proportional to the VoE, then its equation can be written as where X is the constant value derived from the dataset. The combination of Equations (8) and (11) can be written as where, t o is the constant value. The final value of Effectiveness θ f can be written by combining Equation (8), (9) and (11).
where, µ is the constant value. More the value of θ f better will be the option of choosing it for the data processing. Let us assume that the variable d I has the following relationship between Processing Power (P p ), Energy/Battery (E e ) and Storage (S s ).
where d i is the random variable and is the deciding factor in our model for device level. T 0 , T 1 , T 2 , T 3 is the random error coefficient and 'r' being the random error generated. It is understood that the error caused by other random factors cannot be interpreted by processing power (P p ), Energy/Battery (E e ) and Storage (S s ) in d i . Hence, we are adopting

Training Data for the Model
Datasets are normally divided into two subsets, training data and testing data. Typically, training and test datasets are divided into 80:20, 70:30, or 90:10 ratios respectively. The foundation of machine learning is solid training data. The training data is the data that we use to train a model. Knowing the importance of suitable training data for machine learning is vital since it ensures that we have the right kind and amount of data to build a model. To test our model, we need unseen data. We can use this data, which is referred to as testing data, to assess the effectiveness and development of the training of our al-gorithms/models. We can also modify or improve the model for better outcomes. The quality of our machine learning model will be directly proportional to the quality of the data. For this reason, cleaning, debugging, or "data wrangling" tasks consume a significant percentage of data scientists' time. In artificial intelligence or machine learning, training data is inevitable. This process makes machine learning modules accurate, efficient, and fully functional. In this paper, we have included AI training data in terms of training data quality, data collection and licensing, and more. The training data is an important part of the machine learning algorithm. We have trained our algorithm with the help of training data set. We have divided our data set from scenario I, scenario II and scenario III into an 80:20 ratio for training and testing our model. A division of the dataset was made according to time. Training data is required for model development because without training data; the machines would not even know what to understand in the first place. Like an individual trained for his job, a machine needs a body of information to fulfill a specific purpose and deliver corresponding results.

Hybrid Resource Constrained K-Nearest Neighbour Classifier (HRCKNN): Stage 2
The K-Nearest Neighbors (K-NN) algorithm is a classification algorithm that labels each data item based on the label of the data closest to it [68]. For this, the variable k is defined, corresponding to the number of closest neighbors chosen to carry out the classification. Unlike other supervised learning algorithms, K-NN does not learn with the training data, but rather the learning occurs with the test data once the training data has been memorized. These algorithms are known as lazy algorithms, and they allow several problems to be solved simultaneously since the objective function is approximated locally for each element. The algorithm does not learn from a model but instead uses the data to generate an answer only when a prediction is requested. As a disadvantage, the computational cost is very high due to all the training data storage. Therefore, we have proposed a hybrid technique that uses our mathematical model and combines it with the traditional K-NN technique. This hybrid technique is the input for the second algorithm in our proposed framework MLADCF.
In this paper, we have proposed a hybrid K-NN classification approach, namely HRCKNN and MLADCF, which work simultaneously in an IoT environment. This double approach is a solution to the resource-constrained IoT environment. HRCKNN works with our proposed mathematical model MLADCF to tackle the problem of resource-constrained IoT networks. The HRCKNN is the beginning of our proposed model, as we have further classified the data at the edge/fog level. The overview of the HRCKNN is shown in Figure 3.
HRCKNN works in two phases. HRCKNN first trains with the training data and helps the network tackle the problem of resource constraints. The first phase identifies the IoT devices at the device level, which can process the data. This helps the HRCKNN inform the groups of the IoT nodes.
To understand HRCKNN, let us consider the following assumptions. Let 'v' is the query vector. F is the training feature vector where F = ( f , f 2 , f 3 , f 4 . . . . . . . . . . . .. f n ), X is the set of labels with respect to F. C j is the class where j ∈ {1,2,3 . . . . . . . K}. The size of the neighborhood is denoted by λ. Φ denotes the Euclidean distance (ED). Moreover, is the Euclidean distance between the query vector and nearest neighbor. In the first step, a local environment is created by HRCKNN.
In Equation (17), the inclusion of the training set can be written as  HRCKNN works in two phases. HRCKNN first trains with the training data and helps the network tackle the problem of resource constraints. The first phase identifies the IoT devices at the device level, which can process the data. This helps the HRCKNN inform the groups of the IoT nodes.
To understand HRCKNN, let us consider the following assumptions. Let 'v' is the query vector. F is the training feature vector where F = ( , 2 , 3 , 4 …………. n ), X is the set of labels with respect to F. is the class where j ∈ {1,2,3……. K}. The size of the neighborhood is denoted by λ. Φ denotes the Euclidean distance (ED). Moreover, is the Euclidean distance between the query vector and nearest neighbor. In the first step, a local environment is created by HRCKNN.
In Equation (17), the inclusion of the training set can be written as The HRCKNN can calculate the distance between f and ′′ .
where ′ = λ + 1 is the new training set ′′ .The second step of HRCKNN calculates the distances for all the classes, i.e., k local hyperplane. The relative transformation is also adopted to construct the relative space in Equation (19). In the final step, the HRCKNN calculated the distance of k local hyperplane. A local hyperplane is constructed as follows where Φ( (v), v) is the local hyperplane distance, and the solution of the below Equation (21) will give the value of α t . The HRCKNN can calculate the distance between f and F .
where n = λk + 1 is the new training set F". The second step of HRCKNN calculates the distances for all the classes, i.e., k local hyperplane. The relative transformation is also adopted to construct the relative space in Equation (19). In the final step, the HRCKNN calculated the distance of k local hyperplane. A local hyperplane is constructed as follows where Φ(H λ j (v), v) is the local hyperplane distance, and the solution of the below Equation (21) will give the value of α t . U The above equation can also be written as follows where A = (α 1 , α 2 , α 3 . . . . . . . . . α λ ) and the composed matrix of vector f t -f is U, which is an n × λ matrix. The HRCKNN adopts the mathematical model to summarize the local hyperplane distances. Hence creating the groups of the nodes that equally fall in the category of resources constrained and vice versa.

Proposed Cluster Head Data Management
The proposed cluster head algorithm's working is defined in Algorithm 2. This algorithm helped to process the device-level data by considering the edge and source nodes. The algorithm first fetches the timely data from the nodes (line no. 1). After that, The effectiveness of the IoT device (line no. 2) and the size of packets (line no. 3) has been calculated. Once the data packet reached its higher limit then the source node is selected (lines no. [4][5]. Furthermore, the source node sends the data to edge nodes by diving it into equal size chunks via slave nodes (lines no. [6][7][8][9][10][11][12][13][14][15]. After that, data is deleted from source and slave nodes and processing of a collection of data has performed at edge (lines no. [16][17]. At last, the source node data is maintained at the edge node by accumulating the chunks of data (lines no. [18][19].

Proposed Cluster Head Data Management
The proposed cluster head algorithm's working is defined in Algorithm 2. This algorithm helped to process the device-level data by considering the edge and source nodes. The algorithm first fetches the timely data from the nodes (line no. 1). After that, The effectiveness of the IoT device (line no. 2) and the size of packets (line no. 3) has been calculated. Once the data packet reached its higher limit then the source node is selected (lines no. [4][5]. Furthermore, the source node sends the data to edge nodes by diving it into equal size chunks via slave nodes (lines no. [6][7][8][9][10][11][12][13][14][15]. After that, data is deleted from source and slave nodes and processing of a collection of data has performed at edge (lines no. [16][17]. At last, the source node data is maintained at the edge node by accumulating the chunks of data (lines no. [18][19].

Algorithm 2: Cluster Head Algorithm
Input: α t and the value of for each source node and edge node. Output: The processed data of device level. 17. The further processing of data is performed on the ; 18. The chunks of data received from are accumulated on the ; 19. Data is stored for each SRC i and maintained on the edge node, .

Feature Extraction
Before applying any machine learning technique, a fundamental step is selecting the characteristics or attributes that will be used for the training and subsequent application of the intelligence system. The dimension of the data and the importance of its reduction is an important factor to consider today for the storage and processing of the same since it makes training slow. However, many attributes can lead the algorithm to find the best solution. Different techniques allow us to extract and project toward a new data set with fewer attributes and thus obtain metrics very similar to the original set but with a lower cost in terms of computation and storage. It is important to remember that we will lose data quality, although the training is faster in some cases. It is possible that we will not obtain the same precision.
For this reason, it is necessary to study how these techniques affect the different algorithms and see whether it is worth applying a reduction in the number of attributes. In addition, these techniques increase speed, reduce storage space, and allow better visualization of the data. For example, if we reduce to two or three characteristics, we could visualize them in a graph, which would help us better understand how they are distributed.
One of these techniques is feature extraction, which seeks to reduce this data but maintain attributes containing the most relevant information. PCA (Principal Component Analysis) is a well-known used and implemented in the sci-kit-learn package. This linear data transformation technique helps identify patterns based on the correlation between attributes. PCA is responsible for finding the hyperplane closest to the data and projecting them into a new space smaller than the original. The orthogonal axes, which are the main components where these new data are projected, of the new subspace can be interpreted as the directions of the maximum variance, knowing that the new set of features is orthogonal to each other. Vector PCA wants to project a vector with one dimension onto another with a lower dimension. Thus, in the new vector of the searched subspace, the first component will be that attribute with the greatest possible variance, considering that there is no correlation between them; that is, those components that remain in the new are orthogonal to each other. It is appropriate to indicate that PCA is sensitive to the distribution of the data and the scale at which they are found. We will consider this when we analyze how it is possible to reduce the dimension of the data set. We first must have all the data on the same scale to give all the features the same importance and then apply this reduction technique.

System Model
The MLADCF has three layers, i.e., Device layer, Edge-Fog layer, and Cloud layer. The device layer is placed at the bottom of the MLADCF. The Edge-Fog layer is in the middle, and finally, the Cloud layer is placed at the top of the MLADCF. The IoT environment is squeezed in the above 3-level framework, as shown in Figure 4. The device layer at the bottom of the MLADCF is the densest layer among the three layers. Usually, millions of active nodes present are sensing data from the environment.
The nodes in this layer vary in different aspects. There are different sensors in today's world, such as video sensors, motion detectors, smoke detectors, humidity sensors, temperature sensors, proximity sensors, pressure sensors, accelerometers, level sensors, infrared sensors, gas sensors, optical sensors, etc. All these nodes collect data of different data types. In order to perform well in an environment, every datatype needs additional data storage, processing power, bandwidth, power/energy, etc. These device layer nodes have been categorized into the following three categories.

•
Device layer nodes have the capability of processing the data; • Nodes that cannot process the data; • Nodes used as routers/repeaters.
The nodes present at the device level collect the data from the environment. These nodes can be of different types, such as capturing video data, audio data, textual data, etc. The cluster heads connect the data from these nodes. Traditionally the sensed data is captured/sensed followed by the filtration or compression process, and then forwarded to the next level. However, our MLADCF is designed to classify the data at the device level based on the parameters i.e., storage, processing power, and battery/power. The MLADCF will classify the data as the device level so that if a particular data packet can be processed at the device node, it will not be pushed to the Edge/Fog level. Moreover, if the data packet cannot be processed at the device level, it will be pushed to the Edge/Fog level. The middle layer is known as the Fog/Edge layer. This layer contains the Fog nodes. These nodes are lesser in number in comparison to the device level. This layer has better resources in comparison to the device level. These nodes have sufficient battery/power, storage capacity, and processing power for processing the data coming from the device level. However, these nodes also vary in terms of storage, processing power, battery, etc. Likewise, device level here, some nodes can process the data, and some are not capable of processing a huge amount of data coming from the millions of sensors at the device level. Finally, there is the cloud level; this layer has all the required resources for data processing, data analytics, big data, decision-making, etc. Cloud provides services to various levels of the IoT environment.
Sensors 2023, 23, x FOR PEER REVIEW 16 of 31 captured/sensed followed by the filtration or compression process, and then forwarded to the next level. However, our MLADCF is designed to classify the data at the device level based on the parameters i.e., storage, processing power, and battery/power. The MLADCF will classify the data as the device level so that if a particular data packet can be processed at the device node, it will not be pushed to the Edge/Fog level. Moreover, if the data packet cannot be processed at the device level, it will be pushed to the Edge/Fog level. The middle layer is known as the Fog/Edge layer. This layer contains the Fog nodes. These nodes are lesser in number in comparison to the device level. This layer has better resources in comparison to the device level. These nodes have sufficient battery/power, storage capacity, and processing power for processing the data coming from the device level. However, these nodes also vary in terms of storage, processing power, battery, etc. Likewise, device level here, some nodes can process the data, and some are not capable of processing a huge amount of data coming from the millions of sensors at the device level. Finally, there is the cloud level; this layer has all the required resources for data processing, data analytics, big data, decision-making, etc. Cloud provides services to various levels of the IoT environment. Our MLADCF is designed so that the data will be classified and the nodes capable of processing the data will process the data packets, and the rest of the data will be pushed to the upper layer. This method will be incorporated into the device level and the edge/fog layer. This data management method will minimize the delay and utilize the resources efficiently. Furthermore, it will optimize the resource utilization for the coming IoT infrastructure in the future. This can be achieved when we will fix a threshold of the data, also known as the value of effectiveness. The final value of effectiveness θ f is given by Equation (23).
where, µ is the constant value. More the value of θ f better will be the option of choosing it for the data processing. This paradigm provides the means by allowing data to be obtained from billions of devices that can sense, send, and make decisions for the problems identified in the IoT environment. From the state of variables, the structured, unstructured, or semi-structured records can be generated, which have the potential to generate changes from the information and knowledge that can be obtained when the processing mention the challenge to achieving the above due to the heterogeneity and discovery of the sensors, for which they propose solutions such as the creation of a middleware, that allows the connection between the sensors and the cloud to be made transparently.

Proposed Edge Level Storage
Edge computing technology also arrives at artificial intelligence on devices much more feasible. It allows companies to leverage their data series in real-time rather than working with terabytes of data in central repositories in the world real-time cloud. In the next few years or decades, the technology may evolve to find a balance point between the cloud and more powerful distributed edge devices. Software vendors develop specific, more robust, and secure infrastructures and security solutions. Providers will begin to incorporate security solutions for peripheral components into their current service offering to prevent data loss, provide network health diagnostics, and protect against threats. The edge level storage is explained in Algorithm 3. Every there is a cluster of available nearby fog nodes, ; to act as service nodes, ; 8.
N nodes accept the request of SRCi to act as slave nodes, Si1, Si2 … . ; 10.
Sin selected by SRC based on Availability quotient, A where: A = f ( , );
Each of N selected Sin sends the data to the cloud; 14.
Data is collected by cloud and acknowledgment is sent to ; 15. end for; 16. Data is deleted on and ; 17. The further processing of data is performed on the cloud.

Stages in MLADCF
Our proposed framework consists of five stages, as shown in Figure 5. The first stage has n number of sensor nodes present. These nodes senses data from the environment and stores the raw data in the device storage. The cluster head, namely CH as shown in Figure 5 receives the data from these sensors. The working of HRCKNN starts from this stage. The HRCKNN is a combination of our proposed mathematical model and the traditional technique of K-NN. The working of the proposed Algorithm 2 also starts from this stage 1. The proposed algorithm decides whether to push the data to stage 2 or should process the data at stage 1, i.e., the device layer.

Stages in MLADCF
Our proposed framework consists of five stages, as shown in Figure 5. The first stage has n number of sensor nodes present. These nodes senses data from the environment and stores the raw data in the device storage. The cluster head, namely CH 1 as shown in Figure 5 receives the data from these sensors. The working of HRCKNN starts from this stage. The HRCKNN is a combination of our proposed mathematical model and the traditional technique of K-NN. The working of the proposed Algorithm 2 also starts from this stage 1. The proposed algorithm decides whether to push the data to stage 2 or should process the data at stage 1, i.e., the device layer. has n number of sensor nodes present. These nodes senses data from the environment and stores the raw data in the device storage. The cluster head, namely CH 1 as shown in Figure 5 receives the data from these sensors. The working of HRCKNN starts from this stage. The HRCKNN is a combination of our proposed mathematical model and the traditional technique of K-NN. The working of the proposed Algorithm 2 also starts from this stage 1. The proposed algorithm decides whether to push the data to stage 2 or should process the data at stage 1, i.e., the device layer. As the data is pushed to stage 3, which is the fog/edge layer. The fog/edge layer consisted of n number of fog nodes. The proposed Algorithm 3 works at the fog server node and decides whether to push the data to the cloud or to process in this stage.

Experimantal Setup
The first step involves setting up an IoT environment to collect real-time data in various scenarios. We created a wireless sensor network of several sensors and a gateway, as shown in Figure 6a-c. For this IoT environment, we have deployed six sensors in a particular area of agricultural land. Each node has an Magnetostrictive Linear Position Sensors (MTS 420) censor board [69]. This WSN comprises a gateway responsible for communication with the surrounding and distributed sensors using the Zigbee module [70]. The IRIS [71] is programmed according to the censor board. MIB520 [72] is used for the gateway. The communication with the computer is done by the IRIS, which is programmed for the gateway to act as a communicator. We have deployed six sensors, which could cover the full agricultural field. Mote-config is an application used to program the IRIS and MIB 520. As the data is pushed to stage 3, which is the fog/edge layer. The fog/edge layer consisted of n number of fog nodes. The proposed Algorithm 3 works at the fog server node and decides whether to push the data to the cloud or to process in this stage.

Experimantal Setup
The first step involves setting up an IoT environment to collect real-time data in various scenarios. We created a wireless sensor network of several sensors and a gateway, as shown in Figure 6a-c. For this IoT environment, we have deployed six sensors in a particular area of agricultural land. Each node has an Magnetostrictive Linear Position Sensors (MTS 420) censor board [69]. This WSN comprises a gateway responsible for communication with the surrounding and distributed sensors using the Zigbee module [70]. The IRIS [71] is programmed according to the censor board. MIB520 [72] is used for the gateway. The communication with the computer is done by the IRIS, which is programmed for the gateway to act as a communicator. We have deployed six sensors, which could cover the full agricultural field. Mote-config is an application used to program the IRIS and MIB 520.
This mote-config works in the windows operating system and configures the sensors. It also provides a user-friendly interface and allows the user to configure the node ID, RF power, RF channel, and group ID. The nodes can further be enabled over the setup feature on all X mesh-based firmware [73]. High-power and low-power X mesh applications are available for each sensor board. To upload firmware, a program tab is used by using a gateway. The setup of the hardware is as per the below protocol.

•
An ethernet port or USB should connect the gateway and computer; • The gateway should be attached to the IoT motes; • The programming. Next should be done while keeping the motes in power-off mode.
For all other motes, the XMESH file must be uploaded over it. Moreover, for MTS 420 sensor node, the IRS needs to be programmed for the censor board. After programming all the sensors, the sensors were deployed carefully into the agriculture field and connected to the gateway successfully. The topology of the sensors is shown in Figure 2a-c in the above Section 3.3. The advantage of the scenarios is that it covers all the possibilities that could happen during data collection in the smart agricultural activity. The scenarios would cover all the possibilities and failures of the motes if any of the IoT motes were disconnected due to the battery drain. Therefore, the WSN deployed is highly reliable and scalable. The IoT environment is shown in Figure 6a-c has the following sensors in each IoT node that is a temperature sensor, humidity sensor, light sensor, voltage, precipitation, node ID, and location X, Y. The data rate of an IoT node shown in Figure 6c is 250 KBPS and has a range of 500 m: the range and data rate were enough for experimenting in this agriculture field. Postgress database was used for logging data into the sensors' database. To analyze this data and find a prediction model, this data was extracted into a CSV file. The next step involves the cleaning of data in this CSV file. The. CSV file contains anomalous values and redundant values. Therefore, data pre-processing is needed to clean the data. The CSV file also contains high-pitch values and extra columns, which can fail to find the patterns and need to be cleaned. This mote-config works in the windows operating system and configures the senso It also provides a user-friendly interface and allows the user to configure the node ID, power, RF channel, and group ID. The nodes can further be enabled over the setup featu on all X mesh-based firmware [73]. High-power and low-power X mesh applications a available for each sensor board. To upload firmware, a program tab is used by using gateway. The setup of the hardware is as per the below protocol.

•
An ethernet port or USB should connect the gateway and computer; • The gateway should be attached to the IoT motes; • The programming. Next should be done while keeping the motes in power-off mod

Datasets
We have calculated the performance of our model with the help of four data sets that were captured in real-time by creating an IoT environment. The first data set was captured during the daytime with the full battery capacity of the IoT devices. The IoT nodes/sensors were placed in an orchard area for a day. The topology of the sensors is shown in Figure 2a as scenario I. The results for the first data set are shown in Table 3. The second data set was captured by keeping the same IoT sensors. The battery was unchanged during the process, and the storage was also unchanged. While capturing the data, some IoT notes ran out of battery and stopped working. Hence this gives us important information about the practicality of IoT sensors, as these types of situations may arise in an IoT environment. The second data set comprises the data for a lesser number of nodes in comparison with dataset 1. As the nodes ran out of power, the rest of the nodes continued to sense the data from the environment. The results of the second dataset are shown in Table 4. The 3rd dataset falls under the category of scenario II. For this scenario, we replaced the batteries of the IoT sensors with new batteries. In this scenario, we placed IoT sensors so that all the sensors were communicating with each other. The results of the third data set are shown in Table 5. The fourth data set was captured by keeping the sensors as in scenario III, as shown in Figure 2c. In this scenario, all the sensors communicate with a cluster head, giving us the new data set for the experiment. For this data set, the results are shown in Table 6.

Observations
During the data collection in an IoT environment that we have created, it was observed that almost 80% of the energy was consumed during the communication of the data in an IoT network. At the same time, only 20% is utilized for all other activities, including Sensing, processing, storing, etc. Figure 7a shows that the battery took only half an hour, from 70 percent to almost 0 percent. Figure 7b shows that it took nearly 4 h to get the battery from 100 percent to almost 0 percent.

HRCKNN
0.40 82 84 84 92 The fourth data set was captured by keeping the sensors as in scenario III, as shown in Figure 2c. In this scenario, all the sensors communicate with a cluster head, giving us the new data set for the experiment. For this data set, the results are shown in Table 6.

Observations
During the data collection in an IoT environment that we have created, it was observed that almost 80% of the energy was consumed during the communication of the data in an IoT network. At the same time, only 20% is utilized for all other activities, including Sensing, processing, storing, etc. Figure 7a shows that the battery took only half an hour, from 70 percent to almost 0 percent. Figure 7b shows that it took nearly 4 h to get the battery from 100 percent to almost 0 percent. If the distance between the communicating nodes is small, then the energy consumed in data communication is also less. However, if the distance between the communicating nodes is large, then the energy consumed by the nodes is also higher. As in the second scenario, the communication between the nodes was very low compared to the other If the distance between the communicating nodes is small, then the energy consumed in data communication is also less. However, if the distance between the communicating nodes is large, then the energy consumed by the nodes is also higher. As in the second scenario, the communication between the nodes was very low compared to the other scenarios where communication between the nearby nodes was too high. Therefore, triggering the need for an adaptive network.

Metrics for Evaluation of Model Performance
After adjusting the learning algorithm to perform the task, we must measure its efficiency, that is, try to extract some measure that informs us of how well (or poorly) it is doing. As in the cases of supervised and unsupervised learning, the objectives sought are very different; the efficiency of some or other algorithms is also usually defined in very different ways.
The case of supervised learning is the most natural and usual. In this case, we have a set of initial examples on which we perform the learning and from which we know the desired result that our algorithm must return. We want to see if the machine is able, from the trained examples, to generalize the learned behavior so that it is good enough on data not seen a priori, and if so, we say that the machine (model, algorithm) generalizes correctly. Since supervised learning algorithms learn from this data to adjust their internal parameters and return the correct answer, there is little point in measuring the machine's efficiency by bypassing the same data back to it since the information it would give us would be misleadingly optimistic. In this paper, we have evaluated the first stage of our framework with the parameters shown in Appendix A.

Performance Comparison of the Proposed Model
The selection of the algorithms for comparing our proposed algorithm or model is very important. Therefore, we have selected the algorithms commonly used for Data classification in an IoT environment. Therefore, we have selected Logistic Regression (LR), Naive bias (NB), K-Nearest Neighbor (KNN), Decision Trees (DT), Random Forest (RF), and Support Vector Machines (SVM). We have used spyder IDE, DL Libraries, and leveraged machines for the model development and packages in Keras, TensorFlow, and scikit-learn. The experiment was done on a windows-based operating system running python, with an intel i7 processor, 16 GB of RAM, and 1 TB of secondary storage. We have used four data sets. The results for all the datasets are shown below from Tables 3-6. Furthermore, the graphs are plotted for the same as shown in Figure 8a-h.
During the first experiment of DS1 (Data set first), it was observed that NB was the best in the execution time but lacked accuracy. The HRCKNN was average in execution time, but the accuracy of HRCKNN was the best among all the machine learning algorithms. The accuracy was 85 percent, followed by the SVM, which showed the same accuracy. Nevertheless, the SVM lacks execution time badly, i.e., 9.9 s which was the worst among all the algorithms. Table 4 shows the results of the second data set. The execution time was the least in our proposed model at 0.004 s, and it was seen as worst in the case of SVM. The KNN was comparatively better in terms of precision, recall, F1 score, and accuracy. However, its execution time was not good in comparison with HRCKNN. HRCKNN and KNN produced an accuracy of 98 percent. Table 5 used data set 3, and Naive Bias was the fastest among all the algorithms but was worst in accuracy. Similarly, SVM produces an accuracy of 85 percent but was the slowest and took 9.9 s in execution. However, HRCKNN proved to be the best in terms of accuracy, i.e., 85 percent, and was near the best execution time, i.e., 0.036 s.
In Table 6, Naive bias has the best execution time but lacks the accuracy of the data, i.e., only 14 percent. Here HRCKNN has the best accuracy among all the algorithms.
The graphs between the execution time and accuracy are plotted and shown from Figure 8a-d. Moreover, the graphs between accuracy and sensitivity are plotted and shown in Figure 8e-h.
In [30], Alimjan et al. proposed a hybrid classification technique by combining SVM and KNN. In their experiment, they used two data sets and got an accuracy of 91.18% and 94.62%, respectively. In our experiment, we have used four data sets, and the two best results for our experiments accuracy values were 98% and 92%. scenarios where communication between the nearby nodes was too high. Therefore, triggering the need for an adaptive network.

Metrics for Evaluation of Model Performance
After adjusting the learning algorithm to perform the task, we must measure its efficiency, that is, try to extract some measure that informs us of how well (or poorly) it is doing. As in the cases of supervised and unsupervised learning, the objectives sought are very different; the efficiency of some or other algorithms is also usually defined in very different ways.
The case of supervised learning is the most natural and usual. In this case, we have a set of initial examples on which we perform the learning and from which we know the desired result that our algorithm must return. We want to see if the machine is able, from the trained examples, to generalize the learned behavior so that it is good enough on data not seen a priori, and if so, we say that the machine (model, algorithm) generalizes correctly. Since supervised learning algorithms learn from this data to adjust their internal parameters and return the correct answer, there is little point in measuring the machine's efficiency by bypassing the same data back to it since the information it would give us would be misleadingly optimistic. In this paper, we have evaluated the first stage of our framework with the parameters shown in Appendix A.

Performance Comparison of the Proposed Model
The selection of the algorithms for comparing our proposed algorithm or model is very important. Therefore, we have selected the algorithms commonly used for Data classification in an IoT environment. Therefore, we have selected Logistic Regression (LR), Naive bias (NB), K-Nearest Neighbor (KNN), Decision Trees (DT), Random Forest (RF), and Support Vector Machines (SVM). We have used spyder IDE, DL Libraries, and leveraged machines for the model development and packages in Keras, TensorFlow, and scikitlearn. The experiment was done on a windows-based operating system running python, with an intel i7 processor, 16 GB of RAM, and 1 TB of secondary storage. We have used four data sets. The results for all the datasets are shown below from Table 3 to Table 6. Furthermore, the graphs are plotted for the same as shown in Figure 8a   During the first experiment of DS1 (Data set first), it was observed that NB was the best in the execution time but lacked accuracy. The HRCKNN was average in execution time, but the accuracy of HRCKNN was the best among all the machine learning algorithms. The accuracy was 85 percent, followed by the SVM, which showed the same accuracy. Nevertheless, the SVM lacks execution time badly, i.e., 9.9 s which was the worst among all the algorithms. Table 4 shows the results of the second data set. The execution time was the least in our proposed model at 0.004 s, and it was seen as worst in the case of SVM. The KNN was comparatively better in terms of precision, recall, F1 score, and accuracy. However, its execution time was not good in comparison with HRCKNN. HRCKNN and KNN produced an accuracy of 98 percent.

Simulation Parameters
We have tested the MLADCF through simulation, deploying IoT sensors, and creating an IoT environment for live data capturing. The simulation parameters are shown in Table 7.

MLADCF Results for Energy Optimization of the Network
We have included energy, Alive nodes, processing time, and storage to evaluate the performance of our MLADCF, and we have shown the results in the form of graphs. The node's energy during the simulation process is shown in Figure 9. The energy of the node is calculated by equation A5 and equation A6. The results show that the node's energy is better in our MLADCF. In the case of a traditional IoT environment, it was observed that the energy was getting exhausted during the process in comparison to our MLADCF. The result shown in Figure 9 shows an improvement in the node's energy by 11.9%. We have taken all the scenarios in our IoT environment and compared the scenarios causing the energy drop in an IoT node. We have observed an 80% energy drop in scenarios where nodes communicate more than transmitting the data. As every node has a major impact on the overall IoT environment, each node is responsible for the life of the network. If a node is overloaded with incoming data traffic, its energy affects the overall network. The observation from Figure 10 leads us toward the results of our MLADCF. In Figure 10, we can see that the number of alive nodes is more at all the points than in the traditional framework, hence improving the network's life. Moreover, the improvement was calculated to be 24 percent. The increased data traffic from the millions of sensors required more storage capacity. In our proposed frame MLADCF, we have considered the storage problem, and it is observed from Figure 11 that as the nodes start capturing data, the storage used is further As every node has a major impact on the overall IoT environment, each node is responsible for the life of the network. If a node is overloaded with incoming data traffic, its energy affects the overall network. The observation from Figure 10 leads us toward the results of our MLADCF. In Figure 10, we can see that the number of alive nodes is more at all the points than in the traditional framework, hence improving the network's life. Moreover, the improvement was calculated to be 24 percent. As every node has a major impact on the overall IoT environment, each node is responsible for the life of the network. If a node is overloaded with incoming data traffic, its energy affects the overall network. The observation from Figure 10 leads us toward the results of our MLADCF. In Figure 10, we can see that the number of alive nodes is more at all the points than in the traditional framework, hence improving the network's life. Moreover, the improvement was calculated to be 24 percent. The increased data traffic from the millions of sensors required more storage capacity. In our proposed frame MLADCF, we have considered the storage problem, and it is observed from Figure 11 that as the nodes start capturing data, the storage used is further reduced in our MLADCF. The number of secondary nodes for the data processing is in-  The increased data traffic from the millions of sensors required more storage capacity. In our proposed frame MLADCF, we have considered the storage problem, and it is observed from Figure 11 that as the nodes start capturing data, the storage used is further reduced in our MLADCF. The number of secondary nodes for the data processing is increased. The results for the processing time are shown in Figure 12. It is observed from the graph that the processing time decreases if the number of nodes is increased.

Conclusions
The optimization of the resources is important in resource-constrained IoT applications. Implementation of machine learning approaches to meet the expectations of the application is required. Offloading data from IoT devices through the IoT network to the nearest fog node is used in many applications, but the data offloading depends on many factors to optimize resources. The optimization is possible through mechanisms at different layers of the IoT environment. This paper discusses the layered approach for data classification, and we have focused on the overall energy of the IoT network. Based on the device capability, it is decided where to process, store, and communicate the sensed data within the permissible limits of delay in real-time IoT applications. In the proposed framework for the optimization of the resources, the machine learning algorithm is used to decide at which layer of the network and device level within the IoT network, i.e., IoT device, node, cluster node, or fog node, the processing, storage, communication of the data is to

Conclusions
The optimization of the resources is important in resource-constrained IoT applications. Implementation of machine learning approaches to meet the expectations of the application is required. Offloading data from IoT devices through the IoT network to the nearest fog node is used in many applications, but the data offloading depends on many factors to optimize resources. The optimization is possible through mechanisms at different layers of the IoT environment. This paper discusses the layered approach for data classification, and we have focused on the overall energy of the IoT network. Based on the device capability, it is decided where to process, store, and communicate the sensed data within the permissible limits of delay in real-time IoT applications. In the proposed framework for the optimization of the resources, the machine learning algorithm is used to decide at which layer of the network and device level within the IoT network, i.e., IoT device, node, cluster node, or fog node, the processing, storage, communication of the data is to

Conclusions
The optimization of the resources is important in resource-constrained IoT applications. Implementation of machine learning approaches to meet the expectations of the application is required. Offloading data from IoT devices through the IoT network to the nearest fog node is used in many applications, but the data offloading depends on many factors to optimize resources. The optimization is possible through mechanisms at different layers of the IoT environment. This paper discusses the layered approach for data classification, and we have focused on the overall energy of the IoT network. Based on the device capability, it is decided where to process, store, and communicate the sensed data within the permissible limits of delay in real-time IoT applications. In the proposed framework for the optimization of the resources, the machine learning algorithm is used to decide at which layer of the network and device level within the IoT network, i.e., IoT device, node, cluster node, or fog node, the processing, storage, communication of the data is to be taken. The data classification helps to optimize resource utilization within the IoT framework. We have analyzed and classified the IoT data at the device level and the edge level, allowing us to design our proposed framework (MLADCF).
This work has allowed us to understand the current problems IoT systems face and how emerging technologies such as Edge and Fog computing try to solve these problems, reducing the latencies inherent to the network in which these systems operate and the costs associated with cloud services. On the other hand, we cannot understand that deploying this type of system is not easy, so it is not easy to determine where to start on many occasions. We are talking about heterogeneous platforms with a diversity of operating systems, hardware elements from different manufacturers, different solutions in the cloud, and different communication protocols, so it is necessary to have solutions that allow the interoperability of all these elements. We have seen how different scenarios in an IoT environment contribute to IoT standardization by facilitating the adoption of these systems. We have compared our results with the existing approaches and the results of the four data sets captured by these scenarios have shown improvement in terms of accuracy, execution time, precision, recall, and F1-score. Moreover, the experiment shows that the overall energy of the system improved as the number of nodes in the IoT network increased by 11.9 percent. In this regard, it is considered that the objectives initially set have been achieved by the proposed methodology. Future works will investigate the case of privacy preserving for IoT based on the blockchain and homomorphic encryption technologies [74,75] and also the uncertainty modeling for a better decision-making [76,77].
Accuracy measures the percentage of cases that the model has got right. This is one of the most used and favorite metrics among all the researchers. The completeness metric will inform us about the amount that the machine learning model can identify.

F1 Score
The F1 value combines the precision and recall measurements into a single value. This is handy because it makes comparing the combined accuracy and recall performance between various solutions easier. When avoiding both false positives and false negatives is equally important to the problem, a balance between Precision and Sensitivity is needed. In this case, the metric F1 can be used, which is defined as the harmonic mean between these values. F1 Score = 2 × Precision * Recall Precision + Recall (A4) A machine learning classification model can be used to predict the actual class of the data directly or, much more interestingly, predict its probability of belonging to different classes. The latter gives more control over the output, and a custom threshold can be used to interpret the classifier output, which is often more prudent than building a completely new model if the last one has failed.

Performance Evaluation
The comparative analysis is carried out through the following parameters:

Round
The completion of a process is called a round. It starts with the sensing of the data and ends till the data is pushed to the edge level.

Energy
The difference between the total energy of an IoT node and the energy consumed in one round.
where E nr is the remaining energy, E nt is the total energy before the round, and E nc is the energy left after the round. Equation (A5) can also be written as This is called the average energy of the IoT system, where nr is the active nodes.

of 29
Let S o is the storage occupied in an IoT node after the round. It can be calculated by the following equation.
S o = S t −S r (A7)