Autonomous Internet of Things (IoT) Data Reduction Based on Adaptive Threshold

With the development of intelligent IoT applications, vast amounts of data are generated by various volume sensors. These sensor data need to be reduced at the sensor and then reconstructed later to save bandwidth and energy. As the reduced data increase, the reconstructed data become less accurate. Usually, the trade-off between reduction rate and reconstruction accuracy is controlled by the reduction threshold, which is calculated by experiments based on historical data. Considering the dynamic nature of IoT, a fixed threshold cannot balance the reduction rate with the reconstruction accuracy adaptively. Aiming to dynamically balance the reduction rate with the reconstruction accuracy, an autonomous IoT data reduction method based on an adaptive threshold is proposed. During data reduction, concept drift detection is performed to capture IoT dynamic changes and trigger threshold adjustment. During data reconstruction, a data trend is added to improve reconstruction accuracy. The effectiveness of the proposed method is demonstrated by comparing the proposed method with the basic Kalman filtering algorithm, LMS algorithm, and PIP algorithm on stationary and nonstationary datasets. Compared with not applying the adaptive threshold, on average, there is an 11.7% improvement in accuracy for the same reduction rate or a 17.3% improvement in reduction rate for the same accuracy.


Introduction
The rapid advancement in mobile smart hardware has enabled the creation of intelligent IoT applications, which generate a vast amount of sensor data [1].The amount and geographic distribution of these data make them distinctive.Processing such data requires sending the data to a remote processor, such as a sink node, an edge device, or a cloud center, as sensor devices frequently lack the computing and storage power to do so [2].These sensor data are often reduced at the sensor and then reconstructed at the data processor to save bandwidth and communication costs.
Research on data collection and reduction in wireless sensor networks (WSNs) aims to reduce IoT nodes' energy consumption by reducing data transmission volume [3].Data compression, data prediction, and data aggregation are the three main types of data reduction algorithms [4].Data prediction is a more popular and effective strategy because it may achieve a substantial data reduction ratio in contrast with other strategies [5].Data prediction executes by building a data prediction model with the same parameters at both the sensor and the remote processor.The data predicted by the sensor and the remote processor are identical at one time.Therefore, the sensor only needs to determine if the predicted data are accurate or not before uploading the data.No data transmission is necessary if the difference between the predicted value and the collected value is smaller than the reduction threshold.If not, the remote processor receives the data sensor gathered, and the prediction model is updated [4].As the reduced data increase, the reconstructed data become less accurate.Usually, the trade-off between reduction rate and reconstruction accuracy is controlled by the reduction threshold, which is calculated by experiments based on historical data.
Considering the dynamic nature of IoT, a fixed threshold can rarely maintain the optimal balance, resulting in a lower reduction rate and reconstruction accuracy.If the fixed threshold is smaller than expected, less data are reduced and data reconstruction becomes easier.While the reconstruction accuracy is superior, the data reduction rate is compromised, leading to unacceptable energy and bandwidth consumption.On the other hand, if the fixed threshold is excessive, too much data are reduced, and it is difficult to reconstruct accurate data at a remote processor.Experiments in the literature [6] also showed that as the reduction parameter increased, the degree of simplification of a reduced object also increased.Thus, we consider dynamically adjusting the threshold to increase the reduction effect in a dynamic IoT.When IoT data change frequently, the threshold can be lower to upload additional data for reconstruction.As the data stabilize, the threshold can gradually increase to reduce more unessential data.
Aiming to dynamically balance the reduction rate with the reconstruction accuracy, an autonomous IoT data reduction method based on an adaptive threshold is proposed.The proposed method consists of a data reduction phase and a data reconstruction phase.During the reduction phase, concept drift detection is performed to capture IoT data changes and trigger threshold adjustment.The threshold is adjusted to be lower if concept drift occurs, and higher otherwise.During reconstruction, data trends are introduced to improve reconstruction accuracy.When concept drift detection identifies data changes, a data trend is introduced to replace a fixed linear rate from a Kalman filter for higher data reconstruction accuracy.To verify the applicability of the proposed method, experiments are executed in seven properties on three datasets, including stationary and nonstationary types.Then, a comparative analysis with the basic Kalman filtering algorithm [7], LMS filter algorithm [8], and critical+PIP algorithm [9] is conducted.Moreover, our main contribution is as follows.

•
First, to the best of our knowledge, this is the first scheme to incorporate an adaptive reduction threshold into a data reduction algorithm based on Kalman filtering, which enables autonomous IoT data reduction without the need for cloud.

•
Second, aiming to execute reduction threshold adjustment dynamically, a concept drift detection to capture IoT changes is introduced.• Third, we add a data trend in the data reconstruction stage to further improve data reconstruction accuracy.
The rest of the paper is organized as follows.In Section 2, we analyze the lack of autonomy from traditional data reduction and several IoT data reduction algorithms, then give essential background knowledge of Kalman filtering and data reduction based on prediction.The autonomous IoT data reduction algorithm is presented in Section 3 in the order of two steps.Section 4 consists of experimental evaluations of data reduction rate and data reconstruction accuracy on stationary and nonstationary datasets.Finally, Section 5 concludes the paper and provides insights on autonomous edge data reduction.

Related Works
In related works, first, they introduce data compression and data aggregation methods in wireless sensor networks and why they are less autonomous and suitable for a dynamic IoT.Second, data reduction methods in IoT are introduced, especially data prediction that takes into account a dynamic IoT.After that, we summarize the basic process and mathematical basis of using a Kalman filter for data reduction.This process and mathematical notation will continue to be used for the rest of this article.

Data Compression
Data compression techniques [10,11], also known as compressive sampling or compressive sensing, are based on the inherent sparsity properties of natural signals and reduce the original signal according to the Shannon-Nyquist theorem.Data compression can significantly reduce the energy consumption used for data acquisition in IoT nodes.For example, Chang et al. [12] applied the mean difference (MD) for filtering data noise and redundant values in the proposed an AIoT architecture.Gilles et al. [13] used a compressed sensing approach based on a sub-Nyquist scheme, known as a modulated wideband converter, to solve wideband spectrum sensing.Aniol et al. [14] proposed an algorithm based on linear prediction that can perform both the lossless and near-lossless compression of RF signals.
The proposed algorithm is coupled with two signal detection methods to determine the presence of relevant signals and apply varying loss levels as needed.In data compression, the amount of reduced data depends on the compression algorithm, and thus, the reduction rate can rarely be adjusted autonomously according to the dynamic IoT.Meanwhile, the real-time compression and decompression also put pressure on the storage and computing capabilities of IoT devices.

Data Aggregation
Data aggregation [15] is mainly used at the sink node to regulate sensor sampling frequency and thus optimize energy consumption.It works in two ways.First, it dynamically adjusts the sensor-sampling-frequency-based variance between sensor data at a given epoch, which reduces the energy consumption of the sensing unit by preventing the sensor from collecting redundant information.Second, it dynamically adjusts the rate at which features are computed from the original signal.Chen et al. [16] proposed to extract the data features based on fast Fourier transform (FFT) and apply K-means to generate a set of patterns to represent the time-series data in the application of reducing real-time bridge vibration data.Wang et al. [17] proposed an energy-efficient load balancing tree-based data aggregation scheme (LB-TBDAS) for grid-based WSNs.In the scheme, the sensing area is partitioned into many cells of a grid, and the treelike path is established by using the minimum spanning tree algorithm.Zhang et al. [18] proposed a lightweight and privacy-friendly data aggregation scheme against abnormal data, in which the valid data can correctly be aggregated, but abnormal data will be filtered out during the aggregation process.Data aggregation emphasizes the task allocation of data reduction and reconstruction at the physical level.Then, additional data reduction algorithms are required at each node.In this paper, the sensor performs data reduction, and the remote processor performs reconstruction.

IoT Data Reduction
To reduce data transferred to an edge node, Bhargava et al. [19] came up with the idea of only storing values that cannot be predicted accurately based on history.According to an analysis of geographical restrictions from Cao et al. [20], only data along the trajectory for local services should be collected.Wang et al. [21] built an RNN by edge-cloud cooperating for performing data prediction on the edge node and selecting necessary data for updating the data prediction model to upload.The edge data were divided into known situations and unknown scenarios by Zhang et al. [22] for learning model updates.Only the recognized unknown situations are sent to the cloud, and other redundant data are discarded.
An in-networking approach is proposed in [7] based on data prediction.The proposed approach consists of data filtering and data fusion layers.The data filtering layer aims to minimize the number of transmissions.At the same time, the data fusion layer fuses the data based on the minimum squared error criterion.Its Kalman filter double-layer architecture is used in this paper as the base model and comparison method.The least mean square (LMS) algorithm is proposed in [8].The algorithm is based on two decoupled LMS windowed filters combined convexly with different sizes.It estimates future readings at both the sink and sensor nodes.Data transmission occurs if the current reading deviates significantly from a predefined threshold.
Given the dynamic nature of IoT, several existing data prediction approaches focus on dynamic edge resources and sensor hardware for data reduction.Fuzzy redundancy elimi-nation for data deduplication (FREDD) [23] finds that traditional data reduction overlooks the context and dynamics of the network, meanwhile relying on a fixed threshold to execute data reduction.Simple natural language rules represent domain knowledge and expert preferences regarding data duplication boundaries.It is adapted for multiple scenarios, considering both static and mobile devices, with different configurations of hard-separated and soft-separated zones and sensor coverage areas.Data redundancy management for leaf-edges (DRMF) [24] allows for identifying and removing data redundancies in connected environments at the device level.DRMF considers static and mobile edge devices and provides two temporal and spatiotemporal redundancy detection algorithms.Once redundancies are identified, DRMF performs data deduplication, considering the dynamic requirements of data consumers and device resources.Meanwhile, data inaccuracies and unreliability due to sensor dynamics are usually ignored [5].Thus, data reduction and faulty data detection are proposed while enhancing data reliability.
The following is a summary of how our approach differs from other data reduction techniques.First, existing data reduction techniques rarely consider the dynamic balance between reduction rate and reconstruction accuracy in a dynamic IoT.Second, we adaptively adjusted thresholds for autonomous data reduction using concept drift detection [25].Finally, current Kalman-filter-based data prediction techniques assume that the IoT data vary linearly [26] due to low computing capabilities of sensors.They do not use an adaptive data trend [27] to forecast future data.

Kalman Filtering Basics
To introduce data reduction based on Kalman filtering, we give a brief review of Kalman filtering [26], which contains two steps, named the prediction step and the correction step.The prediction step can be described as where x k is the estimate of the state at time step k, A k is the state transition matrix, B k is the control input matrix, u k is the control input, P k is the estimate of the covariance matrix of the state estimate, and Q k is the process noise covariance matrix.The correction step can be described as where y k is the innovation, z k is the measurement, H k is the measurement matrix, S k is the covariance of the innovation, K k is the Kalman gain, and R k is the measurement noise covariance matrix.
In basic data prediction based on Kalman filtering methods, z k denotes the real-time data collected by the sensor, and x k the data predicted by the Kalman filter.Data prediction executes by building a data prediction model with the same Kalman filter parameters at both the sensor and the remote processor.The data predicted by the sensor and the remote processor are identical at one time.Therefore, the sensor only needs to determine if the predicted data x k are accurate or not before uploading the data.No data transmission is necessary if the difference between the predicted value x k and the collected value z k is smaller than the reduction threshold.
When e k calculated by Equation ( 8) is less than e max , the error is accepted, and data do not need to be uploaded.The parameter e max determines the accuracy tolerance and reconstruction accuracy.Thus, the value of e max is crucial in balancing the reduction rate with reconstruction accuracy.

Proposed Adaptive Reduction Threshold Data Reduction Method
Aiming to dynamically balance the reduction rate with reconstruction accuracy, we propose an autonomous IoT data reduction method based on an adaptive threshold.The proposed method consists of five modules: sensor data acquisition, concept drift detection, threshold adaptive adjustment, data reduction, and data reconstruction.As shown in Figure 1, the modules are divided into two main components: the sensor and the remote processor.The sensor is responsible for data acquisition and reduction, while the remote processor is for data reconstruction.Sensor data are transmitted to the concept drift detection after a sensor data acquisition module.The concept drift detection module is responsible for detecting IoT data changes.If concept drift is found, the adaptive threshold adjustment module lowers the threshold e max .In other cases, e max rises and transmits to the data reduction module.The basic Kalman filter was used to execute the reduction in the data reduction module.Next, it was chosen whether to transmit the real data z k to the remote processor based on the comparison with the threshold.If x k is similar to z k , there is no need to transmit z k to a remote processor, and a remote processor uses x k predicted locally in the same parameters with a sensor.Otherwise, z k should transmit to a remote processor and be assigned to z k for accurate later prediction.This assignment is an update to the remote processor's Kalman filter, which failed to predict at time k and needs to be updated for later predictions.Without z k , the data reconstruction module forecasts x k based on the data trend d k and Kalman filter.Each algorithm is analyzed in subsections next in this paper.

Adaptive Adjustment for Reduction Threshold Based on Concept Drift Detection
As mentioned above, adaptive threshold adjustment based on concept drift detection is vital for balance reduction rate and reconstruction accuracy.The detected concept drift indicates a change in the data pattern in a given time window, necessitating a lower data reduction rate to capture more data.Without drift, the data reduction rate gradually increases to filter out irrelevant data.
The Kalman filter assumes that observed data vary linearly [26] and that the linear change rate is constant.Since IoT is dynamic, the linear rate may change sometimes.The linear rate will likely change when the absolute value of a cumulative increment over a time window is abnormal.Thus, the cumulative sum (CUSUM) algorithm [28] is employed to detect concept drift.CUSUM is a statistical control method that detects small shifts in the mean value of a process by monitoring it over time.The CUSUM algorithm accumulates and amplifies persistent biases, thus allowing earlier detection of concept drift, such as linear rate changes.Furthermore, we demonstrate that the CUSUM algorithm can be integrated with other concept drift detection methods by merely swapping out the drift detection module with a different algorithm.
The algorithm works as follows.To address detected concept drift, if the current value of e max exceeds the established error min , e max decreases to lower the reduction rate and enhance the reconstruction accuracy.Without concept drift, e max increases for a higher reduction rate.Adjustments of e max are subject to the constraint that they must remain within the specified error max and error min .When the values of error max or error min are large, a higher data reduction rate is chosen at the expense of a lesser level of reconstruction accuracy, which is suited for sensors with limited processing power.When error max or error min is small, a higher reconstruction accuracy can be guaranteed instead of pursuing a higher data reduction rate.More complex and intelligent decision-making behaviors can be performed based on more accurate data.The settings of error max and error min need to be analyzed and set after particular experiments on different datasets.In the experimental section of this paper, the data reduction rate and reconstruction accuracy are compared and analyzed in detail for different threshold values.Meanwhile, the step size of each threshold change depends on the experimental setup and preferences for how fast or slow the concept drift needs to be adapted.

Autonomous Data Reduction Algorithm Based on Adaptive Reduction Threshold
Next, we describe how to execute autonomous data reduction with an adaptive threshold.In addition, a mechanism for calculating and uploading data trend is shown.When the data are initialized, z 1 is uploaded and stored into the cachedval.With using historical data, cache the actual value before calculating d k .To determine whether concept drift has taken place and to establish the new threshold, Algorithm 1 is performed.If the threshold value has changed, it suggests there may have been a change in the linear rate, in which case, d k should be uploaded instead of H k to forecast future data.Uploading d k is not necessary in any other case.The estimated value x k is then calculated using the Kalman filter, and the gap between the estimated value and the actual value is compared with e max .The real value z k should be submitted when the difference exceeds e max .send z k to remote processor Concept drift indicates the possibility of linear rate change, invalidating the original Kalman filter assumption of a constant linear rate.As a result, d k should be submitted for prediction instead of H k .The observation does not match the sensor's predicted value when e k is larger than the threshold, and it is also challenging to reconstruct.However, these may be anomaly data or a measurement error rather than a concept drift or linear rate change.In this scenario, H k remains valid to forecast future data.

Data Reconstruction Algorithm Based on Data Trend
After autonomous data reduction, the remote processor does not receive the data uploaded from the sensor in every time window.When the data processor receives z k , there is no need for data reconstruction.Nevertheless, when the remote processor fails to receive sensor data, data reconstruction is performed using a Kalman filter assisted by the data trend d k .The data reconstruction procedure is detailed in Algorithm 3. The Kalman filter assumes that the observed data vary linearly.Since IoT is dynamic, nonlinear changes could occur sometimes.Nonlinear Kalman filters, however, are challenging to implement in IoT due to limited computing and storage capacity.As a result, when concept drift detection identifies data changes, we use a data trend to replace the fixed Hk and forecast future value.
Upon receiving of the data z k from the sensor, the Kalman filter at the remote processor undergoes a data reconstruction phase.Data trends are stored to facilitate data reconstruc-tion in subsequent cycles.Reversely, the Kalman filter is utilized to predict z k based on x k .First, the difference between the Kalman filter's predicted value x k and the reconstructed data is calculated.If this difference exceeds a specified threshold e max , the trend of the data d k is utilized for reconstruction.Otherwise, the result of the Kalman filter is employed as the reconstruction outcome.

Datasets and Experiment Setting
For the experiments, three datasets were selected for analysis.The first dataset, Intel Lab data (Bodik P, Hong W, Guestrin C. Intel Lab data.http://db.csail.mit.edu/labdata/labdata.html,2004), comprises information on data collected from 54 sensors deployed at Intel Lab from 28 February 2004, to 5 April 2004.Data were collected at a frequency of 30 seconds per sample, and temperature, humidity, light, and voltage properties were included.Experimental comparisons were performed using 6000 temperature, humidity, and light sensor data from this dataset.The second dataset, the Individual Household Electric Power Consumption dataset (Lichman M, UCI Machine Learning Repository.University of California, Irvine, School of Information and Computer Sciences, 2013), encompasses 2,075,259 measurements collected from December 2006, to November 2010 in residences in Sceaux, France.The data were acquired at 60 seconds per sample frequency and included attributes such as voltage, current, and power.Experimental comparisons were performed using 6000 voltage, current, and power sensor data from this dataset.The third dataset is the Dodgers Loop Sensor dataset (Lichman M, UCI Machine Learning Repository.University of California, Irvine, School of Information and Computer Sciences, 2013), which contains data collected from 10 April 2005, to 1 October 2005, on the Glendale ramp of the Los Angeles 101 North Freeway.Experimental comparisons were performed using 6000 data points within this dataset.
Upon conducting ADF root mean square tests on the above properties, we found p-values of 0.937 and 0.9024 for Intel Lab data, and 0.7437 and 0.7598 for current and power in the Household Power Consumption data.These values were significantly higher than 0.05, leading to the acceptance of the null hypothesis H 0 and indicating that the data exhibited stationary patterns.In contrast, ADF test results for the illumination attribute in Intel Lab data, voltage attribute in Household Power consumption data, and vehicle count attribute in Dodge Loop Sensor Data reveal p-values of 6.54 × 10 −16 , 1.31 × 10 −12 and 0.0, respectively.These values were close to 0, leading to the rejection of H 0 and suggesting that these data exhibited nonstationary patterns, as shown in Table 1.This paper compared two aspects to evaluate the effectiveness of the proposed method in data reduction: data reduction rate (DRR) and data reconstruction accuracy (DRA).The definition of the data reduction rate is shown in Equation (10), where DRR represents the data rate, AD represents the total amount of data, and RD represents the total amount of remaining data after reduction.The data reconstruction accuracy is inspired by the Jaccard similarity between reconstructed and original data of the same length.
Let T 1 = [z 1 , z 2 , . . ., z n ] be the actual collected data, and T 2 = [r 1 , r 2 , . . ., r n ] be the reconstructed data.The Jaccard similarity between T 1 and T 2 is calculated using Equation (11), where DRA represents the data reconstruction accuracy, and n represents the number of reconstructed data.Meanwhile, we calculate the transmission of d k when computing the transmission of our method.DRR = AD − RD AD × 100 ( 10)

Experiments on Adaptive Reduction Threshold e max
To ensure that the threshold varies within a specific range, the algorithm is executed on the same dataset, the range of threshold variation is calculated and shown in Tables 2 and 3.The calculated range of threshold variation is also used as a statistical value in subsequent data reduction comparison experiments.Experiments were conducted using the proposed data reduction method on the temperature and humidity attributes of the Intel Lab data dataset, as well as the current and power characteristics of the Household Power Consumption dataset with stationary-type variations.The threshold range for the temperature attribute was set between 0.01 and 0.1 °C, with an average adaptive threshold of 0.0598 °C, a median threshold of 0.07 °C, a mode threshold of 0.09 °C, and a Pearson correlation coefficient of −0.432 between the threshold variation process and the temperature attribute.The threshold range for the humidity attribute was set to 0.01-0.14%,with an average adaptive threshold of 0.0654%, a median threshold of 0.06%, a mode threshold of 0.01%, and a Pearson correlation coefficient of 0.4882 between the threshold variation process and the humidity attribute.For the current attribute of the Household Power Consumption dataset, the threshold range was set between 0.2 A and 4 A, with an average adaptive threshold of 2.45 A, a median threshold of 3.4 A, a mode threshold of 4 A, and a Pearson correlation coefficient of −0.548 between the threshold variation process and the current attribute.The threshold range for the power attribute was set between 0.25 and 7.2 W, with an average adaptive threshold of 2.26 W, a median threshold of 0.85 W, a mode threshold of 0.25 W, and a Pearson correlation coefficient of 0.5003 between the threshold variation process and the humidity attribute.The threshold variation for all attributes showed a moderate correlation with the Data, demonstrating the effectiveness of the proposed dynamic threshold adjustment mechanism in the Data reduction mechanism.The adaptive adjustment mechanism based on concept drift detection can adjust the reduction rate as the data change pattern evolves.
The following two figures depict the threshold variation process of stationary data.Figure 2 corresponds to the Intel Lab dataset, where Figure 2a shows the temperature data change, Figure 2b shows the temperature threshold change, Figure 2c shows the humidity data change, and Figure 2d shows the humidity threshold change.Figure 3 corresponds to the Household Power Consumption dataset, where Figure 3a shows the current data change, Figure 3b shows the current threshold change, Figure 3c shows the power data change, and Figure 3d shows the power threshold change.In this study, the proposed method for data reduction was applied to nonstationary data from the Intel Lab data for light intensity, the Household Power Consumption dataset for voltage, and the Dodgers Loop Sensor dataset for vehicle count.The threshold range for the light intensity attribute was set from 0.1 to 0.9 Lux, with an adaptive threshold average of 0.5139 Lux, a median threshold of 0.5 Lux, and a mode threshold of 0.4 Lux.The threshold change process showed a weak negative correlation with the current attribute, with a Pearson correlation coefficient of −0.398.For the voltage attribute, the threshold range was set from 0.21 to 1.5 V, with an adaptive threshold average of 1.02 V, a median threshold of 1 V, and a mode threshold of 1 V.The threshold change process showed a weak positive correlation with the humidity attribute, with a Pearson correlation coefficient of 0.344.For the Dodgers Loop Sensor, the vehicle count threshold range was set from 1 to 7, with an adaptive threshold average of 1.383, a median threshold of 1, and a mode threshold of 1.The threshold change process showed a weak negative correlation with the humidity attribute, with a Pearson correlation coefficient of -0.372.When dealing with nonstationary data, the threshold change process is weakly correlated with the data attributes.The data fluctuation is relatively large, resulting in significant differences between adjacent data points.Consequently, the Kalman filter model may fail to predict the next data value accurately, and the error threshold will continue to decrease.The error threshold is maintained at a relatively low level to ensure data accuracy while the data reduction rate is decreased.
The following images depict the threshold variation process for nonstationary data.Figure 4a shows the change in light data, Figure 4b represents the corresponding threshold variation, Figure 4c shows the variation in voltage intensity, and Figure 4d shows the voltage threshold variation.Figure 4e illustrates the variation in count data, while Figure 4f presents the corresponding threshold variation.After error max and error min are set, we calculate a tenth of the difference between error max and error min as the step size.Each time e max increases or decreases, it changes by one step.

Experiments on Adaptive Reduction Rate and Reconstruction Accuracy
In this section, the effectiveness of the proposed method is validated for both stationary and nonstationary datasets by comparing it with fixed threshold reduction methods, such as basic Kalman filter and LMS filter reduction methods, as well as the non-threshold reduction method and the critical+PIP reduction method.The proposed method adjusts the reduction rate dynamically based on the data change pattern, and the reduction threshold changes during the reduction process.Due to various external factors that affect the data, the change patterns of single-dimensional sensor data may differ at different stages, leading to differences in data reduction rate and reconstruction accuracy.Therefore, the minimum, maximum, mean, and mode of the threshold values that are adaptively adjusted by the proposed method in different datasets are taken as the fixed threshold values in traditional methods for comparison with basic Kalman filter and LMS filter data reduction methods.The critical+PIP algorithm is a non-threshold data reduction algorithm, and its efficiency is measured by comparing the data reduction rate and data reconstruction accuracy of the critical+PIP algorithm under both stationary and nonstationary datasets.

Experiments on Stationary Attributes Compared with Fixed Threshold
By conducting experiments on temperature and humidity data from Intel Lab data, it is found that the proposed method has a higher data reduction rate than the basic Kalman and LMS filter data reduction methods with threshold values set by mean, median, and mode.As shown in Tables 4 and 5, when the threshold is set as the maximum value, the proposed method has only a slight reduction rate lower than that of Kalman and LMS filter.Moreover, the data reconstruction accuracy of the proposed method is higher than that of traditional Kalman and LMS filter data reduction methods with threshold values set by mean and median.Through a comparative experiment on the current and power data in the Household Power Consumption dataset, it is found that for stationary datasets, the data reduction rate of the proposed method is higher than that of the basic Kalman filter with mean or median as the threshold.As shown in Tables 6 and 7, the data reconstruction accuracy is better than that of basic Kalman filtering and LMS filtering under different threshold values.For stationary datasets, as the reduction threshold increases, the data reduction rate of the basic Kalman filter and LMS filter will continue to increase, but the data reconstruction accuracy will decrease.The data reduction rate and reconstruction accuracy of the traditional Kalman filter are both higher than those of the LMS filter.Compared with the proposed method, the data reduction algorithm is a dynamic mechanism for controlling the reduction rate, which can adjust the reduction rate dynamically according to the changing patterns of the data.The proposed method achieves higher data reconstruction accuracy when the data reduction rate is equal to that of the traditional Kalman filter.

Experiments on Stationary Attributes Compared with Critical+PIP
The critical+PIP algorithm is a non-threshold-based data reduction algorithm.This article measures the efficiency of the proposed algorithm by comparing its data reduction rate and data reconstruction accuracy with those of another algorithm in the Intel Lab data dataset, specifically for the current and power attributes of Household Power Consumption and the temperature and humidity attributes.
The experimental results in Table 8 show that the data reconstruction accuracy of the critical+PIP algorithm is unstable.To observe the difference in data reconstruction accuracy between the two algorithms, the data reduction rate is controlled between 20% and 80%.For the Intel Lab data, the temperature data reconstruction accuracy of the critical+PIP algorithm decreased from 91.28% to 69.73%, and the humidity data reconstruction accuracy decreased from 68.91% to 52.60%.When processing current data, the data reconstruction accuracy of the critical+PIP algorithm decreased from 99.12% to 73.97%, and when processing power data, the data reconstruction accuracy decreased from 97.10% to 60.49%.The proposed method achieves higher and more stable data reconstruction accuracy with the same data reduction rate.In the following, we will compare the fixed threshold methods in the nonstationary attributes.Analysis of the Intel Lab light data indicates that the data values remain mostly unchanged most of the time.As shown in Table 9, the proposed method exhibits a similar data reduction rate and data reconstruction accuracy to those of the other compared methods.We calculate the transmission of d k when computing the transmission of our method.Therefore, the effect may not be significant in light attribute in Intel Lab data, which may be caused by the fact that there are fewer nonlinear cases and d k does not need to be transmitted.In the case of voltage data shown in Table 10 and count data shown in Table 11 , the proposed method achieves a data reduction rate that is 10% lower than that of the traditional Kalman filter with a mean value threshold for nonstationary datasets.However, the data reconstruction accuracy of the proposed method is better than that of the traditional Kalman filter and LMS filter under different threshold conditions.When dealing with nonstationary datasets, the data reduction rate of the traditional Kalman filter and LMS filter will continue to increase with the increase in the reduction threshold.Still, the data reconstruction accuracy will be very low, and sudden changes in data anomalies cannot be observed in a timely manner.In the case of processing nonstationary datasets, the proposed method can automatically reduce the data reduction rate, maintain sensitivity, and sustain high data accuracy.

Experiments on Nonstationary Attributes Compared with Critical+PIP
This subsection will measure the efficiency of the proposed algorithm by comparing the data reconstruction accuracy of critical+PIP data reduction methods with the same data reduction rate.As Shown in the Table 12, the data reconstruction accuracy of the critical+PIP algorithm is not stable.When dealing with light data, the data reduction rate is controlled from 50% to 80%, the data reconstruction accuracy of critical+PIP decreases from 88.48% to 77.70%, and the data reconstruction accuracy of the proposed method decreases from 96.69% to 93.05%.When processing voltage data, the data reduction rate is controlled to 20-80%, and the data reconstruction accuracy of critical+PIP data decreases from 92.36% to 61.73%, while the data reconstruction accuracy of the proposed method in this paper decreases from 97.35% to 80.36%.It can be seen that the proposed method in this paper has higher data reconstruction accuracy with the same data reduction rate, while the data reconstruction accuracy is more stable, and the data reduction effect is better.The data reconstruction accuracy of the critical+PIP algorithm decreases from 75% to 64.54%, and the data reconstruction accuracy of this paper decreases from 83.24% to 67.70%.This paper's data reconstruction accuracy is higher with the same data reduction rate.Meanwhile, the data reconstruction accuracy of this paper's method is more stable than that of the critical+PIP algorithm when facing a nonstationary dataset.

Conclusions
The large amount of data generated by the sensor needs to be reduced at the sensor and subsequently reconstructed to save bandwidth and energy.As the reduced data increase, the reconstructed data become less accurate.The trade-off between reduction rate and reconstruction accuracy is commonly controlled by the reduction threshold, which is calcu-lated by experiments based on historical data.The motivation is that the basic assumption of the Kalman filter is to remove the influence of noise in the case of static linear rate, while a dynamic IoT may have special cases, such as linear rate change and concept drift.Using the original threshold significantly harms the reduction rate and reconstruction accuracy, and persists for long periods of time when concept drift occurs.In order to dynamically balance the reduction rate with the reconstruction accuracy, we propose an autonomous IoT data reduction method based on an adaptive threshold.During the data reduction phase, concept drift detection is performed to capture the IoT dynamic changes and trigger threshold adjustment.During the data reconstruction phase, a trend is added to the data to improve the reconstruction accuracy.The effectiveness of the proposed method is demonstrated by comparing the proposed method with the basic Kalman filtering algorithm, LMS algorithm, and PIP algorithm on stationary and nonstationary datasets.Compared with not applying the adaptive threshold, on average, we have an 11.7% improvement in accuracy for the same reduction rate or a 17.3% improvement in reduction rate for the same accuracy.The proposed approach focuses on addressing ongoing changes autonomously without cloud involvement, rather than short-term fluctuations, such as noise.Not limited to the IoT environment, the autonomous data reduction is also important to enable green and efficient data mining through energy and bandwidth saving.

Figure 1 .
Figure 1.Proposed adaptive reduction threshold data reduction method.

else 9 :c < error max then 10 :Raise the current threshold e max 11 :Algorithm 2
if if e max > error max then 12: e max = error max In Equation (9), d k represents the data trend at k and is smoothed with a weight α, which lies in the range [0, 1].A value of α close to 1 prioritizes the most recent trend.Both the true data value and data trend d k are transmitted to the remote processor for data reconstruction, as shown in Algorithm 2. Data Reduction Algorithm Based on Adaptive Reduction Threshold Input: current threshold e max , sensor reading z k , data trends d k−1 , data cache cachedval 1: while True do the Kalman filter to calculate estimated value x k 12: e k = z k − x k 13: if |e k | > e max then 14:

Algorithm 3 2 : 7 :
Data Reconstruction Algorithm Based on Kalman Filtering and Data Trend Input: sensor reading z k , data trends d k , data cache cachedval 1: while True do if sensor reading z k is not None then filter to calculate estimated value x k if |e k | > e max then 8: z k = z k−1 + d k 9: else 10:

Figure 2 .
Figure 2. Adaptive threshold chart of Intel Lab data.

Figure 3 .
Figure 3. Adaptive threshold chart of Household Power Consumption.

Algorithm 1
Adaptive Adjustment Algorithm for Reduction Threshold Input: current threshold e max , threshold minimum error min , threshold maxmum error max 1: while True do

Table 2 .
Threshold setting for stationary datasets.

Table 3 .
Threshold setting for nonstationary datasets.

Table 4 .
Experimental results of temperature data.

Table 5 .
Experimental results of humidity data.

Table 7 .
Experimental results of power data.

Table 8 .
Experimental results of stationary attributes compared with critical+PIP.

Table 9 .
Experimental results of light data.

Table 10 .
Experimental results of voltage data.

Table 11 .
Experimental results of count data.

Table 12 .
Experimental results of nonstationary attributes compared with critical+PIP.