IoT Implementation of Kalman Filter to Improve Accuracy of Air Quality Monitoring and Prediction

: In order to obtain high-accuracy measurements, traditional air quality monitoring and prediction systems adopt high-accuracy sensors. However, high-accuracy sensors are accompanied with high cost, which cannot be widely promoted in Internet of Things (IoT) with many sensor nodes. In this paper, we propose a low-cost air quality monitoring and real-time prediction system based on IoT and edge computing, which reduces IoT applications dependence on cloud computing. Raspberry Pi with computing power, as an edge device, runs the Kalman Filter (KF) algorithm, which improves the accuracy of low-cost sensors by 27% on the edge side. Based on the KF algorithm, our proposed system achieves the immediate prediction of the concentration of six air pollutants such as SO 2 , NO 2 and PM2.5 by combining the observations with errors. In the comparison experiments with three common predicted algorithms including Simple Moving Average, Exponentially Weighted Moving Average and Autoregressive Integrated Moving Average, the KF algorithm can obtain the optimal prediction results, and root-mean-square error decreases by 68.3% on average. Taken together, the results of the study indicate that our proposed system, combining edge computing and IoT, can be promoted in smart agriculture.


Introduction
The IoT (Internet of Things) is an important part of the new generation of information technology.It refers to a huge network formed by combining various information sensing devices with the Internet [1].In recent years, the widespread use of IoT terminal equipment has led to a spurt of terminal data and connections, requiring a more computationally efficient IoT network architecture to enable timely data analysis and processing.At the same time, IoT business is continuously derived and widely used in smart agriculture, smart home, intelligent transportation and other fields.Many special application scenarios, such as security monitoring, real-time road condition information collection, automatic driving, etc., require the network to further reduce the data transmission delay [2].The processing and computing power of the traditional wireless network architecture is insufficient to support the deep coverage and massive connection of the intelligent IoT.Moreover, the cloud computing platform is far from the IoT terminal, which is difficult to meet the real-time data requirements of low-latency services [3].
The proposed edge computing [4] provides a new way to solve the development bottleneck of IoT and is considered to be the key enabler of IoT.Edge computing refers to an open platform that integrates network, computing, storage and application core capabilities on the edge of the network, which nears the things or data source.It provides edge intelligence services, to meet the key needs of industry digitalization in agile connectivity, real-time business, data optimization, application intelligence, security and privacy protection, etc. Edge computing features are like human nerve endings, which can self-process simple stimuli and feedback the processed features to the cloud brain.
Smart agriculture makes the application of IoT technology in traditional agricultural production more "intelligent" by using sensors and software to control agricultural production through mobile platforms or computer platforms.In smart agriculture, establishing a real-time monitoring and prediction system for air quality (AQ) is the most basic and most important solution [5].The prediction for AQ is based on the analysis of the monitoring data.In other words, the accuracy of the monitoring data affects the accuracy of the prediction to a certain extent [6].At the same time, the Environment Agency also has specified specific values for AQ [7].Once current AQ exceeds the threshold, people should take appropriate countermeasures.However, when the prediction is inaccurate, it will lead to decision errors.In order to obtain high-precision monitoring data, many AQ monitoring schemes currently existing use high-precision sensors.However, high-precision sensors are often accompanied by higher costs.A complete system consists of multiple sets of sensors [8], so there is a trade-off between cost and accuracy.In addition, in a traditional IoT-based AQ monitoring system, the data collected by the sensing layer needs to be uploaded, analyzed and processed in the cloud computing platform at the network layer [9].However, in China, most agricultural areas are in remote locations and harsh environments limited by bandwidth and network connectivity [10].The timely uploading of monitoring data and the analysis of prediction results cannot be guaranteed, which affects the timeliness of decision-making.Therefore, edge computing with real-time computing power should be considered to solve the bottleneck of traditional cloud computing solutions in agricultural application scenarios.
In order to improve the real-time performance and reduce the cost of the traditional monitoring system, this paper combines edge computing and IoT application in smart agriculture.Under a relatively low hardware cost, the air quality monitoring and prediction system based on the Kalman Filter (KF) algorithm can greatly improve the accuracy, both in monitoring and predicting values.By flexibly arranging inexpensive sensors throughout the monitoring area, the system monitors the concentration of six air pollutants such as SO 2 , NO 2 , CO, O 3 , PM2.5 and PM10 in real time.According to the dynamic characteristics of different air pollutants, the KF algorithm constructs the short-term dynamic prediction model via 100 iterations of the initial sampling data with error.Therefore, the instantaneous prediction of pollutant concentrations is achieved, and the sensor accuracy is improved from the algorithm level on the edge side.This process of correcting the monitoring data to obtain the best predictive value is a local process, which is used as a concrete example of edge computing.Raspberry Pi (RPi) 3B is a low-power, low-cost card computer with a built-in quad-core 1.2 GHz 64 bit processor, which has very good computing and processing power.Therefore, this paper chooses RPi as the carrier of edge computing and the central node of the sensor network.After collecting the monitoring data from various sensors, RPi runs the KF algorithm to obtain the best prediction results of various air pollutant concentrations, and then uploads the monitoring data and prediction results to the cloud through the Wi-Fi module.The cloud stores data and sends feedback to client requests.Through this marginalized low-cost wireless solution, farmers and agricultural experts can collect more accurate concentrations of air pollutants, and analyze the current and subsequent AQ through the client, which provides a more real-time, accurate and scientific basis for taking appropriate control measures.

Edge Computing on IoT
The continuous decline in sensor prices and computing costs has changed the architecture of traditional IoT with more "things" to be connected to the Internet, resulting in pushing computing to the edge of the network [11,12].As more networked devices become available, edge computing [13] is an important change required to make IoT systems more efficient and scalable, which will be used in all walks of life, especially in some areas where cloud computing is inefficient.Compared to cloud computing, edge computing shows the following advantages [14].(1) A focus on real-time, short-cycle data analysis and better support big data analysis for cloud applications.(2) Real-time or faster data processing and analysis.Data processing is closer to data sources, rather than in an external data center of the cloud, so latency can be reduced.(3) Low cost.It costs less on data management solutions for local devices than on the cloud and data center networks.(4) Higher application runtime efficiency.As latency decreases, applications can run more efficiently at a faster rate.(5) Impairing the role of the cloud also reduces the likelihood of a single point of failure, and reducing the reliance on the cloud also means that some devices can run offline smoothly.
From autopilot to smart agriculture, edge computing on IoT has been used in many areas.Companies like PointGrab and Gooee partner to provide IoT enabled lighting solutions with the help of real-time edge computing [15].Brzoza-Woch et al. present a fog-enabled embedded system for environmental monitoring [16].Intel partner with AVOB to develop edge-enabled remote control and monitoring for IoT based smart energy management [17].Datta et al. propose an IoT architecture for connected vehicles and utilized fog computing as a platform for providing IoT services to connected vehicles [18].Bakheder et al. use cloudlets for big data analytics in a mobile cloud computing environment [19].
The development of smart industry requires not only the high cloud but also the ubiquitous edge computing.Regardless of the efficiency of use of IoT applications, time delays or security considerations, edge computing is the key to the popularity of the IoT.

Air Quality Monitoring System
With the rise of IoT and the combination of miniaturized sensor devices and wireless technologies, nowadays, many of the AQ monitoring solutions are based on the traditional IoT architecture to build a remote monitoring system for AQ [9].Composed of various sensor devices, the perception layer identifies various air parameters and collects the data.The network layer, composed of wireless technology, network management system and cloud computing platform, transmits and processes data information collected from the perception layer, then makes corresponding decisions according to current AQ conditions.The application layer presents relevant information back to the user.
Gómez et al. [20], designed and used an IoT-based, multi-purpose architecture for monitoring environmental variables in urban areas.In their four-tier architecture, customer service interface handles requests from clients via the HTTP protocol, when the management layer receives the data and stores it in the database.Raipure and Mehetre [21] proposed a large city pollution monitoring system based on wireless sensor network.The system uses AVR (Atmel AVR) ATmega-32 microcontroller to transfer the values from ADC (Analog to Digital Converters) to the server, and uses a Bluetooth microcontroller to build a communication channel between gas delivery to the server.In the agricultural sector, Shinde et al. [22] established an IoT-based monitoring and control system for AQ in greenhouses.Xiaojun et al. [23] proposed a system which can reduce hardware costs to the previous 1/10 by replacing monitoring devices that use traditional empirical analysis with sensor networks in IoC (Inversion of Control) technology.Kiruthika and Umamakeswari [24] used the Raspberry Pi to build an IoT-based low-cost air quality monitoring system.As shown in Figure 1, the RPi [25][26][27] just as a sensor network node, only collects monitored data and pushes data to the gateway layer.The gateway layer filters and predicts data, then uploads the results to the cloud layer through the ESP 8266 wireless module.The cloud layer analyzes the received data and responds to various requests from the client.Similarly, Jadhav et al. [28] also used the RPi as a bridge between the sensor network and the web server.The reasons that the above solutions adopt RPi are due to its low cost and card-like features.Although the existing studies have proposed a very mature and extensive AQ monitoring program, in China, most agriculture areas are in remote locations and in harsh environments [10].Data analysis and processing through the cloud computing solution at the network layer cannot meet the requirements of low latency [7].In addition, the accuracy of the sensors selected in the existing studies will be different, which will have different degrees of influence on the accuracy of the monitoring data, thus failing to guarantee the scientific nature of the prediction results.

Prediction Model
In recent years, many studies on air quality have focused on prediction air pollutant concentrations and assessing AQ in a given area.There are two main methods for constructing predictive models: traditional statistical methods and methods based on machine learning (ML) or multi-model fusion.
At the statistical method level, Lanzafam et al. [29] propose a model of AQ prediction based on Simple Moving Average (SMA).The model predicts the concentration of pollutants in the next period or periods by the average of a set of recent actual data.Because of its simple calculation method, it is very suitable for immediate prediction.Donnelly et al. [30] propose the Exponentially Weighted Moving Average model (EWMA).The principle of the model is the different weighting factors of the pollutant data in different historical periosd on the prediction process.Without considering the periodicity, the influence of the variable far from the target period is relatively low.Therefore, the prediction results based on EWMA model are smoother and closer to recent data than the SMA.The Autoregressive Integrated Moving Average (ARIMA) [31,32] model is widely used in AQ prediction.The model filters the non-stationary factors in the original sequence by using the data difference method, so that the model can obtain better prediction results.
Although the traditional statistical-based prediction model has better performance in terms of interpretability and computational cost, it is limited by the single feature expression and lacks the ability to deal with complex prediction problems such as nonlinear processes.With the development of ML, many studies have chosen to use ML methods or multi-mode fusion [33,34] to predict AQ.In the study of Rybarczyk and Zalakeviciute [35], based on the J48 decision tree algorithm, two different decision models are constructed to predict the concentration of PM2.5 in the two adjacent regions.In the study of Raipure and Mehetre [21], the ID3 algorithm in the decision tree is applied to calculate the percentage of air pollutants.The algorithm is used to predict specific areas and provide early warning information for highly polluted areas.Xiaojun et al. [23] propose a multi-input and multioutput AQ prediction model based on ANN (Artificial Neural Network).Based on the relationship between current and past 24 hours of pollutant concentration, a 24 hour prediction network is Although the existing studies have proposed a very mature and extensive AQ monitoring program, in China, most agriculture areas are in remote locations and in harsh environments [10].Data analysis and processing through the cloud computing solution at the network layer cannot meet the requirements of low latency [7].In addition, the accuracy of the sensors selected in the existing studies will be different, which will have different degrees of influence on the accuracy of the monitoring data, thus failing to guarantee the scientific nature of the prediction results.

Prediction Model
In recent years, many studies on air quality have focused on prediction air pollutant concentrations and assessing AQ in a given area.There are two main methods for constructing predictive models: traditional statistical methods and methods based on machine learning (ML) or multi-model fusion.
At the statistical method level, Lanzafam et al. [29] propose a model of AQ prediction based on Simple Moving Average (SMA).The model predicts the concentration of pollutants in the next period or periods by the average of a set of recent actual data.Because of its simple calculation method, it is very suitable for immediate prediction.Donnelly et al. [30] propose the Exponentially Weighted Moving Average model (EWMA).The principle of the model is the different weighting factors of the pollutant data in different historical periosd on the prediction process.Without considering the periodicity, the influence of the variable far from the target period is relatively low.Therefore, the prediction results based on EWMA model are smoother and closer to recent data than the SMA.The Autoregressive Integrated Moving Average (ARIMA) [31,32] model is widely used in AQ prediction.The model filters the non-stationary factors in the original sequence by using the data difference method, so that the model can obtain better prediction results.
Although the traditional statistical-based prediction model has better performance in terms of interpretability and computational cost, it is limited by the single feature expression and lacks the ability to deal with complex prediction problems such as nonlinear processes.With the development of ML, many studies have chosen to use ML methods or multi-mode fusion [33,34] to predict AQ.In the study of Rybarczyk and Zalakeviciute [35], based on the J48 decision tree algorithm, two different decision models are constructed to predict the concentration of PM2.5 in the two adjacent regions.In the study of Raipure and Mehetre [21], the ID3 algorithm in the decision tree is applied to calculate the percentage of air pollutants.The algorithm is used to predict specific areas and provide early warning information for highly polluted areas.Xiaojun et al. [23] propose a multi-input and multi-output AQ prediction model based on ANN (Artificial Neural Network).Based on the relationship between current and past 24 h of pollutant concentration, a 24 h prediction network is established.To predict ozone concentrations, Sousa uses multiple linear regression and ANN based on principal components [36].Feng Xiao and Li Qi [37] propose ANN to predict the average daily concentration of PM2.5 two days in advance based on AQ trajectory analysis and wavelet transform.

The Proposed System Architecture
The hardware part of the system is mainly composed of Raspberry Pi, the sensor network and the Wi-Fi module.The software part is mainly the cloud data storage and client system.Composed of several types of sensors, the sensor network realizes real-time monitoring of the concentration of air pollutants such as SO 2 , NO 2 , CO, O 3 , PM2.5, and PM10.After the periodic sampling is completed, the sensor network sends the various pollutant concentration data to the RPi.As the carrier of edge computing, RPi runs the Kalman Filter algorithm after receiving the data.Then, after several iterations and updates in a very short time, the predicted values of various pollutant concentrations at the next moment are obtained.After completing the relevant prediction work, RPi uploads the data to the cloud through the Wi-Fi module.The cloud stores the data, communicates with the client, and presents the user with information, such as the current air quality status and the trend of each pollutant concentration.

Raspberry Pi
Raspberry Pi 3 Model B: The Raspberry Pi (RPi) 3B [38] is a portable and powerful SBC (Single Board Computer), meaning that it runs a full operating system and has sufficient peripherals (memory, CPU, power regulation) to start execution without the addition of hardware.It has been proved to be an immediate access due to the low price of $35.By adding an SD storage, it is possible to quickly have a fully working computer running Raspbian, a Debian-based Linux operating system, which is free and optimized for the RPi hardware.The RPi has a built-in quad-core 1.2 GHz 64 bit CPU, which gives it very good computing and processing power.Therefore, in many of IoT applications, RPi has been deployed as an edge node, see for example [39][40][41][42].
Based on the processing power and computing characteristics of the RPi, this paper also uses the RPi as the edge computing device and the central node of the sensor network (as shown in Figure 2).The RPi periodically reads parameter data from various sensors based on the corresponding connection pins.After obtaining the monitoring data, the background process is awakened from the sleep state, actively runs the Kalman Filter algorithm, and calls the monitoring data to iteratively update it (see Section 4 for details).When the calculation is completed, the predicted data of various air pollutant concentrations at the next moment can be obtained.At this point, the RPi will upload the prediction results and monitoring data to the cloud through the Wi-Fi module, and the cloud stores the data.

Sensors
(1) ZH03A Laser Dust Sensor The ZH03A laser dust sensor (Zhengzhou Winsen Electronics Technology Co., Ltd., Zhengzhou, China), with a minimum resolution particle diameter of 1.0 micron, is a versatile, miniaturized module that uses the principle of laser light scattering to detect dust particles present in the air.By designing different channels to distinguish the size of the particles, the PM2.5 and PM10 concentration values can be obtained separately.Besides, ZH03A has good consistency, stability, real-time response, and also provides a rich interface with digital output, PWM (Pulse Width Modulation) output and analog output.The 24 bit data packet is sent to the RPi by UART (Universal Asynchronous Receiver/Transmitter) transmission, and the PM2.5 and PM10 concentrations are obtained by reading the value of the specific bit.The price of ZH03A is around $11. established.To predict ozone concentrations, Sousa uses multiple linear regression and ANN based on principal components [36].Feng Xiao and Li Qi [37] propose ANN to predict the average daily concentration of PM2.5 two days in advance based on AQ trajectory analysis and wavelet transform.The hardware part of the system is mainly composed of Raspberry Pi, the sensor network and the Wi-Fi module.The software part is mainly the cloud data storage and client system.Composed of several types of sensors, the sensor network realizes real-time monitoring of the concentration of air pollutants such as SO2, NO2, CO, O3, PM2.5, and PM10.After the periodic sampling is completed, the sensor network sends the various pollutant concentration data to the RPi.As the carrier of edge computing, RPi runs the Kalman Filter algorithm after receiving the data.Then, after several iterations and updates in a very short time, the predicted values of various pollutant concentrations at the next moment are obtained.After completing the relevant prediction work, RPi uploads the data to the cloud through the Wi-Fi module.The cloud stores the data, communicates with the client, and presents the user with information, such as the current air quality status and the trend of each pollutant concentration.

Raspberry Pi
Raspberry Pi 3 Model B: The Raspberry Pi (RPi) 3B [38] is a portable and powerful SBC (Single Board Computer), meaning that it runs a full operating system and has sufficient peripherals (memory, CPU, power regulation) to start execution without the addition of hardware.It has been proved to be an immediate access due to the low price of $35.By adding an SD storage, it is possible to quickly have a fully working computer running Raspbian, a Debian-based Linux operating system, which is free and optimized for the RPi hardware.The RPi has a built-in quad-core 1.2 GHz 64 bit CPU, which gives it very good computing and processing power.Therefore, in many of IoT applications, RPi has been deployed as an edge node, see for example [39][40][41][42].
Based on the processing power and computing characteristics of the RPi, this paper also uses the RPi as the edge computing device and the central node of the sensor network (as shown in Figure 2).The RPi periodically reads parameter data from various sensors based on the corresponding connection pins.After obtaining the monitoring data, the background process is awakened from the sleep state, actively runs the Kalman Filter algorithm, and calls the monitoring data to iteratively update it (see Section 4 for details).When the calculation is completed, the predicted data of various (2) SGA-700 Intelligent Gas Sensor The SGA-700 series of intelligent gas sensor (Shenzhen Singoan Electronic Technology Co., Ltd., Shenzhen, China) modules carry out signal amplification, data processing, temperature and humidity compensation, and it has the benefits of a smaller size, lower price and more stable performance.SGA-700 can directly output voltage signals such as 0.4-2 V, 0-1.6 V, 0-4 V, 0-5 V, and serial port signals are reserved.The standard signal after processing can be directly collected and uploaded to the control host RPi.The price of the SGA-700 series gas sensor is less than $3.SGA-700-CO, SGA-700-NO2, SGA-700-SO2 and SGA-700-O3 are used to measure CO, NO 2 , SO 2 and O 3 concentrations, respectively.
(3) ESP8266 Wireless Sensor The ESP8266 (ESPRESSIF SYSTEMS (SHANGHAI) Co., Ltd., Shanghai, China) is a low-power, low-cost, highly integrated Wi-Fi microchip with a full TCP/IP stack, which adds Wi-Fi capabilities to the RPi via a UART serial connection.When the RPi is networking, it can send the predicted results of each pollutant and the data monitored by the sensor to the cloud.
At present, the price of the existing AQ monitoring system ranges from $750 to $3000, while the total cost of our proposed solution is only around $75, which has been reduced by nearly one-tenth, and each module is easily accessible.

Kalman Filter Algorithm
The traditional statistical method-based prediction model [43] has the advantage of high interpretability and low computational cost.Its prediction principle is based on linearly fitting historical data.Therefore, the predicted result can have higher precision when the trend of change is not severe.However, it is no longer applicable when the concentration of various air pollutants is not a stable sequence.For example, the concentration of pollutants, such as PM2.5, PM10, and SO, will suddenly increase due to the increase in vehicle exhaust emissions when traffic is at a peak in the morning and evening [44].Therefore, models such as SMA, EWMA, and ARIMA [29][30][31][32] cannot effectively predict these change points, but these data have higher predictive value (directly corresponding to measures).
In the application scenario of AQ prediction based on IoT devices, the prediction model is limited by the storage capacity and computing power of the device itself.In order to achieve higher prediction accuracy, many ML models not only require a large amount of historical data for training (storage problem), but also have a high time complexity throughout the training process (calculation problem) [45].For example, the decision tree model [21,35] can improve the prediction accuracy via the increase in the number of training samples and the depth of the tree, however, the time complexity O(N × M × D) is also multiplied (N is the number of samples, M is the vector dimension, and D is the tree depth).The ANN [23,36,37] also has the similar problem.Therefore, although the ML model has higher prediction accuracy, it cannot meet the requirements of the edge computing node for the lightweight model.
In order to solve the shortcomings of the above models, and in the case of ensuring the prediction accuracy, the space complexity (storage) and time complexity (calculation) of the model are simplified as much as possible, making it adaptable to the application scenario of edge computing.This paper proposes an AQ prediction model based on the Kalman Filter (KF) algorithm.
The KF is an efficient autoregressive filter model (recursive filter model) [46].It occupies very little memory and only needs to retain data for one state on the system, rather than a long span of historical data.The actual measured data are used to correct the prediction results, which can reflect the objective results in the most realistic way.The operation speed of the KF is very fast, so it is very suitable for solving real-time problems and applying to the edge computing of IoT.
The core idea of the KF is to use a set of state-space expression equations to represent a dynamic system, and to predict the system state xk|k−1 (called prior estimate) at the next moment k according to the optimal estimate (prediction) xk−1 of the system state at time k − 1.At the same time, the system state at time k is observed, and the observed value z k is obtained.Due to the observation error, z k and xk|k−1 deviate from the truly accurate system state, so the predicted value xk|k−1 needs to be corrected by the observed value z k .Then the optimal estimate (prediction) xk of the system state at time k is obtained.
The KF algorithm is different from the general timing prediction method [47].Firstly, there is no need to assume that the error term satisfies the normal distribution.Secondly, it can estimate the system state based on a set of incomplete observations (some time points missing in time series data) or that contain noise (measurement error).Furthermore, compared with the model based on single observations, the KF considers the joint distribution of observations according to time series data at different times and estimates the unknown factors that may affect the system.Therefore, the prediction of the KF will be more accurate.
In summary, the KF algorithm has the following characteristics: 1.
The object of the KF algorithm research is a stochastic process, with sequential data.

2.
The goal of filtering is to predict all random processes even with useless noise.

3.
Differing from the least squares method, the white noise existing in the dynamic system or the observation error existing in the observation data does not need to be filtered.The statistical characteristics of this noise information will be used by the model in the prediction process.4.
The KF algorithm uses a recursive algorithm, and spatial state representation equations are used to construct time-domain filters for prediction of multidimensional random variables (the predicted system state consists of multiple features).

5.
Compared to the ARIMA model, the time series data used for prediction can be smooth or not.6.
The prediction process only considers the process noise, the noise generated by the observation method and the statistical characteristics of the system at the current time point.Besides, the model calculation is small, which is very suitable for real-time prediction.
Based on the characteristics of the Kalman Filter algorithm, this paper constructs the KF algorithm on the edge device (Raspberry Pi) to predict the concentration of various air pollutants in real time.Although there are certain measurement errors in low-cost sensors and processing noises in the model, the KF can improve the accuracy of sensors from the algorithm level by combining the errored observation data.Moreover, the predicted value of the next moment can be obtained from the data of the previous moment, so that the system has predictability for various pollutants and improves the decision-making efficiency.As shown in Figure 3, the KF is used in the edge computing environment of this paper.

Basic Dynamic System Model
The mathematical basis of the Kalman Filter algorithm is the linear algebra and Hidden Markov model.We can use the following equation to describe a basic dynamic system: It means that each  (the signal values) may be evaluated by using a linear stochastic equation.Any  is a linear combination of its previous value  plus a control signal  and a process noise  .And most of the time, there is no control signal  , which is a certain external factor that affects the system. is a state transition matrix acting on  , and  is a control input matrix acting on  . is the process noise at time , that is, the influence of external uncertainty factors on the system, and we assume that its statistical characteristics are in accordance with the mean normal value of 0, and the covariance matrix is a multivariate normal distribution of  , which satisfies: At the same time, at time , the observed value  of the sensor to the real state  of the system satisfies the following equation: The equation tells that any measurement value  (which we are unsure of its accuracy) is a linear combination of the signal value  and the measurement noise  . is the observation transfer matrix, which maps the real space  of the dynamic system into the observation space. is the measurement noise, and it conforms to the multivariate normal distribution with a mean of 0 and the covariance matrix of  , which satisfies: The process noise  and measurement noise  are statistically independent.
The basic structure of the KF algorithm can be obtained from the above equations, as shown in Figure 4.The circle represents the vector, the square represents the matrix, the asterisk represents the Gaussian noise, and the dotted square in the lower right corner of the asterisk represents the covariance matrix corresponding to the noise.

Basic Dynamic System Model
The mathematical basis of the Kalman Filter algorithm is the linear algebra and Hidden Markov model.We can use the following equation to describe a basic dynamic system: It means that each x k (the signal values) may be evaluated by using a linear stochastic equation.Any x k is a linear combination of its previous value x k−1 plus a control signal u k and a process noise w k .And most of the time, there is no control signal u k , which is a certain external factor that affects the system.F k is a state transition matrix acting on x k−1 , and B k is a control input matrix acting on u k .w k is the process noise at time k, that is, the influence of external uncertainty factors on the system, and we assume that its statistical characteristics are in accordance with the mean normal value of 0, and the covariance matrix is a multivariate normal distribution of Q k , which satisfies: At the same time, at time k, the observed value z k of the sensor to the real state x k of the system satisfies the following equation: The equation tells that any measurement value z k (which we are unsure of its accuracy) is a linear combination of the signal value x k and the measurement noise v k .H k is the observation transfer matrix, which maps the real space x k of the dynamic system into the observation space.v k is the measurement noise, and it conforms to the multivariate normal distribution with a mean of 0 and the covariance matrix of R k , which satisfies: The process noise w k and measurement noise v k are statistically independent.
The basic structure of the KF algorithm can be obtained from the above equations, as shown in Figure 4.The circle represents the vector, the square represents the matrix, the asterisk represents the Gaussian noise, and the dotted square in the lower right corner of the asterisk represents the covariance matrix corresponding to the noise.

Kalman Filter Algorithm Implementation
The Kalman Filter algorithm is an autoregressive filtering model.Therefore, the optimal estimate of the system state at the current time, can be obtained by the optimal estimate of the system state at the previous moment and the observation of the system state at the current time.
Firstly, the state of the KF is represented by the following two variables: •  | represents an estimate of the system state at time ; •  | represents the covariance matrix of the state estimation error at time , which measures the accuracy of the estimation.
The KF estimates a process by using a form of feedback control: the filter estimates the process state at some time and then obtains feedback in the form of (noisy) measurements.As such, the equations for the KF fall into two groups: time update equations and measurement update equations [48].The time update equations project forward (in time) the current state and error covariance estimates to obtain the a priori estimates for the next time step.The measurement update equations are responsible for the feedback -i.e. for incorporating a new measurement into the a priori estimate to obtain an improved a posteriori estimate.
The time update equations can also be thought of as predictor equations, while the measurements update equations can be thought of as corrector equations.Indeed, the final estimation algorithm resembles that of a predictor-corrector algorithm for solving numerical problems as shown in Figure 5.

Kalman Filter Algorithm Implementation
The Kalman Filter algorithm is an autoregressive filtering model.Therefore, the optimal estimate of the system state at the current time, can be obtained by the optimal estimate of the system state at the previous moment and the observation of the system state at the current time.
Firstly, the state of the KF is represented by the following two variables: • xk|k represents an estimate of the system state at time k; • P k|k represents the covariance matrix of the state estimation error at time k, which measures the accuracy of the estimation.
The KF estimates a process by using a form of feedback control: the filter estimates the process state at some time and then obtains feedback in the form of (noisy) measurements.As such, the equations for the KF fall into two groups: time update equations and measurement update equations [48].The time update equations project forward (in time) the current state and error covariance estimates to obtain the a priori estimates for the next time step.The measurement update equations are responsible for the feedback-i.e., for incorporating a new measurement into the a priori estimate to obtain an improved a posteriori estimate.
The time update equations can also be thought of as predictor equations, while the measurements update equations can be thought of as corrector equations.Indeed, the final estimation algorithm resembles that of a predictor-corrector algorithm for solving numerical problems as shown in Figure 5.

Initial estimates at
The outputs at will be the input for (1 ) Iterative process of prediction and correction phase of the Kalman Filter.

Prediction
The first step of the algorithm is prediction, also known as time update.The prior estimate and covariance matrix of a prior estimate error in the current period is obtained according to the optimal estimation and covariance matrix of the estimated error of system state at the last moment, expressed

Prediction
The first step of the algorithm is prediction, also known as time update.The prior estimate and covariance matrix of a prior estimate error in the current period is obtained according to the optimal estimation and covariance matrix of the estimated error of system state at the last moment, expressed by the following equations: where F k , B k and Q k are the state transition matrix, the control input matrix, and the covariance matrix of the process noise, respectively.Since the a priori estimate xk|k−1 is not the optimal estimate at time k, it is necessary to correct xk|k−1 in combination with the observation of the sensors.

Correction
In the correction stage of the algorithm (measurement update), the following three values are first calculated: In Equation ( 6), H k is the observation transfer matrix, which maps the real space x k of the dynamic system into the observation space.R k is the covariance matrix of the observed noise, Φ k is the covariance matrix of the observed margin, and K k is Kalman gain.z k is the observation at time k, and z k represents the observation margin, which is the difference between the actual observation and the observation obtained by the a priori estimation.The three values obtained by the above calculation are used to update the filter variables xk|k−1 and P k|k−1 to obtain the optimal estimate xk|k and the covariance matrix P k|k of the estimation error of the system state at time k.

Setting Parameters
In the Kalman filter-based air quality prediction model proposed in this paper, it is assumed that all kinds of pollutants do not change state every hour and there are no control variables, so in Equation ( 5), F k can be set as an identity matrix and the value of B k u k is zero.In Equation ( 6), R k is the covariance matrix of the measurement error of sensors, and z k is an observation matrix consisting of observations of various pollutant concentrations by 100 samples in first 10 min/h (100 samples are required because, after several experiments, the algorithm can achieve optimal convergence and steady state after 100 iterations).
In Equation ( 6), the process noise error Q k is usually difficult to predict as we typically do not have the ability to directly observe the process we are estimating.Statistically speaking, excellent filter performance can be obtained by tuning the filter parameter Q k .The tuning is usually performed off-line, frequently with the help of another (distinct) Kalman Filter in a process generally referred to as system identification.Then, under the condition that Q k is actually constant, both the estimation error covariance P k|k and the Kalman gain K k will stabilize quickly and then remain constant [49].In this paper, assuming that Q k is a constant value Q, then we can determine the value of Q for each pollutant.The smaller the Q value, the higher the trust in the prediction model.If Q is 0, it means that only the prediction model is trusted.And as Q decreases, the system will converge more easily, but when Q is reduced to a certain extent, continuing to decrease may cause the system to start diverging.Conversely, a larger Q value indicates a lower degree of trust in the predictive model, and accordingly, the degree of trust in the measured value is increased.If Q tends to infinity, it means that only the measured value is trusted.
This paper uses the grid search method to tune the Q, the specific process is as follows.
(1) Select data.The Panyu Middle School (PMS) in Guangzhou, Guangdong Province, China, is an official monitoring site, so we assume that the data published on this site is true and accurate [50].we select the air pollutant concentration data released by the monitoring site from 0:00 on 12 February 2019 to 23:00 on 15 February 2019 as the actual value (a total of 96 data points in four days).Each data point in the data set corresponds to the concentration value of each type of pollutant at each hour.At the same time and place, by using the sensor network of our system to monitor, the observation data of various pollutant concentrations during this period is obtained, which is used as the measured value (similarly, a total of 96 data points in four days).
(2) Define search interval.When Q = 0, since the proportion of the observations is very small, the value of the posterior estimate x k is basically not updated during the iterative process, and the trend tends to be gentle.Taking the Kalman Filter to predict the CO concentration as an example.Figure 6a shows the iterative convergence process at Q = 0 (a total of 96 iterations).In the figure, the blue line is the output of the KF algorithm in each iteration (a posterior estimate x k ), the plus sign is the observed value with errors by the sensors, and the green line is the data of the monitoring site, indicating the true value.When Q = 1, the model trusts the observations at this time, and the update of the posterior estimate x k is more affected by the observations.Therefore, the trend will basically coincide with the measured values, as shown in Figure 6b.(3) Calculate the predicted value. performs 1000 searches in the range of (0,1).In each search process, for each , the KF algorithm performs 96 iterative updates in conjunction with 96 measurements.Finally, all the predicted values of a posteriori estimate  can be obtained, from the initial value to the convergence process, i.e.: (4) Calculate RMSE.(3) Calculate the predicted value.
Q performs 1000 searches in the range of (0, 1).In each search process, for each Q, the KF algorithm performs 96 iterative updates in conjunction with 96 measurements.Finally, all the predicted values of a posteriori estimate x k can be obtained, from the initial value to the convergence process, i.e.,: (4) Calculate RMSE.
In the case where Q takes a different value Q n , the RMSE (root-mean-square error) between each predicted value x k and the corresponding true value is calculated: where x is the true value, i.e., the data of the monitoring site, and x is the predicted value calculated by the KF.Then for each Q value Q n , a corresponding RMSE(x, x) Q n is output.
In step (4), since x contains the predicted value in the non-converged state of the system, the RMSE value is relatively large during the parameter tuning process.In addition, for each type of pollutant, the original RMSE has different ranges, since the absolute value calculated from the predicted values and true values is different.Therefore, the RMSE(pollutant), where the pollutant represents the SO 2 , NO 2 , CO, O 3 , PM2.5, PM10, respectively, is normalized by the commonly used min-max normalization method to have the same range.After normalization, each RMSE(pollutant) range is between (0,1), resulting in intuitive comparison.Figure 7 reflects the trend of normalized RMSE of six types of air pollutants during the process of Q search.The smaller the value of RMSE, the less unconverted predicted value will be reflected in the side, which indicates that the faster the model converges and the better the fitting effect.At this time, the value of Q is more favorable to predicting the pollutants (the trend of different pollutants is different, so the optimal value of Q for each type of pollutant will be different.According to the model built in this paper, the more stable the change, the smaller the optimal value of Q will be).the less unconverted predicted value will be reflected in the side, which indicates that the faster the model converges and the better the fitting effect.At this time, the value of  is more favorable to predicting the pollutants (the trend of different pollutants is different, so the optimal value of  for each type of pollutant will be different.According to the model built in this paper, the more stable the change, the smaller the optimal value of  will be).In the tuning process, when Q is taken to Q best , the iterative convergence process for each type of pollutant concentration that uses the KF algorithm for prediction is shown in Figure 8. (6) Best  From Figure 7, we can get the optimal value of the process error  as follows: ( ,  , ,  , 2.5, 10) = 0.467, 0.273, 0.089, 0.572, 0.151, 0.133 (10) In the tuning process, when  is taken to  , the iterative convergence process for each type of pollutant concentration that uses the KF algorithm for prediction is shown in Figure 8.
It can be seen from the figure that when  =  , the KF algorithm can converge after about 60 iterations, and the predicted result after convergence can more accurately reflect the change of the true value.(7) Test .
Taking the predicted CO concentration as an example, it is tested whether the optimal prediction effect can be obtained when the optimal value of  is taken.From step (6), we can have () = 0.089 and use the KF algorithm to predict the concentration of CO at 0:00 on 16 February 2019.The It can be seen from the figure that when Q = Q best , the KF algorithm can converge after about 60 iterations, and the predicted result after convergence can more accurately reflect the change of the true value.
Taking the predicted CO concentration as an example, it is tested whether the optimal prediction effect can be obtained when the optimal value of Q is taken.From step (6), we can have Q(CO) = 0.089 and use the KF algorithm to predict the concentration of CO at 0:00 on 16 February 2019.The convergence process of the algorithm is shown in Figure 9.The plus sign is the observation value obtained by sensors sampling 100 times for CO concentration within 10 min from 23:00 to 23:10 on the 15 February 2019.The blue line is the process of the KF algorithm combined with error observations for 100 iterations.In this test, the last value after convergence of the blue line (model) will be used as the predicted value at 0:00 on the 16 February 2019.The green line is the concentration of CO at 0:00 on the 16 February 2019 of the monitoring site, which will be the target of this prediction.As can be seen from the figure, the model converges quickly when Q is constant and optimal.Not only the error of the observation is corrected, but also the predicted result (the last point on the blue line) is very close to the true value after the model converges.According to equation (7), with the iteration of the KF, the value of the error covariance matrix  will constantly change.When the system enters a steady state, the value of  converges to a minimum estimated variance matrix.The Kalman gain  at this time is also optimal.Therefore, in the process of prediction and correcting CO concentration by the KF algorithm, we can also check the convergence of  to judge whether the system has entered the steady state.As shown in Figure 10,  basically stabilizes when iterating about 50 times, indicating that the model has converged at this time.
Figure 10.The convergence process of the error covariance matrix  when Kalman Filter algorithm predicts the CO concentration at 0:00 on 16 February 2019 with () = 0.089.

Accuracy Improvement Analysis
One of the differences between the Kalman Filter algorithm and other time series prediction models is that the statistical information of process noise and observation error can be effectively According to Equation (7), with the iteration of the KF, the value of the error covariance matrix P k will constantly change.When the system enters a steady state, the value of P k converges to a minimum estimated variance matrix.The Kalman gain K k at this time is also optimal.Therefore, in the process of prediction and correcting CO concentration by the KF algorithm, we can also check the convergence of P k to judge whether the system has entered the steady state.As shown in Figure 10, P k basically stabilizes when iterating about 50 times, indicating that the model has converged at this time.According to equation (7), with the iteration of the KF, the value of the error covariance matrix  will constantly change.When the system enters a steady state, the value of  converges to a minimum estimated variance matrix.The Kalman gain  at this time is also optimal.Therefore, in the process of prediction and correcting CO concentration by the KF algorithm, we can also check the convergence of  to judge whether the system has entered the steady state.As shown in Figure 10,  basically stabilizes when iterating about 50 times, indicating that the model has converged at this time.
Figure 10.The convergence process of the error covariance matrix  when Kalman Filter algorithm predicts the CO concentration at 0:00 on 16 February 2019 with () = 0.089.

Accuracy Improvement Analysis
One of the differences between the Kalman Filter algorithm and other time series prediction models is that the statistical information of process noise and observation error can be effectively

Accuracy Improvement Analysis
One of the differences between the Kalman Filter algorithm and other time series prediction models is that the statistical information of process noise and observation error can be effectively utilized in the prediction process, which can correct the a priori estimation of the model.Therefore, from the predictive perspective, the KF uses observation data obtained by the sensor network to improve prediction accuracy.From the monitoring perspective, even with low-cost and low-precision monitor sensors, the KF can correct the error of the measurement data from the algorithm level, which improves the accuracy of sensors on the side.
In order to verify that the KF algorithm can effectively improve the accuracy of sensor observations, this paper designs a comparing experiment to obtain the error size.The experiment compares the predicted value of the KF and the observed value of the sensor with the data of the official monitoring site, respectively.The main indicators for measuring errors includes MSE (mean-square error), RMSE and MAE (mean-absolute error).The experimental results are shown in Table 1.For a more intuitive explanation about improving the specific accuracy by the KF, the percentage decline of MSE, RMSE and MAE are calculated based on Table 1, respectively.The results are shown in Table 2. Based on the data in Table 2, the error of prediction by the KF is 27% lower than that of monitored by sensors on average, which indicates that the KF algorithm can improve the accuracy of sensors to some extent.

Predictive Ability Analysis
Although Section 5.1 proves that the KF algorithm can improve the accuracy of the sensors, it does not explain the advantages of the KF in predicting performance.Since most ML-based prediction models are limited by the storage power (requiring a large amount of historical data for training to obtain better prediction results) and computing power (model training and prediction processes are computationally intensive tasks, requiring high computational components like CPU) of edge computing nodes, they are not suitable for use in IoT scenarios [45].Therefore, this paper designs the comparison experiment between KF algorithm and three other commonly used time series prediction algorithms that are based on statistical methods, including SMA (Simple Moving Average), EWMA (Exponentially Weighted Moving Average) and ARIMA (Autoregressive Integrated Moving Average).
SMA, EWMA, and ARIMA need to use historical data for training.In order to have a comparative judgment criterion, this paper selects the data of various air pollutant concentrations per hour from 15 to 16 February of 2019, in PMS, which is used as the training set for model and as the comparison data for the algorithm prediction results.
(1) Kalman Filter Prediction.First of all, determine the hyperparameters of the KF algorithm as described in Section 4.2.3.Then, in the actual prediction process, the observation data used by the KF in the correction state is the hourly monitoring data of the air pollutant concentration in PMS (from 0:00 to 23:00 on 16 February 2019).The specific process of each prediction is as follows: (1) during each prediction period, the sensors of the system sample 100 times of the concentration of each air pollutant in the first ten minutes; (2) the KF algorithm is iteratively updated based on the sampling data; (3) the results of the iterative convergence are used as the predicted values of each pollutant concentration in this period.
(2) SMA Prediction SMA is a method for predicting the average value for a certain period in the future.The method calculates the arithmetic mean of several historical data in the past and uses the arithmetic mean as the predicted value for the later period [51].SMA can be expressed as: where F t is the predicted value for the next period, and n is the number of periods of the moving average, generally between 3 and 200.A t−1 is the actual value of the previous period, and A t−2 , A t−3 , A t−n are the actual values of the first two periods, the first three periods and the first n periods, respectively.The number of n is determined by the experimental results of multiple cross-validation [52] on the training set before the SMA prediction.It is found that the prediction effect is best when n = 3.Therefore, in the actual prediction process, the model input data of the first period (predicted value at 0:00 on the 16 February 2019) is the three-hour data from the monitoring site, which is from 21:00 to 23:00 on 15 February 2019.The model input data of the second period (predicted value at 1:00 on the 16 February 2019) is the data from 22:00 to 23:00 on 15 February 2019 and 0:00 on 16 February 2019.By analogy, the predicted values of 24 h on the 16 February 2019 are obtained.
(3) EWMA Prediction EWMA is an improvement to SMA and is a common sequence processing method [53].This applies a non-uniform weighting to time series data so that a lot of data can be used, but recent data is weighted more heavily.As the name suggests, weights are based upon the exponential function.Formulated as follows: where EWMA t is the estimated value at time t, Y t is the observation at time t, n is the number of observations to be monitored including EWMA 0 , and 0 < λ ≤ 1 is a constant that determines the depth of memory of the EWMA.The parameter λ determines the rate at which "older" data enter into the calculation of the EWMA statistic.A large value of λ (closer to 1) gives more weight to recent data and less weight to older data, while a small value of λ (closer to 0) gives more weight to older data.In this paper, the value of λ is 0.4, and the model input of each period is the same as the SMA.
The ARIMA model, also known as the differential autoregressive moving average model, transforms nonstationary time series into stationary time series, learning from historical data to patterns that change over time [54].After learning, this rule is used to predict the future.The ARIMA model can be written as ARIMA(p, d, q) and is an extension of the ARMA(p, q) model.In ARIMA(p, d, q), parameters p, d and q are non-negative integers, p is the order (number of time lags) of the autoregressive model, d is the degree of differencing when the time series becomes stationary, and q is the order of the moving-average model.ARIMA(p, d, q) can be expressed as: where L is a lag operator.Since the parameters of each difference in the model need to be determined before the ARIMA prediction, this paper selects the 70 h data of the monitoring site before 0:00 on 16 February 2019 as the training set for the ARIMA model.The specific training process is: (1) determine the minimum difference order d that transforms the original data into a stationary sequence S; (2) calculate the autocorrelation and partial autocorrelation coefficients of sequence S, to determine the values of p and q; (3) estimate the parameters of each difference after completing the ARIMA model.
In order to compare the prediction accuracy of KF, SMA, EWMA, and ARIMA more intuitively, the verification data set selected is the concentration of various air pollutants collected by the monitoring site on 16 February 2019, in PMS.Three evaluation indicators, MSE, RMSE, and MAE are also chosen to measure the performance of each model.As shown in the following table : As illustrated in Table 3, the MSE, RMSE, and MAE of the KF in predicting the concentration of each pollutant are the lowest, compared with the other three algorithms.In detail, the average RMSE is reduced by 68.3%.That is to say, the KF algorithm shows the smallest prediction error as a whole, and its prediction performance is the best.

Predictive Trend Comparison
This section presents the predictive trend of four algorithms KF, SMA, EWMA and ARIMA via comparing the predicted values with the published values by the official monitoring site.As shown in Figure 11, the predicted values of four algorithms and real values at different moments reveal whether the four algorithms can accurately reflect the characteristics of the data.It is a 24 h prediction trend for the concentration of six pollutants SO 2 , NO 2 , CO, O 3 , PM2.5 and PM10 on 16 February 2019, in PMS.The black solid line represents the data collected in PMS, as a benchmark for comparison.The dashed line in four different colors is the predicted value according to four different algorithms, including SMA, EWMA, ARIMA and KF.As illustrated in Figure 11, compared with the data of the official monitoring site, prediction results of the KF for six kinds of air pollutant concentrations are the closest to the real data.The KF shows the best prediction effect reflecting the characteristics of real data at different times.
In detail, the blue dotted line reflects the trend of SMA algorithm for various pollutants.On the whole, SMA is only effective when dealing with horizontal historical data, such as the time from 2:00 to 5:00 in Figure 11c and the time from 2:00 to 23:00 in Figure 11e,f.However, data with trend or step characteristics, such as Figure 11b,d, the moving average does not always reflect its trend well, showing obvious hysteresis.Since it is an average value, the predicted value always stays at the past level, and it cannot be expected to cause higher or lower fluctuations in the future.However, fluctuations in the concentration of various air pollutants are not always horizontal, so the results of SMA predictions will produce very large deviations.
The orange dotted line reflects the predicted trend of the EWMA algorithm among various pollutants.As can be seen from the figure, the EWMA algorithm is very effective for the processing of horizontal historical data.Although the prediction of trend or step data has been greatly improved compared with SMA, there is still significant hysteresis.The results of each pollutant concentration prediction are still far from the data released by the official monitoring site.
The green dotted line reflects the predicted trend of the ARIMA algorithm for various pollutants.Overall, compared with SMA and EWMA, the ARIMA can better reflect the trend of data, whether for horizontal or step data.However, ARIMA requires that the time series data be stable or to be stable after being differentiated, so in a short period of time, ARIMA can get a good prediction effect.When the data suddenly fluctuates, as illustrated in Figure 11a, the concentration of SO 2 suddenly rises at 6:00 and 8:00, ARIMA still shows obvious hysteresis, resulting in large errors in the prediction results.
The red dotted line reflects the prediction trend of the KF for various pollutants.It can be seen from the figures that even though the system uses inexpensive sensors with low precision, the correction of the KF algorithm can make the prediction results of various pollutant concentrations very close to the data of the official monitoring site.The KF can well reflect the trend of the data both in step-type data (Figure 11b) and horizontal data (Figure 11e), showing no obvious hysteresis or very large volatility.
Based on the above analysis and comparison results, the KF is very suitable for the air quality monitoring and prediction system proposed in this paper.After the hyperparameter is determined, the KF does not need to train the historical data.Therefore, the calculation amount of the model is so small during each iteration and update process that it can be completed quickly on the RPi.By correcting the model by the real-time monitoring data of the sensors, the trend of various pollutant concentrations can be accurately reflected, and the problem of low sensor accuracy is solved from the algorithm level.

Client Interface Design
After the KF algorithm completes the prediction of the concentration of the pollutants at the next moment, RPi will send the prediction results and monitoring data to the cloud, which stores the data in the database and feedbacks results to the client.Users can view the current air quality and the latest 24-h AQI (air quality index) trend through the browser's access, as shown in Figure 12.

Conclusions
Based on the application of edge computing and IoT in smart agriculture, this paper establishes a low-cost air quality monitoring and prediction system via Raspberry Pi, which is an edge device to run the Kalman Filter algorithm.Compared to the traditional air quality monitoring and prediction system that processes in the cloud, this paper puts the machine learning algorithm on the edge, which

Conclusions
Based on the application of edge computing and IoT in smart agriculture, this paper establishes a low-cost air quality monitoring and prediction system via Raspberry Pi, which is an edge device to run the Kalman Filter algorithm.Compared to the traditional air quality monitoring and prediction system that processes in the cloud, this paper puts the machine learning algorithm on the edge, which can avoid the problem of data transmission delay due to bandwidth and network connection limitations in the agricultural environment, and improves the real-time decision-making.By running the KF algorithm on the RPi, which has strong computing power and is used as the edge device, the immediate predictions of six type of air pollutants such as SO 2 , NO 2 , CO, O 3 , PM2.5 and PM10 are realized.Compared with the other three algorithms SMA, EWMA and ARIMA, it can be seen that even with low-accuracy sensors, the error of the prediction results based on the KF is the smallest.RMSE is also reduced by an average of 68.3%.In addition, compared with the observation data of sensors, the accuracy of the predicted value by the KF algorithm is improved by 27%, which improves the accuracy of sensors from the aspect of the algorithm.Compared to other air quality monitoring equipment, the cost of our proposed solution is reduced by at least 10% with the same observation accuracy.In other words, our proposal avoids the trade-off between cost and accuracy in traditional solutions.
In the process of applying the Kalman Filter algorithm to predict the concentration of various air pollutants, this paper ignores the influence of the external environment on concentration of pollutants.Such factors as factory emissions, wind speed, and other factors will have a direct impact on the current concentration of pollutants.In future work, if we can get this data, we can further improve the accuracy of the model.For the proposed system, the edge computing layer based on the sensor network and the Raspberry Pi was designed to be too centralized [55].When the RPi that is in the center breaks down, the problem of disaster recovery backup will emerge, resulting in long-term paralysis of regional functions (requiring human intervention to troubleshoot) and data loss.At the same time, the monitoring and early warning mechanism of the edge computing layer itself is not perfect.In summary, it is necessary to consider the above two problems on the algorithm and the system when applying the system to the actual production environment in the future.

Figure 1 .
Figure 1.Functional model of the proposed system.

Figure 1 .
Figure 1.Functional model of the proposed system.

Figure 2 .
Figure 2. The architecture of air quality monitoring and prediction system.

Figure 2 .
Figure 2. The architecture of air quality monitoring and prediction system.

Figure 4 .
Figure 4. Basic structure of Kalman Filter algorithm.

Figure 4 .
Figure 4. Basic structure of Kalman Filter algorithm.

Figure 5 .
Figure 5. Iterative process of prediction and correction phase of the Kalman Filter.

25 Figure 9 .
Figure 9.The convergence process that the Kalman Filter algorithm predicts the CO concentration at 0:00 on 16 February 2019 with () = 0.089.

Figure 9 .
Figure 9.The convergence process that the Kalman Filter algorithm predicts the CO concentration at 0:00 on 16 February 2019 with Q(CO) = 0.089.

25 Figure 9 .
Figure 9.The convergence process that the Kalman Filter algorithm predicts the CO concentration at 0:00 on 16 February 2019 with () = 0.089.

Figure 10 .
Figure 10.The convergence process of the error covariance matrix P k when Kalman Filter algorithm predicts the CO concentration at 0:00 on 16 February 2019 with Q(CO) = 0.089.

Table 1 .
Error comparison between the Kalman Filter prediction and sensors observation.

Table 2 .
Acceleration accuracy of prediction value by the Kalman Filter.

Table 3 .
Error comparison of each algorithm.