Air Quality Estimation Using Dendritic Neural Regression with Scale-Free Network-Based Differential Evolution

Song, Zhenyu; Tang, Cheng; Qian, Jin; Zhang, Bin; Todo, Yuki

doi:10.3390/atmos12121647

Open AccessArticle

Air Quality Estimation Using Dendritic Neural Regression with Scale-Free Network-Based Differential Evolution

by

Zhenyu Song

^1,*

,

Cheng Tang

²

,

Jin Qian

¹,

Bin Zhang

¹ and

Yuki Todo

³

¹

College of Computer Science and Technology, Taizhou University, Taizhou 225300, China

²

Faculty of Engineering, University of Toyama, Toyama-shi 930-8555, Japan

³

School of Electrical and Computer Engineering, Kanazawa University, Kanazawa-shi 920-1192, Japan

^*

Author to whom correspondence should be addressed.

Atmosphere 2021, 12(12), 1647; https://doi.org/10.3390/atmos12121647

Submission received: 10 November 2021 / Revised: 3 December 2021 / Accepted: 6 December 2021 / Published: 9 December 2021

(This article belongs to the Section Air Quality)

Download

Browse Figures

Versions Notes

Abstract

:

With the rapid development of the global economy, air pollution, which restricts sustainable development and threatens human health, has become an important focus of environmental governance worldwide. The modeling and reliable prediction of air quality remain substantial challenges because uncertainties residing in emissions data are unknown and the dynamic processes are not well understood. A number of machine learning approaches have been used to predict air quality to help alleviate air pollution, since accurate air quality estimation may result in significant social-economic development. From this perspective, a novel air quality estimation approach is proposed, which consists of two components: newly-designed dendritic neural regression (DNR) and customized scale-free network-based differential evolution (SFDE). The DNR can adaptively utilize spatio-temporal information to capture the nonlinear correlation between observations and air pollutant concentrations. Since the landscape of the weight space in DNR is vast and multimodal, SFDE is used as the optimization algorithm due to its powerful search ability. Extensive experimental results demonstrate that the proposed approach can provide stable and reliable performances in the estimation of both PM_2.5 and PM₁₀ concentrations, being significantly better than several commonly-used machine learning algorithms, such as support vector regression and long short-term memory.

Keywords:

air quality estimation; dendritic neural regression; scale-free network; chaotic time series

1. Introduction

In recent years, due to climate change, industrial production, and population growth, the air quality has deteriorated in many parts of the world. This decline in air quality has seriously affected economic development and public health. Fortunately, with gradual improvements in air quality monitoring systems, many countries have established multilevel air quality monitoring networks. The air pollutants of concern mainly include gaseous compounds such as carbon monoxide (CO), sulpfur dioxide (SO₂), nitrogen oxides (NO_x), ozone (O₃), and fine particulate matter (PM_2.5 and PM₁₀) [1,2,3]. In particular, the PM_2.5 and PM₁₀ concentrations have considerable impacts on human health [4]. Hence, it is crucial to forecast the ambient air quality and air quality index to ensure timely and proper responses to heavily polluted weather and to provide guidance for joint emission reduction measures to reduce regional air pollution. In addition, improving the accuracy of air quality analysis and forecasting can help governments to improve the reliability of environmental management and decision-making, and enable timely and effective prevention and control measures to minimize the harm caused by air pollution to people. However, many factors affect air quality—including environmental factors such as temperature, humidity, and wind speed; and human factors such as traffic conditions and pollution source emissions—which increases the difficulty of accurately predicting air quality. Therefore, it is particularly important to establish an air quality prediction system that can achieve excellent performance.

Traditional air quality prediction methods mainly include numerical prediction and regression statistics [5]. Numerical air quality models are based on atmospheric dynamics and employ the monitoring information from multiple environmental monitoring stations to establish meteorological emission and chemical models, and thus, simulate the migration, exchange, diffusion, and emission of pollutants [6]. However, numerical prediction methods are subject to complex prior knowledge, use unreliable and limited data, and have various usage constraints [7]. Moreover, the requirements for the input data are relatively strict, rendering it difficult to accurately predict air quality in real time [8]. Thus, it is theoretically difficult to simulate the real atmospheric environment. In contrast, the regression statistics-based approach avoids complex theoretical models and instead leverages statistical models to predict air quality based on analyses of historical air quality data. Compared with numerical prediction methods, which are based primarily on historical meteorological data and the regular analysis of pollutant monitoring concentrations, meteorological forecast products are utilized to predict pollutant concentrations [9]. Nevertheless, the complex linear or nonlinear relationships between the various factors affecting both air quality and the concentrations of air pollutants are challenging to describe with a definite mathematical model.

Statistics-based air quality prediction models are relatively simple to implement, as the relationships between pollutant concentrations and meteorological factors are established on the basis of statistics. As air pollutant concentration data are nonlinear and irregular, the above-mentioned methods cannot meet the requirements of practical applications to obtain sufficiently accurate and reliable prediction results. However, while the prediction performances of statistics-based methods need to be improved, these models can be applied to predict the air quality in smaller areas, and thus, can provide a certain theoretical basis for future predictions using machine learning and deep learning models. Currently, with the rapid development and application of the Internet of Things and sensor technologies, atmospheric data collected by various sensors and related data collection equipment in cities provide the necessary sources of data for air quality prediction. Since traditional shallow learning models still encounter bottlenecks in utilizing big data, new air quality prediction methods need the support of data-driven models [10]. Recently, many machine and deep learning approaches, such as decision trees (DTs) [11,12], support vector regression (SVR) [13,14,15], long short-term memory (LSTM) [16,17,18], and random forest models [19,20], have been adapted to air quality forecasting. In addition, a universal and effective deep learning air model was proposed to resolve the interpolation, prediction, and feature analysis of air quality at a fine resolution [21] via the embedded feature selection and semisupervised learning of different layers in a deep learning network. This model utilizes relevant information from unlabelled air quality data to improve the interpolation and prediction performance. Moreover, in 2020, a hybrid deep learning model that combines LSTM and convolutional neural networks was developed to improve the air quality prediction accuracy [22]. It can consider the spatial correlation characteristics of air pollutants to achieve high prediction performance. Furthermore, in regard to spatiotemporal correlations, a deep learning model was provided in [23] for daily PM_2.5 concentration forecasting.

Although machine and deep learning models can rapidly forecast air quality with high accuracy, under many complex air quality conditions, feature extraction is not a simple task, as it requires the artificial design of an effective feature set to predict the training results. Therefore, for long-term air quality predictions and addressing the uncertainties and nonlinear problems in prediction systems, employing machine learning prediction models remains challenging. For instance, an LSTM network cannot model and analyze the complex spatial and temporal correlations with air quality, and its nonlinear spatial dependence is an important factor affecting the air quality prediction performance. Nevertheless, with the rise of artificial neural networks (ANNs), which are data-driven and have various advantages, including data adaptation, parameter self-learning, and combined memory, a number of researchers have attempted to apply various neural network models to predict air quality. For example, a hybrid multilayer perceptron (MLP) and linear regression model was developed on the basis of principal component analysis to analyze air pollution [24]. In [25], a feedforward backpropagation neural network (BPNN) and regression model were combined to predict seasonal indoor PM_2.5–10 and PM_2.5 concentrations, and another BPNN-based approach was developed in [26] for regional multi-step-ahead PM_2.5 forecasting. ANNs were likewise used in a highly polluted region to predict the concentrations of all types of pollutants on subsequent days [1]. Moreover, based on a supervised learning neural network, a modified depth-first search method was employed to estimate PM₁₀ concentrations in [27], and Zhang et al. developed an Elman neural network (ENN)-based model for estimating air quality [28]. In addition, fuzzy neural networks [29] and recurrent neural networks [30,31] have been widely used in air quality prediction.

ANNs can reliably and accurately map the correlations between inputs and outputs, and thus have been extensively applied to the prediction of air quality. However, because each ANN model has unique advantages and limitations, it is difficult to select the most suitable model for all air quality time series. In addition, the above-mentioned models focus only on the signal transmission between neurons and ignore the nonlinear relationship in the dendritic structure of each neuron, which has been verified in biological neurons [32]. Moreover, most training algorithms easily fall into local optima and are sensitive to the initial state with gradient descent information [33,34]. To overcome these limitations and consider the calculation efficiency, in this study, a novel dendritic neural regression model (DNR) is proposed to estimate air quality. It is an improved version of our previously proposed dendritic neural model (DNM), which has been successfully applied to morphological hardware realization [35], classification [36,37,38,39,40], and time series prediction [41,42]. Due to the plasticity of dendrites and the nonlinear characteristics of synapses, the DNM can effectively simulate the processes by which biological neurons transmit information and has the capability to fit complex nonlinear functions well. However, since the original DNM network was designed for classification problems, adaptive pruning is needed in the calculation process to improve the calculation efficiency, especially in the processing of high-dimensional data classification; consequently, the computational complexity is excessively high, leading to a long computation time. In the proposed DNR network, we employ a single-branch approach to reduce the computational complexity, and utilize a new weight to control the strength of the branch. In addition, because DNR is used to predict air quality, for which the weight space to be trained is vast and complex, a global optimization algorithm with stronger search capabilities is needed to replace the traditional gradient descent-based algorithm. In this study, to further enhance the air quality prediction performance of DNR, a scale-free network-based differential evolution (SFDE) algorithm is proposed to optimize the weight and threshold of DNR [43]. This scale-free local search method can ensure the diversity of individuals and avoid local optima, thereby helping the differential evolution (DE) algorithm reach the global optimum.

Moreover, one-dimensional air quality time series data can be unpredictable and irregular, and some information is hidden in high dimensions. According to Takens’s theory [44], phase space reconstruction (PSR) can extend one-dimensional data into high-dimensional space. If this high-dimensional data space exhibits chaotic characteristics, it is predictable. Therefore, before DNR is implemented for the air quality prediction, first, mutual information (MI) and false nearest neighbors (FNN) methods are utilized to calculate the time delay and embedding dimensions of the dataset. Then, PSR is performed to transform the one-dimensional PM_2.5 and PM₁₀ time series data into predictable multidimensional spatial vector data, and the maximum Lyapunov exponent (MLE) is used to validate the chaotic peculiarity. Finally, the resulting vectors are used as the training samples of DNR, and the trained DNR network is employed to perform the air quality prediction. For a fair comparison, three PM_2.5 and three PM₁₀ concentration datasets from the past two years were selected for experiments to evaluate DNR’s prediction performance, and each group of experiments was run independently 30 times. Our extensive experimental and statistical results confirm that the air quality estimations of the proposed DNR network are superior to those of its competitors. The novelty and primary contributions of this study are as follows:

A single-branch dendritic neural regression-based approach is proposed to estimate air quality.
To enhance the prediction performance of DNR, a customized SFDE algorithm is proposed to optimize DNR.
Extensive experiments demonstrate that DNR can more accurately and stably predict the PM_2.5 and PM₁₀ concentrations than the existing methods.
Two nonparametric statistical tests further verify that the proposed DNR network is superior to nine of its competitors.

The remainder of this paper is organized as follows: Section 2 mainly introduces the related methods and techniques, including the proposed DNR network, SFDE algorithm, chaotic time series theory, and PSR. The relevant details of the experiments are introduced in Section 3, including a brief description of the experimental data, the experimental setup, the evaluation criteria of the prediction methods, and the experimental results, which are presented and discussed in detail. Section 4 presents a conclusion and summarizes the prospects for future work.

2. Proposed Method

2.1. Dendritic Neural Regression

In this study, a single-branch DNR network is proposed to predict and estimate air quality data and enhance the computational efficiency by improving the traditional DNM network. Due to the calculation mechanism of the dendritic structure of DNR, the relationship mapping between the input and output can be performed properly. The architecture of DNR is shown in Figure 1. Figure 1 (right) shows the transformation process from DNM to DNR, which is composed of four layers: synapses, dendrites, a membrane, and a soma layer. As demonstrated in Figure 1, DNR is a feedforward neural network model in which the signal enters from the synaptic layer and is calculated and gradually propagates backward to the output of the soma layer. Figure 1 (left) describes the six connection cases between the synapse structure and the input signal in DNR. The following is a detailed description of DNR.

2.1.1. Synapses

In neurons, synapses are crucial nodes for signal transmission between dendrites or between dendrites and axons. Since the transmission of a signal on a synapse is feedforward and nonlinear in nature and has two states, namely, excitement and inhibition, we can use the sigmoid function to simulate this process. The formula is described as follows:

S_{i} = \frac{1}{1 + e^{- k (w_{i} y_{i} - q_{i})}},

(1)

where

y_{i}

is the input value of the sample attribute, the hyperparameter k is a positive integer, and i represents the feature dimension of the sample.

w_{i}

and

q_{i}

are the weight and threshold of the input connection, respectively; for different prediction tasks, their suitable values can be obtained after training based on optimization algorithms. To simulate the excited and inhibited states of synapses, according to the values of

w_{i}

and

q_{i}

, Figure 1 (left) indicates that the connection states can be divided into six cases [42]:

$w_{i} < q_{i} < 0$ (for example, $w_{i} = - 1$ and $q_{i} = - 0.5$ ): a high potential input leads to a low potential output, so this state is called an inhibited connection.
$0 < q_{i} < w_{i}$ (for example, $w_{i} = 1$ and $q_{i} = 0.5$ ): a high potential input leads to a high potential output, so this state is called an excited connection.
$q_{i} < 0 < w_{i}$ or $q_{i} < w_{i} < 0$ (for example, $w_{i} = 1$ and $q_{i} = - 0.5$ or $w_{i} = - 1$ and $q_{i} = - 1.5$ ): in these two cases, regardless of the value of the input, the output is always 1, and these two cases are called constant 1 connections.
$0 < w_{i} < q_{i}$ or $w_{i} < 0 < q_{i}$ (for example, $w_{i} = 1$ and $q_{i} = 1.5$ or $w_{i} = - 1$ and $q_{i} = 0.5$ ): these cases are similar to the third case, except the output is 0 regardless of the input, so both states are called 0 connections.

Each input is processed by the sigmoid function of the synaptic layer; this is similar to the pruning operation of a neural network. The trained DNR network can further simplify the calculation cost and improve the calculation efficiency based on this operation.

2.1.2. Dendrites

In DNR, the dendritic layer collects the nonlinear correlated signals transmitted from the synaptic layer, and these nonlinear relationships can be mapped by multiplication [45]. The formula is defined as:

B = \prod_{i = 1}^{n} S_{i} .

(2)

2.1.3. Membrane

The membrane layer adjusts the intensity of the output signal on the dendritic layer and continues to transmit the obtained signal to the soma body. The calculation formula for this layer is expressed as follows:

V = μ \cdot B,

(3)

where

μ

represents the parameter that adjusts the output intensity of the dendritic layer. In DNM, the intensity of the dendritic signal is not considered

(μ = 1)

, which can also be utilized to distinguish the original DNM network from DNR. In DNR, for a certain regression problem,

μ

is also used as a weight that is trained and constantly changed by optimization algorithms.

2.1.4. Soma

The soma layer represents the last calculation operation of DNR. We use the sigmoid function as the activation function to map the nonlinear relationships. The soma layer output calculation is expressed as follows:

O = \frac{1}{1 + e^{- k_{s} (V - α)}} .

(4)

where

k_{s}

represents a user-defined positive integer, and the parameter

α

is a threshold of the sigmoid activation function in cell bodies.

2.2. Optimization Algorithm

In the calculation of the dendritic layer in DNR, a large number of multiplication operations are employed to map the nonlinear relationships between features, and an adjustment parameter is added to control the branch strength in the membrane layer, making the output of DNR very sensitive to the input of the model. Moreover, the parameter space required to participate in the training is vast and complex. Traditional gradient descent optimization algorithms suffer from some limitations in such a parameter space. Therefore, we adopt the more powerful search-capable DE algorithm to replace it. To further improve the DE search performance, we propose the SFDE algorithm. The following describes the search process of the SFDE algorithm and the parameter settings.

2.2.1. Differential Evolution

The DE algorithm is a novel heuristic intelligent optimization algorithm with a simple and computationally efficient structure inspired by the principle of “survival of the fittest”. The basic idea is to create new individuals through differences between individuals and then select better individuals to enter the next generation. The standard DE implementation process consists of four steps: initialization, mutation, crossover, and selection.

First, the population is initialized to P individuals, which are randomly distributed in the solution space. Then, mutation operations are performed on the population. In each generation, according to the differences between individuals, a mutation vector

V_{i}

is obtained. The common mutation strategies are as follows:

rand/a

$V_{i}^{t} = Y_{1}^{t} + F \times (Y_{2}^{t} - Y_{3}^{t}),$

(5)
best/a

$V_{i}^{t} = Y_{b e s t}^{t} + F \times (Y_{1}^{t} - Y_{2}^{t}),$

(6)
rand/b

$V_{i}^{t} = Y_{1}^{t} + F \times (Y_{2}^{t} - Y_{3}^{t}) + F \times (Y_{4}^{t} - Y_{5}^{t}),$

(7)
best/b

$V_{i}^{t} = Y_{b e s t}^{t} + F \times (Y_{1}^{t} - Y_{2}^{t}) + F \times (Y_{3}^{t} - Y_{4}^{t}),$

(8)

where

Y_{b e s t}^{t}

represents the optimal individual in the t-th iteration and F is a scaling factor in (0, 1).

After the mutation operation, all pairs of target vectors

Y_{i}^{t}

and difference vectors

V_{i}^{t}

are crossed to generate the trial vector

R_{i}^{t} = (r_{1}^{t}, r_{2}^{t}, \dots, r_{i}^{t})

. The operation is defined as follows:

r_{i, j}^{t} = \{\begin{matrix} v_{i, j}^{t} & i f r a n d_{i, j} [0, 1] \leq C R_{i}^{t} o r j = D, \\ y_{i, j}^{t} & o t h e r w i s e, \end{matrix}

(9)

where

r_{i, j}^{t}

is the i-th trial vector in the t-th generation, j is a positive integer representing the dimension D of the problem, and

y_{i, j}^{t}

and

v_{i, j}^{t}

represent the cross vector and target vector, respectively, in the j-th dimension.

Finally, based on the current fitness of

R_{i}^{t}

and

Y_{i}^{t}

, a better individual is selected to enter the next generation of evolution. The selection process formula is as follows:

Y_{i}^{t + 1} = \{\begin{matrix} R_{i}^{t} & i f f (R_{i}^{t}) \leq f (Y_{i}^{t}), \\ Y_{i}^{t} & o t h e r w i s e . \end{matrix}

(10)

The above three steps are repeated in each generation until the termination conditions of the algorithm are met.

2.2.2. Scale-Free Network-Based Differential Evolution Algorithm

Barabási and Albert proposed the scale-free BA network model when studying the topology of the World Wide Web [46], which is named the BA network model, and this model has two characteristics: node growth and a priority connection mechanism. The degree distribution of the BA network conforms to a power law distribution as follows:

P (k) \propto k^{- β},

(11)

where

P (k)

is the probability where the degree of the node is the probability k and

β \in (2, 3)

is the scaling factor. Therefore, the BA model can describe many real networks. The network model is constructed in the following steps: (1) Node growth: the network initially has

m_{0}

nodes, and a new node is added each time to connect to the m existing nodes, where

m \leq m_{0}

. (2) Priority connection: the probability

P_{a}

of connecting a newly added node to an existing node is:

P_{a} = \frac{k_{a} + 1}{\sum_{b} (k_{b} + 1)},

(12)

where

k_{a}

denotes the degree of all existing nodes and

k_{b}

represents the sum of all degrees to node b. After n iterations of the algorithm, a scale-free network with m·n edges and

N (N = m_{0} + n)

nodes is generated. (3) The above operations are repeated until all nodes are connected. In the current network space, it is easier to establish links between high- and low-degree nodes, while high-degree nodes rarely connect to each other. The scaling factor

β

can also indicate the degree of connection between low- and high-degree nodes. The formula for calculating

β

is defined as:

β = \frac{I^{- 1} \sum_{i} x_{i} y_{i} - {(I^{- 1} \sum_{i} (x_{i} + y_{i}) / 2)}^{2}}{I^{- 1} \sum_{i} (x_{i} + y_{i}) / 2 - {(I^{- 1} \sum_{i} (x_{i} + y_{i}) / 2)}^{2}},

(13)

where

x_{i}

and

y_{i}

represent the degrees of connecting two adjacent nodes on the i-th link, and I represents the number of links in the network structure. The greater the value of

β

is, the more connections with high-degree nodes that are likely to be generated.

The proposed local search strategy based on the scale-free network ensures the diversity of the population during the search process of the algorithm, which helps the DE algorithm achieve the goal of global optimization. First, based on the population size, a corresponding scale-free network is generated, and all nodes are numbered in the network. Then, the nodes are ranked depending on the number of links. According to the BA algorithm, fewer high-degree nodes and more low-degree nodes can be obtained in the network at this time. After each iteration of the DE algorithm, the individuals are ranked based on their fitness. Finally, the ranked individuals are placed into the network nodes in the same order, and all the individuals are stored in the nodes from high to low (from best to worst). The structure of the SFDE algorithm is shown in Figure 2. The power law distribution reduces the number of high-degree nodes. Therefore, most individuals are close to being excellent individuals after each update, which improves the quality of individuals and generally accelerates the convergence of the algorithm. The update rule for each individual is as follows:

Y_{i}^{t + 1} = Y_{i}^{t} + r a n d (0, 1) \cdot (Y_{i_n}^{t} - Y_{i}^{t}),

(14)

where

Y_{i}^{t}

is the weight vector of the i-th DE individual in the t-th generation and

Y_{i_n}^{t}

is a node connected to

Y_{i}^{t}

in the scale-free network. After each DE iteration, a local search is performed in the scale-free network until the algorithm finally converges. In the process of optimizing DNR, the initial scaling factor

F_{0}

of the SFDE algorithm is set to 0.7, the cross probability is

C R_{0} = 0.9

, the scale-free network parameter

m_{0}

is set to 10, and the remaining parameters adopt the default values.

2.3. Chaotic Time Series

Before using DNR to perform an air quality prediction, it is necessary to preprocess the original one-dimensional time series data. In this section, we mainly introduce how to map the one-dimensional time series data to the high-dimensional phase space based on the PSR method. Takens proposed that mapping one-dimensional data to a high-dimensional space requires that the delay and embedding dimensions be determined [44]. The maximum likelihood estimator (MLE) is used to determine the chaotic characteristics of the new spatial data. Then, the data are analyzed and predicted in the high-dimensional phase space. The methods of calculating the time delay, embedding dimension, and MLE are described below.

2.3.1. Mutual Information

MI refers to the correlation between two event sets. The theoretical basis of MI is information entropy, which is a nonlinear method that is widely used to solve the time delay of PSR. The optimal time delay corresponds to the first minimum value of MI. The calculation formula of information entropy is:

H (S) = - \sum_{i = 1}^{n} P (s_{i}) \cdot ln (P (s_{i})),

(15)

where

s_{i}

represents the time series variable,

P (s_{i})

is the probability of

s_{i}

, and

H (S)

is information entropy. It can be concluded that the joint information entropy of the two groups of time series is:

H (S, Q) = - \sum_{i = 1}^{n} \sum_{j = 1}^{m} P_{s, q} (s_{i}, q_{i}) \cdot ln (P_{s, q} (s_{i}, q_{i})),

(16)

where n and m are the lengths of the two time series systems S and Q, respectively, which can be denoted [x_t, x_t+τ]. If S is determined, the MI value can be calculated by:

I (Q, S) = H (Q) + H (S) - H (S, Q) .

(17)

Thus, based on different time delays

τ

, the corresponding MI value can be obtained by the following formula:

I (τ) = I (x_{t}, x_{t + τ}) = H (x) + H (x_{t}) - H (x_{t}, x_{t + τ}) .

(18)

According to Equation (18), the

τ

corresponding to the first minimum value of

I (τ)

can be obtained as the optimum time delay.

2.3.2. False Nearest Neighbors

From a geometric point of view, chaotic time series are formed by mapping the chaotic motion trajectory in the phase space to the one-dimensional space [47]. A chaotic trajectory may be distorted during projection, which shows that the two points in the phase space are not neighbors, but mapping to the one-dimensional space facilitates the neighbor phenomenon, and the two points are called FNN. The idea of using the FNN method to obtain the embedding dimension is that as the embedding dimension increases, the chaotic motion trajectory gradually expands, the FNN are gradually eliminated, and the chaotic motion trajectory is finally recovered. Therefore, when the FNN phenomenon disappears completely, the corresponding embedding dimension m is the most suitable embedding dimension.

Assuming that the FNN point of

Y_{i} = {y_{i}, y_{(i + τ)}, \dots, y_{(i + (m - 1) \cdot τ)}}

is

Y_{i}^{F N N}

in the m-dimensional phase space, the distance between the two points is defined as:

D_{i} m = ∥Y_{i} - Y_{i}^{F N N}∥ .

(19)

When the embedding dimension increases by 1, the distance between these two points is:

D_{i}^{2} (m + 1) = D_{i}^{2} m + {∥Y_{i + m τ} - Y_{i + m τ}^{F N N}∥}^{2} .

(20)

If

D_{i} (m + 1) ≫ D_{i} m

, these two points are far from each other in the phase space and become two FNN after mapping to the low-dimensional space. The following condition can also be utilized to determine whether the two points in the space are false neighbors:

T_{m} = \frac{∥Y_{i + m τ} - Y_{i + m τ}^{F N N}∥}{D_{i} m},

(21)

where

T_{m}

is the determination threshold and the general value range is [10, 50]. If

T_{m}

is greater than the threshold,

Y_{i}

and

Y_{i}^{F N N}

are considered to be false neighbors.

2.3.3. Phase Space Reconstruction

By using the delay

τ

and the embedding dimension m, the one-dimensional air quality time series

{y_{1}, y_{2}, \dots, y_{n}}

can be mapped to the equivalent phase space based on PSR. This method yields the first vector

Y_{1} = {y_{1}, y_{2}, \dots, y_{m}}

in the m-dimensional phase space by extracting m data, moves backward once with

τ

as the period, and yields the second vector

Y_{2} = {y_{1 + τ}, y_{2 + τ}, \dots, y_{m + τ}}

in the m-dimensional phase space. Thus, the process of forming a set

{Y_{i} ∣ i = 1, 2, \dots, N}

is as follows:

Y = [\begin{matrix} Y_{1}^{T} \\ Y_{2}^{T} \\ ⋮ \\ Y_{N}^{T} \end{matrix}] = [\begin{matrix} y_{1} & y_{2} & \dots & y_{N - (m - 1) \cdot τ - 1} \\ y_{1 + τ} & y_{2 + τ} & \dots & y_{N - (m - 2) \cdot τ - 1} \\ ⋮ & ⋮ & ⋮ & ⋮ \\ y_{1 + (m - 1) \cdot τ} & y_{2 + (m - 1) \cdot τ} & \dots & y_{N - 1} \end{matrix}],

(22)

where

Y_{N}^{T}

is the input of DNR, and the target vector is:

Y_{t a r g e t} = [y_{2 + (m - 1) \cdot τ}, y_{3 + (m - 1) \cdot τ}, \dots, y_{N}] .

(23)

According to the above methods, the original air quality time series data can be mapped to the phase space by selecting the appropriate time delay and embedding dimensions. The processes of calculating the time delay and embedding dimensions are shown in Figure 3, and the results of

τ

and m are shown in Table 1.

2.3.4. Maximum Lyapunov Exponent

Using the above-mentioned PSR method, one-dimensional data can be mapped to a high-dimensional space. However, before performing the prediction task, the chaotic characteristics of the data need to be verified. The MLE, which is based on the idea that a chaotic system is extremely sensitive to the initial value, is a common tool for determining whether a system exhibits chaotic characteristics. If the MLE is positive, even if the correlation between adjacent data in the initial time series is small, their differences will continue to expand after iteration and finally separate [48]. The steps for calculating the MLE are as follows:

Y^{t_{o}}

is taken as the initial point, the nearest point is selected as

Y_{0}^{t_{0}}

, and the distance

L_{0}

between these two points is calculated. With the continuous motion of these two points, when the distance is greater than the threshold

μ

, the distance

L_{1}

is defined as:

L_{1} = ∥Y^{t_{1}} - Y_{0}^{t_{1}}∥ > μ .

(24)

Then,

Y_{0}^{t_{0}}

is taken as the centre point, another point with the closest distance is selected, and the distance

L_{1}^{'}

is:

L_{1}^{'} = ∥Y^{t_{1}} - Y_{1}^{t_{1}}∥ < μ .

(25)

The above steps are repeated for m iterations until all points in the time series are traversed, and the MLE

λ_{max}

can be calculated by:

λ_{max} = \frac{1}{m} \sum_{i = 0}^{m} ln \frac{L_{i}}{L_{i}^{'}} .

(26)

The MLE is used to describe the degree of separation between two similar initial values over time. Therefore,

λ > 0

indicates that adjacent points will eventually separate, and the trajectory is characterized by local instability. As shown in Table 1,

λ_{m a x}

all exceed 0. Thus, the six time series datasets considered in this study all have chaotic properties [48].

3. Experimental Studies

In this section, we primarily introduce the data description and preprocessing, the experimental environment, and related parameter settings. Then, the four indicators employed to evaluate the air quality prediction performance of each method are introduced. Finally, the PM_2.5 and PM₁₀ concentrations predicted by all methods are plotted in graphs for a comparative analysis and discussion. The methodological air quality prediction framework of DNR is shown in Figure 4.

3.1. Data Description and Preprocessing

In this study, daily PM_2.5 and PM₁₀ concentration data from October 2019 to June 2021 in Beijing, China, were selected to verify the performance of our proposed method. All data were obtained from China’s air quality online monitoring and analysis platform, and the daily air quality concentration data were calculated and averaged based on the hourly data of the main Environmental Protection Station, the unit of which is μg/m³, obtained from the website https://www.aqistudy.cn/ (accessed on 30 July 2021). First, the original PM_2.5 and PM₁₀ data were divided into three subdatasets with the same number of samples, and each subdataset was then divided into two subdatasets, namely, a training set and a test set. For the PM_2.5 and PM₁₀ concentration datasets, to achieve the best results in the experiments, in the experimental verification, the portion of the training set from each subdataset was different, and the training set was taken from the top of each dataset, as shown in Table 2. It should be emphasized that during the model training and PSR steps, the test datasets were not visible to the model and participated only in the testing and verification of the methods.

Since the synaptic input interval of DNR is [0, 1], the original data should be normalized. Normalization was not performed to improve computational efficiency, but was also necessary to fully meet the experimental requirements. The normalization formula is as follows:

y_{i}^{'} = \frac{y_{i} - M i n (y)}{M a x (y) - M i n (y)},

(27)

where

y_{i}^{'}

is the normalized vector and

M a x (y)

, and

M i n (y)

are the maximum and minimum values of the original data, respectively. As the soma layer of DNR also adopts a sigmoid function as the activation function, the final output of the model is also [0, 1]. In the experiments, to restore the original values, we applied inverse normalization to the output results.

3.2. Experimental Setup

To improve DNR’s prediction accuracy, we proposed a SFDE algorithm to optimize all the weights and thresholds in the model. To verify the feasibility, superiority, and reliability of the proposed prediction model, we selected 9 commonly used machine learning models for comparison with the proposed DNR method: the MLP, ENN, the DT methods; SVR with four different kernel functions—a linear kernel (SVR + L), a radial basis function (RBF) kernel (SVR + R), a polynomial kernel (SVR + P), and a sigmoid kernel (SVR + S) [49]; an LSTM neural network; and DNR optimized by a BP algorithm (DNR+BP).

Before performing air quality estimation, each model has its own hyperparameters that need to be determined. The changes in the two hyperparameters of DNR have a greater influence on the prediction results than the changes in the other hyperparameters. For a fair comparison, in this study, we utilized a random search method to optimize all the hyperparameters of all the models [50]. The parameter settings of DNR are shown in Table 3, and the settings of the other model parameters are shown in Table 4, where epoch indicates a full iteration and c and

γ

are parameters of the kernel function in SVRs. To obtain a reliable performance comparison, each prediction task was independently repeated 30 times for each prediction method, and the results are average values of the repeated experiments. All experiments were performed on multiple computers with the same Intel(R) Core(TM)i7-10770k 3.60 GHz CPU with 32 GB of memory, and the simulation software was MATLAB R2020a.

3.3. Evaluation Indicators

To accurately and uniformly compare the performances of all participating air quality prediction methods, we selected four common evaluation indicators, all of which are calculated based on the differences between the predicted and true values: (1) The mean square error (MSE), which refers to the expected value of the square of the difference between the estimated and true values. The MSE is used to evaluate the degree of data change; specifically, a smaller MSE value indicates that the model can more accurately fit the data. (2) The root mean square error (RMSE) is the arithmetic square root of the MSE. The difference between it and the MSE is reflected mainly in the sensitivity to error. (3) The mean absolute error (MAE), which can better reflect the actual situation of the predicted value error. (4) The mean absolute percentage error (MAPE), which is very similar to the MAE. The lower the MAPE value is, the better the predictive ability of the model. The corresponding formulas are as follows:

M S E = \frac{1}{n} \sum_{i = 1}^{n} {({\hat{y}}_{t e s t}^{(i)} - y_{t e s t}^{(i)})}^{2},

(28)

R M S E = {[\frac{1}{n} \sum_{i = 1}^{n} {(|{\hat{y}}_{t e s t}^{(i)} - y_{t e s t}^{(i)}|)}^{2}]}^{1 / 2},

(29)

M A E = \frac{1}{n} \sum_{i = 1}^{n} |{\hat{y}}_{t e s t}^{(i)} - y_{t e s t}^{(i)}|,

(30)

M A P E = \frac{1}{n} \sum_{i = 1}^{n} \frac{|{\hat{y}}_{t e s t}^{(i)} - y_{t e s t}^{(i)}|}{{\hat{y}}_{t e s t}^{(i)}},

(31)

where

{\hat{y}}_{t e s t}^{(i)}

and

y_{t e s t}^{(i)}

represent the model-predicted value and the target of the original data, respectively, and n is the number of samples. It should be noted that all evaluation indicators were calculated based on the prediction set, and the results obtained are the concentrations of air pollutants.

3.4. Results and Discussion

In this section, we present the experimental results and comparative analysis in detail. The original one-dimensional daily air pollutant concentration time series data are irregular and random, and thus difficult to predict. Thus, first, the time delay and embedding dimensions of the time series samples were calculated by the MI and FNN approaches, respectively. Then, based on PSR, the time series were constructed into a multidimensional predictable space vector, and the MLE was used to validate its chaos and predictability. Finally, the newly obtained vector data were utilized as the input of the prediction model to predict air quality. For a fair comparison, three daily PM_2.5 datasets and three PM₁₀ datasets from Beijing were selected to avoid errors in the experimental results.

In addition, nine common prediction models were selected as competitors of DNR, and four prediction performance evaluation metrics were used to measure the predictive abilities of the models. Moreover, the stability of the learning algorithm and its influence on the model are discussed and analyzed, and two nonparametric statistical test methods were employed to verify the validity of the final experimental results.

3.4.1. Comparison Study

Before the experiment, in accordance with Table 3 and Table 4, the parameters of all methods were set. To intuitively reflect the concentrations of air pollutants, the outputs of the relevant models were treated by inverse normalization. Table 5 summarizes the prediction results of all models based on the six datasets. To avoid erroneous experimental results, all the reported values are the averages of 30 experiments. The optimal results and second-best results of each dataset are presented in red and blue, respectively. Table 5 demonstrates that DNR achieved excellent performance when predicting both PM_2.5 and PM₁₀ concentrations and outperformed the competitor models, which means that DNR can better address daily air quality forecasting problems. Among the four SVR models, SVR + L had the best performance in the PM_2.5(b) task, indicating that the linear kernel function has a good fitting ability in the SVR training process. SVR+P also displayed certain advantages regarding the PM_2.5(b) and PM₁₀(c) datasets. However, ENN did not perform well in air quality forecasting, as its MAPE value exceeds 2.5, indicating that ENN cannot reliably map the nonlinear relationship between the input and output, and thus cannot accurately predict air quality. In general, SVR + P achieved a prediction effect similar to that of DNR for all six forecasting problems, especially when the difference in the MAE was very small. In addition, the DT and LSTM models had mediocre performances, and in the process of training the LSTM network, due to the characteristics of its learning mechanism, more time is required to fit the data. The other models could perform the prediction tasks efficiently.

Figure 5 shows the curves of the actual and output values of DNR during the training and prediction processes, where the PM_2.5 and PM₁₀ concentrations are colored green and red, respectively, and the gray curve represents the actual values. It is worth emphasizing that the training set was the top 75% or 70% of each dataset for model learning, and the remaining part was the prediction set used to validate the prediction accuracy. Figure 5 demonstrates that DNR can fit the real data reliably. Overall, the predicted values in each prediction problem are consistent with the fluctuations in the observed values. However, there were some defects in the processing of peak and valley (maximum and minimum) values, which deviated from the actual values. As the multiplication operation is used in DNR for dendrites, the soma layer cannot accurately respond to changes in the input when the input suddenly increases at outlier points. In addition, the green curve appears to perform better than the red curve, indicating that the daily PM_2.5 concentration data exhibit stronger chaos, as is also reflected in the MLE values in Table 1, as a larger MLE represents a more predictable chaotic state. As shown in Table 5 (right), the performance indicators of the six datasets were averaged. Although DNR shows certain disadvantages for the PM_2.5(b) dataset, due to its strong nonlinear fitting ability, DNR is still the best model for the overall air quality prediction task. Additionally, to better clearly demonstrate the ability of prediction models, the RMSE and MAPE are described in the form of bar charts in Figure 6, and we found that DNR has numerous advantages for both the PM_2.5 and PM₁₀ concentration datasets.

3.4.2. Spatial Stability of the Models

According to Table 5, based on the mean results of the 30 independent experiments, the predictive ability of DNR is significantly better than those of the other models. However, these experiments failed to reflect its stability when performing the same task each time. In other words, stability is another criterion for evaluating the models, and a boxplot can be used to visualize the distribution of all the results for each model. Figure 7 shows the MAE values for all 30 independent runs for all the models; the red horizontal line represents the median of the experimental results, and the rectangle represents the overall MAE range for the corresponding model in the experiment. ENN clearly oscillated significantly on each dataset, thereby showing low stability. MLP, LSTM, DNR+BP, and DNR all show normal offset ranges, indicating that the randomness of the initial value of the optimization algorithm has a certain impact on the prediction performance of the model. In addition, due to their computational mechanisms, the SVR and DT models all demonstrated high stability, as they produced similar experimental results every time.

3.4.3. Effects of Learning Strategies

The results in Table 5 and Figure 5 and Figure 7 show that DNR exhibited strong predictive abilities for both PM_2.5 and PM₁₀ datasets. Due to the unique nonlinear multiplication operation in DNR, this method can reliably fit the nonlinear relationship between the input and output, which is a crucial advantage that other ANNs do not possess. However, during the model training process, the SFDE algorithm plays an important role in optimizing the weights and thresholds in DNR. On high-dimensional error surfaces, the SFDE algorithm can capture relatively better training results based on its powerful search capabilities.

Among the competing models, both ENN and MLP are optimized based on gradient descent, which easily falls into local optima in high-dimensional solution spaces and often converges too quickly due to the quality of the initial value and the complexity of the solution space; as a result, ENN and MLP could not reach the global optimal state. In response to this disadvantage, based on the Adam method, LSTM can be used to enhance the performance to a certain extent by improving the gradient descent search strategy. However, the LSTM training process requires a significant computational cost. To further compare the influence of the search ability of the learning algorithm on the prediction results, a DNR+BP model is proposed, the structure and hyperparameters of which are the same as those of DNR, and the learning algorithm is a gradient descent-based BP algorithm. The bottom of Table 5 indicates that the prediction performance of DNR was better than that of DNR+BP for each dataset, indicating that the SFDE algorithm can effectively help DNR find a more appropriate combination of parameters for predicting air quality in the DNR training stage.

3.4.4. Statistical Analysis

In the experiments, all methods were independently run 30 times for each prediction task. To further validate the effectiveness and superiority of the proposed method, the Friedman statistical test and Wilcoxon rank sum test were used to verify all the results of each experiment [51,52]. Both test methods were implemented in KEEL software [53], and the nonparametric statistical test results are shown in Table 6 and Table 7. The Friedman test was used to calculate the ranking of all models on the same prediction problem, with smaller numbers indicating better performance. According to Table 6, DNR obtained the highest ranking on all six datasets, especially on PM₁₀(b), for which the DNR rank reached 1.3. We also averaged the ranking results for each method and found that DNR achieved the best ranking for individual datasets and in the average results, which shows that DNR performs air quality prediction excellently. However, to further reflect the significant differences between DNR and the competing models, Wilcoxon rank-sum tests were conducted to provide measurement p values, which are used to indicate whether there is a significant difference, and the threshold of a significant difference is 0.05 (5.00

\times 10^{- 2}

); if the p value is less than 0.05, DNR has significant superiority over its competitor. As shown in Table 7, most of the p values are much less than 0.05, which means that DNR has a significantly superior predictive ability compared with the competing models. In particular, the minimum p value of 1.82

\times 10^{- 12}

is observed many times, indicating that DNR has notable superiority over most of the other models. However, when comparing DNR with SVR + L, SVR + P, and LSTM, one can see that there are four non-significant differences, which is also consistent with the results of our experiments. In summary, these nonparametric statistical tests reveal that DNR is more suitable for predicting air quality than the other models and can achieve superior performance.

4. Conclusions

Estimates of air quality are conducive to environmental monitoring and governance. However, the irregularity of air quality concentration time series data renders it difficult to accurately and stably predict air quality. In this paper, a novel DNR method was proposed to improve the prediction accuracy of daily PM_2.5 and PM₁₀ concentrations. To enhance the performance and robustness of DNR, scale-free network technology is embedded into the DE algorithm to optimize the DNR network. The crucial PSR parameters are calculated by the MI and FNN methods, and their chaotic properties are verified by the MLE. Forecasts were performed by numerous competing models on nearly two years of three daily PM_2.5 and three daily PM₁₀ concentration datasets. Four evaluation metrics and two nonparametric statistical tests were used to evaluate the performance of each competitor. The extensive experimental results confirmed that DNR has superior performance regarding both accuracy and stability in air quality estimation tasks. Therefore, DNR can be considered a highly competitive air quality prediction method. This study focused on the prediction of air quality concentration time series, but did not quantitatively analyze the factors influencing air pollution. Thus, we will combine more attributes related to air pollution (e.g., temperature, weather, and industrial development factors), observation locations, data modeling, and the prediction of multidimensional attributes in our future work.

Author Contributions

Conceptualization, methodology, software, and writing—original draft preparation, Z.S.; investigation, resources, and formal analysis, C.T. and J.Q.; visualization, data curation and validation, J.Q. and B.Z.; supervision, Z.S. and Y.T.; project administration, Z.S. and Y.T.; funding acquisition, Z.S. All authors have read and agreed to the published version of the manuscript.

Funding

This study was partially supported by the Nature Science Foundation of the Jiangsu Higher Education Institutions of China (Grant No. 19KJB520015), the Talent Development Project of Taizhou University (No. TZXY2018QDJJ006), the National Science Foundation for Young Scientists of China (Grant No. 61802274), and the Computer Science and Technology Construction Project of Taizhou University (No. 19YLZYA02).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Agarwal, S.; Sharma, S.; Suresh, R.; Rahman, M.H.; Vranckx, S.; Maiheu, B.; Blyth, L.; Janssen, S.; Gargava, P.; Shukla, V.; et al. Air quality forecasting using artificial neural networks with real time dynamic error correction in highly polluted regions. Sci. Total Environ. 2020, 735, 139454. [Google Scholar] [CrossRef] [PubMed]
Xu, Y.; Yang, W.; Wang, J. Air quality early-warning system for cities in china. Atmos. Environ. 2017, 148, 239–C257. [Google Scholar] [CrossRef]
Cekim, H.O. Forecasting pm 10 concentrations using time series models: A case of the most polluted cities in turkey. Environ. Sci. Pollut. Res. 2020, 27, 25612–25624. [Google Scholar] [CrossRef] [PubMed]
Biancofiore, F.; Busilacchio, M.; Verdecchia, M.; Tomassetti, B.; Aruffo, E.; Bianco, S.; Di Carlo, P. Recursive neural network model for analysis and forecast of PM₁₀ and PM_2.5. Atmos. Pollut. Res. 2017, 8, 652–659. [Google Scholar] [CrossRef]
Feng, X.; Li, Q.; Zhu, Y.; Hou, J.; Jin, L.; Wang, J. Artificial neural networks forecasting of PM_2.5 pollution using air mass trajectory based geographic model and wavelet transformation. Atmos. Environ. 2015, 107, 118–128. [Google Scholar] [CrossRef]
Baklanov, A.; Mestayer, P.; Clappier, A.; Zilitinkevich, S.; Joffre, S.; Mahura, A.; Nielsen, N. Towards improving the simulation of meteorological fields in urban areas through updated/advanced surface fluxes description. Atmos. Chem. Phys. 2008, 8, 523–543. [Google Scholar] [CrossRef] [Green Version]
Stern, R.; Builtjes, P.; Schaap, M.; Timmermans, R.; Vautard, R.; Hodzic, A.; Memmesheimer, M.; Feldmann, H.; Renner, E.; Wolke, R.; et al. A model inter-comparison study focussing on episodes with elevated PM₁₀ concentrations. Atmos. Environ. 2008, 42, 4567–4588. [Google Scholar] [CrossRef]
Lv, B.; Cobourn, W.G.; Bai, Y. Development of nonlinear empirical models to forecast daily PM_2.5 and ozone levels in three large chinese cities. Atmos. Environ. 2016, 147, 209–223. [Google Scholar] [CrossRef]
Sahu, R.; Nagal, A.; Dixit, K.K.; Unnibhavi, H.; Mantravadi, S.; Nair, S.; Simmhan, Y.; Mishra, B.; Zele, R.; Sutaria, R.; et al. Robust statistical calibration and characterization of portable low-cost air quality monitoring sensors to quantify real-time o 3 and no 2 concentrations in diverse environments. Atmos. Meas. Tech. 2021, 14, 37–52. [Google Scholar] [CrossRef]
Hsieh, H.-P.; Lin, S.-D.; Zheng, Y. Inferring air quality for station location recommendation based on urban big data. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Sydney, Australia, 10–13 August 2015; pp. 437–446. [Google Scholar]
Wang, Y.; Kong, T. Air quality predictive modeling based on an improved decision tree in a weather-smart grid. IEEE Access 2019, 7, 172892–172901. [Google Scholar] [CrossRef]
Gore, R.W.; Deshpande, D.S. An approach for classification of health risks based on air quality levels. In Proceedings of the 2017 1st International Conference on Intelligent Systems and Information Management (ICISIM), Aurangabad, India, 5–6 October 2017; pp. 58–61. [Google Scholar]
Liu, B.; Jin, Y.; Li, C. Analysis and prediction of air quality in nanjing from autumn 2018 to summer 2019 using pcrÅ¡CsvrÅ¡Carma combined model. Sci. Rep. 2021, 11, 348. [Google Scholar] [CrossRef] [PubMed]
Dun, M.; Xu, Z.; Chen, Y.; Wu, L. Short-term air quality prediction based on fractional grey linear regression and support vector machine. Math. Probl. Eng. 2020, 2020, 8914501. [Google Scholar] [CrossRef]
Leong, W.; Kelani, R.; Ahmad, Z. Prediction of air pollution index (api) using support vector machine (svm). J. Environ. Chem. Eng. 2020, 8, 103208. [Google Scholar] [CrossRef]
Yan, R.; Liao, J.; Yang, J.; Sun, W.; Nong, M.; Li, F. Multi-hour and multi-site air quality index forecasting in beijing using cnn, lstm, cnnlstm, and spatiotemporal clustering. Expert Syst. Appl. 2021, 169, 114513. [Google Scholar] [CrossRef]
Jin, N.; Zeng, Y.; Yan, K.; Ji, Z. Multivariate air quality forecasting with nested lstm neural network. IEEE Trans. Ind. Inform. 2021, 17, 8514–8522. [Google Scholar] [CrossRef]
Bai, Y.; Zeng, B.; Li, C.; Zhang, J. An ensemble long short-term memory neural network for hourly PM_2.5 concentration forecasting. Chemosphere 2019, 222, 286–294. [Google Scholar] [CrossRef] [PubMed]
Font, A.; Tremper, A.H.; Lin, C.; Priestman, M.; Marsh, D.; Woods, M.; Heal, M.R.; Green, D.C. Air quality in enclosed railway stations: Quantifying the impact of diesel trains through deployment of multi-site measurement and random forest modelling. Environ. Pollut. 2020, 262, 114284. [Google Scholar] [CrossRef] [PubMed]
Stafoggia, M.; Bellander, T.; Bucci, S.; Davoli, M.; De Hoogh, K.; De’Donato, F.; Gariazzo, C.; Lyapustin, A.; Michelozzi, P.; Renzi, M.; et al. Estimation of daily PM₁₀ and PM_2.5 concentrations in italy, 2013–2015, using a spatiotemporal land-use random-forest model. Environ. Int. 2019, 124, 170–179. [Google Scholar] [CrossRef]
Qi, Z.; Wang, T.; Song, G.; Hu, W.; Li, X.; Zhang, Z. Deep air learning: Interpolation, prediction, and feature analysis of fine-grained air quality. IEEE Trans. Knowl. Data Eng. 2018, 30, 2285–2297. [Google Scholar] [CrossRef] [Green Version]
Zhang, Q.; Lam, J.C.; Li, V.O.; Han, Y. Deep-air: A hybrid cnnlstm framework forfine-grained air pollution forecast. arXiv 2020, arXiv:2001.11957. [Google Scholar]
Pak, U.; Ma, J.; Ryu, U.; Ryom, K.; Juhyok, U.; Pak, K.; Pak, C. Deep learning-based PM_2.5 prediction considering the spatiotemporal correlations: A case study of beijing, china. Sci. Total Environ. 2020, 699, 133561. [Google Scholar] [CrossRef]
Voukantsis, D.; Karatzas, K.; Kukkonen, J.; Räsänen, T.; Karppinen, A.; Kolehmainen, M. Intercomparison of air quality data using principal component analysis, and forecasting of PM₁₀ and PM_2.5 concentrations using artificial neural networks, in thessaloniki and helsinki. Sci. Total Environ. 2011, 409, 1266–1276. [Google Scholar] [CrossRef]
Elbayoumi, M.; Ramli, N.A.; Yusof, N.F.F.M. Development and comparison of regression models and feedforward backpropagation neural network models to predict seasonal indoor PM_2.5–10 and PM_2.5 concentrations in naturally ventilated schools. Atmos. Pollut. Res. 2015, 6, 1013–1023. [Google Scholar] [CrossRef]
Kow, P.-Y.; Wang, Y.-S.; Zhou, Y.; Kao, I.-F.; Issermann, M.; Chang, L.-C.; Chang, F.-J. Seamless integration of convolutional and backpropagation neural networks for regional multi-step-ahead PM_2.5 forecasting. J. Clean. Prod.. 2020, 261, 121285. [Google Scholar] [CrossRef]
Photphanloet, C.; Lipikorn, R. PM₁₀ concentration forecast using modified depth-first search and supervised learning neural network. Sci. Total Environ. 2020, 727, 138507. [Google Scholar] [CrossRef] [PubMed]
Zhang, L.; Xie, Y.; Chen, A.; Duan, G.A. forecasting model based on enhanced elman neural network for air quality prediction. In Advanced Multimedia and Ubiquitous Engineering; Springer: Berlin/Heidelberg, Germany, 2018; pp. 65–74. [Google Scholar]
Lin, Y.-C.; Lee, S.-J.; Ouyang, C.-S.; Wu, C.-H. Air quality prediction by neuro-fuzzy modeling approach. Appl. Soft Comput. 2020, 86, 105898. [Google Scholar] [CrossRef]
Ma, J.; Ding, Y.; Cheng, J.C.; Jiang, F.; Tan, Y.; Gan, V.J.; Wan, Z. Identification of high impact factors of air quality on a national scale using big data and machine learning techniques. J. Clean. Prod. 2020, 244, 118955. [Google Scholar] [CrossRef]
Loy-Benitez, J.; Vilela, P.; Li, Q.; Yoo, C. Sequential prediction of quantitative health risk assessment for the fine particulate matter in an underground facility using deep recurrent neural networks. Ecotoxicol. Environ. Saf. 2019, 169, 316–324. [Google Scholar] [CrossRef]
London, M.; Häusser, M. Dendritic computation. Annu. Rev. Neurosci. 2005, 28, 503–532. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Van Ooyen, A.; Nienhuis, B. Improving the convergence of the backpropagation algorithm. Neural Netw. 1992, 5, 465–471. [Google Scholar] [CrossRef]
Weir, M.K. A method for self-determination of adaptive learning rates in back propagation. Neural Netw. 1991, 4, 371–379. [Google Scholar] [CrossRef]
Ji, J.; Gao, S.; Cheng, J.; Tang, Z.; Todo, Y. An approximate logic neuron model with a dendritic structure. Neurocomputing 2016, 173, 1775–1783. [Google Scholar] [CrossRef]
Tang, C.; Ji, J.; Tang, Y.; Gao, S.; Tang, Z.; Todo, Y. A novel machine learning technique for computer-aided diagnosis. Eng. Artif. Intell. 2020, 92, 103627. [Google Scholar] [CrossRef]
Song, S.; Chen, X.; Song, S.; Todo, Y. A neuron model with dendrite morphology for classification. Electronics 2021, 10, 1062. [Google Scholar] [CrossRef]
Song, S.; Chen, X.; Tang, C.; Song, S.; Tang, Z.; Todo, Y. Training an approximate logic dendritic neuron model using social learning particle swarm optimization algorithm. IEEE Access 2019, 7, 141947–141959. [Google Scholar] [CrossRef]
Tang, Y.; Ji, J.; Gao, S.; Dai, H.; Yu, Y.; Todo, Y. A pruning neural network model in credit classification analysis. In Computational Intelligence and Neuroscience; Hindawi: London, UK, 2018; p. 9390410. [Google Scholar]
Ji, J.; Song, S.; Tang, Y.; Gao, S.; Tang, Z.; Todo, Y. Approximate logic neuron model trained by states of matter search algorithm. Knowl.-Based Syst. 2019, 163, 120–130. [Google Scholar] [CrossRef]
Zhou, T.; Gao, S.; Wang, J.; Chu, C.; Todo, Y.; Tang, Z. Financial time series prediction using a dendritic neuron model. Knowl.-Based Syst. 2016, 105, 214–224. [Google Scholar] [CrossRef]
Song, Z.; Tang, Y.; Ji, J.; Todo, Y. Evaluating a dendritic neuron model for wind speed forecasting. Knowl.-Based Syst. 2020, 201, 106052. [Google Scholar] [CrossRef]
Das, S.; Suganthan, P.N. Differential evolution: A survey of the stateof-the-art. IEEE Trans. Evol. Comput. 2010, 15, 4–31. [Google Scholar] [CrossRef]
Takens, F. Detecting strange attractors in turbulence. In Dynamical Systems and Turbulence; Springer: Berlin/Heidelberg, Germany, 1981; pp. 366–381. [Google Scholar]
Gabbiani, F.; Krapp, H.G.; Koch, C.; Laurent, G. Multiplicative computation in a visual neuron sensitive to looming. Nature 2002, 420, 320–324. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Albert, R.; Barabási, A.L. Statistical mechanics of complex networks. Rev. Mod. Phys. 2002, 74, 47. [Google Scholar] [CrossRef] [Green Version]
Kennel, M.B.; Brown, R.; Abarbanel, H.D. Determining embedding dimension for phase-space reconstruction using a geometrical construction. Phys. Rev. A 1992, 45, 3403. [Google Scholar] [CrossRef] [Green Version]
Wolf, A.; Swift, J.B.; Swinney, H.L.; Vastano, J.A. Determining Lyapunov exponents from a time series. Phys. D Nonlinear Phenom. 1985, 16, 285–317. [Google Scholar] [CrossRef] [Green Version]
Chang, C.-C.; Lin, C.-J. Libsvm: A library for support vector machines. ACM Trans. Intell. Syst. Technol. TIST 2011, 2, 27. [Google Scholar] [CrossRef]
Bergstra, J.; Bengio, Y. Random search for hyper-parameter optimization. J. Mach. Learn. Res. 2012, 13, 281–305. [Google Scholar]
GGarcía, S.; Fernández, A.; Luengo, J.; Herrera, F. Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power. Inf. Sci. 2010, 180, 2044–2064. [Google Scholar] [CrossRef]
Derrac, J.; García, S.; Molina, D.; Herrera, F. A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms. Swarm Evol. Comput. 2011, 1, 3–18. [Google Scholar] [CrossRef]
Alcalá-Fdez, J.; Sanchez, L.; Garcia, S.; del Jesus, M.J.; Ventura, S.; Garrell, J.M.; Herrera, F. KEEL: A software tool to assess evolutionary algorithms for data mining problems. Soft Comput. 2009, 13, 307–318. [Google Scholar] [CrossRef]

Figure 1. Illustration of the proposed DNR network and the six connection states of the synapse structure.

Figure 2. Topology of the scale-free network in the search space.

Figure 3. Process of calculating the time delay and embedding dimensions of the six air quality time series datasets.

Figure 4. Air quality estimation framework based on DNR.

Figure 5. Air quality time series training and estimation results of DNR for each dataset.

Figure 6. Comparison of the RMSE and MAPE of the estimation models based on six datasets.

Figure 7. Box-and-whisker plots for the MAE of all prediction models.

Table 1. The results of the time delay, embedding dimensions, and MLE of the six air quality time series datasets for PSR.

Dataset	Time Delay $τ$	Embedding Dimensions m	MLE $λ_{\max}$	Chaotic
PM_2.5(a)	2	4	0.1269	Yes
PM_2.5(b)	2	3	0.1524	Yes
PM_2.5(c)	2	3	0.1506	Yes
PM₁₀(a)	3	3	0.0683	Yes
PM₁₀(b)	3	5	0.0409	Yes
PM₁₀(c)	5	5	0.0022	Yes

Table 2. Descriptions of the Beijing daily PM_2.5 and PM₁₀ concentrations in the experiments.

Dataset	Range	Partition Ratio	Instance Number
PM_2.5(a)	2019.10–2020.12	75%, 25%	343, 114
PM_2.5(b)	2020.01–2021.03	75%, 25%	342, 113
PM_2.5(c)	2020.04–2021.06	75%, 25%	342, 113
PM₁₀(a)	2019.10–2020.12	70%, 30%	320, 137
PM₁₀(b)	2020.01–2021.03	70%, 30%	318, 137
PM₁₀(c)	2020.04–2021.06	70%, 30%	318, 137

Table 3. Parameter settings of DNR for each air quality dataset.

Parameter	PM_2.5(a)	PM_2.5(b)	PM_2.5(c)	PM₁₀(a)	PM₁₀(b)	PM₁₀(c)
k	15	16	15	22	20	21
$α$	10	9	10	6	6	5
Popsize and epoch	100, 1000	100, 1000	100, 1000	100, 1000	100, 1000	100, 1000

Table 4. Parameter settings of the prediction models for each air quality dataset.

Model	Related Parameters
ENN	$L e a r n i n g r a t e = 0.01, e p o c h = 1000$
MLP	$H i d d e n l a y e r = 10, l e a r n i n g r a t e = 0.01, e p o c h = 1000$
SVRs	$C o s t (c) = 0.5, γ = 0.2$
DT	$M i n l e a f = 25$
LSTM	$H i d d e n u n i t s = 200, e p o c h = 1000$
DNR + BP	k and $α$ are the same as DNR, $l e a r n i n g r a t e = 0.05$

Table 5. Experimental results based on 30 runs of all the models on the six air quality datasets. The best and second-best results are highlighted in red and blue, respectively.

		PM_2.5(a)	PM_2.5(b)	PM_2.5(c)	PM₁₀(a)	PM₁₀(b)	PM₁₀(c)	Avg.
ENN	MSE	649,021	768,384	27,383	3575	610,219	2,873,219	821,967
	RMSE	359.698	568.409	95.389	51.733	525.458	952.916	425.601
	MAE	79.706	159.879	27.439	29.892	161.304	347.669	134.315
	MAPE	2.530	5.472	0.954	0.676	1.544	4.295	2.578
MLP	MSE	773.638	2016.49	846.855	1236.41	25,439.5	26,261.7	9429.09
	RMSE	27.768	44.520	29.005	35.135	159.492	162.049	76.328
	MAE	22.730	31.375	20.469	27.901	53.371	54.520	35.061
	MAPE	1.412	0.930	1.172	0.694	0.564	0.523	0.883
SVR + L	MSE	659.795	1265.69	637.809	974.774	27,243.1	28,065.4	9807.75
	RMSE	25.686	35.576	25.255	31.221	165.055	167.527	75.054
	MAE	19.991	25.508	16.488	23.857	53.577	54.687	32.351
	MAPE	1.183	0.850	0.831	0.563	0.525	0.515	0.745
SVR + R	MSE	807.986	2255.30	853.840	1236.54	161,070	167,873	55,682.7
	RMSE	28.425	47.490	29.221	35.164	401.335	409.723	158.560
	MAE	22.392	33.173	20.870	27.035	103.474	122.215	54.860
	MAPE	1.309	0.880	1.161	0.620	0.934	1.355	1.043
SVR + P	MSE	657.839	1300.42	631.037	934.281	24,282.4	25,500.0	8884.33
	RMSE	25.648	36.061	25.120	30.566	155.828	159.687	72.152
	MAE	20.194	25.669	16.562	23.072	47.304	49.682	30.414
	MAPE	1.195	0.801	0.879	0.542	0.473	0.425	0.719
SVR + S	MSE	671.895	1368.19	626.755	985.257	27,136.4	28,303.9	9848.73
	RMSE	25.921	36.989	25.035	31.389	164.731	168.238	75.384
	MAE	20.522	26.400	17.044	24.125	53.257	54.630	32.663
	MAPE	1.205	0.839	0.929	0.573	0.518	0.505	0.762
DT	MSE	1074.89	1656.205	991.629	1707.93	24,538.6	26,226.0	9365.88
	RMSE	32.786	40.696	31.490	41.327	156.648	161.944	77.482
	MAE	22.112	28.461	19.469	30.595	50.682	50.813	33.689
	MAPE	1.145	0.813	0.886	0.688	0.556	0.465	0.758
LSTM	MSE	1569.63	4705.71	2244.44	4206.42	32,055.4	32,795.1	12,929.5
	RMSE	39.601	68.582	47.358	64.851	179.040	181.094	96.754
	MAE	28.173	49.997	31.432	53.575	90.685	90.821	57.447
	MAPE	0.714	0.806	0.753	0.904	0.943	0.923	0.839
DNR + BP	MSE	1809.38	5065.46	2041.34	3841.47	31,741.4	32,667.1	12,861.0
	RMSE	42.537	71.172	45.181	61.980	178.161	180.741	96.629
	MAE	31.313	52.732	30.786	50.027	86.038	84.338	55.872
	MAPE	0.828	0.879	0.743	0.800	0.845	0.763	0.810
DNR	MSE	623.101	1299.78	569.916	909.195	23,965.2	25,091.6	8743.14
	RMSE	24.958	36.040	23.871	30.151	154.801	158.403	71.371
	MAE	19.038	25.349	15.891	22.708	46.302	48.945	29.705
	MAPE	0.704	0.803	0.747	0.535	0.462	0.462	0.621

Table 6. Resulting nonparametric statistical Friedman tests of all prediction models, with the best and second-best results highlighted in red and blue, respectively.

Ranking	ENN	MLP	SVR + L	SVR + R	SVR + P	SVR + S	DT	LSTM	DNR + BP	DNR
PM_2.5(a)	7.425	6.575	3.825	6.675	4.25	5.175	5.75	5.925	6.5	2.9
PM_2.5(b)	9.575	5.9	2.6	7.025	2.525	4.275	4.75	6.975	8.775	2.6
PM_2.5(c)	6.35	6.4	3.975	6.75	4.9	5.875	5.4	5.8	5.6	3.95
PM₁₀(a)	7.2	6.2	2.975	5.375	1.775	3.975	7.325	9.95	8.95	1.275
PM₁₀(b)	8.4	4.575	5.5	9.4	1.75	4.475	3.6	8.625	7.375	1.3
PM₁₀	9.625	4.35	5.225	9.375	1.8	5.175	2.95	7.9	7.1	1.5
Avg.	8.096	5.667	4.017	7.433	2.833	4.825	4.963	7.529	7.383	2.254

Table 7. Resulting statistical Wilcoxon tests of the prediction models compared with DNR.

	DNR vs.
	ENN	MLP	SVR + L	SVR + R	SVR + P	SVR + S	DT	LSTM	DNR + BP
PM_2.5(a)	3.68 $\times 10^{- 07}$	9.22 $\times 10^{- 06}$	1.04 $\times 10^{- 04}$	6.09 $\times 10^{- 08}$	3.34 $\times 10^{- 06}$	1.27 $\times 10^{- 05}$	1.38 $\times 10^{- 07}$	1.02 $\times 10^{- 03}$	2.70 $\times 10^{- 03}$
PM_2.5(b)	1.82 $\times 10^{- 12}$	1.95 $\times 10^{- 05}$	8.89 $\times 10^{- 01}$	1.82 $\times 10^{- 12}$	6.81 $\times 10^{- 01}$	5.27 $\times 10^{- 07}$	8.13 $\times 10^{- 10}$	2.29 $\times 10^{- 09}$	1.81 $\times 10^{- 12}$
PM_2.5(c)	1.78 $\times 10^{- 02}$	1.30 $\times 10^{- 02}$	6.99 $\times 10^{- 02}$	5.38 $\times 10^{- 04}$	2.59 $\times 10^{- 02}$	6.46 $\times 10^{- 07}$	4.89 $\times 10^{- 02}$	8.17 $\times 10^{- 02}$	5.83 $\times 10^{- 03}$
PM₁₀(a)	1.82 $\times 10^{- 12}$	1.82 $\times 10^{- 12}$	3.64 $\times 10^{- 12}$	1.82 $\times 10^{- 12}$	1.01 $\times 10^{- 04}$	3.64 $\times 10^{- 12}$	1.82 $\times 10^{- 12}$	1.82 $\times 10^{- 12}$	1.82 $\times 10^{- 12}$
PM₁₀(b)	1.82 $\times 10^{- 12}$	1.82 $\times 10^{- 12}$	1.82 $\times 10^{- 12}$	1.82 $\times 10^{- 12}$	2.63 $\times 10^{- 03}$	1.82 $\times 10^{- 12}$	3.48 $\times 10^{- 08}$	1.82 $\times 10^{- 12}$	1.82 $\times 10^{- 12}$
PM₁₀(c)	1.82 $\times 10^{- 12}$	1.82 $\times 10^{- 12}$	1.82 $\times 10^{- 12}$	1.82 $\times 10^{- 12}$	1.04 $\times 10^{- 07}$	1.82 $\times 10^{- 12}$	1.04 $\times 10^{- 07}$	1.82 $\times 10^{- 12}$	1.82 $\times 10^{- 12}$

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Song, Z.; Tang, C.; Qian, J.; Zhang, B.; Todo, Y. Air Quality Estimation Using Dendritic Neural Regression with Scale-Free Network-Based Differential Evolution. Atmosphere 2021, 12, 1647. https://doi.org/10.3390/atmos12121647

AMA Style

Song Z, Tang C, Qian J, Zhang B, Todo Y. Air Quality Estimation Using Dendritic Neural Regression with Scale-Free Network-Based Differential Evolution. Atmosphere. 2021; 12(12):1647. https://doi.org/10.3390/atmos12121647

Chicago/Turabian Style

Song, Zhenyu, Cheng Tang, Jin Qian, Bin Zhang, and Yuki Todo. 2021. "Air Quality Estimation Using Dendritic Neural Regression with Scale-Free Network-Based Differential Evolution" Atmosphere 12, no. 12: 1647. https://doi.org/10.3390/atmos12121647

APA Style

Song, Z., Tang, C., Qian, J., Zhang, B., & Todo, Y. (2021). Air Quality Estimation Using Dendritic Neural Regression with Scale-Free Network-Based Differential Evolution. Atmosphere, 12(12), 1647. https://doi.org/10.3390/atmos12121647

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Air Quality Estimation Using Dendritic Neural Regression with Scale-Free Network-Based Differential Evolution

Abstract

1. Introduction

2. Proposed Method

2.1. Dendritic Neural Regression

2.1.1. Synapses

2.1.2. Dendrites

2.1.3. Membrane

2.1.4. Soma

2.2. Optimization Algorithm

2.2.1. Differential Evolution

2.2.2. Scale-Free Network-Based Differential Evolution Algorithm

2.3. Chaotic Time Series

2.3.1. Mutual Information

2.3.2. False Nearest Neighbors

2.3.3. Phase Space Reconstruction

2.3.4. Maximum Lyapunov Exponent

3. Experimental Studies

3.1. Data Description and Preprocessing

3.2. Experimental Setup

3.3. Evaluation Indicators

3.4. Results and Discussion

3.4.1. Comparison Study

3.4.2. Spatial Stability of the Models

3.4.3. Effects of Learning Strategies

3.4.4. Statistical Analysis

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI