Evolving Hybrid Cascade Neural Network Genetic Algorithm Space–Time Forecasting

: Design: At the heart of time series forecasting, if nonlinear and nonstationary data are analyzed using traditional time series, the results will be biased. At the same time, if just using machine learning without any consideration given to input from traditional time series, not much information can be obtained from the results because the machine learning model is a black box. Purpose: In order to better study time series forecasting, we extend the combination of traditional time series and machine learning and propose a hybrid cascade neural network considering a metaheuristic optimization genetic algorithm in space–time forecasting. Finding: To further show the utility of the cascade neural network genetic algorithm, we use various scenarios for training and testing while also extending simulations by considering the activation functions SoftMax, radbas, logsig, and tribas on space–time forecasting of pollution data. During the simulation, we perform numerical metric evaluations using the root-mean-square error (RMSE), mean absolute error (MAE), and symmetric mean absolute percentage error (sMAPE) to demonstrate that our models provide high accuracy and speed up time-lapse computing.


Introduction
Pollution describes the appearance and retention of the regular circulation of material, fine particles, biomaterial, and energy, or a deterioration technique or atmospheric change, which also has or may have significantly negative effects on human beings or the natural environment. Air pollutants are exhaust gases, particulate matter compounds, solid particulate matter, and other substances that emanate into the air, threatening the health of the community and damaging the environment. Air pollutants can be classified into smog and soot, pollution from contaminated air, greenhouse gas emissions, pollen, and mold.
PM refers to particulate matter, also known as particulate emissions. PM comprises aggregated rigid particles and atmospheric fluid droplets. Some are large enough or visible enough to be seen with the naked eye, and others are so small that they can only be The network parameters are managed according to the cumulative effect of the training vector, mostly with error signals. Meanwhile, each error signal should be specified as the gap between the requested response and the network's actual response. This modification is made step by step in order to rapidly mimic the teacher in the neural network, with the emulation in any mathematical context assumed to be ideal. This transfers awareness of the teacher's setting to the neural network as thoroughly as possible by preparation [27]. If this is achieved, then we can dispense with the teacher and encourage the neural network to deal entirely with the environment. Throughout the supervised NN model, input vectors and suitable target vectors are used to update the parameters; before a function can be approximated, input features should be tied to specific output vectors and the information that can be processed should be properly identified [28,29].
The most famous and typical algorithm for neural network training is the context of an error, the main principle of which is that an error in the hidden neurons is calculated by propagation of the error in the output layer neurons. The traditional backpropagation algorithm uses two input and learning processes. Vectors or patterns are displayed in the input layer in feedforward operation, and each neuron throughout the hidden layer is measured in the activation with one neuron net j . The input vector dot product including neuron weight in the hidden layer is represented in Equation (1): (1) where N i is the input vector dimension, and i and j are neuron indices in the layer input and in the hidden layer, respectively. The weight value between the input vectors and neurons in the hidden layer is w I−H ij . The weight value of bias in the hidden layer is b ij and is usually assumed to be b ij = w I−0 0j , x 0 = 1. By substituting net j into activation function ϕ 1 , θ j is calculated. In the activation of a single neuron, each neuron in the output layer computes net k , which is the dot product of θ j and the neuron weight in the output layer, represented in Equations (2) and (3).
In line with this, N H is the number of neurons in the hidden layer and k is the index of a neuron in the output layer. The weight value of neurons between the hidden layer and output layer is described as w H−0 jk . We can substitute net k into activation function ϕ 2 to output y k , represented in Equation (4).
x j w I−H ij (6) Symmetry 2021, 13, 1158 4 of 20 Regarding Equations (4) and (5), the entire collection of weights is updated to ensure that y k is near the target output value of t k by propagating the E r error of the output layer neurons throughout the learning phase. Although a variety of output functions are available to evaluate the error, the squared error is commonly used, represented in Equation (7).

Genetic Algorithm
Biological variation and its basic processes were clarified by Darwin's (2002) evolutionary theory [30]. Natural selection is fundamental to what is often referred to as the macroscopic understanding of evolution. In an environment where only a finite number of humans will survive, and given the basic tendency of people to multiply, selection is necessary if people do not have an accelerated population [31,32]. This evolution favors people who bid more successfully for the given resources. In other words, they are better suited or adapted to the climate, recognized as global best survival [33].
Selection on the basis of competition is one of the two pillars of the mechanism of evolution. The other main influence comes as a function of phenotypical differences in the populations. The phenotype is an individual's physical and behavioral characteristics that assess their fitness in terms of their exposure to the surrounding environment. That individual represents a specific combination of environmental assessment phenotypic characteristics. These characteristics are inherited by the offspring of the individual if evaluated with favor; otherwise, the offspring is discarded. Charles Darwin's insight was that slight, spontaneous phenotype changes occur across ages [34][35][36].
New combinations of phenotype arise and are assessed by these mutations. That is the fundamental basis of the genetic algorithm: With a population of individuals, constraints on the environment lead to natural selection and survival of the population via roulette wheel selection, which results in an increase in the fitness of the population. A random collection of candidates can be generated [37]. Depending on this fitness, many of the best candidates are selected for the next generation using conjugation to seed the performance as an abstract fitness metric [38]. Cross-over and mutation give rise to a number of new offspring that fight for a position in the next generation on the basis of their fitness with old members of the population, before an organism of adequate efficiency is identified and until a previously determined computational threshold is exceeded [39,40]. In line with this, Algorithm 1 shows the scheme of the genetic algorithm. The scheme coincides with the generate-and-test algorithm type. The fitness function constitutes a heuristic estimate of an optimal solution, and the cross-over, mutation, and selection operators guide the search algorithm. The genetic algorithm has many characteristics that can support in the generating and testing of parents.

Cascade Neural Network Genetic Algorithm
Backpropagation training algorithms based on other traditional optimization methods, such as the conjugating gradient and Newton process, have different variants. This same gradient descent approximation, the easiest and among the slowest, usually speeds up the conjugate gradient algorithm, as well as Newton's method [41,42]. We used genetic algorithms through this study. Each neuron weight between the hidden layer and the output layer should be updated, and the weight of the neurons here between the input and the hidden layer was adjusted [43]. The weight change between some of the hidden and output layers of the neuron is specified in Equation (8) with activation function ϕ(x) = 1 x .
The weight value of neurons was updated between the input and hidden layer as represented in Equation (9).
With backpropagation, the input data are repeatedly presented to the neural network. With each presentation, the output of the neural network is compared to the desired output, and the error is computed. This error is then backpropagated through the neural network and used to adjust the weights such that the error decreases with each iteration; the neural network thus gets closer and closer to producing the desired output, represented in Equation (10).
Algorithm 2 shows the function cascade neural network. However, the context backpropagation of each input datum is continuously shown to the neural network, with every representation comparing the output of the neural network to the requested output and computing the error; these errors provide context to the neural network and are used to update the weights to reduce its error for each iteration, as well as the genetic algorithm, allowing new generation of the neural network. for i = 1:o k = k + 1; Wbo(i,1) = W(k); end

Construction of VAR-Cascade
There exist few guidelines for building a neural network model for time series. One of them considers time series as a nonlinear function of several past observations and random errors. Since air pollution data are known to be nonlinear time series data, we selected this method as a benchmark for forecasting. Equation (11) represents the time series models: where f is a nonlinear function determined by the neural network, z t = (1 − B) d y t , and d represents the order difference. Also, the residuals at time t are defined as e t , and m and n are integers. Equation (12) shows that, initially, the VAR model is fitted in order to generate the residuals e t . A neural network is then used to model the nonlinear and linear relations in excess and the original results [22,44,45]. Here, w ij (i = 0, 1, 2, . . . , p + q, j = 1, 2, . . . , Q) and w j (j = 0, 1, 2, 3, . . . , Q) are connection weights and p, q, Q are integers that should be determined in the design process of the cascade neural network. The values of p and q are determined by the underlying properties of the data. If the data are just nonlinear, they only consist of nonlinear structures; then, q can be 0 since the Box-Jenkins method is a linear model that cannot simulate nonlinear interaction. Suboptimal methods may be used in a hybrid model, but suboptimality does not change the functional characteristics of the hybrid approach [17,[46][47][48]. The interpretation of time series requires quantification of the vector dynamic response with time shifts. The main feature of this method is to forecast potential values using recent qualities of a variable, often referred to as lagged values [49]. Commonly, the latest values influence the estimation of a potential value most strongly [50,51]. A single scalar variable is frequently expressed in series data evaluation of a self-regression where future values are estimated based on the weighted total of pre-set lagged values. This variable relies on its own previous values as well as the previous values for many other variables in the much more specific multivariate case [52][53][54].

Study Area
The study areas were Taipei, Hsinchu, Taichung, and Kaohsiung city, with pollution data consisting of nitrogen oxide (NOx), atmospheric PM 2.5 , atmospheric PM 10 , and sulfur dioxide (SO 2 ) levels. Furthermore, the locations of these areas were as established by the Taiwan Environmental Protection Administration Executive Yuan. Table 1 shows statistical summaries of the amounts of air pollution at the four studied locations. The findings typically demonstrate that Taichung has higher concentrations of PM 10 , PM 2.5 , and NO X , but in Kaohsiung, SO 2 is the greatest pollutant. Figure 1 shows an overview of the genetic algorithm's training and evaluation phases. Because each type of air pollutant has a different distribution, we trained the same models for each dataset using the same model architecture. perform simulations for ratios of 90:10, 80:20, 70:30, 60:40, and 50:50. In the training pro-cess, the training samples from the stepper were conditioned for all chromosomes, including new chromosomes of the previous level. Before the formation of the new chromosomes, all forests were educated in parallel. After all forests were qualified in the training part, genetic operators were used in the assessment part to calculate fitness values to operate in the genetic pool. This algorithm altered the substitute operator location to first and functioned only when a new chromosome was generated at the previous point.   Samples for training were split in two, and alternating training and assessment were done in the first half of the samples. After this part was complete, the other half was used for forest training. Again, the first half was divided into smaller sections called stages. We perform simulations for ratios of 90:10, 80:20, 70:30, 60:40, and 50:50. In the training process, the training samples from the stepper were conditioned for all chromosomes, including new chromosomes of the previous level. Before the formation of the new chromosomes, all forests were educated in parallel. After all forests were qualified in the training part, genetic operators were used in the assessment part to calculate fitness values to operate in the genetic pool. This algorithm altered the substitute operator location to first and functioned only when a new chromosome was generated at the previous point.

Air Pollution Forecasting Using VAR-Cascade-GA
Poor air quality in Taiwan has mostly been identified as being a result of household burning, largely the source of greenhouse gas emissions. Taiwan's geography was observed to be a primary contributor towards its environmental problems, resulting in poor absorption and pollutant locking. Taipei, Taiwan's capital and most populous city, is surrounded by mountains, and advanced manufacturing offices all along the western and northern coastlines of Taiwan were also built near mountain ranges. In Section 3, we already discussed the construction step and simulation studies. Furthermore, during the construction stage of input, we employed the VAR pollution space-time dataset including Taichung (Y 1 ), Taipei (Y 2 ), Hsinchu (Y 3 ), and Kaohsiung (Y 4 ) in Taiwan. Figure 2 shows that five hidden layers were used to create the model, and the ratio used was calculated by assessing the error values of the testing results shown in Table 2. During training and testing, PM 2.5 is represented in Figure 3, PM 10 is represented in Figure 4, NO X is represented in Figure 5, and SO 2 is represented in Figure 6. In this context, the cascade neural network genetic algorithm model can be used to study nonlinear and nonstationary data on air pollution. The metrics used to evaluate the test set's result were the root-mean-squared error (RMSE), mean absolute error (MAE), and symmetry mean absolute percentage (sMAPE) between the actual air pollution values and the predicted values. These are metrics that are commonly used in regression problems like our air pollution prediction. If all the metric values are smaller, then the model's performance is better [25].    Noted: Best simulation with low error (*) and yellow highlight represent the lowest value of each information pollution, accuracy measurement, and elapsed time.   In order to measure the stage size to optimize performance, we used backpropagation and conducted a search along the conjugate or orthogonal path. Appropriately, we proved that this was the easiest way to train moderate feedback networks. That being said, some matrix multiplication is included in the processing for such issues as air pollution over time. The network is very wide in this research, so using backpropagation is a good way. When overfitting occurs, the transferability of the potential is significantly decreased. To suppress overfitting, methods such as so-called regularization are often used. L1 (L2) regularization adds the sum of the absolute (square) values of weights to the loss function, as in Equations (13) and (14), where Γ is the loss function and w i jk indicates the weights in the network. In addition, α is the scaling factor for the summation. N H denotes the number of layers and N i denotes the number of nodes in the ith layer. (13)

Does the Activation Function Provide High Accuracy and Speed Up the Time Lapse?
Linear regression models work well throughout short-term predictions based on daily or weekly measurements in time series forecasting, but they cannot tackle nonlinearity in showing variables properly, not even for long-term predictions from seasonal or annual data series. Various machine learning methodologies have been introduced and used to simulate problems and provide predictions in environmental research, as machine efficiency has been evolving rapidly in the last decade. Despite its prominence and outstanding data accuracy, critical issues in the Artificial Neural Network are its propensity to overfit training data and inconsistency for short histories of training data. Several strategies for more effective and efficient preparation of NNs have been recommended. However, these are not simple and also have markedly poor accuracy. Symmetry 2021, 13, x FOR PEER REVIEW 13 of 20 Figure 6. SO2 data training of the CFNN using a genetic algorithm and backpropagation.

Does the Activation Function Provide High Accuracy and Speed Up the Time Lapse?
Linear regression models work well throughout short-term predictions based on daily or weekly measurements in time series forecasting, but they cannot tackle nonlinearity in showing variables properly, not even for long-term predictions from seasonal or annual data series. Various machine learning methodologies have been introduced and used to simulate problems and provide predictions in environmental research, as machine efficiency has been evolving rapidly in the last decade. Despite its prominence and outstanding data accuracy, critical issues in the Artificial Neural Network are its propensity to overfit training data and inconsistency for short histories of training data. Several strategies for more effective and efficient preparation of NNs have been recommended. However, these are not simple and also have markedly poor accuracy.
After the training and testing comparisons already discussed in Section 3.3, we considered proving the performance of the hybrid cascade neural network genetic algorithm when using other activation functions. Computational capabilities are increasing in the era of big data, high-performance computing, parallel processing, and cloud computing. In line with this, we address whether the activation function can improve accuracy and After the training and testing comparisons already discussed in Section 3.3, we considered proving the performance of the hybrid cascade neural network genetic algorithm when using other activation functions. Computational capabilities are increasing in the era of big data, high-performance computing, parallel processing, and cloud computing. In line with this, we address whether the activation function can improve accuracy and speed up the time lapse. Throughout the last decades, the machine learning domain, a branch of artificial intelligence, has gained popularity, and researchers in the area have led it to expand through various areas of human life. Machine learning is a part of research that employs statistics and computer science concepts to develop mathematical models used to execute large tasks such as estimation and inference [55]. These frameworks are collections of mathematical interactions between the system's inputs and outputs. A learning process entails predicting the model parameters so that the task can be executed effectively. To improve accuracy, researchers have conducted simulated comparisons using various activation functions. The most popular activation functions are SoftMax, tanh, ReLU, Leaky ReLU, sigmoid, and logsig [56][57][58][59].
As asserted, the activation function can be defined and applied to an ANN to assist the network in understanding various systems in data. Although contrasted to a neuronbased design seen in human brains, an activation function is essentially responsible for determining what neuron to trigger immediately [60]. Inside an ANN, the activation function is doing the same thing. All of this receives a prior nerve cell output signal and transforms it into a format which can be used as feedback to yet another cell. In this simulation, we used logsig in Equation (15), radbas in Equation (16), SoftMax in Equation (17), and tribas in Equation (18). Table 3 shows that the best activation function for PM 10 was logsig, that for PM 2.5 was SoftMax, that for NO x was radbas, and that for SO 2 was tribas. The SoftMax activation function provided a shorter time lapse than other activation functions. The cascade feed forward neural network model differs only when determining the input variables. During the simulation, we constructed the input by vector autoregression. Then, we considered the input as the lag variable of each predicted variable, in this case, the air pollution data at the four locations of Taichung, Taipei, Hsinchu, and Kaohsiung. Then, in the CFNN model for the four locations, neurons were compiled in the layer and the signal from the input to the first input layer, then to the second layer (hidden layer), and finally to the output layer. The general equation for forecasting pollution data in the four locations, represented in Equation (19), was used for prediction purposes in these study areas. Meanwhile, Equation (20) shows four input neurons Y t−1 (lag 1) and five neurons in hidden layer of Z t . To perform the forecasting, we used Equation (21) for NO x with the radial basis activation function, Equation (22) for PM 2.5 with the SoftMax activation function, Equation (20) for PM 10 with the logsig activation function, and Equation (23) for SO 2 with the tribas activation function. We provide the results of forecasting in Figure 7 for the next 30 steps. The results show Taichung constantly leading with the highest pollutant score compared to other cities in Taiwan.

Conclusions
In this paper, we first presented a full review of a cascade neural network with a genetic algorithm as applied to space-time forecasting. Experimental results on an air pollution dataset showed that our hybrid methods provide high accuracy as proved by the RMSE, MAE, and sMAPE values. Attributable to its rapid urbanization and industrialization over the last decades, Taiwan faces serious environmental issues, including air pollution. In order to resolve air quality issues, the government has taken several countermeasures. The attempt to eliminate SO2 and overall suspended particulate matter was very effective when ever-increasing cars threatened city atmospheres with NOx and particulates. A space-time air pollution analysis over the last 10 years using the monitoring data clearly showed that with urban planning and countermeasure policies, air quality has improved. The analysis should be used to make future policy decisions. Air pollution temporal features were examined herein for Taiwan. The pattern from pollutants to particulates differs in air quality for each location. In a nutshell, the PM, SO2, and NOx levels have drastically increased. Future research should examine using VAR-SARIMA, VAR-ARCH, and other traditional time series as input. Cascade Neural Network Genetic Algorithm for NO x using the radial basis activation function: Cascade Neural Network Genetic Algorithm for PM 10 using the logsig activation function: Cascade Neural Network Genetic Algorithm for SO 10 using the tribas activation function:

Conclusions
In this paper, we first presented a full review of a cascade neural network with a genetic algorithm as applied to space-time forecasting. Experimental results on an air pollution dataset showed that our hybrid methods provide high accuracy as proved by the RMSE, MAE, and sMAPE values. Attributable to its rapid urbanization and industrialization over the last decades, Taiwan faces serious environmental issues, including air pollution. In order to resolve air quality issues, the government has taken several countermeasures. The attempt to eliminate SO 2 and overall suspended particulate matter was very effective when ever-increasing cars threatened city atmospheres with NOx and particulates. A space-time air pollution analysis over the last 10 years using the monitoring data clearly showed that with urban planning and countermeasure policies, air quality has improved. The analysis should be used to make future policy decisions. Air pollution temporal features were examined herein for Taiwan. The pattern from pollutants to particulates differs in air quality for each location. In a nutshell, the PM, SO 2 , and NO x levels have drastically increased. Future research should examine using VAR-SARIMA, VAR-ARCH, and other traditional time series as input.