Soft Computing Applications in Air Quality Modeling: Past, Present, and Future

: Air quality models simulate the atmospheric environment systems and provide increased domain knowledge and reliable forecasting. They provide early warnings to the population and reduce the number of measuring stations. Due to the complexity and non-linear behavior associated with air quality data, soft computing models became popular in air quality modeling (AQM). This study critically investigates, analyses


Introduction
Air pollutants cause widespread detrimental effects on physical, biological, and economic systems. The prominent and dangerous pollutants are carbon oxides (CO x ), nitrogen oxides (NO x ), Sulphur oxides (SO x ), ozone (O 3 ), lead (Pb), respirable suspended particles or particulate matters (PM 2.5 and PM 10 ), and total volatile organic compounds (TVOC) [1][2][3][4]. Exposure to such pollutants causes many diseases, including respiratory diseases, type 2 diabetes, asthma, allergies, and cancer [5][6][7][8][9]. Typically, the environmental regulatory agencies regulate the atmospheric quality against high air pollutant concentrations to protect human health by minimizing different detrimental effects. Air quality models play a vital role in assessing the quality of the atmosphere. They can stimulate the health of explored for air quality modeling. Moreover, the ELM, GMDH, HMM, and RFP models are rarely observed in the current literature for air quality prediction. Furthermore, different hybrid models can also be explored for AQM. Therefore, considering the importance of the AQM issue throughout the world, this study critically reviews and analyzes the existing soft computing models and provides guidelines for future research directions on the exploration of many soft computing models.
This paper is divided into five main sections. The second section will present the analysis of the input selection approaches for soft computing models in air quality modeling. The third section critically evaluates the proposed machine learning models in the literature. The fourth section investigates the potential soft computing models that are assumed to be suitable for air quality modeling. The fifth section reports the conclusions of this study.

Input and Output Selection Approaches
Researchers throughout the world mostly use meteorological and air pollutant data for modeling air quality, employing soft computing techniques as inputs. In a few cases, the influence of temporal data, traffic data, satellite data, and geographic data have also been employed. However, researchers rarely used direct industrial emission data, power plant emission data, and other types of data. Table 1 provides the components of different types of data used as inputs for the soft computing techniques in AQM. It also includes the list of modeled air pollutants. To reduce the volume of the input space by selecting most dominant variables, many different techniques, including cross-correlation analysis (CCA), principal component analysis (PCA), random forest (RF), learning vector quantization (LVQ), and rough set theory (RST), have been explored in literature. Among the mentioned techniques, CCA is the simplest in analyzing the relationship of the multiple time series data. This linear data analysis approach can also be employed to select the computing approaches are summarized in Table 2. This section reviews and analyses such modeling approaches thoroughly.

Artificial Neural Networks Models
ANNs are very popular in modeling complex and non-linear engineering problems as they are capable of parallel computing abilities and adaptive to external disturbances. In general, they consist of input, hidden, and output layers. The hidden layers process the input variables employing different squashing functions and send them to the output layer. They are also flexible, less assumption dependent, and adaptive in modeling environmental issues, making them popular in AQM [86]. Naturally, they have salient advantages over traditional statistical models in air quality forecasting [87]. The tested ANN structures for air pollutants prediction include multilayer perceptron neural networks (MLP-NN), radial basis function neural networks (RBF-NN), square multilayer perceptron neural networks (SMLP-NN), ward neural networks (W-NN), pruned neural networks (P-NN), recursive neural networks (R-NN), general regression neural networks (GR-NN), graph convolutional neural network (GC-NN), and backpropagation neural networks (BP-NN) ( Table 2). This subsection provides critical reviews of neural network-based approaches in air quality modeling throughout the world.
Hrust et al. [20] have employed ANN to forecast four air pollutants (NO 2 , O 3 , CO, and PM 10 ) in Zagreb, Croatia as a function of meteorological variables and concentrations of the respective pollutants. Biancofiore et al. [23] predicted daily average PM 10 and PM 2.5 concentrations 1-day to 3-days ahead in Pescara, Italy employing R-NN using meteorological data and air pollutant (i.e., PM 10 and CO) concentrations. Radojević et al. [21] employed ward and general regression neural networks (W-NN and GR-NN) to estimate the daily average concentrations of SO 2 and NO x in Belgrade, Serbia using meteorological variables and periodic parameters (hour of the day, day of the week, and month of the year) as inputs. Pawlak et al. [22] developed an ANN model to predict the maximum hourly mean of surface O 3 concentrations for the next day in rural and urban areas in central Poland. The model used six input variables, including four forecasted meteorological parameters, recorded the O 3 concentration of the previous day, and the month. Corani [57] compared the performance of the FF-NN model with the pruned neural network (P-NN) and lazy learning for predicting O 3 and PM 10 concentrations in Milan, Italy. The author used air pollutant data (O 3 , NO, and NO 2 ) and meteorological data (pressure, temperature, wind speed, solar radiation, rain, and humidity) as the inputs for the prediction of O 3 concentrations. For PM10 concentration, the predictions used only two air pollutant data (PM 10 and SO 2 ) and two meteorological data (temperature and pressure). The lazy learning model outperformed other models in terms of correlation and mean absolute error.
Anderetta et al. [87] proposed an MLP-NN model with a Bayesian learning scheme to forecast hourly SO 2 concentration levels in the industrial area of Ravenna, Italy, emphasizing the high levels of SO 2 occurring during relatively rare episodes. The authors employed historical meteorological data and SO 2 concentrations of current and previous hours as inputs. Biancofiore et al. [88] modeled hourly O 3 concentration 1-h, 3-h, 6-h, 12-h, 24-h, and 48-h ahead using meteorological data, a photochemical parameter (measured UVA radiation), and air pollutant concentrations (O 3 and NO 2 concentration). The authors used R-NN architecture and measured data from central Italy. Some of the modeling exercises have placed emphasis on emission sources. Gualtieri et al. [89] have demonstrated the capability of ANN in predicting short-term hourly PM 10 concentrations in Brescia, Italy. The authors used hourly atmospheric pollutant (NO x , SO 2 , PM 2.5 , and PM 10 ) concentrations, meteorological parameters, and road traffic counts (municipality boundary and city center traffic volumes) as the inputs.     Mlakar et al. [90] have illustrated how MLP-NN forecasted SO 2 concentrations for the next interval of 30 min in Šoštanj, Slovenia using wind data, air temperature, actual and historical SO 2 concentrations, emissions from the thermal power plant, and the time of day. Gómez-Sanchis et al. [108] have developed MLP-NN models to estimate ambient O 3 concentrations for an agriculture training center; the concentrations of the next three days were predicted independently using the same training center in Carcaixent, Spain using surface meteorological variables and vehicle emission variables as predictors. Spellman [93] and Moustris et al. [125] developed models using relevant data from multiple locations. Spellman [93] proposed an MLP-NN for estimating summer surface O 3  Kurt et al. [114] developed an air pollution prediction system for one region of the Greater Istanbul Area, Turkey. The authors attempted three different ways to predict the concentrations of three air pollutants (SO 2 , PM 10 , and CO) for the next three days using FF-NN, where the meteorological data (general condition, day and night temperatures, wind speed and direction, pressure, and humidity) were used as inputs in the first experiment data. In the second experiment, the concentrations of the second and third days were predicted cumulatively using the previous days' model outputs. The performance of the proposed models improved in the third experiment due to the use of the effect of the day of the week as the input parameter. Kurt and Oktay [117] updated the prediction model used in their previous research [114] by considering spatial features along with the air pollutant data from ten different air quality monitoring stations in Istanbul to forecast the same air pollutants levels three days in advance. The authors concluded that the distance-based geographic model ensured better performance when compared to the non-geographic model. Therefore, the employed input variables were the meteorological data (general condition, day and night temperatures, humidity, pressure, wind speed, and direction), periodic data (day of the week and date), and air pollutant concentrations (SO 2 , PM 10 , and CO). Ozdemir and Taner [123] modeled PM 10 concentrations in Kocaeli, Turkey using ANN and multiple linear regression (MLR) techniques, and achieved better prediction accuracy with the ANN technique than the MLR. They used hourly meteorological data (air temperatures, wind speed and direction, relative humidity, and air pressure) and pollutant levels (monthly average, minimum, maximum, and standard deviation of monthly PM 10 concentrations).
Perez and Reyes [110] forecasted hourly maximum PM 10 concentrations for the next day using MLP-NN in Santiago, Chile as a function of measured concentrations of PM 10 until 7 p.m., and measured and forecasted the meteorological variables. The meteorological variables were the difference between the maximum and minimum temperature on the present day, the difference between the maximum and minimum forecasted temperature of the next day, and the forecasted meteorological potential of atmospheric pollution. Schornobay-Lui et al. [149] employed MLP-NN and non-linear autoregressive exogenous (NARX) neural networks (NARX-NN) to model short-term (daily) and medium-term (monthly) PM 10 concentrations in São Carlos, Brazil. They used only monthly average meteorological data (temperature, relative humidity, wind speed, and accumulated rainfall) for monthly PM 10 concentration prediction, whereas for predicting daily PM 10 concentration (for the next day) they used PM 10 concentrations along with meteorological data as inputs. Ha et al. [128] employed RBF-NN to predict the 8-h maximum average O 3 concentrations in the Sydney basin in New South Wales, Australia using NO x and VOC emissions, ambient temperature, location coordinates, and topography as inputs. Wahid et al. [118] estimated the 8-h maximum average of O 3 concentration in the Sydney basin employing RBF-NN. They modeled O 3 concentration as a function of NO x emission, ambient temperature, location coordinates, and topography.
ANN models are widely explored in AQM due to their satisfactory performances. However, the models can be unstable in a few cases as they are highly dependent on data and can easily fall into local optima instead of finding the global optima [213]. In addition, network training, the amount and quality of training data, and the network parameters (number of hidden layers, transfer function type, number of epochs, number of neurons, and initial weights and biases) significantly influence their performance. A conventional ANN works efficiently when the approximated function is relatively monotonic with only a few dimensions of the input features, but may not be efficient in other cases [214]. A monotonic function is either entirely non-increasing or non-decreasing and its first derivative (which need not be continuous) does not change significantly. They experience the greatest difficulty in approximating functions when the input features are not linearly separable [215]. In typical ANN models, all input variables are connected to neurons of the hidden layer that may affect the generalization ability [216].

Support Vector Machine Models
Boser et al. [217] introduced the SVM technique, which was primarily limited to solving classification problems, to efficient data analysis. With the passage of time, the researchers extended the technique to regression problems. In regression problems, the SVM technique constructs an optimal geometric hyperplane to distinguish the available data and to map them into the higher dimensional feature space by forming a separation surface through the employment of various functions, including sigmoidal, polynomial, and radial basis functions [218,219]. While they are dealing with regression problems, the SVM is known as support vector regression. However, in this paper, the term SVM is used for both the support vector machine and regression. The structural risk minimization principle adopted in SVM seeks to minimize an upper bound of the generalization error that results in better generalization performance when compared to the conventional techniques.
Like other engineering problems, the SVM has also been employed in the AQM field. Ortiz-García et al. [27] applied SVM and MLP-NN for predicting hourly O 3 concentrations at five stations in the urban area of Madrid, Spain. They explored the influence of four previous O 3 measurements (at a given station and at the neighboring stations) and the influence of meteorological variables on O 3 predictions. Luna et al. [29] employed ANN and SVM to predict O 3 concentrations in Rio de Janeiro, Brazil using available primary pollutants (CO, NO x , and O 3 ) and meteorological data (wind speed, solar irradiation, temperature, and moisture content). The authors found solar irradiation and temperature as the most dominant inputs and observed better predictability with SVM than ANN. Yang et al. [186] illustrated space-time SVM to predict hourly PM 2.5 concentrations in Beijing, China and compared the results with space-time ANN, ARIMA, and global SVM. The presented results confirmed the superiority of the space-time SVM over other techniques. The study employed historical PM 2.5 concentration data and meteorological data as inputs for the tested models. Wang et al. [175] illustrated an adaptive RBF-NN and an improved SVM for predicting RSP concentration in Mong Kok Roadside Gaseous Monitory Station, Hong Kong. The authors used six pollutant (SO 2 , NO X , NO, NO 2 , CO, and RSP) concentrations and five meteorological variables (wind speed and direction, indoor and outdoor temperatures, and solar radiation) as inputs. The authors found better generalization performance in the SVM technique than the adaptive RBF-NN technique. However, the SVM technique was very slow compared to the RBF-NN technique. Mehdipour and Memarianfard [180] predicted tropospheric O 3 concentration in Tehran, Iran using SVM as a function of daily maximum pollutant concentrations (SO 2 , NO 2 , CO, PM 2.5 , and PM 10 ) and daily average meteorological data (ambient air temperature, wind speed, and relative humidity). The authors found the RBF-kernel-function-based SVM as the best performing technique.
It is generally perceived that the SVM has higher generalization capability compared to the classical NN model, but the design of the SVM model heavily depends on the proper selection of kernel functions and their parameters. It is also reported that they can be very slow in the training phase [220].

Evolutionary Neural Network and Support Vector Machine Models
Evolutionary computation can be used to tune the weights and biases, to choose the number of hidden layers and neurons, to select the squashing functions, and to generate appropriate architectures for artificial neural networks [221][222][223][224]. They can also be employed to tune the key parameters of the SVM while solving classification or regression problems [25,209,225]. Among many evolutionary computational techniques, the genetic algorithm (GA), bat algorithm (BA), differential evolution (DE), particle swarm optimization (PSO), gravitational search algorithm (GSA), cuckoo search algorithm (CSA), sine cosine algorithm (SCA), grey wolf optimizer (GWO), and backtracking search algorithm (BSA) are well known for the aforementioned purposes.
Kapageridis and Triantafyllou [194] used GA to optimize the topology and learning parameters of time-lagged FF-NN. The model predicted a maximum 24-h moving average of PM 10 concentration for the next day at two monitoring stations in northern Greece. The authors used meteorological data (minimum humidity, maximum temperature, the difference between the maximum and minimum temperature, and average wind speed) and the maximum PM 10 concentration of the previous day as inputs. Niska et al. [196] introduced GA for selecting the inputs and designing the topology of an MLP-NN model for forecasting hourly NO 2 concentrations at a busy urban traffic station in Helsinki, Finland. Among the many available input variables, the authors selected NO 2 and O 3 concentrations, sine and cosine of the hour, sine of the weekday, temperature, wind speed and direction, solar elevation, and friction velocity as inputs for the GA-tuned MLP-NN-based on experience. Grivas and Chaloulakou [31] modeled hourly PM 10 concentrations employing GA-ANN in Athens, Greece using the combination of meteorological data, air pollutant data (PM 10 concentrations), and temporal data (hour of the day, day of the week, and seasonal index). Lu et al. [192] developed a PSO-tuned ANN approach for modeling air pollutant parameters (CO, NO x , NO 2 , and RSP) for the downtown area of Hong Kong using original pollutant data. Liu et al. [84] illustrated the PSO-SVM technique for daily PM 2.5 grade predictions in Beijing, China as a function of meteorological data (average atmospheric pressure, relative humidity, and air temperature, maximum wind speed and direction, and cumulative precipitation) and hourly average air pollutant data (PM 2.5 and PM 10 ). The PSO-SVM displayed better accuracy compared to the GA-SVM, AdaBoost, and ANN models for PM 2.5 grade prediction. Chen et al. [207] employed a PSO-driven SVM technique to predict short-term atmospheric pollutant concentration in Temple of Heaven, Beijing, China and found a faster response for PSO-SVM over GA-SVM. Li et al. [211] employed quantum-behaved particle-swarm-optimization-assisted (QPSO) SVM to predict NO 2 and PM 2.5 concentrations for the next four hours in the Haidian District of Beijing, China. The authors used hourly measurements for five air pollutants (PM 2.5 , NO 2 , CO, O 3 , and SO 2 ), and six meteorological parameters (weather condition, wind speed and direction, temperature, pressure, and relative humidity) as inputs for their developed model. The authors also found that their QPSO-SVM outperformed PSO-SVM, GA-SVM, and GS-SVM. The evolutionary algorithms can tackle the practical problems and obtain better generalization performance, but the process is computationally expensive.

Fuzzy Logic and Neuro-Fuzzy Models
The fuzzy logic model deals with the imprecisions and uncertainties of real-world applications using a set of manually extracted "if-then" rules [226][227][228]. It creates rules controlling conceivable relationships between input and output features through a fuzzification process that transforms the inferable features into the membership values. Then, it follows a defuzzification process to infer the quantifiable output according to the rules and the input data [122]. It is simple, flexible, customizable, and can handle problems with imprecise and incomplete data. However, it is tiresome to develop rules and membership functions. Besides, it makes the analysis difficult as the outputs can be interpreted in a number of ways [229]. In response, fuzzy logic model and artificial neural networks (neuro-fuzzy systems) can be combined together, where the fuzzy logic performs as an inference mechanism under cognitive uncertainty and the neural network possesses the learning, adaptation, fault-tolerance, parallelism, and generalization capabilities [97]. Similar to many engineering problems, the neuro-fuzzy systems have also been employed to model air pollutant concentrations.
Carnevale et al. [36] have suggested a neuro-fuzzy approach to identify source-receptor models for O 3 and PM 10 concentrations prediction in northern Italy by processing the simulations of a deterministic multi-phase modeling system. The authors used daily NO x and VOC emissions for O 3 modeling, whereas NH 3 , NO x , primary PM 10 , SO x , and VOC for PM 10 modeling as input variables. Morabito and Versaci [37] demonstrated a fuzzy neural identification technique to forecast short-term hydrocarbon concentrations in local air at Villa San Giovanni, Italy. The authors used measured air pollutants (CO, NO, NO 2 , O 3 , PM 10 , and SO 2 ) data, meteorological data, and vehicle movements as input variables. Hájek and Ole [104] predicted hourly average O 3 concentrations for the next hour at the Pardubice Dukla station, Czech Republic using air pollutant data (NO x , NO, NO 2 , and O 3 ), temporal data (month of the year), and meteorological data of the present hour. They employed several soft computing techniques, including ANN, SVM, and the fuzzy logic model. The presented results confirmed the superiority of the fuzzy logic model over other models. Yildirim and Bayramoglu [97] developed an adaptive neuro-fuzzy inference system (ANFIS) for the estimation of SO 2 and total suspended particulate matter (TSP) in Zonguldak, Turkey. The authors used meteorological data (wind speed, pressure, precipitation, temperature, solar radiation, and relative humidity) and air pollutant concentrations (SO 2 or TSP) of the previous day as inputs. They found temperature and air pollutant concentration as the most dominant input variables. Yeganeh et al. [119] estimated ground-level PM 2.5 concentration using ANFIS, SVM, and BP-NN techniques in Southeast Queensland, Australia. They used satellite-based, meteorological, and geographical data as inputs. The presented results confirmed the superiority of the ANFIS model over other models. Yeganeh et al. [121] predicted the monthly mean NO 2 concentration by employing ANFIS in Queensland, Australia. The authors improved the prediction accuracy of their developed model using satellite-based and traffic data in conjunction with comprehensive meteorological and geographical data as inputs.
Tanaka et al. [91] adopted a self-organizing fuzzy identification algorithm for modeling CO concentrations for the next hour in a large city in Japan. The authors used CO concentration data, meteorological data (wind speed, temperature, and sunshine), and traffic volume for the previous four hours as inputs. Chung et al. [122] employed a fuzzy inference system (FIS) to predict PM 2.5 and Pb concentrations in mid-southern Taiwan using geographic coordinates and time as inputs. Heo and Kim [95] employed fuzzy logic and neural network models (neuro-fuzzy system) consecutively to forecast the hourly maximum O 3 concentrations for the next day at four monitoring sites in Seoul, Korea. The authors used air pollutants data (CO, NO 2 , O 3 , and SO 2 ) and meteorological data (wind speed and direction, temperature, relative humidity, and solar radiation) as input variables. The prediction accuracy of their developed model was continually verified and augmented through corrective measures. Mishra et al. [113] analyzed the haze episodes and developed a neuro-fuzzy system to forecast PM 2.5 concentrations during haze episodes in the urban area of Delhi, India. The air pollutants (CO, O 3 , NO 2 , SO 2 , and PM 2.5 ) and meteorological parameters (pressure, temperature, wind speed and direction, relative humidity, visibility, and dew point temperature) were used as inputs. The presented results confirmed the superiority of the neuro-fuzzy model over ANN and MLR models. Jain and Khare [102] developed a neuro-fuzzy model for predicting hourly ambient CO concentrations at urban intersections and roadways in Delhi, India. The authors used measured hourly CO data, meteorological data (sunshine hours, wind speed and direction, humidity, temperature, pressure, cloud cover, visibility, stability class, and mixing height), and traffic data (two-wheelers, three-wheelers, diesel-powered, and gasoline-powered vehicles) as inputs. The authors adopted the correlation matrix technique, PCA, and fuzzy Delphi method to identify the most dominant input variables.

Deep Learning Models
Deep learning is a subset of machine learning techniques that are built using artificial neural networks. It deals with a sufficiently large amount of data without pre-processing. In deep learning, the ANN takes inputs and processes those using weights and biases in many hidden layers to spit out a prediction. Weights and biases are adjusted during the training processes to make a better prediction. This process can easily overcome the bottlenecks and overfitting issues related to the shallow multi-output long short-term memory model. It is employed to model both regression and classification problems [230]. The application of deep learning models for air quality modeling has been following an increasing trend.
Zhou et al. [39] proposed a multi-step-ahead air quality forecasting technique, employing a deep multi-output long short-term memory neural network model incorporating three deep learning algorithms (i.e., mini-batch gradient descent, dropout neuron, and L2 regularization) using meteorological and air quality data as inputs. The model predicted PM 2.5 , PM 10 , and NO x concentrations simultaneously in Taipei City, Taiwan. Besides, the model overcame the bottlenecks and overfitting issues related to the shallow multi-output long short-term memory model. Athira et al. [165] predicted PM 10 concentrations using deep learning technique (R-NN, long short-term memory, and gated recurrent unit) in China and found gated recurrent unit as the best performer amongst the employed techniques. The authors used meteorological data as inputs to the deep learning technique. Peng et al. [132] illustrated a spatiotemporal deep learning method for hourly average PM 2.5 concentration prediction in Beijing City, China and compared the prediction results with spatiotemporal ANN, ARIMA, and SVM. According to the presented results, the deep learning technique outperformed the other techniques. The authors used the air quality data for previous intervals as inputs. Tao et al. [167] proposed a deep learning-based short-term forecasting model to predict PM 2.5 concentrations at the US Embassy in Beijing, China.

Ensemble Models
Ensemble models combine a set of individual methods by means of aggregation rules to solve a given problem and produce a better-aggregated solution. Every single method is trained individually and produces a different and/or closer solution [231][232][233][234]. In general, these models use generated solutions of the individual techniques as inputs and produce ultimate solutions using a wide range of processing methods, including averaging, voting, boosting, bagging, and stacking [235]. Averaging (for regression problems) and voting (for classification problems) are the most popular due to their simplicity and better interpretability. These models are also known as committees of learning machines.
Valput et al. [40] modeled hourly NO 2 concentration in Madrid, Spain using ensemble neural networks as a function of meteorological data, pollutant data (hourly average NO 2 , NO, CO, and SO 2 ), and traffic data (intensity and road occupation). Maciag et al. [148] proposed a clustering-based ensemble of evolving spiking neural networks to forecast air pollutants (O 3 and PM 10 ) concentrations up to six hours in the Greater London Area. The presented results demonstrated the superiority of the proposed ensemble technique over ARIMA, MLP-NN, and singleton spiking neural network models. Bing et al. [131] modeled daily maximum O 3 concentration in the metropolitan area of Mexico City, Mexico using four individual soft computing techniques (ANN, SVM, RF, and MLR) and two ensemble techniques (linear and greedy). Among the employed soft computing techniques, the linear ensemble model outperformed other techniques. Feng et al. [158] developed a BP-NN ensemble to forecast the daily PM 2.5 concentration in Southern China using daily assimilated surface meteorological data (including air temperature, relative humidity, pressure, and winds) and daily fire pixel observation as inputs. The authors observed significant improvement in the air pollutant prediction capability of the employed ensembles technique. Wang and Song [139] designed a deep spatial-temporal ensemble model to improve air quality pollutant (PM 2.5 ) prediction in 35 stations in Beijing, China using historical air pollutant data (CO, NO 2 , SO 2 , O 3 , PM 10, and PM 2.5 ) and meteorological data as inputs. The authors found their proposed model outperformed other models including linear regression and a deep neural network.

Hybrid and Other Models
A combination of multiple techniques is known as a hybrid modeling technique and these are very popular in modeling air pollutant concentrations throughout the world. The authors of [41] employed MLP-NN to forecast the daily mean PM 10 and PM 2.5 concentrations for the next day in Greece and Finland where the input features were selected through the PCA method.
Sousa et al. [42] compared the effectiveness of PCA-based FF-NN with MLR and original data-based FF-NN to predict hourly O 3 concentration for the next day in Oporto, Portugal. The authors considered hourly O 3 , NO, and NO 2 concentrations and hourly mean temperature, wind velocity, and relative humidity as the inputs to the predictive models. Cortina-Januchs et al. [81] modeled hourly average PM 10 concentration for the next day in Salamanca, Mexico using clustering-based ANN as a function of historical pollutant data (PM10 concentration) and meteorological data of the current day. They used FCM and FKM clustering approaches to process the input data. Lin et al. [82] predicted three air pollutants (PM 10 , NO x , and NO 2 ) using immune-algorithm-tuned SVM (IA-SVM) in Nantou station, Taiwan from their historical measured values. The authors preprocessed the raw data employing logarithmic, scaling, and normalization techniques. They found logarithmic IA-SVM was the most effective prediction technique in terms of overall MAPE values. The combination of ANFIS and a weighted extreme learning machine (WELM) model outperformed neuro-genetic, ANFIS and other models in predicting air pollutant (CO, No, PM 2.5 , and PM 10 ) concentration in Datong, Taiwan using measured time series air pollutant data [193]. Cheng et al. [208] employed wavelet decomposition to mine useful information from the measured weather data and used ARIMA, ANN, and SVM to predict PM 2.5 in five Chinese cities. The authors confirmed the superiority of the hybrid approaches over non-hybrid approaches through exhaustive simulations. Bai et al. [183] demonstrated wavelet-decomposition-based BP-NN technique to estimate daily air pollutant (PM 10 , SO 2 , and NO 2 ) concentration in Nan'an District of Chongqing, China. The authors used the wavelet coefficients of the previously measured air pollutant concentrations and local meteorological data as inputs. Apart from the reviewed single and hybrid AQM techniques, other modeling approaches including GMDH, HMM, and RFP techniques, have also been explored to model air pollutant concentration. For instance, the authors of [46] employed a GMDH approach to model O 3 concentration in Rub' Al Khali, Saudi Arabia using meteorological data (wind speed, temperature, pressure, relative humidity) and air pollutant data (NO and NO 2 ). Conversely, Zhang et al. [47] illustrated an HMM with Gamma distribution to predict O 3 level in California and Texas, USA as a function of measured air pollutant data (O 3 , NO, and NO 2 ) and measured and predicted meteorological data (temperature, wind speed, relative humidity, and solar radiation). The authors used the wavelet decomposition technique to reduce the input data size by extracting useful features. The authors of [48] proposed an RFP technique to model NO 2 concentration in Wrocław, Poland by using hourly meteorological data (air temperature, wind speed and direction, relative humidity, and atmospheric pressure), hourly traffic counts, and temporal data (month of the year and day of the week).

Generalized Overview
The authors of [236] and [237] proved that the single-hidden-layer feed forward neural networks are the universal approximators of continuous target functions. Similarly, Hammer and Gersmann [238] illustrated the universal function approximation capabilities of the support vector machine. Researchers investigated different variations of the fuzzy logic systems and demonstrated their capabilities as universal function approximators. Ying [239], proved that Takagi-Sugeno (TS) fuzzy systems are the universal approximators for a multi-input and single-output problem. Therefore, the mentioned soft computing models have become theoretically valid candidates for modeling air pollutants. The models attempt to build relations between pollutant levels with the information related to other relevant pollutants, pollutant characteristics, meteorological conditions and pattern, and terrain types and characteristics. If multiple models are used for same air quality modeling problems, it is expected that the model performance will mainly depend on the selection of appropriate inputs, the size of datasets, the run-time, the hyper-parameters of the models, and the inherent model suitability for the specific problem in multi-dimensional space.
Therefore, researchers have started to model air pollutants using ANN, SVM, and fuzzy logic approaches from the beginning. With the passage of time, it has been observed that evolutionary optimization techniques are playing vital roles in achieving better generalization performance by tuning the key parameters of the primarily employed soft computing approaches in AQM. Likewise, researchers moved from fuzzy logic models to neuro-fuzzy and ANFIS models that enhance overall performance. The ensemble models exploit the advantages of the individual models and cutoff the drawbacks (if there are any). Besides, the deep learning methods are better candidates than the shallow machine learning techniques for handling massive and verities of information received from the advanced measurement infrastructures and sensors. Hence, the ensemble models perform better than the individual models and the deep learning models have been exhibiting superiority, especially for large datasets.
In general, environmental data are complex to model as they are functions of several variables that share a complicated and non-linear relationship with each other [240]. The soft computing models that address the limitations of other models and ensure higher prediction accuracies and reliability for a specific pollutant in specific air quality episode should not be considered as effective, accurate, and reliable in all cases. The discussed notion motivates the research for the exploration and exploitation of more techniques in the field of AQM that can vary based on area, pollutant type, and data availability.

Potential Soft Computing Models and Approaches
Among many potential techniques, different variations of artificial neural networks, evolutionary fuzzy and neuro-fuzzy models, ensemble and hybrid models, and knowledge-based models should be further explored. Besides, there is a continuous need for the development of a universal model, as most of the explored models are either site-dependent or pollutant dependent. This section discusses future research directions and potential soft computing models that can be investigated in air quality modeling throughout the world.

Variations of ANN Models
As can be observed from Section 3, ANN approaches were widely explored in AQM and in most cases MLP-NN, BP-NN, RBF-NN, or R-NN were employed. Other available variations of ANN (GR-NN, GC-NN, P-NN, W-NN, and others) models that successfully demonstrated their capabilities in modeling complex and non-linear problems in other engineering fields have not been explored significantly [11]. Many of them (extreme learning machine, multitasking, probabilistic, time delay, modular, and other hybrid neural networks) are rarely explored. Besides, deep neural network models received great attention in modeling PM 2.5 concentrations, but other air pollutants have not been modeled significantly. Therefore, such unexplored and rarely explored variations of the neural networks can be investigated in future works for modeling all types of air pollutant concentrations.

Evolutionary Fuzzy and Neuro-Fuzzy Models
Fuzzy systems are the proven tools for many applications for modeling complex and non-linear problems. However, the lack of learning capabilities in the fuzzy systems has encouraged researchers to augment their capabilities by hybridizing them with the EO techniques [241][242][243][244]. Among the many EO techniques, GA, GWO, CSA, SCA, and PSO are widely used and well-known global search optimization approaches with the ability to explore a large search space for suitable solutions [245][246][247].
Besides, the type-2 fuzzy set is capable of handling more uncertainties than the type-1 fuzzy set that has been successfully applied in a wide range of areas [248][249][250]. Therefore, considering the potentiality of the fuzzy logic approaches, these can be explored in the field of AQM.

Group Method Data Handling Models and Functional Network Models
Long-term research in the field of neural networks and advanced statistical methods has contributed to the evolution of an abductory induction mechanism that is known as GMDH [251]. It automatically synthesizes abductive networks from a database of inputs and outputs with complex and nonlinear relationships. Other extensions of the neural network models include the functional network models (FNM) [252]. This determines the structure of a network and data using domain knowledge and estimates unknown neuron functions. Both GMDH and FNM were explored in many relevant applications [253][254][255]. These rarely explored extensions of the neural networks can be further investigated in AQM.

Case-Based Reasoning and Knowledge-Based Models
Case-based reasoning solves new problems by recalling the experiences and solutions of similar past problems [256]. It deals with the given problems following four steps, namely retrieve, reuse, revise, and retain [257]. Another soft computing technique, the knowledge-based system, attempts to solve problems by giving advice in a domain and utilizing the knowledge provided by a human expert [258]. Researchers have employed both techniques to solve many complex problems [259][260][261]. These techniques can be investigated in AQM, as none of them have yet been explored.

Ensemble and Hybrid Models
As discussed earlier, ensemble models employ multiple learning techniques in parallel and combine their outputs to produce a better generalization performance. In a real-world situation, they aim to manage the strengths and weaknesses of each model and end up with the best possible solutions [262]. Recently, such models received huge momentum in modeling AQM, but this was limited to a few specific pollutants (mainly PM 2.5 ). Researchers should invest more time into these attractive tools as they will become some of the most prominent tools for AQM in the future.

Development of Universal Models
Most of the discussed models are either site dependent or pollutant dependent. There is no guarantee that a specific model developed for a specific site will be stable and reliable for another location with different meteorological conditions. Therefore, there is always a need for the development of a universal model for AQM. Besides, the comparison between the site-specific models could be an attractive option for future research as it aids in developing site characterizations. Such research may enable the creation of guidelines for site-specific model development.

Appropriate Input Selection Methods
As discussed in Section 2, several approaches have been reported to reduce the input space by selecting the most dominant input variables. In addition, most of the approaches selected air pollutant and meteorological data as inputs. A few of the considered other types of data, including temporal, traffic, geographical, and sustainable data. Therefore, the present authors believe that the comparison of such input selection methods considering all available input data types could be an attractive field of research in AQM. Besides, the selection of proper decomposition components for the reduction of data dimensionality could be considered as another potential research direction, as the inclusion of many components in input space may result in model complexity and the accumulation of errors. Moreover, other available data pre-processing and feature extraction techniques employed for relevant fields could also be explored.

Conclusions
Soft computing models have become very popular in air quality modeling as they can efficiently model the complexity and non-linearity associated with air quality data. This article critically reviewed and discussed existing soft computing modeling approaches. Among the many available soft computing techniques, the artificial neural networks with variations of structures and the hybrid modeling approaches combining several techniques were widely explored in predicting air pollutant concentrations throughout the world. Other approaches, including support vector machines, evolutionary artificial neural networks and support vector machines, fuzzy logic, and neuro-fuzzy systems, have also been used in air quality modeling for several years. Recently, deep learning and ensemble models have received huge momentum in modeling air pollutant concentrations due to their wide range of advantages over other available techniques. Additionally, this research reviewed and listed all possible input variables for air quality modeling. It also discussed several input selection processes, including cross-correlation analysis, principal component analysis, random forest, learning vector quantization, rough set theory, and wavelet decomposition techniques. Besides, this article sheds light on several data recovery approaches for missing data, including linear interpolation, multivariate imputation by chained equations, and expectation-maximization imputation methods.
Finally, it proposed many advanced, reliable, and self-organizing soft computing models that are rarely explored and/or not explored in the field of air quality modeling. For instance, functional neural network models, variations of neural network models, evolutionary fuzzy and neuro-fuzzy systems, type-2 fuzzy logic models, group method data handling, case-based reasoning, ensemble, and hybrid models, and knowledge-based systems have the immense potential for modeling air pollutant concentrations. Moreover, the modelers can compare the effectiveness of several input selection processes to find the most suitable one for air quality modeling. Furthermore, they can attempt to build universal models instead of developing site-specific and pollutant-specific models. The authors believe that the findings of this review article will help researchers and decision-makers in determining the suitability and appropriateness of a particular model for a specific modeling context.

Conflicts of Interest:
The authors declare no conflict of interest.