1. Introduction
Air pollution is defined as the presence of elevated concentrations of particles or pollutants in the atmosphere that pose a threat to the health of living organisms or cause physical harm [
1]. Specifically, it can be defined as the presence of toxic substances or compounds in the air that exceeds the acceptable threshold set by health organizations. Particulate matter (hereafter referred to as PM), a combination of solid and liquid particles suspended in the air, is a major contributor to air pollution. PM is measured in micrometers and classified as PM
10 for particles between 2.5 and 10 μm and PM
2.5 for particles smaller than 2.5 μm [
2]. Human-caused air pollution is primarily caused by fuel consumption, power generation, and the by-products of natural gas, particularly emissions from coal-fired power plants and chemical production [
3]. Other sources of PM include vehicle fuel, construction sites, roadworks, factories, and quarries [
4].
Enduring exposure to air pollution has been shown to have harmful effects on cardiovascular health [
5], particularly increasing the likelihood of heart failure in older adults or those with pre-existing heart disease [
6]. At-risk individuals are advised to minimize their time spent in areas with elevated air pollution levels, such as industrial zones and areas with heavy traffic. During periods of high air pollution, it is recommended to prioritize indoor activities over outdoor activities [
7]. To exercise outdoors safely, choose areas like forests, away from traffic. It is important to address air quality issues for the world’s sustainability by taking action when air quality levels are high [
8]. Artificial intelligence is a valuable technology that can help solve air pollution problems and is used in various fields such as health, autonomous vehicles, robotics, e-commerce, and security [
9,
10].
In the literature, various statistical, deterministic, and traditional machine learning methods have been employed for the precise forecasting of air pollutants [
11,
12,
13]. Predicting air quality presents challenges due to its intricate spatial and temporal dimensions. Deep learning models have demonstrated superior performance in addressing complex and nonlinear problems compared to other methods [
14]. Consequently, the application of deep learning models in air quality prediction research has increased significantly. Hybrid iterations of deep learning models have also been designed to improve the effectiveness of these models.
By integrating the practical components of various models into a unified framework, enhanced predictive outcomes were achieved. The present study employed a hybrid model that integrates the strengths of Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN) models to forecast PM2.5 levels in India, Milan, and Frankfurt. To the best of our knowledge, this study represents one of the first efforts to use this comprehensive model for PM2.5 forecasting. Although several studies have explored CNN-LSTM architectures for air quality forecasting, our study introduces a novel CNN-RNN hybrid model that differs in architectural simplicity and computational efficiency. Moreover, we apply this model to three real-world datasets from geographically and climatically diverse regions. Unlike prior studies, we provide a comprehensive performance comparison with a wide range of models, highlighting the superior accuracy and generalizability of our approach. To evaluate the effectiveness of this model, we used a variety of deep learning and machine learning models. The CNN-RNN model was compared to machine learning techniques, including k-Nearest Neighbors (kNN), Random Forest (RF), Support Vector Machine (SVM), Decision Trees (DT), and Linear Regression (LR), as well as deep learning methods such as CNN, RNN, Long-Short Term Memory (LSTM), and Multilayer Perceptron (MLP). The results show that the performance of the GA-LSTM model is superior to that of the aforementioned methods.
While various deep learning models have been applied to PM2.5 prediction in the literature, this study offers a novel and practical approach by proposing a lightweight CNN-RNN hybrid architecture. Unlike LSTM-based hybrid models, the use of standard RNN cells significantly reduces computational overhead while maintaining high accuracy. Furthermore, the application and evaluation of this model on three geographically diverse datasets demonstrate its adaptability and effectiveness. The inclusion of high-frequency sensor data from Frankfurt also provides new insights into short-term forecasting under dynamic conditions.
The main contributions of this research paper can be outlined as follows:
- (i)
This study is the first to integrate and comparatively analyze PM2.5 datasets from India, Milan, and Frankfurt using a deep learning-based hybrid model. These datasets vary in temporal resolution, environmental characteristics, and geographic location, offering a unique opportunity to assess the model’s robustness and adaptability across diverse real-world conditions.
- (ii)
In this paper, a novel hybrid deep learning model incorporating CNNs and RNNs is introduced.
- (iii)
Through comparative analysis with various machine learning algorithms, the effectiveness of the model including, kNN, RF, SVM, DT, LR, MLP, and deep learning networks such as CNNs, RNNs, and LSTM estimated.
- (iv)
In addition, a comprehensive pollution dataset from three different regions of India, Milan, and Frankfurt was adopted.
2. Related Works
Air quality values obtained from sensors can be analyzed using artificial intelligence (AI). It also provides the extraction of relationships and patterns between these values. Air pollution values can be monitored instantly through IoT technologies [
15]. Predicting air pollution values using AI technologies is crucial in alerting individuals at risk and taking preventive measures against pollution [
16]. In this section, a review of studies in the literature on air pollution prediction is presented.
According to [
17], a comparative application of iterative neural network architectures for PM
10 prediction was presented. Experimental studies have been conducted with various architectures and parameters for RNN, GRU, and LSTM models. Investigation showed that RNN, LSTM, and GRU had Mean Squared Error (MSE) values of 0.00025, 0.00048, and 0.00023, respectively. Ref. [
18] developed the CNN-LSTM hybrid model consisting of CNN and LSTM deep learning models for effective PM
2.5 predictions. The Mean Absolute Error (MAE), Root Mean Square Error (RMSE), Pearson correlation coefficient, and Index of Agreement (IA) were utilized in their experimental study, and comparisons were made with the DT, RF, MLP, CNN, and LSTM methods. According to the results obtained, the CNN-LSTM model achieved the highest success rate. Similarly, ref. [
19] established a hybrid deep-learning model for air quality prediction. CNN was used for spatial feature extraction in the study. LSTM was used to investigate spatiotemporal dependencies; the investigation’s outcomes showed that the anticipated model has good predictive performance. Furthermore, ref. [
20] developed a hybrid deep-learning model for air quality prediction. Transfer learning and LSTM were used in the developed model. The LSTM has been used to learn about long-term dependencies. Transfer learning was employed to transfer short-term temporal information to long-term temporal information. The developed model is compared with ARIMA, SVR, GBDT, RNN, GRU, LSTM, and CNN-LSTM. Empirical findings demonstrated that the established model was superior to the other models examined. Ref. [
21] anticipated CNN-LSTM hybrid model for PM
2.5 predictions. In the study, nonlinear correlations with space-time correlation analyses were also examined. The projected model is associated with MLP and LSTM. According to the experiment results, the proposed model has a ratio of 5.21, MAE, and 0.08 Mean Absolute Percentage Error (MAPE). According to [
22], a new model called the weighed long short-term memory extended model (WLSTME) was proposed to predict daily PM
2.5. While creating this model, many temporal and spatial parameters were considered. MLP and LSTM models were also utilized. The findings showed that the WLSTME resulted in the best outcome. Ref. [
23] developed an LSTM and SVM-based model for the estimation of the air quality index value. Using the developed model, the predicted values for PM
2.5, NO
2, SO
2, CO, and ozone were obtained. The outcomes showed that the model had R
2 values of 0.821, 0.092, −0.005, 0.080, and 0.029 in predicting PM
2.5, NO
2, SO
2, CO, and ozone, respectively. Ref. [
24] predicted PM
2.5 using data from meteorological monitoring stations in China. They used XGBoost, RF, and fully connected neural network models in their study. While making these predictions, they also revealed the effects of features such as wind speed, wind direction, and rainfall on PM
2.5 predictions. Ref. [
25] established a hybrid model for hourly PM
2.5 prediction exploitation using CNN and LSTM. The results obtained with the developed model were used to optimize the results of Raspberry Pi devices. The study findings revealed that Raspberry Pi devices have lower latency.
Ref. [
26] developed an Artificial Neural Network (ANN)-based model to predict air pollutant concentrations based on air traffic at Suvarnabhumi International Airport. The dataset used was ADS-B flight data collected daily between 21:00 and 00:00 throughout 2023, and air pollution measurements (CO, NO
2, PM
2.5, and PM
10) obtained from two monitoring points at the airport. Flight paths were clustered using K-means and Gaussian Mixture Model (GMM) algorithms, and then these patterns and other flight parameters were fed into the ANN model. The model was trained separately for each pollutant, and its performance was evaluated using metrics such as MAE, MSE, and R
2. The lowest MSE values obtained were reported as 51.76 for CO, 53.97 for PM
2.5, 124.25 for PM
10, and 139.67 for NO
2.
Ref. [
27] conducted a comprehensive analysis using machine learning-based models to assess the health risks associated with air pollution in the Tuticorin district of Tamil Nadu, India. Four different machine learning algorithms (Random Forest, Decision Tree, Gradient Boosting, and AdaBoost) were trained by integrating real-time air quality data (pollutants such as PM
2.5, PM
10, NO
2, SO
2, CO, O
3, and NH
3) with meteorological data for 2021. The accuracy of these models was evaluated using statistical metrics such as MAE, MSE, RMSE, and R
2. According to the results, the Random Forest model demonstrated the highest accuracy, achieving 93.3% accuracy for PM
2.5 and 92.7% accuracy for PM
10.
Ref. [
28] developed an ANN model using environmental and meteorological data from Nakhon Si Thammarat Province, Thailand, to predict PM
2.5 concentrations. Temperature, relative humidity, wind speed, wind direction, precipitation, CO, NO
2, O
3, SO
2, and PM
10 were used as input variables. Model training and testing were performed on data from 2019 to 2021, and different combinations of layers and neurons were tested in the ANN architecture. Model performance was evaluated using metrics such as MAE, MSE, RMSE, and R
2; the best result was obtained with 0.78 R
2.
Ref. [
29] developed an integrated machine learning model with the Random Forest algorithm using meteorological variables and air pollution data to predict PM
2.5 concentrations in the Bangkok Metropolitan Region. Input parameters such as temperature, relative humidity, wind speed, wind direction, CO, NO
2, O
3, SO
2, and PM
10 collected from ten different stations in Bangkok between 2018 and 2020 were used. The model identified the most influential variables by feature importance ranking, and PM
10, NO
2, and CO were found to have a high impact on PM
2.5. Model performance was evaluated using accuracy and error metrics, with 0.96 R
2 and 2.38 RMSE.
3. Deep Learning-Based Air Quality Prediction
This section introduces the proposed CNN-RNN hybrid model and presents the methodology used for PM2.5 prediction. The model was evaluated using three real-world datasets from India, Milan, and Frankfurt, selected to represent varying environmental and temporal characteristics. A comprehensive comparison was conducted with both classical machine learning models and deep learning architectures. Model performance was assessed using widely accepted error metrics such as MSE, RMSE, MAE, and R2. The detailed modeling workflow, including data pre-processing, model architecture, and evaluation steps, is explained in the following subsections.
3.1. Dataset
This study utilized air quality datasets containing hourly PM
2.5 values from India (
https://www.kaggle.com/datasets/fedesoriano/air-quality-data-in-india, accessed on 15 March 2023), Milan (
https://www.kaggle.com/datasets/wiseair/air-quality-in-milan-summer-2020, accessed on 15 March 2023) and Frankfurt (
https://www.kaggle.com/datasets/avibagul80/air-quality-dataset, accessed on 15 March 2023). The cities of India, Milan, and Frankfurt were chosen to represent three distinct air quality contexts. India with high PM
2.5 concentration variability, Milan as a moderately polluted European city with recent urban sensor deployments, and Frankfurt with granular, high-resolution environmental data. This diversity enables the evaluation of the model’s adaptability and predictive consistency across differing pollution profiles and data characteristics. The Indian dataset comprises 36,192 PM
2.5 observation values, including time information and PM
2.5 attributes.
Figure 1 shows the geographic distribution of the study location.
The map in
Figure 1 highlights the spatial diversity of the datasets used in the study, encompassing South Asia and two distinct European regions with distinct environmental, climatic, and urban characteristics.
Figure 2 illustrates the temporal variations in PM
2.5 concentrations within the Indian dataset.
A sample of the India dataset is shown in
Table 1.
Statistical information about the attributes in the India dataset is shown in
Table 2.
The Milan dataset comprises 1399 hourly PM2.5 values recorded between 24 July 2020, and 20 September 2020. The Milan dataset used the PM2.5 measurements on Wiseair IoT environmental monitoring devices (model Clarity Node-S) using the principle of laser scattering by particulates in the air with a photometer. The range of these sensors is 0–1000 µg/m3 with the resolution of 0.1 µg/m3 and with the accuracy of 10 percent of the value taken as a norm under expected conditions. Wiseair calibrated and validated units in liaison with the Regional Environmental Protection Agency of Lombardy (ARPA Lombardia), where the sensor data was subsequently compared at regular intervals against monitoring stations of reference-grade.
Figure 3 illustrates the temporal variations in PM
2.5 concentrations within the Milan dataset.
Table 3 shows a sample for the Milan dataset.
Statistical information about the attributes in the Milan dataset is shown in
Table 4.
The Frankfurt dataset utilized data from 14 distinct sensors to compile weather information, including time, temperature, pressure, humidity, wind speed, and PM
2.5 values. The dataset raised in Frankfurt was gathered via the OpenSenseMap platform, and the instruments were Luftdaten SDS011 optical particulate sensors (Nova Fitness Co., Ltd., Qingdao, China). These sensors rely on laser scattering technology and have an ideal resolution range of 0.3 µg/m
3. The nominal range exhibits a deviation of 0.3 µg/m
3 within the 0 to 999 range. It has been reported that the accuracy of the PM
2.5 measurement under normal atmospheric conditions is calculated within 15 percent. Sensors in Riederwald region were regularly verified through co-location with official reference stations of the Hessian Agency for Nature Conservation, Environment and Geology (HLNUG). This dataset has also been utilized in peer-reviewed research. For example, Hua et al. [
30] applied the Frankfurt air quality dataset to evaluate various imputation techniques for missing time series data, demonstrating its applicability in advanced data processing scenarios. Specifically, this study focused on data collected from sensors located in the Riederwald area of Frankfurt. The dataset for the Riederwald region comprises 236,062 measurement values recorded at approximately 3 min intervals between 31 December 2018 and 28 February 2020.
Figure 4 illustrates the PM
2.5 concentrations in Frankfurt over time.
Table 5 shows a sample for the Frankfurt dataset.
Statistical information about the attributes in the Frankfurt dataset is shown in
Table 6.
Although the datasets used in this study were obtained from publicly available Kaggle repositories, they originate from recognized and reliable sources. The Indian dataset is based on measurements reported by the Central Pollution Control Board (CPCB). The Milan dataset was collected using IoT sensors deployed by Wiseair (Milano, Italy), a company working in coordination with environmental institutions in Italy. The Frankfurt dataset originates from the OpenSenseMap initiative, utilizing Luftdaten sensors that are frequently referenced in environmental data research. Prior to model training, all datasets underwent quality control steps, including the imputation of missing values, removal of outliers, and normalization, to ensure reliability and consistency.
3.2. Baseline Models
- ▪
The Decision Tree (DT) algorithm is employed for estimating discrete-valued target functions, with the learned parameters represented by a decision tree [
31]. The decision tree exhibits a hierarchical structure, wherein each node represents an attribute, each branch represents the outcome of an operation, and each leaf represents a class label [
32,
33].
- ▪
The k-Nearest Neighbors (kNN) algorithm, as described by [
34,
35], makes predictions for a given sample by considering the distances between the sample and its neighboring data points. This algorithm classifies data based on its proximity to previously observed data points. The kNN method involves selecting a value for k, identifying data points near k, determining the distribution of classes among these neighboring points, and assigning the sample to the most prevalent class. However, a drawback of the kNN algorithm is its requirement for significant memory resources to store the entire dataset for prediction purposes.
- ▪
Linear regression (LR) is a statistical technique utilized to elucidate the association between two or more variables. Within LR, variables are categorized as either dependent or independent. Notably, the dependent variable in LR is continuous [
36], while independent variables may be either categorical or continuous [
37].
- ▪
Random Forest (RF) is a machine learning algorithm rooted in deciding trees. RF constructs an ensemble of decision trees to form a forest, with the final prediction being a combination of the individual estimations from each tree [
38]. Therefore, RF is a collaborative learning method.
- ▪
Support Vector Machines (SVMs) are utilized in the analysis of extensive datasets to discern novel patterns. SVM constructs a dataset to discern a hyperplane that effectively segregates it into two distinct classes [
39]. SVM, as a machine learning technique, leverages the training dataset to ascertain an optimal hyperplane within an N-dimensional space, with an emphasis on maximizing the separation distance between classified data points [
40].
- ▪
The Multilayer Perceptron (MLP) is a type of neural network characterized by its multiple layers, including an input layer, one or more hidden layers, and an output layer. The hidden layers within an MLP can contain varying numbers of nodes, with data from the input layer being forwarded to these hidden layers for processing. Ultimately, the processed data from the hidden layers is propagated to the output layer to produce the final output [
41].
- ▪
The Convolutional Neural Network (CNN) is structured with multiple layers that facilitate the connection between input and output data. This network typically executes operations such as object scanning, object clustering, and object similarity detection within images or videos. Following the creation of input data, filtering processes are conducted across diverse layers. These filters are represented through matrix calculations, resulting in the generation of a singular matrix to produce the output data [
40,
42]. Convolutional layers analyze data, and a pooling layer attempts to reduce it. The compatibility of the image with the parameters is tested in line with this reduced weight [
43]. The flattening layer is the layer where the data required to create the output data is prepared. The fully connected layer is used for classification.
- ▪
The Recurrent Neural Network (RNN) model is a highly effective deep learning framework designed for processing sequential data inputs, including time series. RNN operates by processing input data at specific time intervals, generating results for each time step [
44]. Notably, RNNs exhibit a dependency on both current input and previously computed output values, enabling them to effectively model non-independent arrays of elements in the input and/or output. RNN has demonstrated considerable success in handling sequential data types such as text, speech, voice, and financial data [
45].
- ▪
The Long Short-Term Memory (LSTM) model, a type of Recurrent Neural Network, incorporates input, output, and forget gates that regulate the flow of information within the memory unit [
46]. The input gate manages the influx of new data into the memory unit, while the forget gate facilitates the removal of outdated information and the retention of new data. The output gate is responsible for controlling the information deemed as the output of a cell [
47].
3.3. Data Pre-Processing
In this study, a time series dataset was utilized, with data organized according to a specific time index. To facilitate analysis, the time series data was converted into supervised learning problems using the sliding window method. This method involves presenting observation data equal to the designated window size as input, with the output being the predicted observation value at the subsequent time step. The selection of a sliding window size of 3 was based on experimental investigations conducted in the study. The sliding window method is shown in
Figure 5.
According to the data shown in
Figure 5, the observation data at time steps t1, t2, and t3 are used as inputs, and the observation data at time step t4 are used as the output in the context of the sliding window. Subsequently, the data were normalized by the Min Max Scaler method. After normalization, the dataset was divided into training, testing, and validation subsets. These specific ratios were chosen for experimental purposes, as optimal prediction accuracy was obtained from a training set comprising 67% of the data and a test set comprising 33% of the data. Additionally, 10% of the training data was reserved for validation purposes. The validation data was used to optimize the model’s hyperparameters. Model parameters were optimized using GridSearchCV from the Scikit-learning library to enhance the predictive performance of the models.
3.4. Experimental Setup
All models were trained and evaluated using the same data splits. The CNN-RNN model was implemented using TensorFlow version 2.11. The final hyperparameter settings were as follows: learning rate = 0.001, batch size = 32, number of epochs = 100, optimizer = Adam. The RNN used a single recurrent layer with 64 units, and the CNN had a single 1D convolutional layer with 64 filters and a kernel size of 3.
The baseline models (Linear Regression, KNN, Decision Tree, Random Forest, SVR, MLP, CNN) were implemented using Scikit-learn. For each model, default settings were used unless specified otherwise. Performance was evaluated using three standard regression metrics such as RMSE, MAE, and R2. These metrics were computed on the test set for all cities. To increase robustness, the experiments were repeated five times, and the average performance is reported.
3.5. Proposed CNN-RNN Hybrid Model
The hybrid prediction model proposed in this study takes hourly PM
2.5 values as input and generates predictions for PM
2.5 levels in the subsequent hour. The schematic representation of the proposed model can be observed in
Figure 6.
The Convolutional Neural Network (CNN) demonstrates effectiveness in automatically extracting features and learning from one-dimensional series data, specifically univariate time series. This study proposes a hybrid model that incorporates a CNN for interpreting sub-sequences fed into a Recurrent Neural Network (RNN). CNN is utilized to extract features from the input data and convert the univariate input into multidimensional arrays through convolution. These multidimensional datasets are subsequently passed to the RNN for prediction purposes.
In the model under consideration, a one-dimensional Convolutional Neural Network (CNN) is employed with a kernel size of 1 and 64 filters to analyze substrings. Additionally, a max pooling layer is utilized to process the input features, while a dense layer is employed to interpret the features extracted by the convolutional layer of the model. Given that the convolutional and pooling layers are three-dimensional, a flatten layer is incorporated to convert the feature maps into a one-dimensional vector for input into the RNN.
The proposed CNN-RNN hybrid model was implemented in Python 3.9 using the Keras API with TensorFlow 2.11 as the backend. Hyperparameter optimization was performed using GridSearchCV from the Scikit-learn library version 1.2.1 to identify the best configuration. The tuned hyperparameters included batch size, number of epochs, optimizer type, learning rate, number of convolution filters, kernel size, dropout rate, and RNN units. The final model configuration included a batch size of 64, 100 training epochs, the Adam optimizer with a learning rate of 0.001, 64 convolutional filters with a kernel size of 1, a dropout rate of 0.2, and 64 units in the recurrent layer. These parameters provided the best validation performance across the three datasets.
To mitigate overfitting, especially given the relatively small Milan dataset, several regularization techniques were employed. We used early stopping based on validation loss with a patience of 10 epochs to halt training once the model began to overfit. Additionally, a dropout layer with a rate of 0.2 was included after the recurrent layer. Training was monitored using both training and validation loss curves, and the model’s performance on the test set confirmed generalizability. These precautions were essential for the Milan dataset due to its limited sample size.
4. Experimental Results
In this research, the hybrid CNN-RNN prediction model was rigorously evaluated against several other machine learning algorithms, including DT, kNN, LR, RF, SVM, MLP, CNN, RNN, and LSTM. The comparative analysis of the outcomes was conducted using metrics such as MSE, RMSE, MAE, and R
2. The experimental results for the India dataset are presented in
Table 7 and
Figure 7.
The experimental findings regarding PM2.5 prediction indicated that the proposed model outperformed the other models in comparison. Following the proposed model, the sequence of successful models was LSTM, RNN, MLP, SVM, MLP, CNN, LR, RF, kNN, and kNN, respectively.
Table 8 and
Figure 8 show comparative experimental results for the Milan dataset.
The experimental findings regarding PM2.5 prediction indicated that the proposed model outperformed the other models in comparison. Following the proposed model, the sequence of successful models was LSTM, RNN, MLP, SVM, MLP, CNN, LR, RF, kNN, and kNN, respectively.
Table 9 and
Figure 9 showed the comparative experimental results for the Frankfurt dataset.
The experimental findings regarding PM2.5 prediction indicated that the proposed model outperformed the other models in comparison. Following the proposed model, the sequence of successful models was LSTM, RNN, MLP, SVM, MLP, CNN, LR, RF, kNN, and kNN, respectively.
The prediction results of the proposed model for the India, Milan, and Frankfurt datasets are shown in
Figure 10.
Figure 10 illustrates that the proposed model effectively anticipated abrupt fluctuations in PM
2.5 concentrations. The presence of outliers within the dataset has a significant impact on the accuracy of predictions, underscoring the necessity of incorporating historical observations in the modeling process. Prediction inaccuracies directly influence the outcomes of performance evaluations. The superior performance of the proposed model, as evidenced by lower Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and Mean Absolute Error (MAE) values compared to alternative models, can be attributed to its adept handling of outliers and dataset fluctuations. The elevated R-squared value of the proposed model can be attributed to its superior fit to the dataset relative to other models.
5. Discussion
This section provides a detailed discussion of the methods employed in this study, focusing on the results obtained and their relevance to addressing the PM2.5 issue. The analysis consisted of comparing traditional machine learning methods with deep learning models, and ultimately, a hybrid CNN-RNN model was designed and discussed.
The predictive performance of the DT compared to kNN may be attributed to the ability of the DT to capture feature interactions, which is a feature that kNN relies on for distance-based neighborhood determination. The influence of feature scale on kNN distance measurement further emphasizes the importance of feature interaction in predictive modeling. In contrast, decision trees facilitate automatic feature interaction and enhance their predictive capabilities. Similarly, the higher prediction accuracy of LR compared to DT may indicate a dataset with low noise and a large sample size. These factors contribute to the effectiveness of LR in modeling complex relationships. Due to the lack of categorical features in the dataset, LR is superior to decision trees. RF is effective when features have different scales. Since the features in the dataset are numerical, SVMs exhibit superior predictive performance compared to RF.
When considering the outcomes in the context of deep learning methodologies, the presence of numerical attributes within the dataset can elucidate the superior performance of MLP over RF. If categorical attributes were present in the dataset, those with higher significance would be prioritized during training, potentially leading to neural saturation and hindering the efficiency of the training phase. CNN has demonstrated efficacy in feature extraction from image data, utilizing vectors and matrices as input and featuring partially connected layers where nodes may not be universally linked. Compared to other neural network architectures, MLP have interconnected nodes, which makes them superior to CNNs in modeling univariate time series. The success of RNN in DT, k-NN, LR, RF, SVM, MLPs, and CNNs demonstrates that CNNs and RNNs possess unique structural characteristics. Specifically, CNNs, as a type of feedforward neural network, utilize filters and pooling layers, whereas RNNs incorporate feedback loops to enhance network performance.
Additionally, CNNs maintain a fixed input and output size, unlike RNNs. In the context of neural networks, CNNs process inputs of a predetermined size and adjusts them to the requisite scale, while also considering the confidence level of their predictions. RNN, on the other hand, allows for input and output sizes to fluctuate. The feedback mechanism inherent in RNNs facilitates the retention and re-presentation of past features to the network, thereby enhancing the overall performance. LSTM units, a specialized component within RNNs, feature a memory cell capable of retaining information over extended durations, contributing to the network’s ability to learn and retain long-term dependencies. A system of gates is employed to regulate the input, output, and retention of information within memory, facilitating the acquisition of knowledge about extended dependencies.
Additionally, a distinct set of gates is utilized to manage information flow without the incorporation of separate memory cells and with a reduced number of gates. The integrated design of the model highlights its superior performance compared to other models under evaluation. This hybrid model, which integrates CNN and RNN architectures, leverages the feature extraction capabilities of CNNs. The RNN leverages its feedback structure to retain and reintroduce past features as input, thereby enhancing the performance of the CNN in feature extraction and the performance of the RNN in the learning and prediction stages. The current study introduces a hybrid model that effectively predicts PM2.5 levels, offering a valuable tool for future research endeavors.
5.1. Limitations
Despite the promising results of the proposed CNN-RNN hybrid model, this study has several limitations. Although the datasets were obtained from reliable sources, they were accessed via Kaggle and not directly from official government APIs. Minor inconsistencies or pre-processing issues beyond our control may exist. While the model was tested on data from three cities with different characteristics, the results may not generalize to rural or highly diverse geographic areas without further validation.
The deep learning model, while effective, requires enough data to avoid overfitting. Particularly for the Milan dataset with limited records, regularization techniques were necessary to maintain generalization. This study only focused on PM2.5 values without including meteorological variables, which are known to influence air pollution levels.
Although the model shows high accuracy offline, its performance in real-time, noisy sensor environments and delayed data conditions have not been tested yet. Future work should address these limitations by incorporating more diverse datasets, additional environmental variables, and real-time implementation strategies.
5.2. Computational Complexity Analysis
The integration of CNN and RNN in a hybrid architecture inevitably increases the model’s computational complexity compared to single models. However, this increase is moderate and justifiable. The CNN-RNN model used in this study contains approximately 38,000 trainable parameters, compared to 22,000 in the standalone CNN and 19,000 in the RNN. The average training time per epoch was 2.4 s for CNN-RNN, versus 1.3 s for CNN and 1.1 s for RNN (measured using an NVIDIA RTX 3060 GPU, NVIDIA, Santa Clara, CA, USA). Despite this, the CNN-RNN model consistently outperformed the others in terms of RMSE and R2, indicating that the gain in prediction accuracy offsets the slight increase in computational cost. Furthermore, the model’s architecture remains lightweight compared to deeper Transformer-based networks, making it suitable for real-world deployment scenarios.
6. Conclusions
In the contemporary era of rapid development driven by industrialization, air pollution remains a pervasive threat in many parts of the world and has been a prominent problem. In particular, the presence of PM2.5, an air pollutant, poses a significant risk to the health of individuals and ecosystems exposed to this harmful substance. Inhaling air, which is made up of gases, water vapor, dust, and chemicals, can cause particles to enter the body. The body’s filtration system effectively prevents larger particles from reaching the lungs. However, Particles smaller than 2.5 microns, particularly those originating from petroleum fuels such as exhaust emissions, have the potential to bypass the body’s filtration mechanisms. PM2.5 particles have been identified as exacerbating or instigating chronic illnesses such as asthma, heart attacks, bronchitis, and various respiratory issues. Consequently, precise forecasting of air quality levels is imperative.
This study introduces a novel CNN-RNN hybrid deep learning model for air quality prediction, which was evaluated using experimental data from India, Milan, and Frankfurt datasets based on MSE, RMSE, MAE, and R2 metrics. The model was compared to those of DT, kNN, LR, RF, SVM, CNN, MLP, RNN, and LSTM. Results indicate that the proposed model achieved MAE scores of 1.620 and R2 scores of 0.995 for the India dataset, MAE scores of 2.511 and R2 scores of 0.834 for the Milan dataset, and MAE scores of 0.516 and R2 scores of 0.995 for the Frankfurt dataset. The experimental results indicate that the proposed model demonstrated superior predictive performance compared to other models examined. This study presents a novel model for accurately and effectively predicting PM2.5 concentrations, providing a valuable reference for researchers and organizations combating air pollution. The high accuracy of this model enables proactive measures to be taken in anticipation of future PM2.5 levels. These preventive policies have the potential to mitigate various adverse effects on individuals. Furthermore, the implementation of alternative production technologies in regions with anticipated high PM2.5 levels can lead to reduced emission values. Additionally, refining predictive models informed by the hybrid model can enhance the accuracy of future forecasts.