Next Article in Journal
Aircraft Position Estimation Using Deep Convolutional Neural Networks for Low SNR (Signal-to-Noise Ratio) Values
Previous Article in Journal
Fault Diagnosis of Lithium Battery Modules via Symmetrized Dot Pattern and Convolutional Neural Networks
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Time-Series Forecasting of PM2.5 and PM10 Concentrations Based on the Integration of Surveillance Images

1
School of Geographical Sciences, Fujian Normal University, Fuzhou 350117, China
2
Shanghai Surveying and Mapping Institute, Shanghai 200063, China
3
School of Geography, Nanjing Normal University, Nanjing 210023, China
4
Shanghai Institute of Satellite Engineering, Shanghai 201109, China
*
Authors to whom correspondence should be addressed.
These authors contributed equally to this work.
Sensors 2025, 25(1), 95; https://doi.org/10.3390/s25010095
Submission received: 20 November 2024 / Revised: 19 December 2024 / Accepted: 22 December 2024 / Published: 27 December 2024
(This article belongs to the Section Environmental Sensing)

Abstract

:
Accurate and timely air quality forecasting is crucial for mitigating pollution-related hazards and protecting public health. Recently, there has been a growing interest in integrating visual data for air quality prediction. However, some limitations remain in existing literature, such as their focus on coarse-grained classification, single-moment estimation, or reliance on indirect and unintuitive information from visual images. Here we present a dual-channel deep learning model, integrating surveillance images and multi-source numerical data for air quality forecasting. Our model, which combines a single-channel hybrid network consisting of VGG16 and LSTM (named VGG16-LSTM) with a single-channel Long Short-Term Memory (LSTM) network, efficiently captures detailed spatiotemporal features from surveillance image sequences and temporal features from atmospheric, meteorological, and temporal data, enabling accurate time-series forecasting of PM2.5 and PM10 concentrations. Experiments conducted on the 2021 Shanghai dataset demonstrate that the proposed model significantly outperforms traditional machine learning methods in terms of accuracy and robustness for time-series forecasting, achieving R2 values of 0.9459 and 0.9045 and RMSE values of 4.79 μg/m3 and 11.51 μg/m3 for PM2.5 and PM10, respectively. Furthermore, validation results on the datasets from two stations in Kaohsiung, Taiwan, with average R2 values of 0.9728 and 0.9365 and average RMSE values of 1.89 μg/m3 and 5.69 μg/m3 for PM2.5 and PM10 using a pretrain–finetune training strategy, confirm the model’s adaptability across diverse geographical contexts. These findings highlight the potential of integrating surveillance images to enhance air quality prediction, offering an effective supplement to ground-level environmental monitoring. Future work will focus on expanding datasets and optimizing network architectures to further improve forecasting accuracy and computational efficiency, enhancing the model’s scalability for broader regional air quality management.

1. Introduction

Air pollution is widely recognized as a major global environmental health threat, with prolonged exposure linked to significant impacts on human health and life expectancy. According to the World Health Organization, ambient air pollution is responsible for millions of premature deaths annually [1], underscoring the urgent need for effective air quality forecasting. Accurate forecasts enable early warnings, which are essential for mitigating health risks and improving overall quality of life.
Over the years, researchers have developed various air quality forecasting methods, broadly categorized into numerical simulation methods and statistical methods. Numerical simulation models, such as the Community Multiscale Air Quality (CMAQ) model [2,3,4,5], the Nested Air Quality Prediction Modeling System (NAQPMS) [6], and the Weather Research and Forecasting model coupled with chemistry (WRF-Chem) [7,8], are grounded in meteorological principles and simulate pollutant emission, dispersion, and transformation processes through complex physical and chemical mechanisms [9]. While theoretically robust and interpretable, these models are computationally intensive due to their reliance on extensive high-quality input data with fine-grained spatial and temporal resolutions and intricate meteorological and chemical coupling processes. These constraints often restrict their scalability and practicality for real-time urban monitoring applications [10,11,12,13,14].
To overcome the limitations of physical simulation, statistical methods are frequently employed for air quality forecasting. These methods leverage historical data, including meteorological variables, pollutant concentrations, and temporal trends, to statistically infer future atmospheric conditions. Widely used techniques include Multiple Linear Regression (MLR) [15], Auto Regressive Moving Average (ARMA) [16], Random Forest (RF) [17], Support Vector Machines (SVM) [18], Artificial Neural Networks (ANN) [19], and various hybrid approaches [20,21]. Among these, Recurrent Neural Networks (RNN) and Long Short-Term Memory (LSTM) networks [22,23] have gained attention due to their ability to learn from time-series data and capture dynamic pollutant trends, often achieving promising forecasting performance.
Both traditional physical and statistical models primarily rely on multi-source numerical data, such as pollutant concentrations and meteorological observations. As advancements in computer vision and deep learning, visual data, which can reflect light transmission and scattering properties [24,25], has become a valuable resource for atmospheric inversions. This has driven researchers to explore the integration of visual imagery with numerical data to provide more holistic and accurate forecasts. For instance, Xia et al. [26] proposes a multi-modal deep-learning model to integrate high spatial resolution remote-sensing images and time-series air quality data, improving future PM2.5 forecasts of multiple stations. Rowley and Karakus [27] designed a multimodal AI network named AQNet to predict air pollutants of NO2, O3, and PM10 by combining multi-spectral Sentinel-2 satellite imagery, low-resolution tropospheric NO2 concentration data from Sentinel-5P satellite, and tabular ground measurement data. These studies demonstrate the promise of visual data integration. However, satellite remote sensing data, despite its broader spatial coverage, often suffers from temporal delays and reduced real-time applicability due to the inherent limitations of data acquisition and processing.
Surveillance imagery presents a compelling alternative to satellite data, offering continuous and high-temporal resolution monitoring in localized settings. With widespread deployment in urban environments, surveillance cameras can capture dynamic environmental changes in real time. Studies [28,29,30,31] have statistically analyzed various image features such as color, edge, and texture under different air pollution conditions, demonstrating the potential of surveillance images for pollutant estimation. Surveillance-image-based air quality monitoring provides an innovative, real-time, and ground-level measurement technology for air quality management. It can serve as a valuable supplement to traditional ground-based air quality monitoring stations for enhancing the spatial and temporal resolution of environmental observations, further supporting decision-making for pollution control and public health management. However, existing research leveraging surveillance images for air quality prediction remains limited. Most studies focus on the coarse-grained classification of pollutant levels [32,33,34,35] or single-moment pollutant estimation [36,37,38,39] rather than time-series forecasting. Additionally, many rely on indirect information, such as traffic density [40], which is challenging to extract accurately from images captured under severe pollution. These limitations highlight the need for a direct and fine-grained prediction approach that fully exploits the potential of surveillance images for quantitative time-series forecasting.
To address these gaps, this study proposes a dual-channel integrated deep learning model that concurrently processes multi-source numerical data and surveillance images to forecast multi-time PM2.5 and PM10 concentrations. Specifically, a VGG16-LSTM single-channel network is employed to directly extract detailed spatiotemporal visual features from surveillance image sequences, while an LSTM single-channel network captures temporal features from time-series numerical data. By fusing these two distinct features, the dual-channel integrated model is endowed with the capability to handle multi-source heterogeneous data. To validate the effectiveness of the proposed model, experiments were conducted using the 2021 Shanghai dataset and Kaohsiung dataset, comparing it with several commonly used statistical models, along with detailed evaluation experiments. The main contributions of this work are summarized as follows:
  • We develop a dual-channel deep learning network architecture, enabling the efficient integration of surveillance images and numerical data for accurate multi-time quantitative forecasting;
  • The proposed model improves the PM2.5 and PM10 forecasting accuracy by capturing detailed spatiotemporal features from surveillance images, compared to commonly used statistical models;
  • Performance evaluation validates the model’s robustness for long time-series forecasting and its transferability across geographically diverse datasets.
The remainder of this paper is organized as follows: Section 2 details the dataset, data preprocessing methods, and the proposed dual-channel integrated model. Section 3 presents the experimental results, including performance evaluation, comparison with existing methods, and transferability analysis, and discusses the study’s limitations and potential implications. Finally, Section 4 concludes the paper with a summary of findings and directions for future research.

2. Materials and Methods

2.1. Dataset

It is well established that air quality dynamics are influenced by multiple factors, including interactions among pollutants, coupling between successive processes, and effects from meteorological and other environmental conditions [41,42]. Numerous studies [43,44,45] have confirmed the value of incorporating data related to these influencing factors as inputs in air quality predictions. Building on this foundation and considering data accessibility, we incorporate routinely monitored indicators from atmospheric, meteorological, and temporal factors as supplementary inputs, alongside surveillance images for PM2.5 and PM10 concentration forecasting. Following previous research, raw data were collected from various sources associated with the monitoring station in Pudong New Area (Station ID: 1149A, located at 121.533° E, 31.2284° N). The data acquisition locations are shown in Figure 1a. A comprehensive multi-source dataset was compiled, including single-camera images, six conventional air pollutant indicators, seven common meteorological variables, and two temporal parameters, as summarized in Table 1.
Specifically, the temporal factors utilized in this study refer to the month and hour parameters, which were directly extracted from the metadata of other collected data, eliminating the need for separate collection. The other data sources are as follows:
  • Image Data: Camera images were acquired via web scraping from the official website of the Shanghai Municipal Bureau of Ecology and Environment (https://sthj.sh.gov.cn/). A total of 8499 surveillance images were collected, captured hourly from 1 January 2021 to 31 December 2021, with a resolution of 584 × 389 pixels.
  • Atmospheric Data: The detailed measurement of various air pollutants from all stations nationwide can be obtained from the China National Environmental Monitoring Center (http://www.cnemc.cn/). In this study, 8374 valid hourly concentration records of six conventional air pollutants—PM2.5, PM10, SO2, NO2, O3, and CO—were downloaded for the entire year of 2021 from the monitoring station in Pudong New Area.
  • Meteorological Data: Based on research related to the CHAP dataset [46,47], seven meteorological variables were selected as air quality forecasting factors: precipitation (PRE), surface pressure (SP), temperature (TEM), evaporation (ET), relative humidity (RH), wind speed (WS), and wind direction (WD). These hourly meteorological values were primarily collected from the fifth-generation reanalysis data (ERA5, https://cds.climate.copernicus.eu/) (accessed on 6 September 2022) provided by the Copernicus Climate Change Service (C3S), with a spatial resolution of 0.25° × 0.25°.
Missing data and outliers were inevitable due to instrument failures or extreme weather conditions. In our dataset, meteorological and temporal data records were complete, whereas the image data had only a few missing instances, and atmospheric data exhibited a small number of anomalous negative values and missing entries. These anomalies and missing data were uniformly treated as missing values, resulting in a total of 8126 complete and valid data records across the dataset. To generate a sufficient number of samples for model training, missing data were interpolated using different methods depending on the length of the gap and the data type: for pollutant data with gaps of 4 h or less, the average of the two hours before and after the gap was used, and for longer gaps, the average of the corresponding time points on the previous and subsequent days was used [48]; for image data with gaps of 4 h or less, images from the previous or subsequent two hours were used, and no interpolation was performed for longer gaps. The interpolation increased the number of complete hourly records from 8126 to 8583. Subsequently, a sliding window approach [49] was employed to generate sequence samples for model training and testing, producing 8365 sequence samples for a 24-h forecasting horizon with 12-h historical data and 12-h forecasted data.

2.2. Dual-Channel Integrated Network

Given the interactions among air pollutants, the influence of meteorological conditions, the temporal dependence of air pollution variation, and visual changes associated with particulate matter fluctuations, this study utilizes multi-source heterogeneous data, including historical pollutant records, meteorological variables, time information, and visual images to forecast future PM2.5 and PM10 trends. Hourly air pollutant, meteorological, and temporal data can be processed as single numerical values, allowing seamless integration into a unified model. However, the image at each time point is represented as a multidimensional array, which differs significantly in structure from numerical data. This structural disparity complicates feature extraction, as a single model may struggle to effectively process both data types without sacrificing critical details. Reducing image data to a single value, for example, would cause substantial information loss. To address this, we designed a dual-channel integrated network framework that independently extracts and processes features from each data type, preserving the unique characteristics of both for improved predictive accuracy.
LSTM-based deep learning sequence models are widely applied in existing studies to forecast future air quality based on historical time-series data [50,51]. Through the gate mechanism, the LSTM model progressively transforms features into a new feature space that captures rich temporal information, facilitating more accurate forecasts. In this study, an LSTM network is similarly implemented as a branch of the dual-channel network to extract the temporal features from numerical data, including pollutant concentrations, meteorological conditions, and time information. The LSTM single-channel network comprises a single LSTM layer with 200 hidden neurons, and its output is then prepared for the next stage of integrated learning.
Building on the successful application of the VGG16-LSTM model for image-based air quality estimation in our previous work [36], where the VGG16 component extracts spatial features from individual surveillance images and the LSTM layer captures temporal correlations in image sequences, linking visual changes with particulate matter fluctuations over time; this study continues to use its feature extraction layers as the second branch of our designed network. In the VGG16-LSTM single-channel network, we modified the VGG16 architecture by removing its fully connected and output layers, retaining only the feature extraction layers to capture spatial features. These features are then passed to an LSTM layer with 512 neurons, which captures temporal dependencies between image sequences, generating deep and abstract spatiotemporal features.
To fully exploit the advantages of multi-source heterogeneous data, we fuse the temporal features extracted by the LSTM single-channel network with the spatiotemporal features extracted by the VGG16-LSTM single-channel network at the feature level. This fusion integrates surveillance image data with traditional forecasting factors, providing a holistic approach to PM2.5 and PM10 forecasting. Following feature fusion, two fully connected layers are applied to map the learned features to the output. The first fully connected layer, containing 200 neuron nodes, further refines the fused features into high-level abstract representations. The second fully connected layer serves as the output layer, designed to meet the goal of this study—multi-time forecasting of PM2.5 and PM10 concentrations. The output layer generates a two-dimensional array, where the dimensions correspond to the forecasting horizon and the number of forecasted variables. The number of neurons in the output layer matches the forecasting horizon, and an activation function is applied to model the nonlinear regression relationship between the input features and the forecasted targets.
In summary, the proposed PM2.5 and PM10 forecasting model consists of four key modules, as illustrated in Figure 2. The first is the data fusion module, which merges homogeneous numerical data from different sources into a two-dimensional array with the shape (T1, 15) for supervised learning. The second module is the LSTM single-channel network, which is responsible for extracting temporal features from the homogeneous numerical data and producing an output feature vector of length 200. The third module is the VGG16-LSTM single-channel network, which extracts spatiotemporal features from surveillance image data, formatted as (T1, H, W, 3), generating an output feature vector of length 512. The final module is the feature fusion and forecasting module, which combines the outputs from the two single-channel networks. After fusion, the combined feature vector has a length of 712, and the final model output is produced through two fully connected layers, resulting in an output size of T2 × 2. Here, T1 represents the time horizon of historical data used for forecasting, T2 refers to the future time period for being predicted, and H and W are the height and width of input images, respectively.
The dual-channel deep learning network builds upon the existing LSTM and VGG16-LSTM architectures but distinguishes itself by utilizing two branch networks to extract detailed features from multi-source numerical data and surveillance images, effectively preserving the unique characteristics of heterogeneous data. Its multi-step, multi-output design enables long-term quantitative forecasting of PM2.5 and PM10 concentrations, overcoming the limitations of single-moment and coarse-grained estimations. By leveraging the complementary strengths of numerical and visual data, this integrated approach aims to improve forecasting accuracy and demonstrate the potential of surveillance-image-assisted methodologies in advancing air quality measurement.

3. Results and Discussion

3.1. Model Implementation

The models in this study were implemented using Python 3.7 and the TensorFlow deep learning framework on a server configured with an Intel Xeon Silver 4216 CPU @ 2.10 GHz, an NVIDIA GeForce GTX 2080Ti GPU, and a Windows 10 operating system. Key parameters were carefully initialized to optimize model performance. Similar to the model setup of VGG16-LSTM, mean squared error (MSE) was selected as the loss function to quantify training accuracy, while a learning rate of 0.00001 was chosen to balance model weight updates. A batch size of 8 was used to maintain computational efficiency without overloading memory. To ensure compatibility with the input of VGG16, all images were resized to 224 × 224 pixels, and their pixel values were scaled from a 0–255 range to a 0–1 range to accelerate convergence. The values for pollutant concentrations and meteorological variables were normalized to a 0–1 range based on their respective upper limits (e.g., PM2.5 concentration was capped at 500 [52]).
Two-fold cross-validation was employed to robustly assess model performance. The dataset was split into two equal parts, with each half alternately serving as the training and testing sets. The average performance across these folds was considered the final prediction result. The proposed model’s predictive accuracy was evaluated using two widely recognized regression metrics: the coefficient of determination (R2) and the root mean squared error (RMSE). The R2 metric, ranging from 0 to 1, indicates how closely the predicted values align with the actual air quality measurements, while RMSE quantifies the average prediction error. A higher R2 and a lower RMSE reflect superior model performance.

3.2. Overall Accuracy

To validate the forecasting performance of the proposed method, the dual-channel integrated model was tested on the pre-processed experimental dataset. The experimental setup is consistent with the description in Section 3.1. Figure 3a,b present scatter density plots of the 12-h forecast results versus actual measured values for PM2.5 and PM10 concentrations. Statistical analysis reveals high R2 values and low RMSE for two pollutants: 0.9459 and 4.79 µg/m3 for PM2.5, 0.9045 and 11.51 µg/m3 for PM10. Additionally, the scatter points align closely around the 1:1 line with high sample density and steep fitting slopes (approximately 0.9355 and 0.8876), indicating strong consistency between the predicted and observed values. This confirms the feasibility and effectiveness of the dual-channel integrated deep learning model for PM2.5 and PM10 quantitative forecasting.
However, further examination of Figure 3 shows an underestimation of pollutant levels in high-value regions. This underestimation primarily stems from the uneven distribution of samples, with fewer high-concentration samples available for training, leading to weaker learning capability for high-concentration samples and increased prediction errors. Despite this, the multi-source data fusion-based dual-channel integrated model demonstrates excellent overall performance, achieving high accuracy for PM2.5 and PM10 forecasting.

3.3. Performance Analysis

3.3.1. Comparison with Other Methods

Additionally, the proposed dual-channel integrated model was compared with widely used traditional statistical methods, including MLR, RF, and SVR, as well as the LSTM model. In the RF algorithm, the number of decision trees is a critical parameter influencing the model, and 1000 decision trees were set referring to Doreswamy et al. [53]. Following García Nieto et al. [18], a polynomial kernel function was selected for the SVR algorithm. The LSTM model used here corresponds to the LSTM single-channel network constructed in Section 2.2 and incorporates the same fully connected layer and output layer as the proposed dual-channel integrated model. To ensure a fair comparison, all methods were trained and tested on the same data splits of the processed Shanghai dataset, and all time-series numerical inputs were standardized to maintain consistency across models. Since models other than the dual-channel integrated model cannot directly process the multidimensional structure of surveillance images, we calculated the mean pixel value of each image. The averaged and normalized image data were then combined with other numerical data as inputs in these models to calculate 12-h time-series PM2.5 and PM10 concentrations as outputs. The forecasting performance was evaluated using metrics including R2 and RMSE for PM2.5 and PM10 concentrations. The forecasting results of different methods are presented in Table 2.
Among traditional machine learning algorithms, the RF algorithm achieved moderate forecasting performance, outperforming MLR and SVR in predicting both PM2.5 and PM10 levels. Compared to RF, the LSTM model significantly improved the forecast accuracy for PM2.5 and PM10 by 14.5% and 21.9%, respectively, highlighting the effectiveness of the sequence-based model for time-series forecasting. Notably, the dual-channel integrated model, which builds on the VGG16-LSTM network, achieved the highest prediction accuracy, with further improvements in R2 of 4.7% for PM2.5 and 6.6% for PM10. These results highlight the superiority of the VGG16-LSTM network, which enables the capture of detailed spatiotemporal features from surveillance images that traditional models and the single-channel LSTM approach might miss. The comparison also underscores the value of the dual-channel integrated model that preserves the unique characteristics of multi-source heterogeneous data for enhancing PM2.5 and PM10 quantitative forecasting performance.

3.3.2. Influence of Different Forecasting Factors

To enhance forecasting accuracy, the study integrates various factors, including images, atmospheric data, meteorological data, and temporal data. Figure 4 compares the forecasting results obtained by using different combinations of these factors. The results based on combinations of atmospheric, meteorological, and temporal data (Air, Air + Met, Air + Met + Time) were obtained using the aforementioned LSTM model, while results using all data sources (Air + Met + Time + Image) were derived from the dual-channel model that integrates the LSTM and VGG16-LSTM networks.
The results indicate that the LSTM model combining atmospheric and meteorological data improves the R2 of PM2.5 and PM10 by 4% each while reducing RMSE by 1.22 µg/m3 and 1.55 µg/m3, respectively, compared to that relying solely on atmospheric data. Adding time information further increases R2 by an additional 2.5% and 1.2% and reduces RMSE by 0.76 µg/m3 and 0.5 µg/m3 for PM2.5 and PM10, respectively. Moreover, the dual-channel model, which integrates the VGG16-LSTM network to extract features from surveillance images, achieves an additional average R2 improvement of 6.5% for PM2.5 and PM10 forecasts.
Consistent with previous research, these findings demonstrate the effectiveness of the selected forecasting factors in improving air quality predictions. The inclusion of meteorological and temporal factors markedly enhances prediction accuracy, while the integration of image data, a heterogeneous source, further boosts the forecasting capability of PM2.5 and PM10.

3.3.3. Performance on Different Forecast Time Lags

The study evaluates the performance of the dual-channel integrated model against single-channel models over varying forecast time lags. The results, shown in Figure 5, indicate that the dual-channel model consistently achieves higher R2 values and lower RMSEs at each forecast hour compared to single-channel models. Even for 12-h forecasts, the dual-channel model maintains high accuracy, with R2 values exceeding 0.8 for PM10 and reaching as high as 0.9 for PM2.5. These findings further validate the superiority of the dual-channel model, which significantly outperforms single-channel models in terms of forecast performance.
Figure 5 also shows a general decline in forecast accuracy as the forecast time lag increases for both PM2.5 and PM10, a trend consistent with the decrease in autocorrelation of air pollutants over time [36,54], which impacts forecast effectiveness for longer horizons. Notably, the rate of decline in forecast performance is more pronounced for single-channel models, which is related to the number of forecasting factors involved. In contrast, the dual-channel model demonstrates more stable performance, particularly within the first 10 h, owing to its integration of multi-source data, including surveillance images. The analysis of forecast performance on different time lags underscores the effectiveness and robustness of the proposed dual-channel model on time-series forecasts.

3.3.4. Performance Across Different Forecast Durations

Given the observed decline in forecast accuracy over longer time lags, it is critical to evaluate the feasibility of the dual-channel model for long-term air quality forecasting. To this end, additional experiments were conducted with forecast horizons of 24, 36, 48, 60, and 72 h while keeping the historical sequence length constant. The results are presented in Figure 6.
The dual-channel model demonstrates superior performance in extended time-series forecasts, maintaining high accuracy even when predicting future PM2.5 and PM10 concentrations up to 72 h ahead using only 12 h of historical data. Specifically, the model achieves R2 values of 0.91 and 0.84 for PM2.5 and PM10, respectively, with prediction errors of 6.02 µg/m3 and 15.18 µg/m3. These findings confirm the feasibility of applying the dual-channel deep learning model for long-term air quality forecasting.
However, a decline in R2 values and an increase in RMSE values are observed as the forecast horizon extends (including beyond 72 h), indicating an inverse relationship between forecast accuracy and duration. The longer the forecast duration, the greater the prediction uncertainty. Consequently, in practical applications, it is crucial to balance the trade-off between forecast accuracy and the desired forecast duration, ensuring the model meets the specific requirements of air quality management tasks.

3.3.5. Influence of Data Interpolation

To enhance the continuity of time-series data and generate sufficient samples for model training, we applied interpolation to address missing records in the multi-source dataset. However, interpolating missing data may introduce additional uncertainty or errors in the predictions. To evaluate the impact of interpolation, we conducted a comparative experiment where missing data were either interpolated or removed entirely, and the resulting forecasting performance was analyzed.
After removing records with missing values, the dataset contained approximately 8126 complete records. Due to the discontinuity caused by missing data, the total number of 24-h sequence samples (T1 = 12, T2 = 12) generated by the sliding window method was significantly reduced to 5558. In contrast, interpolation increased the dataset to 8583 complete records, resulting in 8365 sequence samples—considerably improving the sample availability for training.
The forecasting results for the two approaches, interpolation, and deletion, were compared using the 5558 identical sequence samples. Figure 7 presents the scatter density plots illustrating the comparison. Despite the reduced sample size when missing data were removed, our proposed method still achieved relatively high R2 values and low RMSE values for both PM2.5 and PM10 forecasts. Notably, the results on the interpolation-based dataset demonstrated slight improvements over that on the deletion-based dataset, likely benefiting from the increased sample size. It indicates that the interpolation method employed in this study is effective and does not introduce significant uncertainty or bias into the forecasts.
Although the sample size advantage was modest in this particular experiment, the importance of data volume in model training cannot be overstated. A larger sample size is crucial for improving model generalization and performance, particularly in long-term time-series forecasting. Therefore, interpolating missing data is not only beneficial but essential, and selecting an optimal interpolation method remains an important direction for future research.

3.3.6. Transferability of the Model

In addition to the experiments conducted on the Shanghai dataset, the transferability of the proposed model was evaluated using datasets from two stations in Kaohsiung, Taiwan—Renwu (Station ID: 49, located at 120.332631° E, 22.689056° N) and Linyuan (Station ID: 52, located at 120.41175° E, 22.4795° N), whose locations are shown in Figure 1b. The image data provided by Hsieh et al. [55] and Kow et al. [56], with a resolution of 1280 × 720 pixels, were supplemented with air pollutant concentration data sourced from the Taiwan EPA website (https://data.moenv.gov.tw/) (accessed on 10 May 2022) and meteorological data from ERA5. After data preprocessing, 3462 and 3418 sequence samples were generated for the Renwu and Linyuan stations, respectively. Due to the spatial heterogeneity in air quality conditions and surveillance images between different locations, models trained on data from a fixed location often require retraining and adjustment when applied to data from other regions [57]. To assess adaptability, two approaches were used in this study: (1) Retrain: Retraining the dual-channel integrated model from scratch using the two Kaohsiung station datasets, and (2) Pretrain–finetune: Fine-tuning a pretrained model, initially trained on the Shanghai dataset, with the Kaohsiung data.
The predictive performance under both approaches is summarized in Table 3. The dual-channel integrated model demonstrated high accuracy in predicting air pollutant concentrations for both Kaohsiung stations under both training strategies, confirming the effectiveness and transferability of the proposed multi-source data integration method. Notably, the pretrain–finetuned model consistently outperformed the retrained model for all pollutants at both stations. Specifically, the R2 for PM2.5 increased by more than 0.05, and RMSE decreased by over 1.3 µg/m3; for PM10, the R2 improved by more than 0.04, and RMSE decreased by over 1.7 µg/m3.
The fine-tuning process enhanced the model’s predictive capacity by effectively leveraging the training data from both regions, thereby improving its adaptability. These findings validate the efficacy of the pretrain–finetune approach and highlight it as a recommended strategy for transferring models to new regions, particularly where local data are limited.

3.4. Limitations of This Study

While this study demonstrates significant advancements in air quality forecasting, several limitations must be acknowledged to provide a comprehensive understanding of its findings and implications. Firstly, the integration of surveillance images with atmospheric and meteorological data introduces challenges related to data availability and privacy. Meteorological and atmospheric pollutant records are generally reliable and accessible, but obtaining surveillance images can be hindered by legal, logistical, or policy constraints, resulting in a limited sample size and restricting broader generalization. Furthermore, the collected data often exhibit spatial discrepancies between different sources, as exemplified by the Shanghai dataset in this study. Specifically, the spatial distance between surveillance cameras and atmospheric or meteorological monitoring stations can lead to data misalignment, potentially introducing noise and reducing predictive accuracy. Despite these drawbacks, such datasets remain valuable and scarce resources for research. Future efforts should prioritize promoting open data sharing to create richer and more diverse datasets. This would enable more comprehensive assessments of the model’s generalizability across different regions and conditions while fostering deeper investigations into the effects of spatial inconsistencies and the development of advanced techniques to address and mitigate such discrepancies.
Secondly, the proposed dual-channel integrated model significantly improves forecasting accuracy by combining surveillance imagery with multi-source numerical data, but this comes at the cost of increased algorithmic complexity and resource demands. Specifically, the model’s parameter count increased approximately 75-fold, and the memory access cost (MAC) rose by nearly 2 million times, requiring up to 300 GB of computer memory to run the dual-channel integrated model. Such computational intensity necessitates high-performance GPUs and longer training durations, which may limit the scalability of this approach in resource-limited environments. Research on model optimization techniques, such as lightweight network architectures or model pruning, could help reduce computational overhead while maintaining accuracy.
Additionally, the resolution of surveillance images was resized to 224 × 224 pixels for model input in this study to balance computational efficiency and feature extraction. While sufficient for capturing general spatiotemporal patterns, this resolution may limit the model’s ability to detect fine-grained visual details that could enhance prediction accuracy. Higher-resolution images could potentially improve performance by capturing subtle atmospheric variations. Future studies should assess the trade-offs between image resolution, computational cost, and model accuracy to determine optimal configurations.
Finally, the use of interpolation methods to handle missing data inevitably introduces estimation errors, which can propagate through the model and affect prediction outcomes. Although the interpolation techniques employed in this study effectively increased data completeness and sample size, they may oversimplify the underlying temporal and spatial dynamics. More sophisticated approaches, such as machine learning-based imputation or dynamic temporal interpolation, should be explored to minimize biases and enhance data integrity.

4. Conclusions

To address the limitations of existing surveillance-image-based air quality prediction approaches, such as their focus on coarse-grained classification, single-moment estimation, or reliance on indirect and unintuitive information from surveillance images, this study develops a dual-channel deep learning model for time-series forecasting of quantitative air quality. This model efficiently integrates multi-source heterogeneous data while preserving their unique characteristics by leveraging a VGG16-LSTM single-channel network to directly extract spatiotemporal features from surveillance image sequences and an LSTM single-channel network to capture temporal features from atmospheric, meteorological, and temporal data. This design enables the model to achieve accurate multi-time PM2.5 and PM10 concentration forecasts, addressing the challenges of integrating complex, multimodal datasets.
The model’s performance was rigorously validated using the 2021 Shanghai dataset, demonstrating its effectiveness and feasibility. Compared to traditional air quality forecasting methods such as support vector regression and random forest, the dual-channel model significantly outperformed these baselines. The inclusion of surveillance image data not only enhanced forecast accuracy but also improved the robustness of the predictions, underscoring the value of integrating visual data into air quality forecasting. Furthermore, the model showed superior performance across various forecast durations, confirming its potential for long-term forecasting. To assess its adaptability, the model’s transferability was tested on datasets from two stations in Kaohsiung, Taiwan. Through both retraining and pretrain–finetuning approaches, the model demonstrated its ability to generalize across diverse geographical regions.
The proposed dual-channel integrated deep learning model achieved notable performance improvements, but we acknowledge certain limitations. Employing state-of-the-art deep learning architectures could further enhance computational efficiency and forecasting accuracy. Future work will focus on optimizing network designs to reduce resource demands while maintaining or improving performance. Additionally, the current limitation in the availability of image-based air quality datasets restricts this study to single-station forecasts. Expanding the proposed method to larger regions by collecting image data from multiple stations and developing high-accuracy regional forecasting models will be a key focus of future research. These efforts aim to unlock the full potential of visual image data in advancing air quality forecasting on a broader scale.

Author Contributions

Conceptualization, X.W.; Data curation, M.W.; Formal analysis, S.Z.; Funding acquisition, X.L.; Investigation, S.Z.; Methodology, Y.W.; Project administration, Y.W.; Resources, M.W.; Software, S.Z.; Supervision, X.L.; Validation, X.W.; Visualization, M.W.; Writing—original draft, Y.W.; Writing—review and editing, X.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (NSFC), grant number 42471439.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in the study are included in the article; further inquiries can be directed to the corresponding authors.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. WHO. Ambient (Outdoor) Air Pollution. 2024. Available online: https://www.who.int/news-room/fact-sheets/detail/ambient-(outdoor)-air-quality-and-health (accessed on 24 October 2024).
  2. Binkowski, F.S.; Roselle, S.J. Models-3 Community Multiscale Air Quality (Cmaq) Model Aerosol Component 1. Model Description. J. Geophys. Res.-Atmos. 2003, 108, 4183. [Google Scholar] [CrossRef]
  3. Byun, D.W.; Ching, J.K.S. Science Algorithms of the Epa Models-3 Community Multi-Scale Air Quality (Cmaq) Modeling System; U.S. Environmental Protection Agency: Washington, DC, USA, 1999.
  4. Chen, J.; Lu, J.; Avise, J.C.; DaMassa, J.A.; Kleeman, M.J.; Kaduwela, A.P. Seasonal Modeling of PM2.5 in California’s San Joaquin Valley. Atmos. Environ. 2014, 92, 182–190. [Google Scholar] [CrossRef]
  5. Mebust, M.R.; Eder, B.K.; Binkowski, F.S.; Roselle, S.J. Models-3 Community Multiscale Air Quality (Cmaq) Model Aerosol Component 2. Model Evaluation. J. Geophys. Res.-Atmos. 2003, 108, 4184. [Google Scholar] [CrossRef]
  6. Wang, Z.; Maeda, T.; Hayashi, M.; Hsiao, L.F.; Liu, K.Y. A Nested Air Quality Prediction Modeling System for Urban and Regional Scales: Application for High-Ozone Episode in Taiwan. Water Air Soil Pollut. 2001, 130, 391–396. [Google Scholar] [CrossRef]
  7. Grell, G.A.; Peckham, S.E.; Schmitz, R.; McKeen, S.A.; Frost, G.; Skamarock, W.C.; Eder, B. Fully Coupled “Online” Chemistry within the Wrf Model. Atmos. Environ. 2005, 39, 6957–6975. [Google Scholar] [CrossRef]
  8. Saide, P.E.; Carmichael, G.R.; Spak, S.N.; Gallardo, L.; Osses, A.E.; Mena-Carrasco, M.A.; Pagowski, M. Forecasting Urban PM10 and PM2.5 Pollution Episodes in Very Stable Nocturnal Conditions and Complex Terrain Using Wrf-Chem Co Tracer Model. Atmos. Environ. 2011, 45, 2769–2780. [Google Scholar] [CrossRef]
  9. Zhang, Y.; Bocquet, M.; Mallet, V.; Seigneur, C.; Baklanov, A. Real-Time Air Quality Forecasting, Part I: History, Techniques, and Current Status. Atmos. Environ. 2012, 60, 632–655. [Google Scholar] [CrossRef]
  10. Daly, A.; Zannetti, P. Air Pollution Modeling—An Overview. In Chapter 2 of Ambient Air Pollution; The Arab School for Science and Technology (ASST): Alexandria, Egypt; The EnviroComp Institute: Half Moon Bay, CA, USA, 2007; pp. 15–28. [Google Scholar]
  11. Stern, R.; Builtjes, P.; Schaap, M.; Timmermans, R.; Vautard, R.; Hodzic, A.; Memmesheimer, M.; Feldmann, H.; Renner, E.; Wolke, R.; et al. A Model Inter-Comparison Study Focussing on Episodes with Elevated PM10 Concentrations. Atmos. Environ. 2008, 42, 4567–4588. [Google Scholar] [CrossRef]
  12. Vautard, R.; Builtjes, P.H.J.; Thunis, P.; Cuvelier, C.; Bedogni, M.; Bessagnet, B.; Honoré, C.; Moussiopoulos, N.; Pirovano, G.; Schaap, M.; et al. Evaluation and Intercomparison of Ozone and PM10 Simulations by Several Chemistry Transport Models over Four European Cities within the Citydelta Project. Atmos. Environ. 2007, 41, 173–188. [Google Scholar] [CrossRef]
  13. Appel, K.W.; Napelenok, S.L.; Foley, K.M.; Pye, H.O.T.; Hogrefe, C.; Luecken, D.J.; Bash, J.O.; Roselle, S.J.; Pleim, J.E.; Foroutan, H.; et al. Description and Evaluation of the Community Multiscale Air Quality (Cmaq) Modeling System Version 5.1. Geosci. Model Dev. 2017, 10, 1703–1732. [Google Scholar] [CrossRef]
  14. Manders, A.M.M.; Schaap, M.; Hoogerbrugge, R. Testing the Capability of the Chemistry Transport Model Lotos-Euros to Forecast PM10 Levels in the Netherlands. Atmos. Environ. 2009, 43, 4050–4059. [Google Scholar] [CrossRef]
  15. Li, C.; Hsu, N.C.; Tsay, S.-C. A Study on the Potential Applications of Satellite Data in Air Quality Monitoring and Forecasting. Atmos. Environ. 2011, 45, 3663–3675. [Google Scholar] [CrossRef]
  16. Ziegel, E.R. Time Series Analysis, Forecasting, and Control. Technometrics 1995, 37, 238–242. [Google Scholar] [CrossRef]
  17. Rubal; Kumar, D. Evolving Differential Evolution Method with Random Forest for Prediction of Air Pollution. Procedia Comput. Sci. 2018, 132, 824–833. [Google Scholar] [CrossRef]
  18. García Nieto, P.J.; Combarro, E.F.; del Coz Díaz, J.J.; Montañés, E. A Svm-Based Regression Model to Study the Air Quality at Local Scale in Oviedo Urban Area (Northern Spain): A Case Study. Appl. Math. Comput. 2013, 219, 8923–8937. [Google Scholar] [CrossRef]
  19. Hooyberghs, J.; Mensink, C.; Dumont, G.; Fierens, F.; Brasseur, O. A Neural Network Forecast for Daily Average PM10 Concentrations in Belgium. Atmos. Environ. 2005, 39, 3279–3289. [Google Scholar] [CrossRef]
  20. Chen, Y.; Shi, R.; Shu, S.; Gao, W. Ensemble and Enhanced PM10 Concentration Forecast Model Based on Stepwise Regression and Wavelet Analysis. Atmos. Environ. 2013, 74, 346–359. [Google Scholar] [CrossRef]
  21. Díaz-Robles, L.A.; Ortega, J.C.; Fu, J.S.; Reed, G.D.; Chow, J.C.; Watson, J.G.; Moncada-Herrera, J.A. A Hybrid Arima and Artificial Neural Networks Model to Forecast Particulate Matter in Urban Areas: The Case of Temuco, Chile. Atmos. Environ. 2008, 42, 8331–8340. [Google Scholar] [CrossRef]
  22. Li, X.; Peng, L.; Yao, X.; Cui, S.; Hu, Y.; You, C.; Chi, T. Long Short-Term Memory Neural Network for Air Pollutant Concentration Predictions: Method Development and Evaluation. Environ. Pollut. 2017, 231, 997–1004. [Google Scholar] [CrossRef]
  23. Ma, X.; Tao, Z.; Wang, Y.; Yu, H.; Wang, Y. Long Short-Term Memory Neural Network for Traffic Speed Prediction Using Remote Microwave Sensor Data. Transp. Res. Part C Emerg. Technol. 2015, 54, 187–197. [Google Scholar] [CrossRef]
  24. McCartney, E.J. Optics of the Atmosphere: Scattering by Molecules and Particles; John Wiley & Sons: New York, NY, USA, 1976; pp. 1–42. [Google Scholar]
  25. Narasimhan, S.G.; Nayar, S.K. Vision and the Atmosphere. Int. J. Comput. Vis. 2002, 48, 233–254. [Google Scholar] [CrossRef]
  26. Xia, H.; Chen, X.; Wang, Z.; Chen, X.; Dong, F. A Multi-Modal Deep-Learning Air Quality Prediction Method Based on Multi-Station Time-Series Data and Remote-Sensing Images: Case Study of Beijing and Tianjin. Entropy 2024, 26, 91. [Google Scholar] [CrossRef]
  27. Rowley, A.; Karakus, O. Predicting Air Quality Via Multimodal Ai and Satellite Imagery. Remote Sens. Environ. 2023, 293, 113609. [Google Scholar] [CrossRef]
  28. Liaw, J.J.; Chen, K.Y. Using High-Frequency Information and Rh to Estimate Aqi Based on Svr. Sensors 2021, 21, 3630. [Google Scholar] [CrossRef]
  29. Liu, C.; Tsow, F.; Zou, Y.; Tao, N. Particle Pollution Estimation Based on Image Analysis. PLoS ONE 2016, 11, e0145955. [Google Scholar] [CrossRef]
  30. Pudasaini, B.; Kanaparthi, M.; Scrimgeour, J.; Banerjee, N.; Mondal, S.; Skufca, J.; Dhaniyala, S. Estimating PM2.5 from Photographs. Atmos. Environ.-X 2020, 5, 100063. [Google Scholar] [CrossRef]
  31. Yang, B.; Chen, Q. PM2.5 Concentration Estimation Based on Image Quality Assessment. In Proceedings of the 2017 4th Asian Conference on Pattern Recognition (ACPR), Nanjing, China, 26–29 November 2017; pp. 676–681. [Google Scholar] [CrossRef]
  32. Chakma, A.; Vizena, B.; Cao, T.T.; Lin, J.; Zhang, J. Image-Based Air Quality Analysis Using Deep Convolutional Neural Network. In Proceedings of the 2017 24th IEEE International Conference on Image Processing (Icip), Beijing, China, 17–20 September 2017; pp. 3949–3952. [Google Scholar]
  33. Ma, J.; Li, K.; Han, Y.H.; Yang, J.Y. Image-Based Air Pollution Estimation Using Hybrid Convolutional Neural Network. In Proceedings of the 2018 24th International Conference on Pattern Recognition (Icpr), Beijing, China, 20–24 August 2018; pp. 471–476. [Google Scholar]
  34. Zhang, C.; Yan, J.C.; Li, C.S.; Wu, H.; Bie, R.F. End-to-End Learning for Image-Based Air Quality Level Estimation. Mach. Vision. Appl. 2018, 29, 601–615. [Google Scholar] [CrossRef]
  35. Kalajdjieski, J.; Zdravevski, E.; Corizzo, R.; Lameski, P.; Kalajdziski, S.; Pires, I.M.; Garcia, N.M.; Trajkovik, V. Air Pollution Prediction with Multi-Modal Data and Deep Neural Networks. Remote Sens. 2020, 12, 4142. [Google Scholar] [CrossRef]
  36. Wang, X.; Wang, M.; Liu, X.; Mao, Y.; Chen, Y.; Dai, S. Surveillance-Image-Based Outdoor Air Quality Monitoring. Environ. Sci. Ecotechnol 2024, 18, 100319. [Google Scholar] [CrossRef]
  37. Wang, X.; Wang, M.; Liu, X.; Zhang, X.; Li, R. A PM2.5 Concentration Estimation Method Based on Multi-Feature Combination of Image Patches. Environ. Res. 2022, 211, 113051. [Google Scholar] [CrossRef]
  38. Yue, G.; Gu, K.; Qiao, J. Effective and Efficient Photo-Based PM2.5 Concentration Estimation. IEEE Trans. Instrum. Meas. 2019, 68, 3962–3971. [Google Scholar] [CrossRef]
  39. Zhang, B.; Geng, Z.; Zhang, H.; Pan, J. Densely Connected Convolutional Networks with Attention Long Short-Term Memory for Estimating PM2.5 Values from Images. J. Clean. Prod. 2022, 333, 130101. [Google Scholar] [CrossRef]
  40. Hameed, S.; Islam, A.; Ahmad, K.; Belhaouari, S.B.; Qadir, J.; Al-Fuqaha, A. Deep Learning Based Multimodal Urban Air Quality Prediction and Traffic Analytics. Sci. Rep. 2023, 13, 22181. [Google Scholar] [CrossRef]
  41. Olsson, P.Q.; Benner, R.L. Atmospheric Chemistry and Physics:  From Air Pollution to Climate Change. J. Am. Chem. Soc. 1999, 121, 1423. [Google Scholar] [CrossRef]
  42. Zhang, Q.; He, K.; Huo, H. Policy: Cleaning China’s Air. Nature 2012, 484, 161–162. [Google Scholar] [CrossRef]
  43. Bai, L.; Wang, J.; Ma, X.; Lu, H. Air Pollution Forecasts: An Overview. Int. J. Environ. Res. Public Health 2018, 15, 780. [Google Scholar] [CrossRef]
  44. Carmichael, G.R.; Sandu, A.; Chai, T.; Daescu, D.N.; Constantinescu, E.M.; Tang, Y. Predicting Air Quality: Improvements through Advanced Methods to Integrate Models and Measurements. J. Comput. Phys. 2008, 227, 3540–3571. [Google Scholar] [CrossRef]
  45. Ni, X.Y.; Huang, H.; Du, W.P. Relevance Analysis and Short-Term Prediction of Pm Concentrations in Beijing Based on Multi-Source Data. Atmos. Environ. 2017, 150, 146–161. [Google Scholar] [CrossRef]
  46. Wei, J.; Li, Z.Q.; Cribb, M.; Huang, W.; Xue, W.H.; Sun, L.; Guo, J.P.; Peng, Y.R.; Li, J.; Lyapustin, A.; et al. Improved 1 Km Resolution PM2.5 Estimates across China Using Enhanced Space-Time Extremely Randomized Trees. Atmos. Chem. Phys. 2020, 20, 3273–3289. [Google Scholar] [CrossRef]
  47. Wei, J.; Li, Z.Q.; Lyapustin, A.; Sun, L.; Peng, Y.R.; Xue, W.H.; Su, T.N.; Cribb, M. Reconstructing 1-Km-Resolution High-Quality PM2.5 Data Records from 2000 to 2018 in China: Spatiotemporal Variations and Policy Implications. Remote Sens. Environ. 2021, 252, 112136. [Google Scholar] [CrossRef]
  48. Freeman, B.S.; Taylor, G.; Gharabaghi, B.; Thé, J. Forecasting Air Quality Time Series Using Deep Learning. J. Air Waste Manag. Assoc. 2018, 68, 866–886. [Google Scholar] [CrossRef] [PubMed]
  49. Gilik, A.; Ogrenci, A.S.; Ozmen, A. Air Quality Prediction Using Cnn+Lstm-Based Hybrid Deep Learning Architecture. Environ. Sci. Pollut. Res. Int. 2022, 29, 11920–11938. [Google Scholar] [CrossRef] [PubMed]
  50. Chang, Y.S.; Chiao, H.T.; Abimannan, S.; Huang, Y.P.; Tsai, Y.T.; Lin, K.M. An Lstm-Based Aggregated Model for Air Pollution Forecasting. Atmos. Pollut. Res. 2020, 11, 1451–1463. [Google Scholar] [CrossRef]
  51. Zaini, N.; Ahmed, A.N.; Ean, L.W.; Chow, M.F.; Malek, M.A. Forecasting of Fine Particulate Matter Based on Lstm and Optimization Algorithm. J. Clean. Prod. 2023, 427, 139233. [Google Scholar] [CrossRef]
  52. Ministry of Ecology and Environment of the People’s Republic of China. Technical Regulation on Ambient Air Quality Index (On Trial); China Environmental Science Press: Beijing, China, 2012.
  53. Harishkumar, K.S.; Yogesh, K.M.; Gad, I. Forecasting Air Pollution Particulate Matter (PM2.5) Using Machine Learning Regression Models. Procedia Comput. Sci. 2020, 171, 2057–2066. [Google Scholar] [CrossRef]
  54. Ding, C.; Wang, G.; Zhang, X.; Liu, Q.; Liu, X. A Hybrid Cnn-Lstm Model for Predicting PM2.5 in Beijing Based on Spatiotemporal Correlation. Environ. Ecol. Stat. 2021, 28, 503–522. [Google Scholar] [CrossRef]
  55. Hsieh, C.-H.; Kuan-Yu, C.; Jiang, M.-Y.; Liaw, J.-J.; Shin, J. Estimation of PM2.5 Concentration Based on Support Vector Regression with Improved Dark Channel Prior and High Frequency Information in Images. IEEE Access 2022, 10, 48486–48498. [Google Scholar] [CrossRef]
  56. Kow, P.Y.; Hsia, I.W.; Chang, L.C.; Chang, F.J. Real-Time Image-Based Air Quality Estimation by Deep Learning Neural Networks. J. Environ. Manag. 2022, 307, 114560. [Google Scholar] [CrossRef]
  57. Luo, Z.Y.; Huang, F.F.; Liu, H. PM2.5 Concentration Estimation Using Convolutional Neural Network and Gradient Boosting Machine. J. Environ. Sci. 2020, 98, 85–93. [Google Scholar] [CrossRef]
Figure 1. Data acquisition location maps. (a) Shanghai dataset: the distance between the surveillance camera location and the air quality monitoring station is 3.51 km, and (b) Kaohsiung dataset: each surveillance camera location and the corresponding air quality monitoring station are co-located at the same site, where the cyan grid represents meteorological data units.
Figure 1. Data acquisition location maps. (a) Shanghai dataset: the distance between the surveillance camera location and the air quality monitoring station is 3.51 km, and (b) Kaohsiung dataset: each surveillance camera location and the corresponding air quality monitoring station are co-located at the same site, where the cyan grid represents meteorological data units.
Sensors 25 00095 g001
Figure 2. The network architecture of dual-channel integrated model. (a) Data fusion module: Combines multi-source homogeneous numerical data into a unified format for processing; (b) LSTM single-channel network module: Extracts temporal features from atmospheric, meteorological, and temporal data; (c) VGG16-LSTM single-channel network module: Extracts spatiotemporal features from surveillance image sequences; and (d) Feature fusion and forecasting module: Merges features and outputs time-series predictions of PM2.5 and PM10 concentrations.
Figure 2. The network architecture of dual-channel integrated model. (a) Data fusion module: Combines multi-source homogeneous numerical data into a unified format for processing; (b) LSTM single-channel network module: Extracts temporal features from atmospheric, meteorological, and temporal data; (c) VGG16-LSTM single-channel network module: Extracts spatiotemporal features from surveillance image sequences; and (d) Feature fusion and forecasting module: Merges features and outputs time-series predictions of PM2.5 and PM10 concentrations.
Sensors 25 00095 g002
Figure 3. Scatter density plots of forecast results of two atmospheric particulate matters. (a) PM2.5 and (b) PM10. Black dashed lines denote 1:1 lines, and red solid lines denote best-fit lines from the linear regression. Note: A single ground truth value (x-axis) may correspond to multiple predicted values (y-axis) in the scatter plot because each ground truth value can appear at different forecasted time points in different sample sequences when using the sliding window approach for sample generation.
Figure 3. Scatter density plots of forecast results of two atmospheric particulate matters. (a) PM2.5 and (b) PM10. Black dashed lines denote 1:1 lines, and red solid lines denote best-fit lines from the linear regression. Note: A single ground truth value (x-axis) may correspond to multiple predicted values (y-axis) in the scatter plot because each ground truth value can appear at different forecasted time points in different sample sequences when using the sliding window approach for sample generation.
Sensors 25 00095 g003
Figure 4. Comparison of prediction results on two atmospheric particulate matters with different combinations of forecast factors. (a) PM2.5 and (b) PM10.
Figure 4. Comparison of prediction results on two atmospheric particulate matters with different combinations of forecast factors. (a) PM2.5 and (b) PM10.
Sensors 25 00095 g004
Figure 5. Comparison of forecast accuracy on two atmospheric particulate matters over different forecast time lags. (a) PM2.5 and (b) PM10.
Figure 5. Comparison of forecast accuracy on two atmospheric particulate matters over different forecast time lags. (a) PM2.5 and (b) PM10.
Sensors 25 00095 g005
Figure 6. Comparison of prediction results on two atmospheric particulate matters across different forecast durations. (a) PM2.5 and (b) PM10.
Figure 6. Comparison of prediction results on two atmospheric particulate matters across different forecast durations. (a) PM2.5 and (b) PM10.
Sensors 25 00095 g006
Figure 7. Scatter density plots of forecast results of two atmospheric particulate matter with different data processing approaches for missing values. (a) PM2.5 forecasts with data interpolation, (b) PM10 forecasts with data interpolation, (c) PM2.5 forecasts with data deletion, and (d) PM10 forecasts with data deletion. Black dashed lines denote 1:1 lines, and red solid lines denote best-fit lines from the linear regression.
Figure 7. Scatter density plots of forecast results of two atmospheric particulate matter with different data processing approaches for missing values. (a) PM2.5 forecasts with data interpolation, (b) PM10 forecasts with data interpolation, (c) PM2.5 forecasts with data deletion, and (d) PM10 forecasts with data deletion. Black dashed lines denote 1:1 lines, and red solid lines denote best-fit lines from the linear regression.
Sensors 25 00095 g007
Table 1. Summary of the multi-source dataset.
Table 1. Summary of the multi-source dataset.
Data TypeSourceVariablesResolutionValid Number
Image dataShanghai Municipal Bureau of Ecology and Environment,
https://sthj.sh.gov.cn/
Surveillance images from a single cameraHourly,
584 × 389 pixels
8499
Atmospheric dataChina National Environmental Monitoring Center,
http://www.cnemc.cn/
PM2.5, PM10, SO2, NO2, O3, COHourly,
Site-level
8374
Meteorological dataCopernicus Climate Change Service,
https://cds.climate.copernicus.eu/ (accessed on 6 September 2022)
Precipitation, temperature, surface pressure, evaporation relative humidity, wind speed, wind direction, Hourly, 0.25° × 0.25°8760
Temporal dataMetadata from datasetsMonth, hourHourly8760
Table 2. Performance comparison of different methods.
Table 2. Performance comparison of different methods.
MethodPM2.5PM10
R2RMSE (µg/m3)R2RMSE (µg/m3)
MLR0.586313.260.477626.91
RF0.754010.220.619322.97
SVR0.705511.180.586223.95
LSTM0.89906.550.838114.98
Ours0.94594.790.904511.51
Table 3. Predictions of pretrain–finetuned and retrained models at Renwu and Linyuan stations.
Table 3. Predictions of pretrain–finetuned and retrained models at Renwu and Linyuan stations.
PollutantsMetricsRenwu StationLinyuan Station
RetrainPretrain–FinetuneRetrainPretrain–Finetune
PM2.5R20.91020.97390.92080.9717
RMSE (µg/m3)3.371.813.291.96
PM10R20.89120.93840.89140.9346
RMSE (µg/m3)7.195.417.685.96
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wu, Y.; Wang, X.; Wang, M.; Liu, X.; Zhu, S. Time-Series Forecasting of PM2.5 and PM10 Concentrations Based on the Integration of Surveillance Images. Sensors 2025, 25, 95. https://doi.org/10.3390/s25010095

AMA Style

Wu Y, Wang X, Wang M, Liu X, Zhu S. Time-Series Forecasting of PM2.5 and PM10 Concentrations Based on the Integration of Surveillance Images. Sensors. 2025; 25(1):95. https://doi.org/10.3390/s25010095

Chicago/Turabian Style

Wu, Yong, Xiaochu Wang, Meizhen Wang, Xuejun Liu, and Sifeng Zhu. 2025. "Time-Series Forecasting of PM2.5 and PM10 Concentrations Based on the Integration of Surveillance Images" Sensors 25, no. 1: 95. https://doi.org/10.3390/s25010095

APA Style

Wu, Y., Wang, X., Wang, M., Liu, X., & Zhu, S. (2025). Time-Series Forecasting of PM2.5 and PM10 Concentrations Based on the Integration of Surveillance Images. Sensors, 25(1), 95. https://doi.org/10.3390/s25010095

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop