Spatiotemporal Air Pollution Forecasting in Houston-TX: A Case Study for Ozone Using Deep Graph Neural Networks

: The presence of pollutants in our atmosphere has become one of humanity’s greatest challenges. These pollutants, produced primarily by burning fossil fuels, are detrimental to human health, our climate and agriculture. This work proposes the use of a spatiotemporal graph neural network, designed to forecast ozone concentration based on the GraphSAGE paradigm, to aid in our understanding of the dynamic nature of these pollutants’ production and proliferation in urban areas. This model was trained and tested using data from Houston, Texas, the United States, with varying numbers of time-lags, forecast horizons (1, 3, 6 h ahead), input data and nearby stations. The results show that the proposed GNN-SAGE model successfully recognized spatiotemporal patterns underlying these data, bolstering its forecasting performance when compared with a benchmarking persistence model by 33.7%, 48.7% and 57.1% for 1, 3 and 6 h forecast horizons, respectively. The proposed model produces error levels lower than we could ﬁnd in the existing literature. The conclusions drawn from variable importance SHAP analysis also revealed that when predicting ozone, solar radiation becomes relevant as the forecast time horizon is raised. According to EPA regulation, the model also determined nonattainment conditions for the reference station.


Introduction
In recent years, it has become increasingly apparent that anthropogenic pollutant emissions pose one of the greatest threats to human society and the planet as we know it [1].According to the Statistical Review of World Energy [2], even though total fossil fuel consumption decreased from 490.07 EJ in 2019 to 463.24 EJ in 2020, the harmful effects of gases emitted from fossil fuels still heavily impact human health, the condition of our climate and the economy.A wide body of existing research makes the negative influence of pollutants on the human body abundantly clear.This research highlights a significant contribution to the development of various illnesses across all pathology, be they neurological [1,3], cancerous [4,5], coronary [6], respiratory [7], or even psychological [8], amounting to countless premature deaths [9][10][11].
Hazardous air pollutants also impact the global environment, as previous studies suggest [12][13][14].The accumulation of pollutants in the atmosphere due to human activity boosts the absorption and reemission of energy in the atmosphere, resulting in a trend of gradual increase in global temperature caused by the greenhouse effect [15,16].The economic loss due to the impacts of pollutant gases have also been studied thoroughly, geographically spanning Africa [17], the United States [18], Europe [19,20], India [21], China [22] and beyond.
Clearly, the negative outcomes of pollutant emissions are a major obstacle for health and development across the globe.In order to mitigate the harmful impacts of this pollution, air quality regulations were developed, primarily targeted towards limiting the concentrations of hazardous materials in the atmosphere [23].
Different predictive models were developed to keep track of toxic gas concentration in the atmosphere.Chemical transport models (CTM) rely on meteorological and chemical processes occurring in the atmosphere to simulate pollutant concentrations, offering a broad spatiotemporal coverage [24][25][26].This physics-based approach seeks to establish a mathematical model to assess pollutant concentration, a computationally demanding process [27,28].Other deterministic models, such as Weather Research and Forecasting (WRF) coupled with Community Multi-Scale Air Quality (CMAQ) are viable options for estimating pollution levels.These models have been successfully used to assess pollutant concentrations in different studies [24,29,30].Deterministic models combined with machine learning techniques have also proved to be viable tools for pollution forecasting [28,31,32].
Physics-based models, such as transport models, are computationally expensive, requiring long processing times [27,28,33] and relying on detailed spatial analysis for the correct modeling of urban areas [34].To overcome these drawbacks, researchers investigated machine learning (ML) to model complex spatiotemporal processes.Machine learning can accurately represent non-linear processes and eliminate the need for the explicit programming of each physical process for its implementation [25,28,35,36].Machine learning models have improved computational performance compared to physics-based models [37].
Previous research in this field [38] investigated three different machine learning (ML) models, namely support vector machines (SVM), decision trees (DT) and artificial neural networks (ANN), using both univariate and multivariate data input.The authors proposed forecasting pollutant concentration up to 24 h ahead in Qatar.The research results indicated that the multivariate approach led to better results, reaching the best value for Normalized Root Mean Squared Error (nRMSE) of 6.1% for 1 h in advance ozone estimation.In [39], the authors proposed to use regression trees (RT) to forecast the daily maximum 8 h average ozone concentrations over China.The RT model successfully captured spatiotemporal concentrations of ozone, being more accurate than CTM models and requiring less computational effort, with a coefficient of determination (R 2 ) and Root Mean Squared Error (RMSE) of 0.69 and 26 µg/m 3 , respectively.In another research effort [40], the authors compared eight different ML models to predict future ozone levels in India for 24 h.The models were trained and tested using two distinct approaches: a whole year of data and seasonal parted data.The results showed that amongst the assessed models, XGBoost reached the best results for the yearly training with an R 2 equal to 0.61.The overall models' predictive capacity increased for the seasonal-trained approach, where XGBoost was the best model again, with R 2 equal to 0.75 for forecasting during the winter season, an improvement of 18%.
More recently, deep learning (DL) models were applied to estimate pollutant concentration levels [33].Seng et al. [41] used the long short-term memory (LSTM) model with a multi-output and multi-index of supervised learning to estimate air quality over Beijing, considering the data from the present monitoring station and the ones on its surroundings.The results proved that their proposed model achieved the best results compared to their baseline DL models.Deep learning techniques can also be used with satellite data to estimate pollution levels in the atmosphere, as found in [42].The authors used the Learning Surface Ozone (LESO) framework and deep forests to estimate ozone levels over the United States, Europe and India.The results showed improvement for R 2 of over 30%.In the US and Europe, R 2 achieved values greater than 0.9, while in India, the model had a worse performance reaching an R 2 of 0.39.Combinations of DL models have also been used to forecast atmospheric conditions regarding pollutants.In [43], authors propose using convolutional neural networks (CNN) and LSTM models to predict air quality for Barcelona, Kocaeli and Istanbul.This hybrid approach combines the best characteristics of each model to analyze temporal data by the LSTM and the spatial understanding of data by CNN.Compared with results found in the literature, their CNN-LSTM architecture improved up to 53% prediction for particulate matter, 31% for ozone forecasting, 47% for oxides of nitrogen and 46% for sulphur dioxide, using a multivariate approach.Deep learning models were also implemented along attention mechanisms, allowing the model to focus on important information from the dataset [44].In study [45], a DL model based on CNN and spatiotemporal attention was developed to forecast atmospheric particulate matter (PM 10 and PM 2.5 ) up to 4 h ahead over the Yangtze River Delta airshed in China.The proposed model's performance was superior over the assessed state-of-the-art baseline models, reaching mean improvement up to 15.25% for PM 2.5 and 16.91% for PM 10 , proving to be a reliable framework for pollutant prediction.
Another promising tool for estimating air pollution is based on graph theory.Graphbased models supplement traditional DL models by directly taking into account spatiotemporal characteristics underlying the used information [46,47].In [48], authors combined graph convolutional neural networks (GC) and LSTM models to forecast PM 2.5 concentration in China.The authors compared the GC-LSTM proposed model with other state-ofthe-art approaches, showing that their model outperformed the baselines for all forecast horizons from 1 h up to 72 h ahead.Graph attention networks (GAT) are another possible paradigm to predict future pollutant concentration.In work [49], the authors developed a convolutional graph sequence-to-sequence with an attention model to assess future ozone concentrations over a long time series.The proposed model proved realiable and had superior performance over the benchmarking models considered.An application of GAT together with a recurrent network can be found in research [50].Their authors developed a GAT model that learns the spatial dependencies of the forecast pollutant for different locations.The application of graph-based models proved to achieve better results than previous ML baseline models, improving RMSE over 3.5% compared with the best benchmarking result.The reviewed literature points to the importance of ML, DL and, more recently, graph-based models in developing precise pollutant forecasting tools.
The present work proposes a novel graph-structured approach based on GraphSAGE [51] to predict ozone concentration over the urban area of Houston, Texas.A different number of time-lags and forecast horizons were considered during this study to further understand how these factors impact the model's prediction.To accomplish this, four different approaches were proposed and compared: 1.
In the first approach, the input data consisting only of past ozone concentration data and hour of the day (HoD) and day of the year (DoY) were used to estimate ozone concentration.This limitation yields the highest number of available stations to spatially feed the model.

2.
For the second approach, ozone, NO 2 /NO X and weather information (wind speed, wind direction, outdoor temperature, relative humidity, solar radiation) and HoD/DoY were used to estimate ozone concentration.This is the maximum weather data that we can use, which yields a smaller number of stations but favours some important ozone precursors (chemical and physical).

3.
The third approach used past ozone data, NO 2 /NO X , Volatile organic compounds (VOCs) (propane, isobutane, benzene, toluene, ethylbenzene), weather (wind speed, wind direction, outdoor temperature) and HoD/DoY information to estimate ozone concentration.The VOCs information limited the other input variables and the number of stations, resulting in the smallest set of stations.4.
Finally, the last approach used past ozone data, NO 2 /NO X , weather and HoD/DoY information, aiming for the maximum number of stations aggregated with chemical/physical information.
To improve on the research that was introduced above, this work's main contributions aim at deepening the scientific understanding of ozone forecasting using advanced machine learning methods by: 1.
Developing a new model to forecast air pollution concentrations.2.
Analyzing the inclusion of spatial information to deep learning air quality forecasts.

3.
The importance analysis of input variables for different data configurations over distinct deep learning forecasting horizons.4.
Producing an accurate and reliable model for ozone prediction based on DL and graph theory.
The remainder of this work is organized as follows: the applied methodology is presented in Section 2, followed by the presentation and discussion of results in Section 3. In Section 4, the found results are discussed.In Section 5 a conclusion closes the work.

Houston Database
The data used for training and testing the proposed model were acquired from the Houston metropolitan area, provided by the Texas Commission on Environmental Quality (data acquired from https://www17.tceq.texas.gov/tamis/index.cfm,accessed on 1 December 2022).This site was chosen since it is a major urban area with an important industrial and economic center, hosting large oil refineries and petrochemical industries in the North American region [52,53].Due to its highly populated area and economic activities, this region is often classified as non-attained by the Environmental Protection Agency (EPA) for the US.This means that Houston does not comply with the National Ambient Air Quality Standards (NAAQS) for air pollutant concentrations [52][53][54].For the present work, data from monitoring stations were used.The historical data comprise the period from 2011 to 2019 and has hourly information for concentrations of ozone, NO 2 /NO x , the VOCs propane, isobutane, benzene, toluene and ethylbenzene.This research supplemented the pollutant information with weather data such as wind speed, wind direction, outdoor temperature, relative humidity and solar radiation.A map of the studied region and its stations is presented in Figure 1.The topographic maps containing the resulting stations and their height for each proposed approach are depicted in panes (a) to (d) of Figure 2. It is worth mentioning that most stations are on flat, low-lying terrain.The topographic maps containing the resulting stations and their height for each proposed approach are depicted in panes (a) to (d) of Figure 2. It is worth mentioning that most stations are on flat, low-lying terrain.
Atmosphere 2023, 14, x FOR PEER REVIEW 6 of 27 In Figure 2, the right-side bar indicates the terrain's height, in meters, for the Houston area, whilst the blue squares represent the location of each weather station.The varying number of stations in each pane is caused by the lack of data from some stations, resulting in stations being filtered out to keep the measurements uniform on each approach, keeping the ones containing enough valid information for the model assessment.Based on that, the reference station used for assessing all the proposed forecasting approaches is station 482011035, located in the southern part of Houston, as shown in Figure 2.
Wind speed and direction are the weather variables with the greatest influence on ozone concentration in the atmosphere [55].Thus, it is of interest to understand the wind behavior at the weather stations during the assessed period.The wind characterization In Figure 2, the right-side bar indicates the terrain's height, in meters, for the Houston area, whilst the blue squares represent the location of each weather station.The varying number of stations in each pane is caused by the lack of data from some stations, resulting in stations being filtered out to keep the measurements uniform on each approach, keeping the ones containing enough valid information for the model assessment.Based on that, the reference station used for assessing all the proposed forecasting approaches is station 482011035, located in the southern part of Houston, as shown in Figure 2.
Wind speed and direction are the weather variables with the greatest influence on ozone concentration in the atmosphere [55].Thus, it is of interest to understand the wind behavior at the weather stations during the assessed period.The wind characterization for the stations showed that the wind essentially had a similar trend for each station, being more prominent from a region between east and south.Similar behavior is found for the stations of all the different approaches but number 1, since it does not consider any weather information.

Persistence Model
The persistence model was selected as a baseline model to assess the proposed model's performance.The persistence model is often used in forecasting tasks as benchmarking to compare with novel proposed models and states that the predicted attribute will be the same as the one measured at present [56][57][58].The persistence model performs well for short forecast horizons [56,59,60], offering good baseline comparison.

LASSO Model
The LASSO (Least Absolute Shrinkage and Selection Operator) model [61] performs attribute selection and normalization of linear problems using L1 penalization.For the present work, LASSO was used as a benchmarking model to assess the size of the historical dataset.

GraphSAGE Model
The GraphSAGE (sample and aggregate) model is a framework where nodes are equally aggregated to learn embedding functions to generalize unseen information by training a set of aggregator functions using samples of a constant size set of neighboring nodes [51,62,63].Its architecture allows for the topological structure of different nodes to be learnt, as well as their distribution along the dataset, ultimately leading to the model's generalization [51].The main advantage of graph-based models over the traditional ML models is that they naturally consider spatial relationships between the assessed stations.This is essential in forecasting pollutant concentration, such as ozone, since it relies on both temporal and spatial features, also facilitating the understanding of its base structure [46,47].Figure 3 displays a logic representation for the graph paradigm used in this work, while the applied GraphSAGE structure is depicted in Figure 4.
for the stations showed that the wind essentially had a similar trend for each stati more prominent from a region between east and south.Similar behavior is foun stations of all the different approaches but number 1, since it does not cons weather information.

Persistence Model
The persistence model was selected as a baseline model to assess the p model's performance.The persistence model is often used in forecasting tasks a marking to compare with novel proposed models and states that the predicted will be the same as the one measured at present [56][57][58].The persistence model well for short forecast horizons [56,59,60], offering good baseline comparison.

LASSO Model
The LASSO (Least Absolute Shrinkage and Selection Operator) model [61] attribute selection and normalization of linear problems using L1 penalization present work, LASSO was used as a benchmarking model to assess the size of th ical dataset.

GraphSAGE Model
The GraphSAGE (sample and aggregate) model is a framework where n equally aggregated to learn embedding functions to generalize unseen inform training a set of aggregator functions using samples of a constant size set of nei nodes [51,62,63].Its architecture allows for the topological structure of different be learnt, as well as their distribution along the dataset, ultimately leading to the generalization [51].The main advantage of graph-based models over the tradit models is that they naturally consider spatial relationships between the assessed This is essential in forecasting pollutant concentration, such as ozone, since it both temporal and spatial features, also facilitating the understanding of its base [46,47].Figure 3 displays a logic representation for the graph paradigm used in th while the applied GraphSAGE structure is depicted in Figure 4.    Figure 3 depicts the graph approach using five nearby stations, represented as circles as an example.From the figure, nearby stations send information to the reference study station, located in the middle of the image at the endpoint of the arrow's heads, also sending data to itself.
Figure 4 shows how the input spatiotemporal data from the nodes are fed to the model, passing through successive convolutional, dropout layers and leaky ReLU activation functions.After four repetitions, the output data are then sent to be processed by a dense layer, where they will finally return the predicted value for ozone concentration by its output layer.

Evaluation Metrics
To compare the proposed model performance against baseline models, the metrics RMSE, nRMSE, Forecast Skill and R 2 were used.Their formulation can be found in references [57] for RMSE, nRMSE and Forecast Skill and reference [64] for R 2 .

SHAP Analysis
Although ML models lead to state-of-the-art results in many scientific fields, these models lack explainability, making them hard to be interpreted beyond their results.Addressing this matter, Shapley Additive Explanations (SHAP) was implemented after model training using the SHAP library for Python language (the SHAP library documentation can be read at https://shap.readthedocs.io/en/latest/,accessed on 26 January 2023).The SHAP approach offers a way to locally explain ML models through game theory, by assessing a model's results for each case when a variable is not considered, finding relationships between such variables and ranking them from the most to the least important [65].The SHAP approach has proved to be a versatile tool with multidisciplinary applications, being successfully implemented for regression problems [66], model optimization [67,68], parameter selection [69] and even in the medical field [70,71], helping researchers to fully analyze their results.

Dataset Size
For the model to return meaningful and reliable results, it is necessary to verify if the information contained in the dataset is sufficient for training.To do so, different dataset sizes for training were tested.The results achieved by the GraphSAGE model were compared to the baseline models LASSO and persistence, as shown in Figure 5 (a) for RMSE and (b) for Forecast Skill, respectively.

Evaluation Metrics
To compare the proposed model performance against baseline models, the metrics RMSE, nRMSE, Forecast Skill and R 2 were used.Their formulation can be found in references [57] for RMSE, nRMSE and Forecast Skill and reference [64] for R 2 .

SHAP Analysis
Although ML models lead to state-of-the-art results in many scientific fields, these models lack explainability, making them hard to be interpreted beyond their results.Addressing this matter, Shapley Additive Explanations (SHAP) was implemented after model training using the SHAP library for Python language (the SHAP library documentation can be read at https://shap.readthedocs.io/en/latest/,accessed on 26 January 2023).The SHAP approach offers a way to locally explain ML models through game theory, by assessing a model's results for each case when a variable is not considered, finding relationships between such variables and ranking them from the most to the least important [65].The SHAP approach has proved to be a versatile tool with multidisciplinary applications, being successfully implemented for regression problems [66], model optimization [67,68], parameter selection [69] and even in the medical field [70,71], helping researchers to fully analyze their results.

Dataset Size
For the model to return meaningful and reliable results, it is necessary to verify if the information contained in the dataset is sufficient for training.To do so, different dataset sizes for training were tested.The results achieved by the GraphSAGE model were compared to the baseline models LASSO and persistence, as shown in Figure 5 (a) for RMSE and (b) for Forecast Skill, respectively.In Figure 5, the x-axis represents the number of years used to train the model, ranging from one to nine years.For panel (a), it is visible that increasing the number of training In Figure 5, the x-axis represents the number of years used to train the model, ranging from one to nine years.For panel (a), it is visible that increasing the number of training years has a positive influence over the GraphSAGE model, reaching the optimum value of 4.03 ppb for seven years.The baseline model LASSO showed worse performance, with a larger RMSE of 5.74 ppb for the same training data size.Figure 5b leads to similar results in terms of Forecast Skill, also indicating that both models were able to beat persistence.Once again, the graph-based model outperforms the baseline LASSO model for all assessed configurations, reaching peak results for seven years and Forecast Skill of 30% compared to persistence, whilst LASSO achieved 17%.
From Figure 5, it is also possible to conclude that the data behavior for GraphSAGE shows convergence on its results for eight years of training data.Thus, it is possible to indicate that this dataset size has enough information to provide meaningful results when used by the graph-based model.

Ozone Concentration Prediction for 1 h Forecast Horizon
The results summarized in Figure 6 were achieved for the prediction of ozone level 1 h in advance for different approaches using GraphSAGE.From Figure 5, it is also possible to conclude that the data behavior for GraphSAGE shows convergence on its results for eight years of training data.Thus, it is possible to indicate that this dataset size has enough information to provide meaningful results when used by the graph-based model.

Ozone Concentration Prediction for 1 h Forecast Horizon
The results summarized in Figure 6 were achieved for the prediction of ozone level 1 h in advance for different approaches using GraphSAGE.In Figure 6a, the worst performance for all time-lags but 6 h was reached using approach 1 (which uses only previous ozone concentration information).On the other hand, approach 4 (focused on using the max number of stations) led to more accurate results for ozone prediction, reaching the lowest RMSE value of 3.80 ppb for 4 h time-lags.Approach 2 (similar to approach 4 but using fewer stations and more weather variables) managed to reach very similar results to approach 4, with slightly worse performance with the use of 1 up to 5 h time-lags and the same result for 6 h time-lags.Interestingly, for approach 3 (which uses VOCs information to predict future ozone concentration), this variation showed little improvement over the model's forecasting capacity, surpassing the number of time-lags assessed by approach 1 for 6 h time-lags.
In pane (b), the Forecast Skill values for each approach show that approach 4 indeed leads to more significant improvement for ozone forecasting.Such an approach improved the model's accuracy over the persistence model to a maximum of 34%.Visualizing Figure 6b, it is possible to see that, once again, GraphSAGE using approach 1 had the worst performance and approach 3 worsened for further forecasting horizons.In Figure 6a, the worst performance for all time-lags but 6 h was reached using approach 1 (which uses only previous ozone concentration information).On the other hand, approach 4 (focused on using the max number of stations) led to more accurate results for ozone prediction, reaching the lowest RMSE value of 3.80 ppb for 4 h time-lags.Approach 2 (similar to approach 4 but using fewer stations and more weather variables) managed to reach very similar results to approach 4, with slightly worse performance with the use of 1 up to 5 h time-lags and the same result for 6 h time-lags.Interestingly, for approach 3 (which uses VOCs information to predict future ozone concentration), this variation showed little improvement over the model's forecasting capacity, surpassing the number of time-lags assessed by approach 1 for 6 h time-lags.
In pane (b), the Forecast Skill values for each approach show that approach 4 indeed leads to more significant improvement for ozone forecasting.Such an approach improved the model's accuracy over the persistence model to a maximum of 34%.Visualizing Figure 6b, it is possible to see that, once again, GraphSAGE using approach 1 had the worst performance and approach 3 worsened for further forecasting horizons.
The results provided in Figure 6 indicate that the prediction of future ozone values should not consider only previous values of its concentrations but rather consider external (exogenous) variables such as wind speed, wind direction and nitrogen oxides.However, considering VOCs information led to no significant change in the model's forecasting capacity, but it deteriorated its performance for a longer time-lag value.Figure 7 shows the ozone concentration values forecast by GraphSAGE using approach 4, 4 h time-lags and a validation dataset along an arbitrary time window of the validation dataset.It is also important to point out that each date represents the start of the respective day at 00:00 h.The results provided in Figure 6 indicate that the prediction of future ozone values should not consider only previous values of its concentrations but rather consider external (exogenous) variables such as wind speed, wind direction and nitrogen oxides.However, considering VOCs information led to no significant change in the model's forecasting capacity, but it deteriorated its performance for a longer time-lag value.Figure 7 shows the ozone concentration values forecast by GraphSAGE using approach 4, 4 h time-lags and a validation dataset along an arbitrary time window of the validation dataset.It is also important to point out that each date represents the start of the respective day at 00:00 h.From Figure 7, it is shown that the graph model can identify ozone peaks during the assessed period, an important feature in checking EPA compliance.It also followed closely the actual data trend over the period, indicating good overall performance.This can be better visualized in Figure 8, where the measured and forecast ozone levels, in ppb, are compared with each other.The histograms in Figure 8 represent the frequency of the statistical probabilistic distribution for the assessed values of ozone, in ppb unit.From Figure 7, it is shown that the graph model can identify ozone peaks during the assessed period, an important feature in checking EPA compliance.It also followed closely the actual data trend over the period, indicating good overall performance.This can be better visualized in Figure 8, where the measured and forecast ozone levels, in ppb, are compared with each other.The histograms in Figure 8 represent the frequency of the statistical probabilistic distribution for the assessed values of ozone, in ppb unit.
From Figure 8, it is easier to observe the good agreement between forecast and predicted ozone values using the proposed GraphSAGE DNN model.The aggregated points around the regression line indicate good agreement between forecast and measured values of ozone level, with RMSE and R 2 being 3.8 ppb and 0.95, respectively.From Figure 8, it is easier to observe the good agreement between forecast and predicted ozone values using the proposed GraphSAGE DNN model.The aggregated points around the regression line indicate good agreement between forecast and measured values of ozone level, with RMSE and R 2 being 3.8 ppb and 0.95, respectively.

Ozone Concentration Prediction for 3 h Forecast Horizon
The results for ozone estimation considering a 3 h ahead forecasting horizon are presented next.Figure 9 depicts the results achieved using the proposed GraphSAGE model for the four proposed approaches.

Ozone Concentration Prediction for 3 h Forecast Horizon
The results for ozone estimation considering a 3 h ahead forecasting horizon are presented next.Figure 9 depicts the results achieved using the proposed GraphSAGE model for the four proposed approaches.Figure 9, for all time-lags and approaches 2, 3 and 4, demonstrates that increasing the number of time-lags insignificantly improves the proposed GraphSAGE's performance.Again, approach 1 reached the worst RMSE and Forecast Skill values.Approach 1 improved the model's performance from 1 to 3 h time-lags but stabilized for higher values.Approaches 2 and 4 again led to very similar results, approach 2 being the best one with   9, for all time-lags and approaches 2, 3 and 4, demonstrates that increasing the number of time-lags insignificantly improves the proposed GraphSAGE's performance.Again, approach 1 reached the worst RMSE and Forecast Skill values.Approach 1 improved the model's performance from 1 to 3 h time-lags but stabilized for higher values.Approaches 2 and 4 again led to very similar results, approach 2 being the best one with the lowest RMSE and Forecast Skill values of 6.45 ppb and 49% for 2 time-lags, respectively.Approach 3 once more did not show significative variation for the model's performance, but instead showed a degradation in the model's performance starting from 3 h time-lags.This may indicate that information from VOC data adds noise to the model using this specific approach.Figure 10 presents the regression values using the graph model, comparing them with actual measurements over an arbitrary reference period of the validation dataset.From Figure 10, it is possible to see that the model can still follow the trends on t actual data, but less accurately when compared with Figure 7, for a 1 h ahead forecastin horizon.For this case, some peaks are not properly reproduced, being underestimated the GraphSAGE model; this can be seen for the period between 31 January and 1 Februar This is also reflected in the scatter plot depicted in Figure 11.From Figure 10, it is possible to see that the model can still follow the trends on the actual data, but less accurately when compared with Figure 7, for a 1 h ahead forecasting horizon.For this case, some peaks are not properly reproduced, being underestimated by the GraphSAGE model; this can be seen for the period between 31 January and 1 February.This is also reflected in the scatter plot depicted in Figure 11.
As previously stated, as expected, the model's performance was inferior to the one for the 1 h forecast horizon.Figure 11 shows that for the 3 h ahead horizon, there is still good agreement between real and forecast ozone concentrations, but the points are more dispersed over the plot area.This sparser distribution leads to higher RMSE and lower R 2 , which reached the respective values of 6.45 ppb and 0.84.As previously stated, as expected, the model's performance was inferior to the one for the 1 h forecast horizon.Figure 11 shows that for the 3 h ahead horizon, there is still good agreement between real and forecast ozone concentrations, but the points are more dispersed over the plot area.This sparser distribution leads to higher RMSE and lower R 2 , which reached the respective values of 6.45 ppb and 0.84.

Ozone Concentration Prediction for 6 h Forecast Horizon
The following results regard the case for predicting ozone concentration for 6 h in advance.The effects of the different number of time-lags on the model's prediction for the 6 h forecast horizon are presented in Figure 12

Ozone Concentration Prediction for 6 h Forecast Horizon
The following results regard the case for predicting ozone concentration for 6 h in advance.The effects of the different number of time-lags on the model's prediction for the 6 h forecast horizon are presented in Figure 12  From Figure 12, predicting ozone levels 6 h into the future proved challenging for the model.All approaches, besides number 1, return values for RMSE and Forecast Skill very close together, indicating that the model is not relying on the different features used by each approach, as well as their past values (the model is quite insensible to the number of time-lags).Approach number 2 generated the best results for both assessed metrics, reach- From Figure 12, predicting ozone levels 6 h into the future proved challenging for the model.All approaches, besides number 1, return values for RMSE and Forecast Skill very close together, indicating that the model is not relying on the different features used by each approach, as well as their past values (the model is quite insensible to the number of time-lags).Approach number 2 generated the best results for both assessed metrics, reaching the best values for RMSE and Forecast Skill of 8.09 ppb and 57% for 7 h time-lags, respectively.Approaches 3 and 4 could not provide better results; they degraded the model's accuracy for time-lags greater than 3 h for approach 3 and 6 h for approach 4. The results from approach 1 indicate stable behavior after 3 h time-lags.Figure 13 shows how GraphSAGE forecasting followed the real measured ozone concentration trend.Again, comparing previous results for 1 and 3 h forecasting horizons, the pres case has the weakest performance among the three.For 6 h forecasting horizons, model mismatches the actual measurements, occasionally underestimating or overe mating peaks.Figure 14 displays the scatter plot for predicted values compared with tual measurements concerning the 6 h forecasting horizon.Again, comparing previous results for 1 and 3 h forecasting horizons, the present case has the weakest performance among the three.For 6 h forecasting horizons, the model mismatches the actual measurements, occasionally underestimating or overestimating peaks.Figure 14 displays the scatter plot for predicted values compared with actual measurements concerning the 6 h forecasting horizon.
The worst model's performance for 6 h forecasting horizon results in dispersed points in Figure 14, and the maximum forecast value is essentially 80 ppb.This 'peak shaving' phenomenon is common for longer time horizon forecasting once the model loses the time connection of the data and starts tending to give average values as output.Again, this leads to increased RMSE, 8.08 ppb and reduced R 2 , 0.75.The worst model's performance for 6 h forecasting horizon results in dispersed points in Figure 14, and the maximum forecast value is essentially 80 ppb.This 'peak shaving' phenomenon is common for longer time horizon forecasting once the model loses the time connection of the data and starts tending to give average values as output.Again, this leads to increased RMSE, 8.08 ppb and reduced R 2 , 0.75.

Importance of Different Input Attributes to Ozone Forecasting
Figure 15a-c shows for every studied forecasting horizon how important the attributes are for the GraphSAGE model for each case, accordingly to SHAP analysis considering the best results setup for each forecast horizon.
In Figure 15, the variables are ordered in a descending order from the most influential to the least influential attribute to predict future ozone levels.The right-side bar represents the feature value regarding its correlation to the model's output, where the higher the value the higher is the correlation between the variable and the forecast ozone level.Positive SHAP values indicate that positive values for the assessed variable contribute positively for the model's prediction, while negative values have negative influence over the model's output.
From Figure 15, the most influential attribute for (a) and (b) is the past hour's information of ozone data from the reference station, indicated by "Ozone_L(1)", while in pane (c) the variable "sin(HoD)" takes the lead as the most influential.The importance of the HoD variable for ozone prediction is due to its cyclic nature, reaching maximum values during daytime due to photochemical reactions, and minimum values during nighttime due to deposit and elimination processes [72,73].Data from the previous 1 h information for ozone concentration, e.g., "Ozone_L(1)", have a strong influence over the model's performance, indicating that information from further in the past adds little to future estimations due to its volatile behavior.It is also visible that past ozone information coming from stations other than the reference one, e.g., "Ozone_L(1)_Station(1)", also plays an important role in the model's predictive performance.For all forecasting horizons, ozone concentration from surrounding stations is always present on the top part of the SHAP plot, indicating a strong influence over the model.In Figure 15, the variables are ordered in a descending order from the most influential to the least influential attribute to predict future ozone levels.The right-side bar represents the feature value regarding its correlation to the model's output, where the higher the value the higher is the correlation between the variable and the forecast ozone level.Positive SHAP values indicate that positive values for the assessed variable contribute positively for the model's prediction, while negative values have negative influence over the model's output.
From Figure 15, the most influential attribute for (a) and (b) is the past hour's information of ozone data from the reference station, indicated by "Ozone_L(1)", while in pane (c) the variable "sin(HoD)" takes the lead as the most influential.The importance of the Still, in Figure 15, wind speed is another influential variable on ozone prediction by the GraphSAGE model for all forecasting horizons.For 1, 3 and 6 h horizons, the best results were achieved by approach 4, approach 2 and again approach 2, respectively.For the 1 h forecast, wind speed is the most influential weather variable, according to SHAP analysis.For 3 and 6 h, solar radiation is the most relevant weather variable, indicating that iterations between the Sun's radiation and ozone are more relevant to the model's predictive performance due to the daily photochemical reactions between ozone and solar radiation.This may explain the superior performance of approach 2 over the remaining ones for longer forecasting windows.

Model's Performance for the Determination of Compliance with EPA Regulations
To further assess the proposed GraphSAGE performance, the present work also considered EPA regulations concerning ozone levels in the atmosphere.The EPA states that a region is non-attained when the fourth-highest daily value for the daily maximum 8 h average of a 3 consecutive year average surpasses the threshold of 70 ppb.To verify the station's compliance to EPA regulations through GraphSAGE assessment, one year of data was used instead of the three consecutive years.The motivation behind this change was that more precise and detailed results from the graph model were desired and the three-year period average would lead to the same results as if measured yearly.The results for this analysis are presented in Figure 16, where the dotted red line represents the 70 ppb limit.
that iterations between the Sun's radiation and ozone are more relevant to the model's predictive performance due to the daily photochemical reactions between ozone and solar radiation.This may explain the superior performance of approach 2 over the remaining ones for longer forecasting windows.

Model's Performance for the Determination of Compliance with EPA Regulations
To further assess the proposed GraphSAGE performance, the present work also considered EPA regulations concerning ozone levels in the atmosphere.The EPA states that a region is non-attained when the fourth-highest daily value for the daily maximum 8 h average of a 3 consecutive year average surpasses the threshold of 70 ppb.To verify the station's compliance to EPA regulations through GraphSAGE assessment, one year of data was used instead of the three consecutive years.The motivation behind this change was that more precise and detailed results from the graph model were desired and the three-year period average would lead to the same results as if measured yearly.The results for this analysis are presented in Figure 16, where the dotted red line represents the 70 ppb limit.In Figure 16a, the proposed GraphSAGE model successfully classified the reference station as non-attained, even in the presence of a slight underestimation of ozone In Figure 16a, the proposed GraphSAGE model successfully classified the reference station as non-attained, even in the presence of a slight underestimation of ozone concentrations for every daily maximum.The same result was achieved for the 3 h forecasting horizon in panel (b), where the applied graph model was able to determine the nonattainment condition for the reference station.Lastly, in Figure 16c, the model failed by a tiny amount to correctly attribute the studied station as non-attained for the fourth-highest daily maxima, achieving the value of 69.90 ppb.However, this marginal difference does not mitigate the overall good model performance, since it was able to correctly predict the station's non-attained condition for 1 and 3 h ahead.

Discussion
The GraphSAGE model, developed in this research, proved to be a viable tool for assessing future ozone concentrations in urban areas.The graph structure allows better modeling for ozone, considering both its spatial and temporal characteristics.For the 1 h forecasting horizon, the model achieved the best overall performance.For this case of short-term prediction, approach 4 provided the best results.For such a configuration, the regression line comparing predicted and estimated values showed excellent agreement, with RMSE and R 2 values of 3.8 ppb and 0.95, respectively.Using time-lags greater than 4 h did not improve the models' performance.
The SHAP analysis showed that for this forecast horizon, wind speed acts as an essential variable for the model's performance, alongside spatiotemporal data from the surrounding stations and time/seasonal data such as HoD and DoY.The results achieved from SHAP analysis are in accordance with the ones found in the literature.In work [74], SHAP analysis identified that for ozone estimation DoY played a major role as a variable over their model prediction.However, results for horizons greater than 1 h were improved when approach number 2 was used, indicating that solar radiation starts to play an essential role in the model's prediction, and variables DoY and HoD are not enough to satisfactorily interpret the cyclic photochemical reactions between Sun and ozone [72,73].Concerning weather variables, they have been proved to exert an important influence over ozone levels [75][76][77][78][79], although they vary accordingly to each study.The importance of weather variables may indicate why approach 1 did not achieve satisfactory results when compared against the others, since it lacked weather data.Work [80] attested the importance of VOCs over ozone levels.However, approach 3, which uses VOC data, did not achieve results as good as approaches 2 and 4.This can be explained by data availability for this approach, since VOCs information limited the other input variables and the number of stations, resulting in the smallest set of stations.
According to EPA regulation, the nonattainment condition for the reference station was also tested using the proposed GraphSAGE model.In this sense, one year of data was evaluated considering 1, 3 and 6 h forecasting horizons.For 1 and 3 h ahead, GraphSAGE successfully attested the nonattainment condition over the reference station.For the remaining horizon, it did not identify the nonattainment state for the reference location.Nevertheless, GraphSAGE's performance is not eclipsed by this result alone since the difference between the forecasted and actual values was minimal.
The developed GraphSAGE models proved to be able to reach good predictive results and can determine the future condition of the reference station in advance.To verify their performance within the ozone forecasting field, results drawn from the literature were compiled to be compared.Table 1 summarizes the model's results for RMSE, Forecast Skill and R 2 , while Table 2 compiles results for ozone prediction studies from the literature.Compared with the attention-based models proposed in [81,82], GraphSAGE has proved to reach better RMSE values for all the forecasting horizons, achieving a mean improvement of 39% for the former and over 72% for the later.Still, concerning work [81], the graph-based model did not surpass the reference one considering the coefficient of determination R 2 , which indicates better modeled values by the reference model.This, however, does not mitigate the superior performance for GraphSAGE regarding RMSE.
Comparing this study's results against deep learning approaches in [83][84][85], once again, the graph model proved to be of competitive or superior performance.Although it is not possible to compare GraphSAGE results accordingly to specific seasons, the included attribute of DoY should consider, in part, seasonal variations over ozone concentration.This way, in [83], GraphSAGE proved to capture spatiotemporal relations underlying ozone modeling better, improving ozone prediction by 59.36% over the reference model for winter and spring, and was 65.55% superior for summer and autumn for the 1 h horizon, indicating better generalization by the GraphSAGE model.Comparing our model with reference [84], which is based on LSTM and CTM, for the 6 h forecasting horizon GraphSAGE did not surpass this reference's result.However, the values are still in the same range, proving that the graph model offers competitive results for ozone forecasting.Lastly, comparing the XGBoost-based reference model in [85], RMSE improved by 71% for the assessed horizon.

Conclusions
In the present work, a graph-based model named GraphSAGE was developed to assess future ozone concentration.Using historical data from Houston, Texas, ranging from 2011 to 2019, the model was trained and tested for different dataset configurations, variables, time-lags and forecast horizons.Four different approaches were tested to obtain the best prediction for ozone: the first approach considered past ozone concentration; approach 2 considered NO 2 /NO x and weather information (wind speed, wind direction, outdoor temperature, relative humidity, solar radiation); approach 3 considered VOCs information to assess future ozone concentration; and approach 4 considered the greatest number of weather stations containing NO 2 /NO X , weather and HoD/DoY information.
The resultant values for predicted ozone were assessed using the persistence model as benchmarking, alongside RMSE, R 2 and Forecast Skill metrics.The results showed that for the 1 h horizon, approach 4 led to better results for 4 h time-lags, indicating that further increasing this parameter would rather add noise to the model.For 3 and 6 h horizons, approach 2 returned the best metric values.The respective time-lags for these horizons were 2 and 7 h.The RMSE and R 2 values were 3.80 ppb and 0.95 for the 1 h forecasting horizon, 6.45 ppb and 0.84 for the 3 h horizon and 8.09 ppb and 0.75 for the 6 h forecasting horizon.Compared with the persistence benchmarking model, Forecast Skill showed improvement for GraphSAGE of 33.7%, 48.7% and 57.1% for 1, 3 and 6 h horizons, in that order.The model's performance was also evaluated for its capacity to attest to the nonattainment condition for the reference station.Concerning 1 and 3 h horizons, GraphSAGE successfully predicted the nonattainment state for the reference location.For 6 h ahead forecasting, it failed to correctly predict such a state by a minimal difference, which does not mitigate the overall good model performance.Lastly, the graph-based model also showed good performance when compared with the literature results.It improved ozone forecasting by 72% when compared with attention-based models and 71% when compared with XGBoost-based models.The results place GraphSAGE as the state-of-the-art model for pollutant assessment.

Atmosphere 2023 , 27 Figure 1 .
Figure 1.Map of the Houston area and weather stations.

Figure 1 .
Figure 1.Map of the Houston area and weather stations.

Figure 3 .
Figure 3. Example of logic representation for the graph topology.

Figure 3
Figure3depicts the graph approach using five nearby stations, represented as an example.From the figure, nearby stations send information to the referen station, located in the middle of the image at the endpoint of the arrow's heads, a ing data to itself.

Figure 3 .
Figure 3. Example of logic representation for the graph topology.

Figure 5 .
Figure 5. Influence of different sizes for the training dataset for GraphSAGE model compared with benchmarking models LASSO and persistence in terms of RMSE (a) and Forecast Skill (b).

Figure 5 .
Figure 5. Influence of different sizes for the training dataset for GraphSAGE model compared with benchmarking models LASSO and persistence in terms of RMSE (a) and Forecast Skill (b).

Atmosphere 2023 ,
14, x FOR PEER REVIEW 10 of 27 years has a positive influence over the GraphSAGE model, reaching the optimum value of 4.03 ppb for seven years.The baseline model LASSO showed worse performance, with a larger RMSE of 5.74 ppb for the same training data size.Figure5bleads to similar results in terms of Forecast Skill, also indicating that both models were able to beat persistence.Once again, the graph-based model outperforms the baseline LASSO model for all assessed configurations, reaching peak results for seven years and Forecast Skill of 30% compared to persistence, whilst LASSO achieved 17%.

Figure 6 .
Figure 6.Effect of different number of time-lags on the model's prediction for 1 h forecast horizon in terms of RMSE (a) and Forecast Skill (b).

Figure 6 .
Figure 6.Effect of different number of time-lags on the model's prediction for 1 h forecast horizon in terms of RMSE (a) and Forecast Skill (b).

Figure 7 .
Figure 7.Comparison between forecast ozone values using GraphSAGE and validation dataset and actual measured concentrations for 1 h forecasting horizon.

Figure 7 .
Figure 7.Comparison between forecast ozone values using GraphSAGE and validation dataset and actual measured concentrations for 1 h forecasting horizon.

Figure 8 .
Figure 8. Scatter plot with marginal distribution for GraphSAGE model forecasting ozone 1 h ahead.

Figure 8 .
Figure 8. Scatter plot with marginal distribution for GraphSAGE model forecasting ozone 1 h ahead.

Atmosphere 2023 , 27 Figure 9 .
Figure 9.Effect of the different number of time-lags on the model's prediction for 3 h forecast horizon in terms of RMSE (a) and Forecast Skill (b).

Figure 9 .
Figure 9.Effect of the different number of time-lags on the model's prediction for 3 h forecast horizon in terms of RMSE (a) and Forecast Skill (b).

Figure
Figure 9, for all time-lags and approaches 2, 3 and 4, demonstrates that increasing the number of time-lags insignificantly improves the proposed GraphSAGE's performance.Again, approach 1 reached the worst RMSE and Forecast Skill values.Approach 1 improved the model's performance from 1 to 3 h time-lags but stabilized for higher values.Approaches 2 and 4 again led to very similar results, approach 2 being the best one with the lowest RMSE and Forecast Skill values of 6.45 ppb and 49% for 2 time-lags, respectively.Approach 3 once more did not show significative variation for the model's performance, but instead showed a degradation in the model's performance starting from 3 h time-lags.This may indicate that information from VOC data adds noise to the model using this specific approach.Figure10presents the regression values using the graph model, comparing them with actual measurements over an arbitrary reference period of the validation dataset.

Atmosphere 2023 ,Figure 10 .
Figure 10.Comparison between forecasted ozone values using GraphSAGE for the validation d taset and actual measured concentrations for 3 h forecasting horizon.

Figure 10 .
Figure 10.Comparison between forecasted ozone values using GraphSAGE for the validation dataset and actual measured concentrations for 3 h forecasting horizon.

Figure 11 .
Figure 11.Scatter plot with marginal distribution for GraphSAGE model forecasting ozone 3 h ahead.
in terms of RMSE (a) and Forecast Skill (b).

Figure 11 .
Figure 11.Scatter plot with marginal distribution for GraphSAGE model forecasting ozone 3 h ahead.

27 Figure 12 .
Figure 12.Effect of the different number of time-lags on the model's prediction for 6 h forecast horizon in terms of RMSE (a) and Forecast Skill (b).

Figure 12 .
Figure 12.Effect of the different number of time-lags on the model's prediction for 6 h forecast horizon in terms of RMSE (a) and Forecast Skill (b).

Atmosphere 2023 ,Figure 13 .
Figure 13.Comparison between forecast ozone values using GraphSAGE and validation dat and actual measured concentrations for 6 h forecasting horizon.

Figure 13 .
Figure 13.Comparison between forecast ozone values using GraphSAGE and validation dataset and actual measured concentrations for 6 h forecasting horizon.

Figure 14 .
Figure 14.Scatter plot with marginal distribution for GraphSAGE model forecasting ozone 6 h ahead.

Figure 15 (
Figure15 (panes (a) to (c)) shows for every studied forecasting horizon how important the attributes are for the GraphSAGE model for each case, accordingly to SHAP analysis considering the best results setup for each forecast horizon.

Figure 14 .
Figure 14.Scatter plot with marginal distribution for GraphSAGE model forecasting ozone 6 h ahead.

Figure 16 .
Figure 16.Determination of nonattainment status for reference station for 1 h (a), 3 h (b) and 6 h (c) forecasting horizons.

Figure 16 .
Figure 16.Determination of nonattainment status for reference station for 1 h (a), 3 h (b) and 6 h (c) forecasting horizons.

Table 1 .
Summary of results for ozone prediction using GraphSAGE model.

Table 2 .
Literature values for ozone prediction.