Using Neural Network NO2-Predictions to Understand Air Quality Changes in Urban Areas—A Case Study in Hamburg

Jesemann, Anne-Sophie; Matthias, Volker; Böhner, Jürgen; Bechtel, Benjamin

doi:10.3390/atmos13111929

Open AccessArticle

Using Neural Network NO₂-Predictions to Understand Air Quality Changes in Urban Areas—A Case Study in Hamburg

¹

Department of Physical Geography, Universität Hamburg, 20146 Hamburg, Germany

²

Institute of Coastal Environmental Chemistry, Helmholtz Zentrum Hereon, 21502 Geesthacht, Germany

³

Department of Geography, Ruhr-Universität Bochum, 44801 Bochum, Germany

^*

Authors to whom correspondence should be addressed.

Atmosphere 2022, 13(11), 1929; https://doi.org/10.3390/atmos13111929

Submission received: 30 September 2022 / Revised: 13 November 2022 / Accepted: 17 November 2022 / Published: 19 November 2022

(This article belongs to the Special Issue Air Quality Impacts of Vehicle Emissions)

Download

Browse Figures

Versions Notes

Abstract

:

Due to the link between air pollutants and human health, reliable model estimates of hourly pollutant concentrations are of particular interest. Artificial neural networks (ANNs) are powerful modeling tools capable of reproducing the observed variations in pollutants with high accuracy. We present a simple ANN for the city of Hamburg that estimated the hourly NO₂ concentration. The model was trained with a ten-year dataset (2007–2016), tested for the year 2017, and then applied to assess the efficiency of countermeasures against air pollution implemented since 2018. Using both meteorological data and describing the weekday dependent traffic variabilities as predictors, the model performed accurately and showed high consistency over the test data. This proved to be very efficient in detecting anomalies in the time series. The further the prediction was from the time of the training data, the more the modeled data deviated from the measured data. Using the model, we could detect changes in the time series that did not follow previous trends in the training data. The largest deviation occurred during the COVID-19 lockdown in 2020, when traffic volumes decreased significantly. Concluding our case study, the ANN based approach proved suitable for modeling the NO₂ concentrations and allowed for the assessment of the efficiency of policy measures addressing air pollution.

Keywords:

air quality; nitrogen dioxide; artificial neural networks; urban air

1. Introduction

Air quality is a complex, multifactorial system, and at the same time, is a highly relevant topic due to its connection to human health. Numerous studies have demonstrated the link between cardiovascular and lung disease with long-term pollutant exposure, specifically with nitrogen dioxide (NO₂) and particulate matter (PM_2.5 and PM₁₀). According to the European Environmental Agency [1], in 2018, about 55,000 premature deaths in the EU can be attributed to the effects of NO₂ exposure on the population. Evaluations of several clinical and epidemiological studies state that there is at least moderate evidence that adverse health effects occur even with short-term pollutant exposure such as exposure below the specified limits [2].

In many large cities, critically high concentrations of pollutants are measured, affecting their quality of life and public health. This has also been the case in many German cities. Attention to this problem increased even more after the so-called “diesel scandal”, which involved the manipulation of exhaust gas emissions by the automotive industry. In 2017 and 2018, lawsuits were filed against many German cities for not complying with the legal air quality limits [3]. This has sparked considerable attention to the topic of air quality and its effects on human health in German media and politics.

There is growing research interest in models that can consistently represent air quality in order to identify times of particularly high pollution and to be able to react to short-term changes in air quality, especially in urban areas. The deterministic prediction of pollutants is connected with a number of uncertainties given the complexity of the physical and chemical processes that influence the formation and transport of pollutants in the urban atmosphere [4,5,6].

Therefore, sophisticated machine learning methods are increasingly applied in air quality modeling, outperforming traditional statistical approaches. In a review by Cabaneros et al. [7] examining how many studies have employed artificial neural networks (ANNs) to model pollutants since 2001, a total of 139 studies were found, of which 51 papers have used this method to predict nitrogen oxides, while other studies have focused on modeling different pollutants such as particulate matter (PM), carbon dioxide (CO₂), or ozone (O₃).

ANNs can capture complex, non-linear relationships between meteorological variables and pollutant concentrations and are capable of generalizing from experience gained from training datasets to form functional relationships between variables, even if the nature of the relationships is unknown. Unlike regression analysis, ANNs perform well with a lot of noise in the data [8].

Some of the first successful applications of ANNs for modeling NO₂ concentrations in urban areas were already recorded in the late 1990s and early 2000s by Gardner and Dorling [9], Kolehmainen et al. [10], and Perez and Trier [11], showing how the proposed approaches outperformed regression-based models. Since then, many more recent studies have also presented striking results using neural networks for modeling nitrogen oxides, modeling nationwide nitrogen oxide emissions over large time periods (Stamenković et al.) [12] as well as local emissions on an hourly basis [6]. Some studies include multiple air pollutants in the models such as Jiang et al. [13], where a neural network and a heuristic algorithm were combined to develop an early-warning system for five different pollutants.

A key issue in the development of machine learning air quality models is the selection of the suitable input parameters. NO₂ levels in urban air are influenced by numerous variables representing the meteorological and pollution source conditions. Some studies have found meteorological variables such as temperature and wind speed to be important predictors as well as concentrations from other pollutants as predictors [14,15].

Another option is to consider the previously measured concentrations of the desired pollutant as predictors by using the temporal autocorrelation of the relationship between successive values of the same variable. This works particularly well if a forecast is to be made several hours in advance [16,17]. Some studies have combined this approach with a recurrent long short-term memory (LSTM) network to successfully predict NO₂ up to 8 h in advance [18]. Dai et al. [19] used LSTM combined with convolutional neural networks (CNNs). Their model could be applied to predict six different pollutants. These models often show good results but are computationally more costly than simple feed forward networks.

Other studies have used traffic volume data, either from traffic counts or from other models [20,21,22]. Given that traffic is one of the main drivers of high NO₂ concentrations, traffic counts have high predictive power. However, these data are rarely available or not well-resolved in time. He et al. [23] quantified the effects of different predictors on the day-to-day variations in the pollutant concentrations, finding that local meteorological conditions were one of the most important factors. Some studies [10,24] have suggested that good results can be achieved when combining meteorological data with periodic variables such as hour of the day (HOD) or day of the week (DOW). Radojević et al. [25] analyzed the importance of periodic parameters, specifically month of the year (MOY) for the modeling of pollutant concentrations, showing that models based on periodic parameters outperformed models solely based on meteorological predictors.

Many studies applying ANN modeling in air quality have only considered a short time series, in which traffic behavior and thus also the NO₂ concentration is largely homogeneous. This provides little knowledge about how well the models can generalize and whether they are representative for the longer-term pollutant variations at the site where they are applied. Other models that only predict the daily averages are not suitable for use in short-term interventions. There is a lack of simple models capable of estimating the current pollutant concentrations at higher temporal resolution while using easily available input data.

In our study, we attempted to develop a model that could accurately estimate the real-time hourly NO₂ concentration in the city of Hamburg, Germany that only used easily accessible standard meteorology and time data as the input variables. For this, we used an extensive training dataset spanning ten years. Furthermore, we explored the possibilities of using this model to assess the effects of countermeasures to air pollution.

As in most cities, air quality in Hamburg is monitored at a few isolated measuring stations distributed throughout the city. Several monitoring stations have already recorded high NO₂ levels above the legal limits in the past years. For this case study, Stresemannstrasse station in the inner city of Hamburg was used as it is known to be a pollutant hotspot, frequently exceeding the legal thresholds for NO₂ concentrations. This location is particularly relevant because of the high housing density in its immediate vicinity. After the limit value for the NO₂ concentrations had been regularly exceeded and the annual average NO₂ concentrations showed only slowly decreasing values for several years, some countermeasures were taken by municipal authorities of the city of Hamburg to accelerate the reduction in the NO₂ concentration at Stresemannstrasse station. These include a diesel vehicle entry restriction enacted in 2018. The COVID-19 lockdown in the spring of 2020 also affected the local pollution levels, leading to a visible decrease in the pollutant concentrations.

These developments make the station a particularly interesting place to study. In the first step, a NO₂ forecasting model was developed and examined for its general performance in 2017. Afterward, it was used in two experiments:

(i): To assess the effect of the implementation of driving restrictions from 2018 onward;
(ii): To assess the effect of the COVID-19 lockdown in spring of 2020.

2. Materials and Methods

2.1. Study Area

The city of Hamburg is in the north of Germany by the Elbe River, about 100 km southeast of the North Sea. The climate is humid throughout the year and oceanic due to the maritime influence caused by predominant westerlies. Approximately 1.8 million people live in Hamburg.

The street in which the air quality monitoring station used for this study is located is close to the city center. It is a busy main traffic artery connecting the motorway and the city center. At the same time, the residential density along this street is very high. For years, both the annual average concentration measured at this station and the number of days with peak pollution have been above the legal limits. Therefore, this area is of high relevance for local politics. On 31 May 2018, a new law was implemented that banned the passage of many diesel vehicles on this exact street.

2.2. Experimental Design

Our study was divided into model evaluation and two experiments, as seen in the workflow in Figure 1. The first step was the development of a model that estimated the hourly NO₂ concentration at the selected location. ANN models were trained with data from 10 years (2007 to 2016), and then tested for 2017. As part of the model evaluation, we analyzed the prediction over the course of the test year (2017).

In the first experiment, the best performing model was selected to predict the hourly NO₂ values for the years 2018 and 2019. Since specific vehicles were banned from the surrounding street from mid 2018 onward, the goal of this experiment was to assess whether these restrictions had an impact on the NO₂ levels. For this purpose, a comparison was made with a second location in Hamburg, Habichtsstrasse station, which is also a traffic station, but no driving bans have been enacted in its vicinity.

In the second experiment, NO₂ concentrations in the first half of 2020 were predicted, which were affected by substantial COVID-19 related lockdown measures. The deviations between observed and estimated data were then used to analyze how events and actions taken in the past three years have impacted the NO₂ concentrations.

2.3. Model

After pre-testing possible model designs, we selected a feedforward neural network containing an input and an output layer, and in between one hidden layer with four times as many neurons as input features.

We applied dropout regularization during training in the hidden layer to prevent overfitting as suggested in Srivastava et al. [26]. Input units were randomly set to zero with a frequency of 0.2. No values were dropped during inference. For the transfer function in the hidden layer, the ReLu function was used, and for the output layer, a linear function, as these achieved the best results. We used the mean squared error (MSE) as the loss function and optimized the network parameters with the AdaGrad algorithm [27]. In addition, early stopping was employed: the training was interrupted if the validation loss did not improve by at least 0.1 after a maximum of 10 epochs.

In addition, we also ran a multiple linear regression (MLR) with the same predictors. The performance of both the ANN and the MLR was then compared using the coefficient of determination (R2), the root mean square error (RMSE), and the Index of Agreement (IA) for a comparative evaluation. These parameters are commonly used and provide a measure of the quality of the prediction as well as information about the errors. Both RMSE and IA are sensitive to outliers and penalizes large deviations. The IA reflects the degree of the prediction error in a standardized way and is therefore well-suited for comparing the model with the results of other studies. The closer the agreement value is to 1, the better the agreement between the model and the observation.

To ensure the reproducibility of the model’s performance, we trained 100 ensembles of neural networks and calculated the means for each performance parameter as well as the standard deviation.

2.4. Data

The correct choice of input parameters, especially the representation of traffic, plays a significant role in predicting the pollutant concentrations. Many studies have used concentrations of other pollutants such as NO or O₃ to predict NO₂ [15,17]. This is of limited practical use for the intent of this study. More specifically, the trained model represents the status quo and predicts the NO₂ concentration without measures. As shown in Table 1, we mainly used meteorological data as input variables, along with temporal predictors to represent the diurnal, weekly, and seasonal variations in the traffic volumes. Meteorological predictors were selected according to known meteorological influences on the pollutant concentrations and dispersion. Temporal variables were included to learn diurnal, weekly, and seasonal emission characteristics. Similar predictors have been used in various studies [24]. The datasets were compiled from January 2007 up to the end of June 2020.

The data for pollutant concentrations at Stresemannstrasse station as well as for the second air monitoring station, Habichtstrasse, used for the comparison were provided by the Hamburg Air Monitoring Network. Most of the meteorological data used for the same period were taken at the weather mast of the Meteorological Institute of the University of Hamburg (UHH) and provided by the Meteorological Institute at the University of Hamburg. The mast is approximately 10 km southeast of Stresemannstrasse station and the observations are not disturbed by surrounding buildings. Some meteorological data were extracted from the Climate Data Center of the German Weather Service. This was due to too many missing values in the weather mast data.

At the weather mast, the temperature and relative humidity are measured at a height of 2 m, radiation, wind speed, and direction at a height of 10 m.

Upper meteorological parameters such as mixing height and ventilation coefficient have a large impact on NO₂ concentration. Since these data are not routinely available from meteorological observations, vertical temperature gradients were calculated as a proxy, using temperature measured at heights of 50 m, 70 m, 110 m, and 175 m.

The variable wind direction was transformed using sine and cosine to avoid inconsistency at 0/360°. All data were scaled to a range from 0 to 1 as the neural network can handle this better.

In order to enable a recognition of temporal patterns in the NO₂ fluctuation, we allowed the model to detect the typical annual, weekly, and diurnal variation of the pollutant. Time variables reflecting different frequencies in the observed data were therefore included considering the month of year (MOY), day of the week (DOW), and hour of the day (HOD). In previous studies, these variables have often been transformed and presented as sinusoidal functions, as we did with wind direction. We chose dummy encoding [28] because there was no natural ordinal relationship between individual time steps.

Additionally, we had to consider that there is already a trend in the average measured NO₂ concentration over the past 10 years, which was observed in many locations worldwide [29]. In Europe, this was caused by EU-wide emissions directives, with stricter emission limits for newer cars. Consequently, fleet modernization results in reduced total emissions. The input dataset covers a period in which most of the cars have been replaced by newer ones and the trend in stricter emission standards is not reflected by the input variables. To compensate for this, another variable was added: a counter that increases linearly over the entire dataset and was also scaled to a range from 0 to 1 at the end. Boxplots of the input variables are provided in Figure 2.

Overall, there were only a few missing values in the dataset, so different seasons and weather conditions were well-represented. For modeling, the missing values were skipped. In the end, more than 100,000 values for each variable could be used for the model.

3. Results

3.1. Model Evaluation

For all evaluation measures, the neural network performed better than the multiple linear regression (MLR), as can be seen in Table 2, reaching an R2 of 75% on the training set and 69% on the test set. It was noticeable that the error in the test dataset was smaller than in the training data. This is due to the higher variability in the data in the training dataset.

As shown in Figure 3, the model showed good temporal consistency throughout the time series and could estimate the hourly NO₂ concentration in 2017 quite accurately. However, the forecast was somewhat less accurate in spring and summer compared to the winter half of the year.

The weekly averages in Figure 4 show the typical weekly and diurnal variation pattern of the observed and predicted NO₂ values in the test period. The ranges overlap for the most part, showing that the model is very capable of capturing the diurnal cycle.

Working days have two peaks, one in the morning and one later in the afternoon, corresponding to the hours with high traffic load due to commuter and delivery traffic. In between, the concentration drops slightly. At night, it drops to a minimum as the traffic decreases significantly. The local maxima drop slightly toward the end of the week. On weekends, the morning peak is missing, and only a local maximum is seen in the afternoon.

Figure 4 clearly shows that the model captured the diurnal cycle very accurately and reflected the differences between the morning and afternoon peaks. Differences between the days of the week could also be distinguished.

Since the model was less accurate in the summer months, we also looked at the weekly cycle in individual seasons. In Figure 5, the differences between the summer months (June to August) and the winter months (December to February) are displayed. In winter, the morning and afternoon peaks were quite similar during the week. Here, the forecast and the observation corresponded very well. In summer, the observed afternoon peak was often considerably higher than the modeled value. Generally, the observed concentrations are higher in summer than in winter. Particularly high NO₂ concentrations are likely to occur in winter, when frequent ground inversions impede the vertical air exchange, while lower NO₂ observations in summer can often be explained by an increased photochemical cycle, resulting in higher ozone concentrations. In the case of this traffic station, higher concentrations in summer could be due to an insufficient amount of ozone at this location. The model had no information about the presence of reactants and therefore underestimates these concentrations.

For a better understanding of the variation in the model performance, it is helpful to look at a few shorter time periods. Figure 6 shows the first week of each quarter in 2017, comparing the predicted and the observed values.

As shown in Figure 6a, the diurnal variability caused by traffic peak hours was captured quite accurately. In the absence of detailed information on the current traffic situation or the concentrations of other pollutants in the predictor set, the neural network learns typical diurnal and weekly cycles such as working days and weekends. Therefore, weekdays that deviate from this pattern due to special events such as public holidays show large deviations between the model and observation. This is the case, for example, on Tuesday, 3 October 2017 (Figure 6d), which is a public holiday in Germany. The model assumes a regular weekday traffic pattern instead, resulting in a significant overestimation of the observed concentrations.

While this example reflects certain limitations in operational monitoring and prediction applications, it shows the model’s capability to detect anomalies in the time series. The biggest outlier in the entire test period occurred at the beginning of July 2017, when very high NO₂ concentrations were measured during the night from Thursday to Friday (Figure 6c). These were not expected based on the trained model, underlining the difficulty in predicting exceptional pollution events.

During this period, the G20 summit was held in Hamburg. It was accompanied by large street protests and riots. Thus, the high value is most likely related to various fires put on by the protesters during the largest riots that started late in the evening on 6 July in the immediate vicinity of the measuring station. This event outside the usual pattern was not captured by the model but could be detected by a comparison of the prediction and the observation. It confirms that individual events can have a considerable influence on short-term pollutant fluctuations.

To reflect the continuously decreasing NO₂ concentration in the past years, a counter was used as an input variable for the model. To quantify the value added by using the counter, we compared the model residuals for a model trained with a counter and a model trained without a counter.

Furthermore, we considered model residuals as the predicted values subtracted from the observed values. Figure 7 shows the four-weekly moving averages of both the model residuals (with and without the counter) over the period from early 2017 to the beginning of 2020.

We can deduce that the counter as an input variable is essential to capture the trend over the 10 years prior to 2017. Without the counter, the model failed to accurately predict the NO₂ concentrations in 2017. Over the course of time, however, this is still not sufficient to correctly represent the concentration variability due to changes in traffic. We can identify some periods in which the residuals are considerably higher, and periods in which the difference is smaller. Over time, the overestimation of the prediction increases continuously. This suggests that there must be either several effects that cause NO₂ concentrations to decrease, or that existing effects do not develop linearly. Apart from the modernization of the vehicle fleet (e.g., due to stricter emission standards), which is represented by the counter, other effects may also be involved. Other studies suggest that climate change policies, aiming to reduce coal and oil consumption, contribute to reductions in NO_x emissions [30]. These reductions do not necessarily take place in traffic but may also be seen in the measured time series and may be caused by lower energy consumption in nearby buildings.

3.2. Experiment 1

The trend of the overprediction of NO₂ concentrations could also be related to the restrictions on diesel vehicles that came into effect on 1 June 2018. It is not simple to assess this effect, as the tendency to overestimate values could already be observed before the restrictions came into force. We therefore considered data of a second air monitoring station in Hamburg, Habichtstrasse, and used these to compare the development of concentrations over the years to possibly deduce whether the restrictions had an effect.

This station is similar to the first one in terms of its proximity to a street with heavy traffic loads. The difference between these two stations, however, is that no direct actions were taken at Habichtstrasse during the period in question to help reduce car traffic, and no driving restrictions were applied in its vicinity in the past years.

We fitted the model at this location, achieving similar results as at the first location. The performance parameters for 2017 were almost the same with an R2 on the test set of 0.7305, RMSE of 16.33, and IA of 0.9140. We then compared the model residuals over the years shown in Figure 8. The vertical line in the graph marks the point in time when driving restrictions were applied on Stresemannstrasse (i.e., June 2018). We considered the comparison until the beginning of 2020 to ensure that, on one hand, a potential effect of the entry restrictions did not occur with a lag, and on the other hand, possible differences were not temporary or coincidental.

Although the level of the residuals fluctuates, it can be seen that the predicted values of the model were increasingly above the observed values at both locations. The two locations did not appear to display different overall trends. A noticeable difference between the model residuals can be seen toward the end of the year in 2019. This may be explained by a construction site at the location Habichtstrasse, leading to a reduction in traffic at that time. These findings suggest that the trend toward the overestimation of NO₂ concentrations is not due to a single measure such as the restriction on diesel vehicles. It appears rather plausible that the reduction in pollutant concentrations is caused by multiple effects.

3.3. Experiment 2

The course of the COVID 19 outbreak can be traced through the model residuals. In the third experiment, nitrogen dioxide concentrations were calculated for the first half of 2020. This year differed from the previous ones, as it was influenced by the onset of the COVID-19 pandemic. After early cases occurred in Germany in February and March, the first nationwide lockdown came into effect on 22 March 2020.

When looking at the model residuals for each week in 2020 (Figure 9), we can see that the model almost continuously predicted too high NO₂ concentrations for the first 9 weeks. This deviation corresponded to the developments observed in previous years. From the beginning of March (week 10), a slight change in the residuals could be seen, possibly because by this time, many individuals had already reacted and started to maintain a social distance.

The traffic from mid-March until weeks 10 to 18 was heavily impacted by the COVID lockdown as it eliminated a large part of the traffic caused by commuters. As a result, unusually low concentrations were recorded whilst the model assumed business-as-usual conditions and almost consistently overestimated the observed concentrations. From May onward (week 19), however, this effect seemed to gradually recede. This fits with the course of the lockdown, as the first relaxations were enacted in May, six weeks after its implementation.

The total span of the residuals in March and April also increased, suggesting that some episodes of very high concentrations remained. This becomes clearer in Figure 10 when comparing the observed and predicted concentrations using week 13 as an example.

Usually, we would expect two distinct concentration peaks per day: one in the morning and one in the afternoon during rush hour. This is exactly what the model predicted under business-as-usual conditions. In week 13, corresponding to the beginning of lockdown, the observed afternoon peak was reduced while a morning peak remained on the working days. In contrast to the prediction, the morning peak was higher than the afternoon peak. This might be due to delivery traffic, continuing during the lockdown and primarily driving in the morning hours. This morning concentration was estimated rather accurately by the model, but the peak was more pronounced, and the observed diurnal cycle differed.

4. Discussion

Since high NO₂-concentrations have a significant impact on human health, but air monitoring networks are only scarcely distributed and thus rarely capable of capturing the spatiotemporal variations of air pollution, the demand for well-functioning models working with accessible input data is high. In addition, clear recommendations are needed regarding appropriate policy measures against poor air quality.

In this study, we developed a model for one location in the city of Hamburg, estimating the NO₂-levels in real-time and using mainly meteorology and time as the input variables. We used that model in two experiments to reconstruct events related to air quality in the past years. We found that the variables used were well-suited to calculate the current NO₂ concentration. By applying the model to different time series, we can infer that it is primarily human behavior causing a change in emissions that has the greatest influence on local NO₂ concentrations. Although weather conditions play an important role in current pollutant concentrations, meteorology alone is not sufficient to cause short-term episodes of high local NO₂ pollution.

Our results support previous findings that neural networks are suitable to describe local pollutant concentrations. The model accurately and reliably predicts concentrations. Its strengths lie in its ability to reflect the usual diurnal, weekly, and annual variation for a wide range of weather conditions very accurately, as it has been trained on a very extensive dataset. It does not require data from other pollutants. It can also be used to track and evaluate past changes in pollutant levels, as we did in our study.

Its limitations lie in the prediction of exceptional or extreme events that are not meteorological, as these events are not represented by the training data. Furthermore, our results suggest that for an accurate prediction, the training data period should be close to the predicted period.

Overall, the model performance was comparable to those of other studies with a similar experimental design (e.g., Kukkonen et al. [24], Radojevic et al. [25], and Lee et al. [31]). The latter showed that ANNs can better forecast photochemically active substances like O₃ because of their strong dependence on meteorological conditions, in particular, temperature and radiation. This is in line with the results we achieved for NO₂.

Some studies have achieved better results because they included different input data that highly correlate with the NO₂ concentration, for instance, other pollutant concentrations [16,17] or data on traffic volumes [20,22]. However, this input data are not as readily available, especially in real-time.

In general, every statistical model is location-dependent. It shows different results depending on the area for which it is trained [24]. No previous study on modeling pollutant concentrations with ANNs used Hamburg as a study area. Therefore, differences in performance may also be explained by different climatic conditions, traffic volumes, or pollution levels.

The training of the model was performed with the necessary tools to avoid overfitting, making the model robust and capable of generalizing. Another advantage is that it is trained with data easily accessible at many locations. The concept itself could therefore be transferred somewhere else. However, the performance of the model at different locations could not be assessed with this case study and requires further research. Additionally, for training and for validation, suitable high-resolution air quality and meteorology data must be available. Monitoring networks with better spatial coverage is therefore indispensable.

The results also provide clues as to how the model could be improved. Since we observed that deviations mainly occurred on days that did not fit into the usual everyday pattern, it might be helpful to exclude public holidays from the time series and thereby mitigate particularly large outliers. As an alternative, variables could be included to mark specific holidays. Depending on the desired application and the availability of the data, other pollutant concentrations may also be included as input data to make the estimate even more accurate.

In our study, we used the ANN for now-casting. The basic structure of the model, however, also allows for forecasting with a few modifications. The model performance would have to be re-evaluated in this case, as the prediction error of forecasted input parameters can amplify the prediction error. Gonzáles-Enrique et al. [18], Maleki et al. [32], and Alkabbani et al. [33], for example, have shown successful applications of ANNs for forecasting pollutant concentrations in advance. ANNs can thereby serve as a tool for urban planning. For example, temporary traffic regulations could be considered when an episode of particularly high air pollution is anticipated. However, our results confirm that it is generally difficult to predict exceptional pollution events as they are outliers in the data and are caused by factors external to the observed system. Even the use of a very extensive training dataset, as in our case, is not sufficient to foresee special cases.

As an additional finding, we observed that the prediction of the model diverged continuously from the observation as the test period moved away from the training period. If the model is to be used for operational monitoring, it would have to be trained repeatedly with the current data. For our study, however, the focus was on detecting deviations or trends in the NO₂ fluctuations and by that, possibly assessing the effects of the emission reduction measures taken in the past years.

The driving bans for diesel vehicles did not seem to show a significant effect in the overall course of the concentrations. This might be due to the low number of vehicles affected by the ban, or because the law only applied to a section of the road. At the same time, we could see that individual events, even those that have nothing to do with traffic, can cause short episodes of heavy pollution, for example, during the G20 summit in July 2017.

A strong reduction in NO₂ concentrations could be observed in 2020, when a large part of the traffic was eliminated due to the COVID-19 lockdown. The associated effects of the lockdown show that traffic avoidance is one of the most effective measures to prevent air pollution. According to further studies, the lockdown effects varied across different regions but were the strongest in urban areas with reductions up to 55%. It is also emphasized that the effects were very small in the long-term, as the measures only lasted a few weeks [34].

5. Conclusions

In our case study, we were able to develop a neural network that could properly estimate a large proportion of local NO₂ concentrations using only meteorological and periodic variables as predictors. Based on this model, we can explain and evaluate changes in NO₂ concentrations and assess the impact of human behavior. The model predicted NO₂ concentrations accurately at the station level for the period of more than one year and has potential for further applications if minor adjustments are made. We can therefore conclude that neural networks are promising tools for researching air quality.

To better advise policymakers, urban planners, and public health officials, further studies in this area are needed. Further research could include the extrapolation to an entire station network to explore spatial patterns and an analysis to which extended spatial predictors can be included to derive spatio-temporal patterns. Furthermore, it could be worth examining the importance of variables, for example, whether vertical temperature gradients are needed or can be substituted by reanalysis data.

In addition, our study highlights the need for data that are both freely accessible and, in the best case, high resolved and covering long periods of time, emphasizing the importance of crowd-sourcing campaigns or the expansion of the air quality monitoring network.

Author Contributions

Conceptualization, A.-S.J. and B.B.; Data curation, A.-S.J.; Formal analysis, A.-S.J. and B.B.; Methodology, A.-S.J., V.M. and B.B.; Project administration, B.B.; Resources, V.M., J.B. and B.B.; Supervision, V.M., J.B. and B.B.; Validation, V.M.; Visualization, A.-S.J.; Writing—original draft, A.-S.J.; Writing—review & editing, V.M., J.B. and B.B. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Publicly available weather data were analyzed in this study. This data can be found here: https://cdc.dwd.de/portal/ (accessed on 18 November 2022). Further weather data were obtained from the Institute of Meteorology, Universität Hambugr and is available on demand on https://wettermast.uni-hamburg.de/ (accesses on 18 November 2022). Air quality data is available on demand on https://luft.hamburg.de/clp/schadstoffe/clp1/ (accessed on 18 November 2022).

Acknowledgments

The authors acknowledge the Hamburg Luftmessnetz and the Institute of Meteorology for providing air quality and weather data used in the study.

Conflicts of Interest

The authors declare no conflict of interest.

References

European Environment Agency; González Ortiz, A.; Guerreiro, C.; Soares, J. Air Quality in Europe: 2020 Report; EU Publications: Luxembourg, 2020. [Google Scholar] [CrossRef]
Latza, U.; Gerdes, S.; Baur, X. Effects of nitrogen dioxide on human health: Systematic review of experimental and epidemiological studies conducted between 2002 and 2006. Int. J. Hyg. Environ. Health 2009, 212, 271–287. [Google Scholar] [CrossRef]
Deutsche Umwelthilfe. Right to clean air. Europe Background Paper. 2019. Available online: https://www.duh.de/fileadmin/user_upload/download/Projektinformation/Verkehr/Luftreinhaltung/Right-to-Clean-Air_Europe_Backgroundpaper_EN.pdf (accessed on 30 July 2022).
Baklanov, A.; Molina, L.T.; Gauss, M. Megacities, air quality and climate. Atmos. Environ. 2016, 126, 235–249. [Google Scholar] [CrossRef]
Canepa, E.; Builtjes, P.J.H. Thoughts on Earth System Modeling: From global to regional scale. Earth-Sci. Rev. 2017, 171, 456–462. [Google Scholar] [CrossRef]
Arhami, M.; Kamali, N.; Rajabi, M.M. Predicting hourly air pollutant levels using artificial neural networks coupled with uncertainty analysis by Monte Carlo simulations. Environ. Sci. Pollut. Res. 2013, 20, 4777–4789. [Google Scholar] [CrossRef] [PubMed]
Cabaneros, S.M.; Calautit, J.K.; Hughes, B.R. A review of artificial neural network models for ambient air pollution prediction. Environ. Model. Softw. 2019, 119, 285–304. [Google Scholar] [CrossRef]
Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016. [Google Scholar]
Gardner, M.; Dorling, S. Neural network modelling and prediction of hourly NO_x and NO₂ concentrations in urban air in London. Atmos. Environ. 1999, 33, 709–719. [Google Scholar] [CrossRef]
Kolehmainen, M.; Martikainen, H.; Ruuskanen, J. Neural networks and periodic components used in air quality forecasting. Atmos. Environ. 2001, 35, 815–825. [Google Scholar] [CrossRef]
Perez, P.; Trier, A. Prediction of NO and NO₂ concentrations near a street with heavy traffic in Santiago, Chile. Atmos. Environ. 2001, 35, 1783–1789. [Google Scholar] [CrossRef]
Stamenković, L.J.; Antanasijević, D.Z.; Ristić, M.; Perić-Grujić, A.A.; Pocajt, V.V. Prediction of nitrogen oxides emissions at the national level based on optimized artificial neural network model. Air Qual. Atmos. Health 2017, 10, 15–23. [Google Scholar] [CrossRef]
Jiang, P.; Li, C.; Li, R.; Yang, H. An innovative hybrid air pollution early-warning system based on pollutants forecasting and Extenics evaluation. Knowl.-Based Syst. 2019, 164, 174–192. [Google Scholar] [CrossRef]
Niska, H.; Hiltunen, T.; Karppinen, A.; Ruuskanen, J.; Kolehmainen, M. Evolving the neural network model for forecasting air pollution time series. Eng. Appl. Artif. Intell. 2004, 17, 159–167. [Google Scholar] [CrossRef]
Ding, W.; Zhang, J.; Leung, Y. Prediction of air pollutant concentration based on sparse response back-propagation training feedforward neural networks. Environ. Sci. Pollut. Res. 2016, 23, 19481–19494. [Google Scholar] [CrossRef]
Liu, H.; Wu, H.; Lv, X.; Ren, Z.; Liu, M.; Li, Y.; Shi, H. An intelligent hybrid model for air pollutant concentrations forecasting case of Beijing in China. Sustain. Cities Soc. 2019, 47, 101471. [Google Scholar] [CrossRef]
Cabaneros, S.M.S.; Calautit, J.K.S.; Hughes, B.R. Hybrid artificial neural network models for effective prediction and mitigation of urban roadside NO₂ pollution. Energy Procedia 2017, 142, 3524–3530. [Google Scholar] [CrossRef]
González-Enrique, J.; Ruiz-Aguilar, J.J.; Moscoso-López, J.A.; Urda, D.; Deka, L.; Turias, I.J. Artificial neural networks, sequence-to-sequence LSTMs, and exogenous variables as analytical tools for NO₂ (air pollution) forecasting: A case study in the bay of algeciras (Spain). Sensors 2021, 21, 1770. [Google Scholar] [CrossRef] [PubMed]
Dai, H.; Huang, G.; Wang, J.; Zeng, H.; Zhou, F. Prediction of Air Pollutant Concentration Based on One-Dimensional Multi-Scale CNN-LSTM Considering Spatial-Temporal Characteristics: A Case Study of Xi’an, China. Atmosphere 2021, 12, 1626. [Google Scholar] [CrossRef]
Yeganeh, B.; Hewson, M.G.; Clifford, S.; Tavassoli, A.; Knibbs, L.D.; Morawska, L. Estimating the spatiotemporal variation of NO₂ concentration using an adaptive neuro-fuzzy inference system. Environ. Model. Softw. 2018, 100, 222–235. [Google Scholar] [CrossRef] [Green Version]
Catalano, M.; Galatioto, F.; Bell, M.; Namdeo, A.; Bergantino, A.S. Improving the prediction of air pollution peak episodes generated by urban transport networks. Environ. Sci. Policy 2016, 60, 69–83. [Google Scholar] [CrossRef] [Green Version]
Zito, P.; Chen, H.; Bell, M.C. Predicting real-time roadside CO and NO₂ concentrations using neural networks. IEEE Trans. Intell. Transp. Syst. 2008, 9, 514–522. [Google Scholar] [CrossRef]
He, H.; Lu, W.-Z.; Xue, Y. Prediction of particulate matter at street level using artificial neural networks coupling with chaotic particle swarm optimization algorithm. Build. Environ. 2014, 78, 111–117. [Google Scholar] [CrossRef]
Kukkonen, J.; Partanen, L.; Karppinen, A.; Ruuskanen, J.; Junninen, H.; Kolehmainen, M.; Niska, H.; Dorling, S.; Chatterton, T.; Foxall, R.; et al. Extensive evaluation of neural network models for the prediction of NO₂ and PM₁₀ concentrations, compared with a deterministic modelling system and measurements in central Helsinki. Atmos. Environ. 2003, 37, 4539–4550. [Google Scholar] [CrossRef]
Radojević, D.; Antanasijević, D.; Perić-Grujić, A.; Ristić, M.; Pocajt, V. The significance of periodic parameters for ANN modeling of daily SO₂ and NO_x concentrations: A case study of Belgrade, Serbia. Atmos. Pollut. Res. 2018, 10, 621–628. [Google Scholar] [CrossRef]
Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 2011, 15, 1929–1958. Available online: http://jmlr.org/papers/v15/srivastava14a.html (accessed on 17 April 2022).
Duchi, J.; Hazan, E.; Singer, Y. Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 2011, 12, 2121–2159. [Google Scholar]
Garavaglia, S.; Sharma, A. A smart guide to dummy variables: Four applications and a macro. In Proceedings of the Northeast SAS Users Group Conference, Murray Hill, NJ, USA; 1998. [Google Scholar]
Fowler, D.; Brimblecombe, P.; Burrows, J.; Heal, M.R.; Grennfelt, P.; Stevenson, D.S.; Jowett, A.; Nemitz, E.; Coyle, M.; Liu, X.; et al. A chronology of global air quality. Philos. Trans. R. Soc. A 2020, 378, 20190314. [Google Scholar] [CrossRef] [PubMed]
Rao, S.; Klimont, Z.; Smith, S.J.; Dingenen, R.V.; Dentener, F.; Bouwman, L.; Riahi, K.; Amann, M.; Bodirsky, B.L.; van Vuuren, D.P.; et al. Future air pollution in the shared socio-economic pathways. Glob. Environ. Chang. 2017, 42, 346–358. [Google Scholar] [CrossRef]
Lee, C.L.; Wong, Y.J.; Arumugasamy, S.K. Dynamic simulation of airborne pollutant concentrations associated with the effect of climate change in Batu Muda region, Malaysia. Model. Earth Syst. Environ. 2022, 8, 323–338. [Google Scholar] [CrossRef]
Maleki, H.; Sorooshian, A.; Goudarzi, G.; Baboli, Z.; Birgani, Y.T.; Rahmati, M. Air pollution prediction by using an artificial neural network model. Clean Technol. Environ. Policy 2019, 21, 1341–1352. [Google Scholar] [CrossRef]
Alkabbani, H.; Ramadan, A.; Zhu, Q.Q.; Elkamel, A. An Improved Air Quality Index Machine Learning-Based Forecasting with Multivariate Data Imputation Approach. Atmosphere 2022, 13, 1144. [Google Scholar] [CrossRef]
Matthias, V.; Quante, M.; Arndt, J.A.; Badeke, R.; Fink, L.; Petrik, R.; Feldner, J.; Schwarzkopf, D.; Link, E.-M.; Ramacher, M.O.P.; et al. The role of emission reductions and the meteorological situation for air quality improvements during the COVID-19 lockdown period in central Europe. Atmos. Chem. Phys. 2021, 21, 13931–13971. [Google Scholar] [CrossRef]

Figure 1. Workflow including the steps and respective time period.

Figure 2. Boxplots of the numerical input variables.

Figure 3. Observed vs. predicted NO₂ concentrations (µg/m³) in each month of 2017.

Figure 4. Average weekly pattern of the observed and predicted NO₂ values in 2017 with a 95% confidence interval (bootstrapped).

Figure 5. The average weekly pattern of observed NO₂ and predicted NO₂ values with a bootstrapped 95% confidence interval in (a) summer and (b) winter.

Figure 6. Prediction of the model vs. observed values for the first week (Monday to Sunday) of each quarter in 2017.

Figure 7. Moving averages (four-weekly) of the difference between the prediction and observation, each for a model with and without the counter as an input feature.

Figure 8. Moving averages (four-weekly) of difference between prediction and observation, each for the Stresemannstrasse and Habichtstrasse stations.

Figure 9. Model residuals of NO₂ concentrations for each month in 2020.

Figure 10. Comparison of the predicted and observed values of week 13 in 2020 during the COVID lockdown (23 to 30 of March).

Table 1. Input variables used for the training of the model.

Variable (Unit)	Source	Variable (Unit)	Source
Air Temperature (°C)	UHH	Temperature gradients (ΔT/Δh)	UHH
Air Pressure (hPa)	UHH	Cloud cover (Okta)	UHH
Relative Humidity (%)	UHH	Precipitation (mm)	CDC
Wind direction (degrees)	UHH	Sunshine duration (min/h)	CDC
Wind speed (m/s)	UHH	Hour of the Day, Day of the week, Month of the year
Short wave radiation (W/m²)	UHH	Counter

Table 2. The mean performance statistics of the selected model with standard deviation compared to multiple linear regression.

	MLR	ANN
R2 train	0.58	0.7528 (0.005)
R2 test	0.53	0.6976 (0.004)
RMSE train	18.98	14.69 (0.154)
RMSE test	16.59	12.74 (0.086)
IA train	0.85	0.925 (0.002)
IA test	0.85	0.902 (0.002)

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Jesemann, A.-S.; Matthias, V.; Böhner, J.; Bechtel, B. Using Neural Network NO₂-Predictions to Understand Air Quality Changes in Urban Areas—A Case Study in Hamburg. Atmosphere 2022, 13, 1929. https://doi.org/10.3390/atmos13111929

AMA Style

Jesemann A-S, Matthias V, Böhner J, Bechtel B. Using Neural Network NO₂-Predictions to Understand Air Quality Changes in Urban Areas—A Case Study in Hamburg. Atmosphere. 2022; 13(11):1929. https://doi.org/10.3390/atmos13111929

Chicago/Turabian Style

Jesemann, Anne-Sophie, Volker Matthias, Jürgen Böhner, and Benjamin Bechtel. 2022. "Using Neural Network NO₂-Predictions to Understand Air Quality Changes in Urban Areas—A Case Study in Hamburg" Atmosphere 13, no. 11: 1929. https://doi.org/10.3390/atmos13111929

APA Style

Jesemann, A.-S., Matthias, V., Böhner, J., & Bechtel, B. (2022). Using Neural Network NO₂-Predictions to Understand Air Quality Changes in Urban Areas—A Case Study in Hamburg. Atmosphere, 13(11), 1929. https://doi.org/10.3390/atmos13111929

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu