Explicit Modeling of Meteorological Explanatory Variables in Short-Term Forecasting of Maximum Ozone Concentrations via a Multiple Regression Time Series Framework

: Statistical time series forecasting is a useful tool for predicting air pollutant concentrations in urban areas, especially in emerging economies, where the capacity to implement comprehensive air quality models is limited. In this study, a general multiple regression with seasonal autoregressive moving average errors model was estimated and implemented to forecast maximum ozone concentrations with a short time resolution: overnight, morning, afternoon and evening. In contrast to a number of short-term air quality time series forecasting applications, the model was designed to explicitly include the e ﬀ ects of meteorological variables on the ozone level as exogenous variables. As the application location, the model was constructed with data from ﬁve monitoring stations in the Monterrey Metropolitan Area of Mexico. The results show that, together with structural stochastic components, meteorological parameters have a signiﬁcant contribution for obtaining reliable forecasts. The resulting model is an interpretable, useful and easily implementable model for forecasting ozone maxima. Moreover, it proved to be consistent with the general dynamics of ozone formation and provides a suitable platform for forecasting, showing similar or better performance compared to models in other existing studies.


Introduction
Forecasting is an integral and useful task for managing urban air quality. Since the 1970s, forecasting techniques and tools have been developed in response to the severe pollution episodes that occurred between 1930 and 1960 in diverse parts of the world, particularly in Europe and the United States of America. Empirical approaches and statistical models were the first techniques used to forecast spatio-temporal pollutant concentrations. Afterwards, between 1970 and 1990, 3D air quality models were developed and applied on urban, regional and global scales [1]. Comprehensive 3D photochemical models solve the mathematical equations that describe the chemical and physical dynamics of pollutants in the atmosphere [2]. As inputs, 3D air quality models require a large amount of reliable meteorological, geographical and emissions data. In addition, in order to be implemented, they require high computational capacity as well as specialized knowledge about atmospheric chemical reactions and physical processes. These factors make the setup, execution and operation of these comprehensive models for forecasting pollutant concentrations technically complicated in some urban Atmosphere 2020, 11, 1304 2 of 20 areas, especially those located in countries with emerging economies. Therefore, simpler mathematical and statistical models are still widely used.
Tropospheric ozone (O 3 ) is a greenhouse gas and a criteria air pollutant that often exceeds the air quality standards in many urban areas around the world [3]. It is a secondary pollutant formed by chemical reactions of nitrogen oxides (NOx) (NOx = NO + NO 2 ) and volatile organic compounds (VOCs) in the presence of sunlight. Ozone production is highly dependent on the levels of chemical precursors and meteorological conditions. Several studies have explored the effect of meteorological parameters on O 3 concentrations [4][5][6]. In general, O 3 tends to increase with higher temperatures, which controls the chemical reaction rates associated with its production [7]. In contrast, increased wind speed is usually associated with decreasing O 3 levels due to a dispersion effect. Similarly, increases in relative humidity are related to decreases in O 3 because higher humidity levels are associated with greater cloud abundance and atmospheric instability [5]. Similarly, reductions in solar radiation are usually associated with reductions in O 3 because its formation also depends on photochemical reactions. The complex, nonlinear process of O 3 formation makes the short-term forecasting of O 3 challenging, requiring sophisticated mathematical and statistical approaches.
Existing studies have reported a number of diverse mathematical approaches for forecasting ground-level O 3 and other pollutants. Artificial Neural Networks (ANN) are the most common mathematical models that are used to forecast air pollution. Users of this approach emphasize its effectiveness when dealing with nonlinear systems [8][9][10][11][12][13][14]. However, ANN are commonly called "black box" models because of their limited capacity to provide information for interpreting the effect of the predictor variables on the output; they present generalization issues and are computationally intensive and time consuming compared to statistical models [15,16]. ANN are widely accepted, but they are more popular in applications where model interpretation is of secondary importance [17]. Less common mathematical approaches for forecasting ground-level O 3 include Fuzzy Time Series (FTS) [16,18] and additive models [19].
Statistical models have also been widely used to forecast O 3 concentrations and other criteria air pollutants. The Autoregressive Integrated Moving Average (ARIMA) model [20] is a classical modeling and forecasting technique that is widely used to analyze linear time series data. Multiple studies [21][22][23][24][25][26][27] have found that both ARIMA and ARMA models are reliable and capable of predicting short-term O 3 concentrations. However, most of these studies have not included explicitly the effect of meteorological variables in predicting O 3 concentrations, or the typical temporal resolution is not finer than a daily forecast. This latter characteristic makes some of the designed models inappropriate for use by environmental authorities that require to release O 3 forecasts more than once during the day in order to limit the risk of human exposure to air pollution episodes. The few studies that apply ARIMA models to predict O 3 concentrations with an explicit treatment of meteorological conditions with explanatory variables [25], tend to exclude the physical interpretation of the results. Other ARIMA models that consider heteroscedasticity show improvements in forecasting performance [28,29], but as in the aforementioned studies, they leave out meteorological information. Multiple linear regression (MLR) models are usually applied to assess the effect of meteorological conditions on O 3 concentrations [30][31][32][33], but they are less commonly used to predict O 3 concentrations. Wang [34] found that an ARMA model generally fits slightly better than MLR models when comparing the ability of the models to predict O 3 , CO (carbon monoxide) and NO 2 (nitrogen dioxide) monthly maximum one-hour concentrations.
Studies that have compared statistical models with algorithmic techniques (i.e., ARIMA versus ANN) have not shown a single conclusive result regarding their capability to forecast air pollutants [28]. A frequent practice is to combine algorithmic and statistical models to produce hybrid models to forecast air pollution [15,[35][36][37][38][39]. Overall, hybrid models are capable of accurately and precisely predicting pollutant concentrations. However, when the computational capacity is limited, hybrid models are not recommended.
In general, for the purposes of environmental authorities, forecasting models need to predict pollutant concentrations in real time, several times per day and over short time intervals after Atmosphere 2020, 11, 1304 3 of 20 new data become available. Particularly, in emerging economies, it is important that forecasting models can be executed on platforms with low computational resources and that they are self-contained (without external sources of information other than the measurements available in the air quality monitoring network). In addition, the development of time series models with explanatory variables continues to be an active area of interest, in particular in the environmental sciences [40]. Considering these requirements, in this work, a general Multiple Regression with Seasonal Autoregressive Moving Average (SARMA) errors model was conceived and obtained to forecast maximum O 3 concentrations, with a number of novel design and operational features. The model was implemented in the Monterrey Metropolitan Area (MMA) of Mexico and incorporates explicitly concurrent effects of meteorological variables (as exogenous variables) on the forecast O 3 concentration and effects of past, recent and cyclic O 3 concentrations. This approach enables a physical interpretation of the model. Furthermore, maximum O 3 concentrations are forecast with a short time resolution: overnight, morning, afternoon and evening. In operational terms, the proposed modeling approach allows fast computation with moderate computational resources.

Geographical and Meteorological Characteristics
The MMA is located in the northeastern Mexican state of Nuevo Leon, surrounded by mountains to the south and west and flat terrain to the northeast, with an average altitude of 500 m a.s.l. (Figure 1). The MMA is the third most populated Mexican metropolitan area, comprising around 88% of the population of the State of Nuevo Leon, which corresponds to 4.1 million inhabitants [41]. In addition, it is the largest urban region in the north of Mexico, with around 1150 km 2 of urbanized area and 14 municipalities [42]. The MMA makes the third largest contribution to Mexico's GDP (7.5%, 2017) [43].
Atmosphere 2020, 11, x FOR PEER REVIEW 3 of 21 data become available. Particularly, in emerging economies, it is important that forecasting models can be executed on platforms with low computational resources and that they are self-contained (without external sources of information other than the measurements available in the air quality monitoring network). In addition, the development of time series models with explanatory variables continues to be an active area of interest, in particular in the environmental sciences [40]. Considering these requirements, in this work, a general Multiple Regression with Seasonal Autoregressive Moving Average (SARMA) errors model was conceived and obtained to forecast maximum O3 concentrations, with a number of novel design and operational features. The model was implemented in the Monterrey Metropolitan Area (MMA) of Mexico and incorporates explicitly concurrent effects of meteorological variables (as exogenous variables) on the forecast O3 concentration and effects of past, recent and cyclic O3 concentrations. This approach enables a physical interpretation of the model. Furthermore, maximum O3 concentrations are forecast with a short time resolution: overnight, morning, afternoon and evening. In operational terms, the proposed modeling approach allows fast computation with moderate computational resources.

Geographical and Meteorological Characteristics
The MMA is located in the northeastern Mexican state of Nuevo Leon, surrounded by mountains to the south and west and flat terrain to the northeast, with an average altitude of 500 m a.s.l. (Figure 1). The MMA is the third most populated Mexican metropolitan area, comprising around 88% of the population of the State of Nuevo Leon, which corresponds to 4.1 million inhabitants [41]. In addition, it is the largest urban region in the north of Mexico, with around 1150 km 2 of urbanized area and 14 municipalities [42]. The MMA makes the third largest contribution to Mexico's GDP (7.5%, 2017) [43].
The climate of the MMA is semi-arid, and the meteorological conditions change substantially throughout the year. The annual average temperature is 20 °C, but the monthly average temperature ranges from 5 °C in January to 32 °C between May and August. August and September are the rainy months, and the annual average rainfall is about 650 mm [44].  The climate of the MMA is semi-arid, and the meteorological conditions change substantially throughout the year. The annual average temperature is 20 • C, but the monthly average temperature ranges from 5 • C in January to 32 • C between May and August. August and September are the rainy months, and the annual average rainfall is about 650 mm [44].

Monitoring Network and Data
Within the MMA, the Integral Environmental Monitoring System (SIMA) started operating in November 1992 with five air quality monitoring stations. Subsequently, from 2009 to 2017, eight stations were progressively added to the air quality network. Today, SIMA operates thirteen sites, which have been monitoring tropospheric O 3 , six additional air pollutants (SO 2 , CO, NO 2 , NO, PM 10 , and PM 2.5 ) and seven meteorological variables (temperature, solar radiation, relative humidity, rainfall, atmospheric pressure, wind speed and wind direction). In accordance with EPA EQOA-0880-047, from 1993 to 2003, UV photometric analyzers (Thermo Environmental Instruments Inc. (TEI), model 49) were used to measure O 3 with a precision less than 2 ppbv and a detection limit of 2 ppbv. After May 2003, the TEI model 49 was replaced by model 49C, which has a precision better than 1 ppbv and a detection limit of 1 ppbv [45].
Ozone concentrations and meteorological variables are recorded every minute and summarized as hourly averages. These summaries were provided by SIMA from 2009 to 2016. Only five sites (OBI, GPE, SNB, SNN, STA) were selected to conduct the modeling, based on the following criteria: (i) the sites are the oldest of the SIMA, guaranteeing long data records, (ii) the selected sites had more than 75% data availability of O 3 data and (iii) there was a lower proportion of outliers and inconsistent data compared to other monitoring sites. Table 1 briefly describes each of these selected monitoring sites. . From 1993 to 2014, the air quality monitoring site that presented the largest number of exceedances was STA (on the west side of the basin), followed by the SNB, GPE and OBI sites. This is due to the fact that prevailing winds in this region are from east to west. In addition, the O 3 concentrations have increased by 0.22 ppbv/year, showing the maxima during spring and minima in winter [45]. These conditions indicate that O 3 is an air quality problem that needs to be addressed in the MMA. Previous O 3 studies conducted in the MMA indicate that there is an observable relation between O 3 and certain meteorological variables (solar radiation, temperature and wind direction) [45,47]. However, the methodologies presented in those studies do not explicitly model O 3 as a function of those meteorological variables. Furthermore, Carrillo-Torres et al. [47] found that the seasonal cycles of O 3 are mainly governed by changes in meteorology more than by primary emissions. This implies that understanding the effect of meteorological variables on O 3 concentrations is important for predicting possible pollution episodes in the MMA, as in other urban centers.

Data Processing
Daily six-hour maximum O 3 concentrations were computed four times each day: overnight, morning, afternoon and evening. The time frame according to which the calculations were made is shown in Table 2. Under this arrangement, the computation and availability of forecasts is made at 05:00, 11:00, 17:00 and 23:00 Central Standard Time (CST) and at 06:00, 12:00, 18:00 and 00:00 daylight saving time (DST). Because each time of the day is represented by a single O 3 maximum value, which may occur at any, non-fixed time within the six-hour interval, it is necessary to have a summary measure of the meteorological predictors for such intervals. A weighted average of the meteorological values was computed for each time of day. The weights were determined by the relative frequency of the times for each hour in which the maximum O 3 concentration is observed. Different weights are calculated for each site ( Table 3). The weighted average wind direction was calculated using the vectorial cosine-sine representation. The meteorological and O 3 time series data were pre-processed to remove values outside of the admissible range and inconsistent recordings (special codings and sudden changes of scale).
Missing data were estimated at a one-hour time scale using the Kalman Filter (KF) [48,49]. The KF is an algorithm that is widely used in time series analysis, allowing the computation of estimating functions, forecasts and interpolation of the series. When the filter is used for interpolation, it is usually called the Kalman Smoother (KS). In this setting, the filter interpolates observed values to fill in the missing data (imputation), provided a model for the series. If the series is governed by a Gaussian process, the imputed values are optimal, (i.e., unbiased and with minimum variance). In general, large autoregressive models of varying orders and regressors, such as time of the day, hour or month of the year, depending on the site were fitted on the hourly series, according to the scheme presented in Section 3.2. The KS was implemented in an iterative convergent way, in which an estimated MLR with AR errors model was used to impute the missing values and was then re-estimated for re-imputation and so on, until a convergence criterion was met (<10 −6 for the norm of the differences of parameter estimates between iterations). A similar approach was taken for the meteorological variables.
In particular, the STA and OBI sites exhibited aberrant temperature recordings during 2012, inconsistent with the recordings observed for the other sites. Given the importance of temperature in forecasting O 3 concentrations, the series data for 2012 were removed and estimated employing a regression model with correlated errors that used temperature records from the remaining sites as predictors along with the KS. From 2009 to 2011, solar radiation recordings were not available at the STA, SNB and SNN sites. Additionally, there were large data gaps for SNN in many meteorological variables up to May 2012. The forecasting models for the STA and SNB sites were thus estimated using information only from January 2012 onward, while estimation for the SNN site proceeded using data from June 2012 onward.
The sample used for estimation of the models is given in Table 4, which provides the total number of hourly observations employed for each site and the number of maxima O 3 concentrations per time of day being modeled; dates are also indicated. Summary measures of maximum ozone concentrations are presented in Table 5. There are highly comparable averages (33.10 ppbv to 38.12 ppbv) and dispersion values (18.78 ppbv to 22.35 ppbv standard deviations) among sites. The computations shown here and those described in the following sections, including model fitting and forecast computation, were carried with R software, version 3.2.1. [50] using base packages only. For the construction of scatterplots (Figure 2), the hexbin package was employed.

General Statistical Approach
The employed modeling approach involves making use of meteorological variables together with past ozone concentrations as predictors. We consider the concurrent meteorological summary values (see Section 3.1) for the six-hour interval ozone concentrations being predicted as linear regressors in a multiple regression model. In this setting, the meteorological features are considered exogenous variables and define a mean level for ozone maximum concentrations according to the specific values that the variables assume at each interval. In this sense, the meteorological values are considered as given. Ozone departures from this mean level are treated as random deviations with a temporal structure (i.e., as a stochastic process), which can be exploited to predict current deviations from past values. This is accomplished through the family of S MMA models. The framework combining these considerations is known as the multiple linear regression model with SARMA errors. Seasonal effects, such as daily cycles, may also be included in either the regression part or the structural part of the model.
Six appropriately transformed meteorological variables were used as regression predictors: temperature (TEMP, • C), relative humidity (RH, %), solar radiation (SR, kW/m 2 ), angular wind direction (WD, degrees) and wind speed (WS, km/h). The transformations were computed in order to linearize the relation with maximum O 3 levels (see Section 3.4). Additionally, terrestrial rotation and translation effects were included in the regression model through the inclusion of time of day and month of the year as predictors. The baseline is the average January overnight level; every other month or time of the day effects represent deviations from this baseline. Note that a large number of degrees of freedom is available for the model estimation, as a total of 23 meteorological parameters plus six SARMA parameters were fit to the data based, at least, on 2920 observations ( Table 4).
The predictive structure of the series is shown by the autocorrelation function (ACF) and the partial ACF (PACF) (correlations and partial correlations of present values with lagged values). We call the (partial) autocorrelations functions calculated from the observations the empirical (P)ACF.
These are the main tools that are commonly employed to identify a suitable model for stochastic dependence in ARMA modeling. Moreover, by simulating from an estimated candidate model and calculating its corresponding (P)ACF, a comparison with the empirical P(ACF) provides a valuable guide for assessing a proposed model, as the simulated (P)ACF of plausible models is expected to resemble the empirical (P)ACF of the series being considered.
We use the Box-Jenkins approach [20] for building a time series model, whose general steps are as follows: 1.
Calculation of residuals from a multiple regression model with transformed meteorological predictors.

2.
Identification of a SARMA model for the residuals via the ACF and PACF.

3.
Model fitting, in which meteorological effects and SARMA parameters are simultaneously estimated.

4.
Model diagnostic using Ljung-Box goodness of fit tests [51] and a comparison of empirical (P)ACF with the (P)ACF obtained by simulating from the fitted model.

5.
Model selection based on the previous two steps.
The last three steps of the process are usually made in an iterative manner until a satisfactory model is obtained. Model selection is accomplished by combining Ljung-Box tests and empirical versus simulated (P)ACFs comparisons, aiming for a compromise between both criteria. Ljung-Box tests are a commonly used diagnostic tool that examine the autocorrelations of the model residuals at each lag. The tests check whether at least one autocorrelation at each or previous lags is significantly different from zero. Statistically significant (p-value ≤ 0.05) non-zero autocorrelations are indicative of model inadequacy.

The SARMA Model
A multiple regression model with multiplicative SARMA (p,q)(P,Q) s errors may be formulated as follows: where y t and x t represent the O 3 maxima and associated weighted average values of meteorological variables at time of day t, respectively; ф p (B) = 1 − ф 1 B − . . . − ф p B p , with B p y t = y t-p being an autoregressive operator of order p and θ q (B) = 1 − θ 1 B − . . . − θ q B q being the moving average operator for the regular part, respectively; is the autoregressive operator of order P, with s being the number of periods per season, and is the moving average operator of order Q for the seasonal part, respectively; a t is usually assumed to be normal white noise with variance σ 2 a . For the series being modeled, s is expected to be equal to four (i.e., a daily cycle effect). The effect of meteorological variables on the maximum O 3 concentration is represented by β.
The method of parameter estimation (ф, Φ, θ, Θ, β, σ 2 a ) employed here is maximum likelihood estimation using the exact likelihood. Because of the assumptions made about a t , the resulting probability distribution function for the process y t is Gaussian. Maximum likelihood estimates are the parameter values that maximize (1). Evaluation of the exact likelihood with state-space representations is given in [52] (p. 385); see also [20] (p. 243) for other approaches. For computational details, see [53]. Note that, for a forecast to actually be computed, parameter estimates are needed; we simply denote this forecast byŷ t in the sequel. A great deal of dispersion can be seen in the plots. The locally estimated scatterplot smoothing (LOESS) class of local regression models [54] is used in an ad hoc way as an aid to visualize local trends in the relationship between O 3 maxima and the meteorological variables. The relationships in many cases are non-linear because meteorological variables are linked with direct and indirect effects in O 3 concentrations [55][56][57][58]. The cyclic nature of angular WD explains the particular pattern between O 3 concentrations and WD. Higher O 3 concentrations are recorded when WD takes values between 60 and 120 degrees (north-east, east and south-east directions). This pattern can likely be explained by the fact that there is downwind transport in the MMA of photochemical air masses that come from the industrial sector located in these directions [45].

Meteorological Variables Transformations
Proper modeling of these non-linear relationships through linear regression must make use of transformations of the meteorological variables. In general, graphical displays are employed to suggest suitable transformations. Commonly employed transformations in statistical regression analysis, such as natural logarithmic, polynomial and reciprocal functions, were applied to the meteorological variables. Note that the aim of these transformations is to capture the general underlying relationship between O 3 concentrations and the meteorological variables and not to reproduce the fit given by the local regression, which is used here purely as an indicator of such a relationship. Table 6 presents the formulas of the transformations used for each meteorological variable. Since there is a slight departure form linearity between TEMP and O 3 , a natural logarithmic transformation on TEMP was employed. For the slight curvature shown in the SR, a second-order polynomial was p proposed and found statistically significant (details are provided in Section 4.1). For RH, a third-order polynomial was employed. For WS, we used a reciprocal transformation, to resemble the asymptotic level reached by the dispersion effect of WS on O 3 concentrations. WD has a cyclic relation to O 3 . We chose to re-center at the angle where the peak O 3 concentration occurs and then model the deviation from this angle; as WD deviates from this angle, O 3 concentrations decay.
Atmosphere 2020, 11, 1304 9 of 20 Therefore, we re-center angular WD for each site by computing its sine and cosine coordinates and rotating them counterclockwise by a specified angle; finally, we recover the WD in terms of "new" angles. Table 7 lists the angles selected for the rotation at each site. The other sites showed patterns similar to these, and the same formulas were employed.
Atmosphere 2020, 11, x FOR PEER REVIEW 9 of 21 lists the angles selected for the rotation at each site. The other sites showed patterns similar to these, and the same formulas were employed.   15) x + x 2 + x 3 x + x 2 1/(x + 1)  Observe the non-linear nature of the relationships. Simple natural logarithmic, polynomial and inverse transformations were used to achieve linearity. A re-centering of angular wind direction around 60 degrees was used followed by measuring the absolute distances from this new center. Table 6. Transformations of meteorological representatives except for angular wind direction.

RH SR WS
ln(x + 273. 15) x + x 2 + x 3 x + x 2 1/(x + 1) Angular WD was further transformed using where x is the angular direction in the new coordinates (after rotation), and I is an indicator variable, which takes values 0 or 1 according to whether the variable is in the interval between in parentheses. The formula therefore restricts x values to lie within 0 and 180 as WD departs from the new center.

Performance Measures
Forecast performance measures are usually employed to assess the ability of a given model to forecast the time series. Table 8 lists the definitions and meanings of the measures used in this study; the notationŷ t denotes the one-step-ahead forecast at time of day (six-hour interval) t computed from information up to time t−1 (all relevant previous information). These measures should be close to zero for a good forecasting procedure. Table 8. Performance measures. One-step-ahead forecast at time of day t using all previous relevant information is denoted byŷ t ; y represents the mean of the series, and n is the number of available observations in the series.

Error Measure Formula Description
Root Mean Square Error (RMSE) Computation of these measures is performed using the sample estimation period (see Section 3.1) and two-week left-out sample (observations not used in model estimation), separately. In the latter period, observed meteorological information was used. Comparison of the performances in both periods provides an indication of how well the estimated model captures the data-generating process and thus offers an assessment of its capacity to produce good-quality forecasts.

Model Estimation
A SARMA (3,0) (3,0) 4 single general model was found to account equally well for the maximum O 3 time series structure for every site. Figure 3 shows the empirical (P)ACF computed from the historical observations and the simulated (P)ACF computed from this model for the OBI site series only. The remaining sites behave virtually in the same manner. The plot exhibits a typical exponential decay of ACF for both the regular and the seasonal component together with the corresponding cutoffs of (P)ACF at the first few lags, which is common in autoregressive models. The seasonal period is of order four, representing a day-period effect. No lack of stationarity is evident from the (P)ACF plots.  Parameter estimates of the fitted models are given in Table 9. The estimates of the ARMA model are shown first, and the remaining ones are the estimated regression coefficients. All first-and thirdorder regular autoregressive effects were significant (p 0.05). Furthermore, every effect describing the seasonal component was significant, except for the STA site. The first-order effects in both the regular and seasonal part present the largest magnitudes. Thus, the model recovers the effect of the previous time of day and a daily seasonal effect. Table 9 shows significant increasing effects occurring mainly during springtime (March, April and May), while during the summer there is a decrease in O3 concentrations compared to the baseline, which probably is due to the fact that the highest wind speeds (>10 km h −1 ) are recorded during this time [45], promoting O3 dispersion. In contrast, December presents decreasing deviations in most sites because of reduced TEMP and SR. Other monthly deviations vary from site to site, but they are negligible or not statistically significant. Parameter estimates of the fitted models are given in Table 9. The estimates of the ARMA model are shown first, and the remaining ones are the estimated regression coefficients. All first-and third-order regular autoregressive effects were significant (p ≤ 0.05). Furthermore, every effect describing the seasonal component was significant, except for the STA site. The first-order effects in both the regular and seasonal part present the largest magnitudes. Thus, the model recovers the effect of the previous time of day and a daily seasonal effect. Table 9 shows significant increasing effects occurring mainly during springtime (March, April and May), while during the summer there is a decrease in O 3 concentrations compared to the baseline, which probably is due to the fact that the highest wind speeds (>10 km h −1 ) are recorded during this time [45], promoting O 3 dispersion. In contrast, December presents decreasing deviations in most sites because of reduced TEMP and SR. Other monthly deviations vary from site to site, but they are negligible or not statistically significant.   As for time of day effects, significantly increased deviations in O3 concentrations are observed between 12:00 and 17:00 CST. This is consistent with the enhanced photochemical period, in which O3 generally peaks in the MMA [45], and with the general O3 production dynamics during the day. The decreasing effects from the overnight baseline at any time of day for the OBI site might be explained by variables such as SR, TEMP and RH, whose influence on O3 formation mediates the time of day effect. The influence of meteorological variations is verified by the overall significant effect of their transformations exhibited in Table 9.

Model Diagnostics
A two-way assessment of the model is given by comparing the empirical versus simulated (P)ACF and the usual Ljung-Box tests. A plausible estimated model is expected to reproduce the empirical (P)ACF. This can be verified by simulating from the estimated model and calculating the (P)ACF from the simulated series. Figure 3 shows very similar (P)ACF patterns between the empirical and simulated series for the OBI site at recent previous times of day and discrepancies at larger lags corresponding to seasonal effects. No ARMA model with an increased number of lags for the seasonal part achieved a (P)ACF pattern similar to the empirical P(ACF). The remaining sites exhibited similar patterns.
ACF of residuals and Ljung-Box goodness of fit tests p-values are shown in Figure 4. Each lag represents a six-hour time interval. Larger lags (≥20) show statistically significant ACF, indicating model inadequacy; however, ACF magnitudes at those lags are rather small, representing marginal, 5-7 former days effects on O3 concentrations. The model accounts well for the immediately previous six-hour lags and daily cycles effects. Other models including moving average terms for both the   As for time of day effects, significantly increased deviations in O3 concentrations are observed between 12:00 and 17:00 CST. This is consistent with the enhanced photochemical period, in which O3 generally peaks in the MMA [45], and with the general O3 production dynamics during the day. The decreasing effects from the overnight baseline at any time of day for the OBI site might be explained by variables such as SR, TEMP and RH, whose influence on O3 formation mediates the time of day effect. The influence of meteorological variations is verified by the overall significant effect of their transformations exhibited in Table 9.

Model Diagnostics
A two-way assessment of the model is given by comparing the empirical versus simulated (P)ACF and the usual Ljung-Box tests. A plausible estimated model is expected to reproduce the empirical (P)ACF. This can be verified by simulating from the estimated model and calculating the (P)ACF from the simulated series. Figure 3 shows very similar (P)ACF patterns between the empirical and simulated series for the OBI site at recent previous times of day and discrepancies at larger lags corresponding to seasonal effects. No ARMA model with an increased number of lags for the seasonal part achieved a (P)ACF pattern similar to the empirical P(ACF). The remaining sites exhibited similar patterns.
ACF of residuals and Ljung-Box goodness of fit tests p-values are shown in Figure 4. Each lag represents a six-hour time interval. Larger lags (≥20) show statistically significant ACF, indicating model inadequacy; however, ACF magnitudes at those lags are rather small, representing marginal, 5-7 former days effects on O3 concentrations. The model accounts well for the immediately previous six-hour lags and daily cycles effects. Other models including moving average terms for both the   As for time of day effects, significantly increased deviations in O3 concentrations are observed between 12:00 and 17:00 CST. This is consistent with the enhanced photochemical period, in which O3 generally peaks in the MMA [45], and with the general O3 production dynamics during the day. The decreasing effects from the overnight baseline at any time of day for the OBI site might be explained by variables such as SR, TEMP and RH, whose influence on O3 formation mediates the time of day effect. The influence of meteorological variations is verified by the overall significant effect of their transformations exhibited in Table 9.

Model Diagnostics
A two-way assessment of the model is given by comparing the empirical versus simulated (P)ACF and the usual Ljung-Box tests. A plausible estimated model is expected to reproduce the empirical (P)ACF. This can be verified by simulating from the estimated model and calculating the (P)ACF from the simulated series. Figure 3 shows very similar (P)ACF patterns between the empirical and simulated series for the OBI site at recent previous times of day and discrepancies at larger lags corresponding to seasonal effects. No ARMA model with an increased number of lags for the seasonal part achieved a (P)ACF pattern similar to the empirical P(ACF). The remaining sites exhibited similar patterns.
ACF of residuals and Ljung-Box goodness of fit tests p-values are shown in Figure 4. Each lag represents a six-hour time interval. Larger lags (≥20) show statistically significant ACF, indicating model inadequacy; however, ACF magnitudes at those lags are rather small, representing marginal, 5-7 former days effects on O3 concentrations. The model accounts well for the immediately previous six-hour lags and daily cycles effects. Other models including moving average terms for both the   As for time of day effects, significantly increased deviations in O 3 concentrations are observed between 12:00 and 17:00 CST. This is consistent with the enhanced photochemical period, in which O 3 generally peaks in the MMA [45], and with the general O 3 production dynamics during the day. The decreasing effects from the overnight baseline at any time of day for the OBI site might be explained by variables such as SR, TEMP and RH, whose influence on O 3 formation mediates the time of day effect. The influence of meteorological variations is verified by the overall significant effect of their transformations exhibited in Table 9.

Model Diagnostics
A two-way assessment of the model is given by comparing the empirical versus simulated (P)ACF and the usual Ljung-Box tests. A plausible estimated model is expected to reproduce the empirical (P)ACF. This can be verified by simulating from the estimated model and calculating the (P)ACF from the simulated series. Figure 3 shows very similar (P)ACF patterns between the empirical and simulated series for the OBI site at recent previous times of day and discrepancies at larger lags corresponding to seasonal effects. No ARMA model with an increased number of lags for the seasonal part achieved a (P)ACF pattern similar to the empirical P(ACF). The remaining sites exhibited similar patterns.
ACF of residuals and Ljung-Box goodness of fit tests p-values are shown in Figure 4. Each lag represents a six-hour time interval. Larger lags (≥20) show statistically significant ACF, indicating model inadequacy; however, ACF magnitudes at those lags are rather small, representing marginal, 5-7 former days effects on O 3 concentrations. The model accounts well for the immediately previous six-hour lags and daily cycles effects. Other models including moving average terms for both the regular and the seasonal component were not found to improve the fit. Overall, the set of tests performed appear to support an adequate good model fit.
Atmosphere 2020, 11, x FOR PEER REVIEW 13 of 21 regular and the seasonal component were not found to improve the fit. Overall, the set of tests performed appear to support an adequate good model fit.   Figure S6 in the Supplementary Material shows a further examination of residuals. A lack of constant variance is apparent in every site, as time intervals with different variability were observed. This represents a departure from the statistical model assumptions and deserves further investigation. For the purposes of this paper, we consider that the insights and forecasts provided by the model presented are reasonable, and accounting for variance heterogeneity may be regarded as a refinement of the original model. Thus, it can be said that the model adequately represents series behavior at recent times of the day, in which most of the predictive structure is available, and the remaining predictive structure of distant former days is negligible. Table 10 shows the performance measures assessing the fitted values of the estimated models, i.e., one-step-ahead forecasts computed using the estimation periods shown in Table 4. The fitted values were computed with a natural logarithmic scale, and then, inverse transformation was applied to allow comparison with the original untransformed maximum O 3 level, through the performance measures presented in Section 3.5. Every performance measure was then computed with the original scale (ppbv).  Table 10 shows RMSE values around 10 units, ranging from 9.58 to 10.89. On average, the forecast error ranges from 6.87 to 8.15 in absolute value (MAE). As for the MAPE, it tends to exhibit the largest values, ranging from 25.65 to 32.31. Furthermore, the model tends to overestimate the maximum O 3 concentrations, as every MPE is negative, around 8, with great variability among sites. NME values are around 21 and are similar among sites. Note that the site SNN consistently shows the best performance values, suggesting that this site is where the model fits better. The measures, however, are comparable between sites, and we may conclude that the model performs similarly for all of them, especially regarding the MAE and RMSE. Table 11 shows the performance measures obtained using an out-of-sample period of days following the estimation period. Note that the performance measures are generally quite comparable to those from the estimation period listed in Table 10. During this out-of-sample period, the forecasts tend to overestimate the maximum O 3 concentration, except for STA, which shows a positive MPE value. Figure 5 provides an illustration of the one-step-ahead forecast performance together with 95% prediction intervals in the last two weeks used for estimation and the following two weeks of left-out observations. Comparing both periods, we see similar forecast performances. We might conclude that the forecast performance does not decay when forecasting new observations not previously used for model fitting, and thus the model captures the processes behind the ozone records and provides a suitable framework for forecasting. Table 11. Performance measures over 30 left-out days (120 times of the day). The forecast performance does not decay in the out-of-sample interval (see Table 10).

Comparison with Other Studies
Most of the existing studies that have reported AR(I)MA models generally do not consider meteorological variables for O3 prediction. There are a number of difficulties in comparing the results of this paper with those of other works. The different temporal resolutions used in the analyses, which range from hourly to monthly periods, the out-of-sample period, and the summary statistic that are reported do not allow a direct comparison with our models and forecasts. Thus, caution is needed in comparing and interpreting the performance measures. Kumar et al. [21] forecasted onehour daily maximum O3 concentrations using an ARIMA (1,0,1) model, finding an RMSE of 9.55 and a MAE of 8.36, while Kumar and Jain [23] forecasted daily average O3 concentrations with an ARIMA (0,0,1) model and found an RMSE of 15.1 and a MAE of 10.9. Duenas et al. [22] built an ARIMA (1,0,0)(1,0,1)24 for hourly periods (O3 averages), showing a MAE of 10.62 and 5.96 in urban and rural areas, respectively. In contrast, Beldjillali et al. [26] found a MAE of 2.03 forecasting monthly average O3 concentrations with a simple model AR(1). In the present study, in which four maximum O3

Comparison with Other Studies
Most of the existing studies that have reported AR(I)MA models generally do not consider meteorological variables for O 3 prediction. There are a number of difficulties in comparing the results of this paper with those of other works. The different temporal resolutions used in the analyses, which range from hourly to monthly periods, the out-of-sample period, and the summary statistic that are reported do not allow a direct comparison with our models and forecasts. Thus, caution is needed in comparing and interpreting the performance measures. Kumar et al. [21] forecasted one-hour daily maximum O 3 concentrations using an ARIMA (1,0,1) model, finding an RMSE of 9.55 and a MAE of 8.36, while Kumar and Jain [23] forecasted daily average O 3 concentrations with an ARIMA (0,0,1) model and found an RMSE of 15.1 and a MAE of 10.9. Duenas et al. [22] built an ARIMA (1,0,0)(1,0,1) 24 [21] and Kumar and Jain [23] showed MAPE values of 13.14 (daily maxima) and 25.8 (daily average), respectively. Duenas et al. [22], in the aforementioned work, found an MPE of −11.33 in urban areas. The SARMA models in the present study show MAPEs ranging between 19.50 and 25.65 and MPEs ranging between −3.09 and −14.03, depending on the site.
As mentioned, a direct comparison is difficult, especially regarding absolute measures such as RMSE and MAE; lower RMSE and MAE are expected for an adequate forecasting model at lower temporal resolutions when compared to higher-resolution modeling because of the smaller variability expected at lower resolutions. The present work shows similar RMSE and MAE measures to those presented in the reviewed papers; however, the temporal resolution presented here is higher. Although we may conclude that the models proposed here have better performance, the inherent variability of the data for the particular sites considered in the papers also prevents clear-cut comparisons. Perhaps MPE and MAPE are more comparable since they represent relative measures. In the present paper, MAPE tends to be larger than in the reviewed papers, and MPEs are negative (as the one discussed in the previous paragraph). It is worth noting that the magnitudes of these measures are similar. However, because temporal resolutions have an impact, both measures might increase since O 3 concentrations at overnight times tend to be small. Finally, we conclude that the performance measures presented here are similar or better than the ones reported in other studies, presuming that the inherent variabilities of the studied data are comparable. The increase of MAPE in this paper is expected, as a higher temporal resolution is considered.
The use of 3D numerical air quality models is recommended to forecast air quality pollutants, as they are sophisticated tools that address the non-linearity of the photochemical system and include meteorological, terrain and emissions information to simulate pollutant dispersion [59]. Zhang et al. [1] reviewed the history and current status of 3D real-time air quality forecasting models, showing that they have been mainly used in the US and Europe. In the US, 3D forecasting models applied to predict eight-hour maximum hourly averages showed NME values between 17.0 and 30.4, averaging 23.9, and RSME values ranging from 8.7 to 31.0, averaging 16.0. In comparison, our study, which uses a six-hour resolution, shows NME values ranging from 20.6 to 23.6 and from 21.2 to 25.7 in the estimation and out-of-sample periods, respectively, which are well within the ranges of values and below or close to the average value of the 3D models discussed above. Regarding the RSME, our analysis achieves a maximum value of 14.6, combining both periods, which is less than the average value of the 3D models. Similar results were found in Catalonia in north-east Spain using the MM5/MNEQA/CMAQ modeling forecast system to predict eight-hour maximum O 3 concentrations, with an RSME of 16.1, MPE of 1.1, MAPE of 13.3 and NME of 14.5 [60].
Other 3D air quality studies employed time resolutions less comparable to the one used in the present study. For example, Chai et al. [61] reported monthly and annual average O 3 concentrations over the US applying the Eta/CMAQ system. Curier et al. [62] forecasted daily maximum O 3 concentrations over Europe with the LOTOS-EUROS chemical transport model, and the CALIOPE modeling system (WRF-ARW/HERMES/CMAQ/BSD-DREAM8b) was applied over Spain to forecast one-hour O 3 concentrations [63]. The values of the performance measures presented in these papers are similar to those discussed in the previous paragraph; however, the summaries employed for O 3 concentrations and different time resolutions make detailed comparisons difficult. A common issue in some studies [1,63] is a tendency to overestimate O 3 concentrations, just as the SARMA models presented here. Overall, the statistical models proposed in this study exhibit forecast performance comparable to that of 3D air quality systems, but with the use of low computational resources.

Conclusions
A univariate multiple regression model with meteorological predictors and SARMA errors has proven to be useful for representing the dynamics of O 3 maxima at four times of the day (overnight, morning, afternoon and evening) in five sites of the MMA monitoring network in Mexico. The nonlinear nature of the relationship between O 3 maxima concentrations and the meteorological variables can be easily transformed, making it amenable to linear modeling. A relevant contribution of this paper is the extension of the Box-Jenkins approach, which is commonly employed in the forecasting literature on O 3 concentrations. Typically, this type of research has focused on building "pure" ARIMA models with no inclusion of additional predictors, while an important feature of the ARIMA methodology is its flexibility to incorporate external information into the model. We make use of meteorological predictors and prove their usefulness in understanding and forecasting maximum O 3 concentrations with an adequate level of precision.
The quality of the statistical forecasts presented here can be considered good. Moreover, they are very similar to those obtained by 3D air quality systems and are computable with much less infrastructure, mathematical complexity and computational effort. Comparison with other studies using statistical models is difficult because of the specifics of each analysis; however, our forecasts are quite comparable and can be considered satisfactory. Overall, we conclude that the regression model with SARMA errors presented here recovers the general formation of O 3 dynamics, reflecting the specifics of the MMA and providing good-quality forecasts. There is room for further analytical work within the developed framework. The model constitutes a foundation upon which additional refinements could be implemented to improve forecasting or to extend understanding of O 3 dynamics in the MMA, with special attention to volatility features not addressed by the model. Future research could include the addition of O 3 precursors concentrations as well as the concentration of other criteria pollutants as predictors, as well as other meteorological or anthropogenic drivers.
Supplementary Materials: The following are available online at http://www.mdpi.com/2073-4433/11/12/1304/s1, Figure S1: Relationship between natural logarithm of O 3 and temperature (TEMP) for four sites in the MMA, Figure S2: Relationship between natural logarithm of O 3 and relative humidity (RH) for four sites in the MMA, Figure S3. Relationship between natural logarithm of O 3 and solar radiation (SR) for four sites in the MMA, Figure S4: Relationship between natural logarithm of O 3 and wind direction (WD) for four sites in the MMA, Figure S5: Relationship between natural logarithm of O3 and wind speed (Ws) for four sites in the MMA, Figure S6