Evaluation of MODIS Aerosol Optical Depth and Surface Data Using an Ensemble Modeling Approach to Assess PM 2.5 Temporal and Spatial Distributions

: The use of statistical models and machine-learning techniques along satellite-derived aerosol optical depth (AOD) is a promising method to estimate ground-level particulate matter with an aerodynamic diameter of ≤ 2.5 µ m (PM 2.5 ), mainly in urban areas with low air quality monitor density. Nevertheless, the relationship between AOD and ground-level PM 2.5 varies spatiotemporally and differences related to spatial domains, temporal schemes, and seasonal variations must be assessed. Here, an ensemble multiple linear regression (EMLR) model and an ensemble neural network (ENN) model were developed to estimate PM 2.5 levels in the Monterrey Metropolitan Area (MMA), the second largest urban center in Mexico. Four AOD-SDSs (Scientiﬁc Datasets) from MODIS Collection 6 were tested using three spatial domains and two temporal schemes. The best model performance was obtained using AOD at 0.55 µ m from MODIS-Aqua at a spatial resolution of 3 km, along meteorological parameters and daily scheme. EMLR yielded a correlation coefﬁcient ( R ) of ~0.57 and a root mean square error ( RMSE ) of ~7.00 µ g m − 3 . ENN performed better than EMLR, with an R of ~0.78 and RMSE of ~5.43 µ g m − 3 . Satellite-derived AOD in combination with meteorology data allowed for the estimation of PM 2.5 distributions in an urban area with low air quality monitor


Introduction
Particulate matter with an aerodynamic diameter of less than or equal to 2.5 µm (PM 2.5 ) is a critical indicator for measuring the air quality in a given area [1,2]. PM 2.5 concentrations are usually monitored using air quality measurement stations with spatial representativeness ranging from 500 m to 4 km in diameter for local, neighborhood scales [3]. However, in many regions of the world (e.g., Latin America), ground-level measurements of PM 2.5 are scarce or non-existent [4], making it difficult to determine the behavior and distribution of aerosols at local and regional scales. Satellite data are potentially useful for deriving indirect estimates of ground PM 2.5 pollution from aerosol optical properties, overcoming the well-known spatial limitation of traditional measurements. Such estimates may be regarded as a complementary cost effective method for PM 2.5 monitoring at local and regional scales in locations with non-existent or without extensive ground-based PM 2.5 monitoring networks [5,6].
spective Analysis for Research and Application model, version 2 (MERRA-2), aerosol components to represent PM2.5 ground concentrations. Therefore, this study contributes to the knowledge and understanding of the performance of MODIS AOD-PM2.5 retrieval algorithms over urban, mixed-use land coverage areas with limited ground-based observations.

Study Area
This study was conducted within the MMA, Mexico, from January 2010 to December 2017. The MMA (25°40′N, 100°20′W) is located in the State of Nuevo León, in Northeast Mexico, approximately 720 km north of Mexico City and approximately 230 km south of the United States border. The MMA is composed of 18 municipalities ( Figure 1) and has an area of 7657 km 2 [26]. It is the largest urban area in northeastern Mexico and the second most populous in the country with 5.7 million inhabitants [27]. The study area is located on the boundary of the province of Sierra Madre Oriental, which is characterized by mountainous relief with alternating mountain ranges and valleys [28]. Vertical drops of over 1400 m occur, with minimum and maximum elevations of 542 m and 1993 m above sea level, respectively. The lower altitude areas correspond to valleys, where the population of the MMA is settled [29].
The study area was analyzed using three spatial domains. Domain 1 (D1) contains ground data from each ground-level monitoring site versus AOD averaged at a resolution of 0.10° × 0.10° (collocated AOD values are contained in this domain); Domain 2 (D2) contains ground data (averaged from the five sites) versus AOD averaged at a resolution of 0.25° × 0.25°; and Domain 3 (D3) contains ground data (averaged from the five sites) versus AOD averaged at a resolution of 0.500° × 0.625°. Domains 1, 2, and 3 represent neighborhood, urban, and regional scales, respectively, as defined by EPA (2008) [3].  The study area was analyzed using three spatial domains. Domain 1 (D1) contains ground data from each ground-level monitoring site versus AOD averaged at a resolution of 0.10 • × 0.10 • (collocated AOD values are contained in this domain); Domain 2 (D2) contains ground data (averaged from the five sites) versus AOD averaged at a resolution of 0.25 • × 0.25 • ; and Domain 3 (D3) contains ground data (averaged from the five sites) versus AOD averaged at a resolution of 0.500 • × 0.625 • . Domains 1, 2, and 3 represent neighborhood, urban, and regional scales, respectively, as defined by EPA (2008) [3].

Ground-Based Air Quality Monitoring and PM 2.5 Pollution in the MMA
The MMA has a network of air quality monitoring stations known locally as the Sistema Integral de Monitoreo Ambiental (SIMA) composed currently of 13 monitoring sites. Criteria air pollutants (O 3 , SO 2 , NO x [NO x = NO + NO 2 ], CO, PM 2.5 , and PM 10 ) and meteorological variables (temperature, relative humidity, wind speed, wind direction, pressure, precipitation, and solar radiation) are continuously monitored, with the data summarized as hourly averages.
The MMA currently has a status of nonattainment with respect to the Mexican Air Quality Standards for PM 2.5 [30,31]. In 2015, the MMA exhibited annual averages in the range of 20-34 µg m −3 for PM 2.5 , exceeding the PM 2.5 national air quality standard of 12 µg m −3 [32]. For 50 days of 2019, the PM 2.5 concentrations exceeded the 24-h average limit of 45 µg m −3 set by the Mexican standard [33]. Seasonal variations in the PM 2.5 concentrations have also been observed, with winter showing the highest concentrations while the lowest concentrations were observed during summer and the first weeks of fall [34].
There are multiple sources of fine PM air pollution in the MMA. Source apportionment studies based on the chemical characterization of ambient air aerosols conducted in the MMA have found that approximately 64% of PM 2.5 comes from direct or precursor vehicle emissions [35][36][37]. Other significant sources include meat-cooking operations and biomass burning. Industrial activities, garbage burning, biogenic emissions, and resuspended dust also contribute to air pollution in the MMA [38].

Ground-Based PM 2.5 and Meteorological Data
The PM 2.5 and meteorological data were retrieved for the period from 2010 to 2017 at five urban ambient air monitoring sites in the SIMA network (the Southeast, Northeast, Downtown, Northwest, and Southwest stations; Figure 1). These sites were chosen because they provided continuous valid PM 2.5 records during the study period. Specifically, the PM 2.5 instruments are based on the beta-ray attenuation method and continuous weighing with a microbalance, providing automatic and continuous measurements and concentration recordings. SIMA instrumentation records PM 2.5 every minute; the records are then validated and stored as 1-h averages. The Mexican standards NOM-035-SEMARNAT-1993 [39] and NOM-156-SEMARNAT-2012 [40] are followed to accomplish the calibration, maintenance procedures, and quality assurance/quality control protocols.

MODIS AOD Data
Four AOD scientific datasets (SDSs) from Aqua MODIS C6 aerosol products (MYD04_3K and MYD04_L2) were evaluated to identify which dataset best described the relationship between AOD and the ground-based data for the estimation of PM 2.5 in the MMA. Table 1 lists the AOD-SDSs used in this study. fine-dominated aerosol models: continental, moderately absorbing, strongly absorbing smoke, and weakly absorbing (urban/industrial) [41]. The DB AOD-SDS from MYD04_L2 (AOD_DB) is a MODIS product retrieved using the DB algorithm at 10 km at nadir. Three surface categories were derived from MODIS DB C6 including arid and semiarid regions, general vegetation, and urban/built-up and transitional regions. The DB algorithm performs aerosol retrieval for non-spherical and absorbing aerosols including (1) dust aerosols, (2) mixtures of dust/smoke aerosols, and (3) strongly absorbing dust [42].
Finally, the Cb AOD-SDS from the MYD04_L2 product (AOD_Cb) is a merged product reported at 10 km. This dataset was designed to provide a single dataset that combines the best of DB and DT into a data product with fewer gaps [43]. The merge procedure is based on NDVI criteria (NDVI > 0.30 = DT; NDVI < 0.20 = DB; 0.2 < NDVI < 0.30 = merge). provides an overview of the methodology followed in this study. The first step involved data acquisition, spatial and temporal collocation, and the testing of 24 MODIS datasets (four AOD-SDSs for three domains and two temporal schemes) using stepwise regression to determine the meteorological variables that influence the AOD-PM 2.5 relationship in the MMA. In the second step, the 24 refined datasets (which included only the influential meteorological variables in the AOD-PM 2.5 relationship) were analyzed using a multiple linear regression (MLR) model to identify which AOD-SDS, domain, and temporal schemes showed the best correlation in the MMA. The resulting dataset was labeled the "best dataset". In the third step, an ensemble model based on MLR was developed to validate the robustness of the best dataset. Then, several neural networks (NNs) were trained and validated to build an ENN model to estimate the PM 2.5 concentrations.

PM 2.5 Concentration and MODIS AOD Spatiotemporal Collocation
The four MODIS AOD datasets and the ground-based data were geo-referenced to geographic latitude/longitude (WGS84) coordinates. To explore spatial differences in the AOD selection, the effect of the box size (the number of pixels in the spatial collocation) around the ground sites on the mean AOD was evaluated. The four AOD-SDSs were tested along with ground-based data using three spatial domains ( Figure 1c) and two temporal averaging schemes (denoted "hourly" and "daily") to identify sample AOD data to develop statistical models that best describe the relationship between PM 2.5 , AOD, and the meteorology.
The hourly scheme recognizes that the Aqua satellite crosses the MMA at 13:30 local time (±55 min). Accordingly, ground-level PM 2.5 and meteorological values were processed by averaging the observations measured from 12:00 to 15:00 local time. This scheme tries to make connections with ground-level air masses similar to those observed by MODIS. In the daily scheme, data were processed by averaging the PM 2.5 and 1-h average meteorological observations reported from midnight to midnight, local time. This scheme is used to represent the PM 2.5 data according to the 24-h Mexican standard NOM-025-SSA1-2014 [33], which requires a threshold of 75% data capture (i.e., 18 hourly records) to consider the data as valid and representative for a given day. Missing days were excluded from our analysis. The average AOD at the satellite overpass time was used to represent the AOD mean values in both schemes, against which the ground-level average observations were contrasted.

MLR
Twenty-four different datasets (AOD from four MODIS SDSs, three spatial domains, and two temporal schemes) were analyzed to identify the best dataset, namely, the dataset that best described the relationship between the satellite-derived AOD and the groundlevel meteorological observations with surface-level PM 2.5 in the MMA. The effect of the proposed spatiotemporal averaging schemes on this relationship was evaluated by assessing the MLR models.

PM2.5 Concentration and MODIS AOD Spatiotemporal Collocation
The four MODIS AOD datasets and the ground-based data were geo-referenced to geographic latitude/longitude (WGS84) coordinates. To explore spatial differences in the AOD selection, the effect of the box size (the number of pixels in the spatial collocation) around the ground sites on the mean AOD was evaluated. The four AOD-SDSs were tested along with ground-based data using three spatial domains ( Figure 1c) and two temporal averaging schemes (denoted "hourly" and "daily") to identify sample AOD data to develop statistical models that best describe the relationship between PM2.5, AOD, and the meteorology.
The hourly scheme recognizes that the Aqua satellite crosses the MMA at 13:30 local time (±55 min). Accordingly, ground-level PM2.5 and meteorological values were processed by averaging the observations measured from 12:00 to 15:00 local time. This scheme tries to make connections with ground-level air masses similar to those observed by MODIS. In the daily scheme, data were processed by averaging the PM2.5 and 1-h average meteorological observations reported from midnight to midnight, local time. This scheme is used to represent the PM2.5 data according to the 24-h Mexican standard NOM-025-SSA1-2014 [33], which requires a threshold of 75% data capture (i.e., 18 hourly records) to To sequentially identify the best subset of independent variables to incorporate into the regression equation, all meteorological variables recorded by SIMA (temperature, relative humidity, wind speed, wind direction, rain, pressure, and solar radiation) along with the AOD data were included in the 24 datasets. The final datasets (refined datasets) were selected using a stepwise regression method to achieve the highest level of estimation accuracy possible, and only those meteorological variables that were statistically significant and most significantly correlated with the dependent variable (PM 2.5 ) were retained. Forward and backward stepwise regression procedures were applied, and a significance level (p-value) of 0.05 was used for the inclusion and exclusion of variables.
The refined datasets were tested to identify the best dataset using MLR. The best dataset was selected according to the following criteria. First, the dataset must have data representativeness (number of data, N). Second, once MLR is performed, the mean and variability (standard deviation) of the estimated model should be similar to the mean and variability of the observed dataset. Finally, the dataset should have a high correlation coefficient (R) and low error estimators (i.e., root mean square error (RMSE) and mean absolute error (MAE)).
Because the model performance can be sensitive to the dataset, the robustness of MLR was tested by performing an independent validation. Once the best dataset was selected, an ensemble model was developed according to the following steps. First, 10 iterations were performed, randomly creating datasets from the best dataset and splitting the data into 70% training and 30% validation datasets. Then, MLR was performed independently for each training dataset. PM 2.5 concentrations in each validation dataset were estimated using the regression coefficients (β i ) obtained in the training dataset. The correlation coefficient and error metrics were compared to assure that they were similar in both datasets (training and validation). The β i coefficients from the training dataset were saved at each iteration. Finally, an EMLR model was built by averaging each β i coefficient from the regression equations at the training datasets. The ensemble model was expected to have similar statistical performance to that of the MLR model using the original best dataset.

ANN
Several thousand multiple back-propagation multilayer perceptron-type ANN models (MLP) were developed, trained, and validated using identical topology and neuron functions to estimate the ground-level PM 2.5 . MLP is one of the most utilized feedforward ANN types to solve non-linear function approximation tasks [44]. The network in a MLP typically contains multiple layers and mainly consists of three layers: the input layer of sensory units, the hidden layer of computation nodes, and the output layer [45]. Figure 3 shows the ANN model applied in this study, in which input factors (AOD and meteorological variables) are used with one hidden layer, leading to one outcome. A sigmoid function was used as the transfer function in the hidden layer, and the Levenberg-Marquardt algorithm [46] was employed for the training process. The NNs used in this study were designed using the "simpler-structure principle" [47], which assumes that the optimal structure of a NN should be fairly simple, with several nodes in one hidden layer. Under this principle, training starts with the simplest structure (only one node in the hidden layer) with the aim of detecting a local optimum on the validation error curve. The procedure is performed again with a new node, and the resulting value is compared to the previous minimum value. This is repeated until the model performance does not improve significantly [48]. The early stopping method [49] was used, and the stopping criteria were determined by a predefined threshold number of training iterations and a cross-validation approach [50]. The procedure described above allows the method to avoid overfitting. The best dataset was used to train, test, and validate NNs for the estimation of PM2.5 ground concentrations over the MMA. For each one of the thousand developed NNs, the best dataset was first randomly divided into training and validation subsets using the hold-out validation method [51]: 70% of the data was used for training the NNs, while the remaining 30% of the data was arranged to test the model performance. Then, the 10 NNs with the best performance (highest R and lowest RMSE and MAE values) in the validation subset were selected. The individual predicted PM2.5 concentrations were combined to build an ENN model using a simple averaging technique. Statistical metrics (R, RMSE, and MAE) were computed for the training and validation datasets in each NN to allow consistent comparisons and to evaluate the overall ENN performance. The NNs used in this study were designed using the "simpler-structure principle" [47], which assumes that the optimal structure of a NN should be fairly simple, with several nodes in one hidden layer. Under this principle, training starts with the simplest structure (only one node in the hidden layer) with the aim of detecting a local optimum on the validation error curve. The procedure is performed again with a new node, and the resulting value is compared to the previous minimum value. This is repeated until the model performance does not improve significantly [48]. The early stopping method [49] was used, and the stopping criteria were determined by a predefined threshold number of training iterations and a cross-validation approach [50]. The procedure described above allows the method to avoid overfitting.

Results
The best dataset was used to train, test, and validate NNs for the estimation of PM 2.5 ground concentrations over the MMA. For each one of the thousand developed NNs, the best dataset was first randomly divided into training and validation subsets using the hold-out validation method [51]: 70% of the data was used for training the NNs, while the remaining 30% of the data was arranged to test the model performance. Then, the 10 NNs with the best performance (highest R and lowest RMSE and MAE values) in the validation subset were selected. The individual predicted PM 2.5 concentrations were combined to build an ENN model using a simple averaging technique. Statistical metrics (R, RMSE, and MAE) were computed for the training and validation datasets in each NN to allow consistent comparisons and to evaluate the overall ENN performance.

Effect of the Domain Size on AOD
Descriptive statistics for the tested AOD-SDSs at different domain resolutions are presented in Table 2 including the number of days (N) that AOD data were available in each domain, the percentage of days (% days) with complete data (AOD data availability matches SIMA meteorological valid data, in total 1156 records), and the AOD minimum, maximum, mean, and standard deviation values. Note that a smaller box size yields a higher AOD mean. Table 2. Descriptive statistics of the four AOD-SDSs averaged in the three tested domains (D1, D2, and D3 as per Figure 1).  AOD_3K was the SDS with the largest data availability, highest variability (standard deviation), and the widest difference between the minimum and maximum AODs for all domains. Because the only differences between the 3-km and 10-km algorithms were the way that the pixels were organized and the number of pixels required to proceed with retrieval after all masking and deselection [52], pixels representing bright or inhomogeneous surfaces (as is the case of the MMA surface) that might be discarded during the sorting and discarding procedure at 10 km were retained in the 3-km retrieval.

Days (N) 1 (%) Days
AOD_DB was the SDS with the least variability between the three domains. The resulting AOD_DB means yielded similar values for the three domains, and the difference between the minimum and maximum AODs was bounded, suggesting a homogeneous aerosol load over the MMA. However, the DB algorithm only performs aerosol retrieval for non-spherical and absorbing dust aerosols [42], which do not completely represent the mixture of aerosols (e.g., PM 2.5 ) found in the MMA atmosphere, where dust represents only 10% of its mass and organic carbon (50%), elemental carbon (5%), and soluble inorganic salts such as nitrates, sulfates, and ammonium (35%), account for the balance [36]. The observed weak ability to report high (maximum) and low (minimum) AODs from the DB SDS (compared to AOD_3K) might increase the possibility of the under-or over-estimation of PM 2.5 values when using this SDS.
While the DT surface scheme based on the identification of dark target pixels, mainly vegetated areas [41], does not fully represent the highly heterogeneous surfaces in the MMA, the urban/industrial aerosol type identified by this algorithm is of particular interest in the MMA. However, the number of available data for AOD_DT and AOD_Cb in Domain 1 (D1; N = 4) did not allow meaningful statistics to be estimated. The bright surface from the urban area led to pixels being identified as inappropriate for dark target aerosol retrieval [52].
In addition, there were no differences between AOD_DT and AOD_Cb for D1 and Domain 2 (D2). This may be because, in the smallest domain, NDVI > 0.30, meaning AOD was retrieved using the DT algorithm according to the NDVI criteria (NDVI > 0.30 = DT; NDVI < 0.20 = DB; 0.2 < NDVI < 0.30 = AOD_Cb), or this may indicate that, in the smallest domain, AOD_DT has better confidence than DB because AOD in AOD_Cb is assigned to the algorithm (DT or DB) that has the higher confidence [41]. The NDVI criteria for the merged product describe the transition zones between the vegetated and non-vegetated areas that are typically present in Domain 3 (D3). In D3, the AOD_Cb data availability was slightly larger than that of AOD_DT. In this domain, possibly because of the heterogeneous surface conditions (urban, rural, and natural), the AOD_Cb SDS reported few extra AOD pixels compared to AOD_DT; however, the descriptive statistics for both datasets were not statistically different.

MLR Models
As a first step, simple linear regressions between PM 2.5 and AOD were performed, yielding R < 0.10 for all domains, underlining the need to incorporate a larger set of predictor variables, as in MLR. A total of 1156 meteorological records were considered valid once the threshold criteria were applied. The stepwise regression procedure showed that the meteorological parameters, along with the AOD data, that best estimate the PM 2.5 concentrations over the MMA include temperature (T), relative humidity (RH), wind speed (WS), and wind direction (WD) (p-value < 0.05 and Mallows' Cp = 7.00): The statistical metrics (based on model fitting) obtained from performing MLR on each of the 24 refined datasets (4 AOD-SDSs, three spatial domains, and two temporal schemes, incorporating only the meteorological parameters indicated in Equation (1)), are presented in Table 3.
In the hourly scheme, Domain 1 represents the spatial collocation to the satellite data on the ground-based monitoring sites, coinciding with the overpass time of the AQUA satellite. AOD data availability in Domain 1 was scarce. The use of the AOD_3K in the MLR (MLR_1) for Domain 1 yielded the greatest variability (Std. Dev. = 13.24 and 9.08, hourly and daily averaging schemes, respectively), the lowest correlation coefficient (R = 0.35 and 0.48, hourly and daily averaging schemes, respectively) and the largest errors for both temporal averaging schemes. Domain 2 and Domain 3 showed similar performance among them. In the hourly averaging scheme, Domain 2 showed better performance than Domain 3, as opposed to the use of daily scheme, in which Domain 3 performed better than Domain 2. The best performance using AOD_3K was obtained using the daily scheme over Domain 3.  When MLR was estimated using AOD_DT (MLR_2), there were no available pixels in D1; therefore, the statistical parameters could not be calculated. In D2, for the hourly and daily schemes, AOD_DT and AOD_Cb showed the best performance of the four SDSs and the three domains. However, the data availability was not sufficient to build a robust predictor model, as the available data only covered 6.58% of the study period. D2 performed better than D3, even though data availability was greater in D3. For D3 in the hourly scheme, AOD_DT showed the best performance. Because AOD_Cb SDS had the same AOD values in D1 and D2 as in AOD_DT, MLR_2 and MLR_4 showed the same statistical performance for both temporal schemes. For MLR_4 in D3, the daily scheme performed better than the hourly scheme.
MLR_3 in D1 showed the worst statistical performance (R = 0.29 and R = 0.44 for the hourly and daily schemes, respectively) of all the evaluated SDSs, where AOD_DB was a weak explanatory variable (p-values of 0.125 and 0.079 for the hourly and daily schemes, respectively). D2, with the hourly scheme, performed better than D3, even though the latter had greater data availability. In contrast, with the daily scheme, D3 did better than D2. Of the evaluated MLRs, MLR_3 presented the best performance over D3 using the daily scheme.
A one-way analysis of variance (ANOVA) revealed significant statistical differences between the four tested AOD-SDSs (p-value < 0.05) in D3; however, the statistical performance of all the MLR models in D3 was similar in terms of the correlation coefficient and the error metrics without taking into consideration the selection of the SDS. D3 provides regional-averaged values and is characterized by a heterogeneous surface and therefore NDVI ( Figure S1) and a mix of aerosols from different sources (i.e., urban, industrial, and agricultural). Because each AOD algorithm is built under different assumptions and is more suitable to specific surfaces and aerosol types, the observed heterogeneity in D3 allows the different algorithms to meet their required conditions for AOD retrieval.

Model Selection and General Seasonal Variation of the Relevant Variables
Due to its high correlation coefficient, low error estimators, and high data availability, the refined dataset MLR_1 in D3 using the daily scheme was selected in this study as the best dataset. The data distribution of the variables in the best dataset is shown in Figure 4. In general, PM 2.5 and the meteorological variables displayed behavior typical of the region [32,53].  The ground-based temperature (Figure 4a) increased from April to August. July and August were the hottest months, with mean daily maximum and minimum temperatures of 33 °C and 25 °C, respectively. December and January were the coldest months according to the mean daily maximum (22 °C) and minimum (3 °C) temperatures. The RH showed peak daily mean values above 75% in July and above 96% on a few days in De- The ground-based temperature (Figure 4a) increased from April to August. July and August were the hottest months, with mean daily maximum and minimum temperatures of 33 • C and 25 • C, respectively. December and January were the coldest months according to the mean daily maximum (22 • C) and minimum (3 • C) temperatures. The RH showed peak daily mean values above 75% in July and above 96% on a few days in December. The predominant wind pattern in the MMA was east to west (Figure 4b). During spring and summer, the WS reached its maximum values of greater than 10 km h −1 . Minimum wind speeds of approximately 3-4 km h −1 occurred in the winter and fall seasons, blowing mainly from the northeast and only blowing from the northwest 10% of the days.
The ground-measured PM 2.5 and AOD_3K showed seasonal differences (Figure 4a). The highest observed daily-average PM 2.5 concentrations occurred in winter (64.0 µg m −3 ), whereas the highest AOD_3K was reported in spring (0.89). In winter, cold weather and the corresponding increase in emissions from combustion processes, low mixing heights, and general stagnant conditions resulted in high surface PM 2.5 ; however, these same meteorological conditions limited the dispersion of fine PM to the free troposphere (above the mixing layer), where there appeared to be low loadings of aerosols as represented by AOD. The latter may also be partially explained by the transit of cold fronts from the north, typical of this season, that help to clean the upper atmosphere (on average, from September to May, 50 cold fronts traverse the region, with the most intense occurring during January and February [54]). Therefore, in winter, the aerosol load integrated over the entire column was low.
The opposite occurs during spring, when the mixing layer height increases substantially; secondary aerosol production is enhanced as a result of higher temperatures and solar radiation, the surface wind intensity increases, and transport is larger (to and from the MMA). Spring is also a period when forest fires occur in the region. In general, spring and early summer are dry periods, increasing dust resuspension. Therefore, aerosol production and transport are enhanced over the entire atmospheric column, which could explain the AOD peak observed in spring. The surface-level PM 2.5 was at its lowest levels in summer; however, high AOD levels were still reported in summer. During this time, the surface wind speed, temperatures, and mixing heights were at their highest values. Accumulation of air pollutants does not occur over the MMA; however, transport effects result in a fair load of aerosols being present throughout the column. Fall is characterized as the rainy season, and therefore, its AOD level is lower than those in spring and summer.

Performance of the Ensemble Models
The regression coefficients (β i ) obtained from each iteration during the EMLR procedure as well as β i of the EMLR and MLR models using the best dataset (as defined in Section 3.3) are shown in Table 4. A one-way ANOVA revealed no statistically significant differences (p-value ≈ 0.97) between the PM 2.5 values estimated from the conventional MLR (95% confidence intervals for the mean: 27.48-28.20) and those estimated using EMLR (95% confidence intervals for the mean: 27. 47-28.19). Therefore, performing MLR using the best dataset is robust and the dataset does not influence the model performance.
Because it is appropriate to use the best dataset to enhance the PM 2.5 estimation, 10 trained and validated NNs were used to construct the ENN. Figure 5 shows the statistical performance of each NN. In the training fittings, the R values ranged between 0.74 and 0.75, whereas in the validation datasets, the R values ranged between 0.68 and 0.71. The differences between the R values of the training fittings and those of the validation datasets ranged between 0.04 and 0.06. The differences in RMSE ranged from −1.18 µg m −3 to 0.02 µg m −3 , and the differences in the MAE ranged from −0.87 µg m −3 to −0.07 µg m −3 . These results suggest that the model was not substantially overfitted. These results suggest that the model was not substantially overfitted.

Spatial and Temporal Variability
The use of collocated sites, a widely employed method for estimating PM2.5 using satellite-derived AOD data (e.g., [55,56]), was not useful for the region of interest, where the MODIS AOD values were irregularly distributed in space and time. In areas with highly heterogeneous surface coverage, as is the case for the MMA, the retrieval frequency of AOD is still a challenge at fine resolutions (D1 and D2). Still, there is potential for the use of the averaged AOD for local and regional PM2.5 estimations (D3).
Moreover, cloud cover can also affect PM2.5 assessments because satellite retrieval methods based on solar reflectance depend on the availability of clear sky, in contrast to ground-based monitors, which measure PM concentrations regardless of the cloud conditions [55]. Even though cloud cover data are not routinely reported for the MMA, previous studies suggest that there is a seasonal cloud cover prevailing pattern as easterly anticyclonic air masses highly influence the MMA and are commonly associated with clear sky conditions [57], primarily during the spring and summer months [53].
Furthermore, there is a spatial and temporal mismatch between the averaged satellite AOD and the averaged ground-based PM2.5 [58]. The correlation between AOD and the ground-based PM2.5 varies with temporal averaging and geographic location, as shown in Shin et al. (2019) [7]. However, the moderate correlation observed in this study may have been influenced by the relatively low AOD data availability. The results indicate that the data availability increases when the size of the domain increases, with D3 providing the best AOD data representativeness. Therefore, the data availability depends on the spatial scheme rather than the temporal scheme. The correlation coefficient and error estimators depend on both the spatial and temporal schemes; however, the correlation coefficients change significantly as a function of the box size (i.e., errors are larger for the smallest domain), and are larger when using the hourly scheme as opposed to the daily scheme. In summary, the correlation between PM2.5 and AOD, to some extent, depends on the spatial and temporal average schemes of MODIS AOD [59].
A one-way ANOVA revealed that there was a statistical difference (p-value < 0.05) between the observed hourly-averaged PM2.5 values (29.39 ± 12.32 μg m −3 ) and the observed daily-averaged PM2.5 values (27.88 ± 8.52 μg m −3 ). It was observed that the 24-h averages used in the daily scheme suppressed the extreme PM2.5 values observed in the hourly scheme. The mean and standard deviation of the MLRs performed for the 24 refined datasets were in the same range as the observed dataset for both the hourly and

Spatial and Temporal Variability
The use of collocated sites, a widely employed method for estimating PM 2.5 using satellite-derived AOD data (e.g., [55,56]), was not useful for the region of interest, where the MODIS AOD values were irregularly distributed in space and time. In areas with highly heterogeneous surface coverage, as is the case for the MMA, the retrieval frequency of AOD is still a challenge at fine resolutions (D1 and D2). Still, there is potential for the use of the averaged AOD for local and regional PM 2.5 estimations (D3).
Moreover, cloud cover can also affect PM 2.5 assessments because satellite retrieval methods based on solar reflectance depend on the availability of clear sky, in contrast to ground-based monitors, which measure PM concentrations regardless of the cloud conditions [55]. Even though cloud cover data are not routinely reported for the MMA, previous studies suggest that there is a seasonal cloud cover prevailing pattern as easterly anticyclonic air masses highly influence the MMA and are commonly associated with clear sky conditions [57], primarily during the spring and summer months [53].
Furthermore, there is a spatial and temporal mismatch between the averaged satellite AOD and the averaged ground-based PM 2.5 [58]. The correlation between AOD and the ground-based PM 2.5 varies with temporal averaging and geographic location, as shown in Shin et al. (2019) [7]. However, the moderate correlation observed in this study may have been influenced by the relatively low AOD data availability. The results indicate that the data availability increases when the size of the domain increases, with D3 providing the best AOD data representativeness. Therefore, the data availability depends on the spatial scheme rather than the temporal scheme. The correlation coefficient and error estimators depend on both the spatial and temporal schemes; however, the correlation coefficients change significantly as a function of the box size (i.e., errors are larger for the smallest domain), and are larger when using the hourly scheme as opposed to the daily scheme. In summary, the correlation between PM 2.5 and AOD, to some extent, depends on the spatial and temporal average schemes of MODIS AOD [59].
A one-way ANOVA revealed that there was a statistical difference (p-value < 0.05) between the observed hourly-averaged PM 2.5 values (29.39 ± 12.32 µg m −3 ) and the observed daily-averaged PM 2.5 values (27.88 ± 8.52 µg m −3 ). It was observed that the 24-h averages used in the daily scheme suppressed the extreme PM 2.5 values observed in the hourly scheme. The mean and standard deviation of the MLRs performed for the 24 refined datasets were in the same range as the observed dataset for both the hourly and daily schemes. Nonetheless, in this study, the use of a daily scheme yielded the best correlation coefficients in contrast to the study of Guo et al. (2017) [59], in which it was concluded that the best model performance was achieved under an hourly scheme.
The best dataset includes 695 days with valid surface and satellite data. The hourly ground-based PM 2.5 concentrations were averaged, displaying a multimodal pattern (Figure 7), with peaks between 12:00 and 14:00 (likely due to air pollutant emissions and secondary formation), between 18:00 and 20:00 (likely due to emissions during the evening traffic rush-hour), and between 23:00 and 01:00 (possibly due to a decrease in the mixing layer height). The minimum averaged concentration occurred between 08:00 and 10:00. Although traffic emissions increase at morning rush hour (7:00-9:00 h), an increment in the average PM 2.5 concentration was observed starting at 10:00, probably due to the formation of secondary organic aerosol (SOA) and other secondary species (i.e., sulfates) enhanced by photochemical activity. For example, in the MMA, SOA can represent as much as 59% of the PM 2.5 , as reported by   [60]. For most hours (except between 08:00 and 10:00), the mean ground-based PM 2.5 of the hourly and daily scheme are within the confidence interval of the ground-based hourly PM 2.5 . This might explain why, even though AOD was reported only at the satellite overpass time (between 12:00 and 15:00), good statistical performance was achieved when using the daily scheme. However, this might not be the case for all regions and datasets. daily schemes. Nonetheless, in this study, the use of a daily scheme yielded the best correlation coefficients in contrast to the study of Guo et al. (2017) [59], in which it was concluded that the best model performance was achieved under an hourly scheme. The best dataset includes 695 days with valid surface and satellite data. The hourly ground-based PM2.5 concentrations were averaged, displaying a multimodal pattern (Figure 7), with peaks between 12:00 and 14:00 (likely due to air pollutant emissions and secondary formation), between 18:00 and 20:00 (likely due to emissions during the evening traffic rush-hour), and between 23:00 and 01:00 (possibly due to a decrease in the mixing layer height). The minimum averaged concentration occurred between 08:00 and 10:00. Although traffic emissions increase at morning rush hour (7:00-9:00 h), an increment in the average PM2.5 concentration was observed starting at 10:00, probably due to the formation of secondary organic aerosol (SOA) and other secondary species (i.e., sulfates) enhanced by photochemical activity. For example, in the MMA, SOA can represent as much as 59% of the PM2.5, as reported by   [60]. For most hours (except between 08:00 and 10:00), the mean ground-based PM2.5 of the hourly and daily scheme are within the confidence interval of the ground-based hourly PM2.5. This might explain why, even though AOD was reported only at the satellite overpass time (between 12:00 and 15:00), good statistical performance was achieved when using the daily scheme. However, this might not be the case for all regions and datasets.

Seasonal Variation in the Model Performance
The EMLR model tended to underestimate the higher ground-based PM2.5 concentrations and overestimate the lower concentrations. However, the use of ENN reduced the under and over-estimation gap. Similar to our findings, Christopher (2009a and2009b) [61,62] and Zaman et al. (2017) [63] found that NN models improved PM estimations when compared to linear models.
Regarding the error metrics, RMSE was larger than MAE, presumably as a result of the influence of high PM2.5 episodes in the MMA that were underestimated by the models such as those that occurred in winter. The meteorological conditions in winter can be

Seasonal Variation in the Model Performance
The EMLR model tended to underestimate the higher ground-based PM 2.5 concentrations and overestimate the lower concentrations. However, the use of ENN reduced the under and over-estimation gap. Similar to our findings, Christopher (2009a and2009b) [61,62] and Zaman et al. (2017) [63] found that NN models improved PM estimations when compared to linear models.
Regarding the error metrics, RMSE was larger than MAE, presumably as a result of the influence of high PM 2.5 episodes in the MMA that were underestimated by the models such as those that occurred in winter. The meteorological conditions in winter can be unfavorable to pollutant dispersion/removal and include frequent occurrences of stagnant weather associated with high pressures, temperature inversion, and diminished precipitation. In addition, the poor air quality in winter is often attributed to large air pollutant emissions resulting from enhanced fossil fuel use for heating purposes [64].
In this study, different seasonal patterns were found between the ground-based PM 2.5 and MODIS AOD correlations, which might arise from their complex relationship as influenced by the aerosol types, aerosol chemical compositions, aerosol size distributions, aerosol vertical profiles, and meteorology [47,[65][66][67].
To further explore the observed seasonal variations in the model performance, the fittings from the EMLR and ENN models were divided into four subsets corresponding to each season: winter (December, January, and February), spring (March, April, and May), summer (June, July, and August), and fall (September, October, and November). The model fittings for both EMLR and ENN exhibited a large degree of seasonal variability (Figure 8). The ENN model (R = 0.80, 0.85, 0.51, and 0.72 in winter, spring, summer, and fall, respectively) performed better than the EMLR model (R = 0.65, 0.67, 0.29, and 0.59 in winter, spring, summer, and fall, respectively) for all seasons. 0.59 in winter, spring, summer, and fall, respectively) for all seasons.
Both the EMLR and ENN models performed better in winter and spring when using seasonal subsets compared to the entire dataset ( Figure 6), as found in Li et al., 2017 [68]. The models achieved their lowest performance in summer. During summer, the AOD-PM2.5 relationship may be influenced by the winds reaching maximum speeds greater than 10 km h −1 (Figure 4b), enhancing the contribution of dust particles. In addition, Mancilla et al. (2019) [32] found that in summer, geological material accounted for 45% of the chemical composition of coarse particles (PMc = PM10 − PM2.5) and 6% of PM2.5, suggesting the presence of dust in the MMA atmosphere. Large amounts of atmospheric dust might result in high AOD values [69]. Mancilla et al. (2019) [32] also reported that (NH4)2SO4 can contribute to as much as 54% of the total mass of PM2.5 in the MMA. According to Gao and Zhang (2014) [70], the extinction coefficient increases with increased SO2 −4 , therefore yielding larger AOD values, as it is the integral of the aerosol extinction coefficient. This also implies that AOD values are influenced not only by aerosol concentrations, but also by aerosol chemical composition.
Estimation of PM2.5 absolute values using AOD is still a challenge and can be uncertain due to meteorological and seasonal conditions, location (e.g., near major roadways or industrial complexes), or aerosol composition. However, the use of AOD for classifying relative values and categories has been demonstrated to be useful in applications based on pollution concentration intervals rather than absolute values (e.g., public health assessment studies, Air Quality Index), and warrants further research [71]. Both the EMLR and ENN models performed better in winter and spring when using seasonal subsets compared to the entire dataset ( Figure 6), as found in Li et al., 2017 [68]. The models achieved their lowest performance in summer. During summer, the AOD-PM 2.5 relationship may be influenced by the winds reaching maximum speeds greater than 10 km h −1 (Figure 4b), enhancing the contribution of dust particles. In addition, Mancilla et al. (2019) [32] found that in summer, geological material accounted for 45% of the chemical composition of coarse particles (PMc = PM 10 − PM 2.5 ) and 6% of PM 2.5 , suggesting the presence of dust in the MMA atmosphere. Large amounts of atmospheric dust might result in high AOD values [69]. Mancilla et al. (2019) [32] also reported that (NH 4 ) 2 SO 4 can contribute to as much as 54% of the total mass of PM 2.5 in the MMA. According to Gao and Zhang (2014) [70], the extinction coefficient increases with increased SO 2 −4 , therefore yielding larger AOD values, as it is the integral of the aerosol extinction coefficient. This also implies that AOD values are influenced not only by aerosol concentrations, but also by aerosol chemical composition.
Estimation of PM 2.5 absolute values using AOD is still a challenge and can be uncertain due to meteorological and seasonal conditions, location (e.g., near major roadways or industrial complexes), or aerosol composition. However, the use of AOD for classifying relative values and categories has been demonstrated to be useful in applications based on pollution concentration intervals rather than absolute values (e.g., public health assessment studies, Air Quality Index), and warrants further research [71].

Contributions in the Context of Latin America
As stated earlier, few studies have been conducted in the Latin American region to establish the relationship between AOD and PM 2.5 . The results obtained here show the feasibility of using averaged satellite-derived AOD for local and regional ground-level PM 2.5 estimations and reinforce the assumption that spatiotemporal variations in the AOD-PM 2.5 relationship need to be understood on a region-by-region basis.  [16] used MODIS MOD09CMA AOD data with a spatial resolution of 0.05 • as a pollution proxy in La Paz, Lima, and Bogota, by qualitatively comparing AOD and PM 10 data.
Téllez-Rojo et al. (2020) [19] used MODIS Multiangle Implementation of Atmospheric Correction (MAIAC) data with a spatial resolution of 1 km, along with interpolated PM 2.5 measurements from Mexico City monitoring network stations, to derive PM 2.5 estimates from a hybrid spatiotemporal model, and reconstruct long-and short-term spatially resolved population exposure to PM 2.5 for epidemiological studies [72]. Their model performance was good even for days without AOD values (cross-validated R 2 = 0.72 and RMSE = 5.42 µg/m 3 ). In addition, Vu et al. (2019) [20] combined AOD 1-km spatial resolution data retrieved using the MAIAC algorithm, meteorological fields from the European Center for Medium-Range Weather Forecasts, parameters from the Weather Research and Forecasting model coupled with chemistry, and land use variables to fit a random forest (RF) model against ground measurements from 16 monitoring stations, with the objective of developing a PM 2.5 exposure model for Lima, Perú. The RF model performance was as follows: cross-validated R 2 = 0.70 and RMSE = 5.95 µg/m 3 , indicating a good fit between the predictors and the ground measurements.
Recently, Santos-Damascena et al. (2021) [73], when performing a multi-country assessment of urban health determinants in Latin America, characterized PM 2.5 levels in 366 cities using satellite-derived estimates. Their study used data from the Atmospheric Composition Analysis Group of Dalhousie University and examined high-resolution AODderived from the MAIAC algorithm as a predictor of ground-level PM concentrations in the Metropolitan Area of Sao Paulo (MASP). Correlations between the ground-level PM 10 concentrations (at the hour of the satellite overpass, diurnal averaging, and daily averaging intervals) and the collocated AODs were found to be weak (ranging from −0.20 to 0.03), in accordance with a previous study by Natali (2008) [74]. Santos-Damascena et al. (2021) [73] concluded that high-resolution AOD data from the current MAIAC version were not well suited to predict ground-level PM concentrations in the MASP. However, to increase confidence in their conclusions, they compared 2012-2017 AOD retrievals from the DT algorithm at 3 km × 3 km (collection 6.1) with those from an AERONET site and obtained strong correlations for the Terra and Aqua retrievals (R = 0.83 and 0.88, respectively) that were comparable to those obtained for the MAIAC retrievals. The findings of Santos Damascena et al. (2021) [73] suggest that AOD does not necessarily correlate with PM 2.5 ; therefore, there can be other factors influencing this relationship.
In northeastern Mexico, only Carmona et al. (2020) [17] has evaluated the ability of MERRA-2 aerosol components to represent PM 2.5 ground concentrations and to develop and validate an ENN model to estimate monthly average PM 2.5 ground concentrations. The results of their comparison between the ENN and ground measurements were as follows: R =~0.90, RMSE = 1.81 µg m −3 , and MAE = 1.31 µg m −3 . Apart from comparing four algorithms, this study differs from that of Carmona et al. in having a better temporal resolution (hourly and daily) and using the local meteorology.

Conclusions
The AOD-PM 2.5 relationship is complex and, as observed in other regions of the world, for northeastern Mexico, it requires a local assessment of different spatiotemporal averaging schemes to obtain the best correlation. Our results also indicate the need to incorporate other surface-level measured meteorological parameters as predictor variables in addition to AOD to better estimate the PM 2.5 levels. Using the AOD_3K MODIS product (based in the DT algorithm), a 24-h average (daily scheme) over D3 (0.500 • × 0.625 • ) was selected as the dataset that best estimated PM 2.5 concentrations in the MMA as a result of its larger data availability and the mixture of aerosols present in this domain. In addition, the DT algorithm is best suited for NDVI > 0.30, which is typical of vegetative coverage, the dominant coverage in D3. This best dataset was used to develop and validate robust EMLR and non-overfitted ENN models.
The EMLR and ENN models developed here were able to estimate the PM 2.5 concentrations relatively well compared to other applications. In general, the models overpredicted the lowest PM 2.5 concentrations and underpredicted the highest concentrations; this was most evident when using EMLR. Overall, ENN showed better statistical performance than EMLR. When the predicting models were constructed by season, better statistical performances were obtained. The best performance was obtained for seasons with calm or relatively calm winds and low to mild temperatures (winter and spring). The worst performance was observed in summer, when the high winds promote long-range transport and dust resuspension is enhanced, influencing the entire aerosol column. As new retrieval algorithms (with regional calibration), collections, and machine-learning techniques become available, improved performances should be attained.
The methodologies and models developed in this study can help assess local and regional distributions of PM 2.5 in the MMA, improving the knowledge and understanding of the performance of MODIS AOD-PM 2.5 retrieval algorithms over urban, mixed-use land coverage areas with limited ground-based observations, address information gaps, help improve understanding of the exposure-response relationships associated with PM 2.5 , and serve as a stepping stone toward designing effective environmental regulations.