Spatial and Temporal Distribution of PM 2.5 Pollution over Northeastern Mexico: Application of MERRA-2 Reanalysis Datasets

: Aerosol and meteorological remote sensing data could be used to assess the distribution of urban and regional ﬁne particulate matter (PM 2.5 ), especially in locations where there are few or no ground-based observations, such as Latin America. The objective of this study is to evaluate the ability of Modern-Era Retrospective Analysis for Research and Application, version 2 (MERRA-2) aerosol components to represent PM 2.5 ground concentrations and to develop and validate an ensemble neural network (ENN) model that uses MERRA-2 aerosol and meteorology products to estimate the monthly average of PM 2.5 ground concentrations in the Monterrey Metropolitan Area (MMA), which is the main urban area in Northeastern Mexico (NEM). The project involves the application of the ENN model to a regional domain that includes not only the MMA but also other municipalities in NEM in the period from January 2010 to December 2014. Aerosol optical depth (AOD), temperature, relative humidity, dust PM 2.5 , sea salt PM 2.5 , black carbon (BC), organic carbon (OC), and sulfate (SO 42 − ) reanalysis data were identiﬁed as factors that signiﬁcantly inﬂuenced PM 2.5 concentrations. The ENN estimated a PM 2.5 monthly mean of 25.62 µ g m − 3 during the entire period. The results of the comparison between the ENN and ground measurements were as follows: correlation coe ﬃ cient R ~ 0.90; root mean square error = 1.81 µ g m − 3 ; mean absolute error = 1.31 µ g m − 3 . Overall, the PM 2.5 levels were higher in winter and spring. The highest PM 2.5 levels were located in the MMA, which is the major source of air pollution throughout this area. The estimated data indicated that PM 2.5 was not distributed uniformly throughout the region but varied both spatially and temporally. These results led to the conclusion that the magnitude of air pollution varies among seasons and regions, and it is correlated with meteorological factors. The methodology developed in this study could be used to identify new monitoring sites and address information gaps.


Introduction
Ambient particulate matter (PM) with aerodynamic diameter less than 2.5 µm (PM 2.5 ) is known to have adverse effects on visibility, ecosystems, and climate change [1]. High levels of airborne particles also present a significant health risk, and their effects on human health are well-documented in the The combination of ground-level concentrations and satellite retrievals into a robust model is not a straightforward task, but machine learning techniques have proven to be a valuable tool in this task. The use of machine learning techniques, including artificial neural networks (ANNs), for air quality applications has been explored for some time [24], and its feasibility has been demonstrated [25]. A neural network uses artificial neurons, which are the smallest units in data processing [26]. The types of ANN include the back-propagation neural network (BPNN) [27,28], multilayer perceptron (MLP) [29,30], radial basis function (RBF) [31,32], and adopted neuro-fuzzy inference systems (ANFIS) [33][34][35]. While most of the available work is aimed at long-term forecasting of criteria pollutants using meteorological variables and source emissions as predictors [36], the use of satellite-derived aerosol optical depth (AOD) variables or reanalysis data for training and testing ANNs is also being investigated with favorable results [25]. Still, the majority of satellite applications have not been tested in developing countries, typically having higher PM 2.5 levels and distinctive emission source profiles [36,37].
The objective of this study is to evaluate the ability of MERRA-2 aerosol products to represent seasonal variations in the PM 2.5 ground concentrations in the Monterrey Metropolitan Area (MMA) and in Northeastern Mexico. The paper's scope includes the development and validation of a neural network model that uses MERRA-2 products to analyze the spatial-temporal distribution of PM 2.5 concentrations over a wide domain that includes not only the MMA but also other municipalities in Northeastern Mexico. To the authors' best knowledge, this study constitutes one of the first applications of reanalysis data (i.e., MERRA-2 aerosol and meteorology products) combined with neural network model for the estimation of PM 2.5 in a Latin American urban area. It is also the first attempt to use MERRA-2 data to estimate PM 2.5 concentrations in the MMA and to analyze the spatial and temporal distributions of PM 2.5 pollution over NEM.

Data and Methods
An overview of the methodology is shown in Figure 1. In general, first, meteorological and aerosol data from ground monitoring stations and MERRA-2 datasets were collected, and spatially and temporally arranged into a new grid. Then, an ENN was developed and validated. The ENN relates ground-based and MERRA-2 data to estimate PM 2.5 within the Monterrey Metropolitan Area (MMA) for a period covering from January 2010 to December 2014. This stage allowed for the evaluation of the ability of MERRA-2 products to represent ground-based PM 2.5 measurements and the identification of the predictor variables to be incorporated to the ENN. Ten trained and cross-validated NNs were employed to construct an ensemble NN for the estimation of PM 2.5 monthly averages using the resulting dataset. Finally, a method that uses the neural networks developed in the first stage and the MERRA-2 reanalysis data was used to estimate PM 2.5 in a gridded domain covering the NEM. Based on these data, monthly pollution maps were generated to analyze the spatial and temporal distributions of PM 2.5 concentration fields. Statistical metrics-linear correlation coefficient R, Root Mean Squared Error (RMSE), and Mean Absolute Error (MAE)-were computed to evaluate model performance. Details are provided below.

Study Area
Located in Northeastern Mexico, the MMA covers 7,615 km 2 and is the third largest urban area in Mexico, with a population of 4.7 million people, distributed in 18 municipalities [38]. The MMA concentrates over 40% of the population in this region, and is the second largest industrial center in the country, having many facilities in several productive sectors (e.g., cement, glass, metallurgy, petrochemical, food processing, electric utilities, assembly, etc.) [39].
The average elevation of the urbanized area is around 550 m.a.s.l. although it is surrounded by mountains in three of the four cardinal directions with varying heights that reach 3,400 m.a.s.l., which alters the natural dispersion of airborne particles, thus limiting the air quality in this zone [40]. The distance from Monterrey to the nearest seashore is approximately 315 km. Specifically, this study  In this study, from January 2010 to December 2014, hourly PM2.5 concentrations were retrieved from the SIMA network at five urban air-monitoring sites (Southeast, Northeast, Downtown, Northwest, and Southwest). These sites were selected because they provided the long-term data required for this research. Also, they are the only sites that provide continuous and valid PM2.5 records during the study period. The PM2.5 instruments were based on the beta-ray attenuation method (BAM) and continuous weighing with microbalance, which provided automatic, continuous measurements and concentration recording.

PM2.5 and Meteorological Data from MERRA-2
Reanalysis combines model fields with observations distributed irregularly in space and time in a spatially complete gridded meteorological dataset based on an unchanging model [52]. Reanalysis products are obtained using a fixed data assimilation scheme and a global climate model that ingest all available observations every 6-12 h. The MERRA-2 is based mainly on the Goddard Earth Observing System Model, Version 5 (GEOS-5), which is a weather and climate capable model provided with atmospheric circulation and composition as well as oceanic and land components [17]. MERRA-2 is not only a meteorological reanalysis but also includes the assimilation of aerosol observations based on a version of the GEOS-5 model that is radiatively coupled to the Goddard chemistry, aerosol, radiation, and transport (GOCART) aerosol module. It includes the assimilation of bias-corrected AOD from both MODIS sensors and the Advanced Very High-Resolution Radiometer instruments, non-bias-corrected AOD from the Multiangle Imaging SpectroRadiometer (MISR) over bright surfaces, and AOD from the Aerosol Robotic Network (AERONET) [17,18,53].
MERRA-2 integrates observational data from earth observing satellites (EOS) and the GEOS-5 model to simulate the hydrological cycle of Earth [52] since the beginning of the satellite era [54]. The MERRA-2 system has many of the same basic features as the MERRA system, which were described The MMA frequently exceeds the national air quality standards for ozone (O 3 ), PM 10 , and PM 2.5 [41]. According to the 2016 National Emission Inventory [42], 9,507 tons of PM 2.5 were emitted in the MMA in 2016, 13% of which came from point sources, 2% from non-road mobile sources, 77% from on-road mobile sources, and 7% from area sources. Previous studies on PM 2.5 source apportionment based on a chemical characterization and the use of organic molecular markers have also been conducted in the MMA [43,44]. These studies found that anthropogenic sources were a major contribution to secondary PM 2.5 , where vehicle exhausts accounted for about 64% of the PM 2.5 . Cooking operations and biomass burning were also found to be important sources [45]. Other sources identified in the MMA were industrial activities, garbage burning, biogenic emissions, and resuspended dust [46].
In contrast to other Mexican cities that have improved their air quality, the average concentrations of PM 2.5 in the MMA increased by~5 µg m −3 between 2013 and 2015 [47]. In routine monitoring network, annual averages of PM 2.5 were found in the range of 20-34 µg m −3 , which exceeded the corresponding national air quality standards of 12 µg m −3 (annual average) [48]. Annual seasonal variations in the concentration of PM 2.5 were observed, in which the highest concentration was in winter, and the lowest concentration was in the first half of the fall season [49].

Ground-Based PM 2.5 and Meteorological Data
Since 1993, Nuevo León state authorities have operated a monitoring system for air quality. The Integrated Environmental Monitoring System (SIMA) currently consists of 13 monitoring stations located in the four cardinal directions and in the center of the city (Figure 2). Criteria pollutants (O 3 , SO 2 , NO x , CO, PM 2.5 and PM 10 ) and meteorological variables (i.e., temperature, relative humidity, wind speed, wind direction, pressure, precipitation, and solar radiation) are monitored continuously, and the data are summarized in hourly averages. Calibration, maintenance procedures, and quality assurance/quality control (QA/QC) follow the protocols established in the Mexican standards NOM-035-SEMARNAT-1993 [50] and NOM-156-SEMARNAT-2012 [51].
In this study, from January 2010 to December 2014, hourly PM 2.5 concentrations were retrieved from the SIMA network at five urban air-monitoring sites (Southeast, Northeast, Downtown, Northwest, and Southwest). These sites were selected because they provided the long-term data required for this research. Also, they are the only sites that provide continuous and valid PM 2.5 records during the study period. The PM 2.5 instruments were based on the beta-ray attenuation method (BAM) and continuous weighing with microbalance, which provided automatic, continuous measurements and concentration recording.

PM 2.5 and Meteorological Data from MERRA-2
Reanalysis combines model fields with observations distributed irregularly in space and time in a spatially complete gridded meteorological dataset based on an unchanging model [52]. Reanalysis products are obtained using a fixed data assimilation scheme and a global climate model that ingest all available observations every 6-12 h. The MERRA-2 is based mainly on the Goddard Earth Observing System Model, Version 5 (GEOS-5), which is a weather and climate capable model provided with atmospheric circulation and composition as well as oceanic and land components [17]. MERRA-2 is not only a meteorological reanalysis but also includes the assimilation of aerosol observations based on a version of the GEOS-5 model that is radiatively coupled to the Goddard chemistry, aerosol, radiation, and transport (GOCART) aerosol module. It includes the assimilation of bias-corrected AOD from both MODIS sensors and the Advanced Very High-Resolution Radiometer instruments, non-bias-corrected AOD from the Multiangle Imaging SpectroRadiometer (MISR) over bright surfaces, and AOD from the Aerosol Robotic Network (AERONET) [17,18,53].
MERRA-2 integrates observational data from earth observing satellites (EOS) and the GEOS-5 model to simulate the hydrological cycle of Earth [52] since the beginning of the satellite era [54]. The MERRA-2 system has many of the same basic features as the MERRA system, which were described by Rienecker et al. (2011) [52], but it includes several important updates to its physical parameterizations (e.g., the use of a cubed-gridded sphere, increased re-evaporation of frozen precipitation and cloud condensate, changes to the background gravity wave drag, and an improved relationship between ocean surface roughness and ocean surface stress) and analysis algorithms (e.g., the use of normalized pseudo relative humidity and updated background error statistics). The updates included numerous additional satellite observations both before and after the introduction of NOAA-18 in 2005 by the National Oceanic and Atmospheric Administration (NOAA), the use of version 2.1.3 of the Community Radiative Transfer Model (CRTM) of the assimilation of all satellite radiances, the bias correction of aircraft temperature observations, mass conservation and water balance, observation-corrected precipitation forcing, sea surface temperature (SST), and sea-ice concentration (SIC) boundary conditions based on high-resolution daily products [55].
In this study, air temperature, relative humidity, eastward wind component, and northward wind component were extracted from M2IMNPASM three-dimensional, monthly time-averaged, assimilated meteorological fields (instM_3d_asm_Np) at 850 hPa [56]. The surface wind speed was extracted from M2TMNXFLX two-dimensional, monthly time-averaged, surface flux diagnostics files (tavgM_2d_flx_Nx) [57]. The aerosol components were extracted from M2TMNXAER two-dimensional, monthly time-averaged, aerosol diagnostics files (tavgM_2d_aer_Nx) [58]. All scientific datasets (SDS) used in this research had a spatial resolution of 0.5 • x 0.625 • , as presented in Table 1.

Type
Name Long Name Units

Spatial and Temporal Collocation
MERRA-2 and ground-based PM 2.5 data were georeferenced to geographic latitude/longitude (WGS84) coordinates. Both datasets were collocated in space and time as follows: Temporal collocation was performed by monthly averaging the ground-based observations per monitoring site. Daily averages per site (including daily averaged PM 2.5 concentrations and meteorological data) were calculated according to Mexican standard NOM-025-SSA1-2014 [59], which requires a threshold of 75% data capture (i.e., 18 hourly records) in order to consider that data are valid and representative. Missing values were excluded from the analysis. When daily averages were available in at least four sites, a local daily averaged dataset (PM 2.5 and meteorological data) was estimated. These local-daily averages were then averaged in a monthly basis to obtain a local monthly averaged dataset.
For spatial collocation, a new local grid covering the MMA was created by using the centroids of the native MERRA-2 grid as vertexes (Figure 2), which increased data representation over the MMA. A unique value for each variable in each grid cell was calculated by averaging data values at each cell centroids of the native MERRA-2 grid. Then, the local ground-based monthly averages were assigned to the grid cell contained within the lower and upper bounds at 25.5 • Lat, −100 • Lon and 26 • Lat, −100.125 • Lon, respectively. For the regional application, a domain contained within the lower and upper bounds at 23 • Lat, −101.25 • Lon and 28 • Lat, −98.125 • Lon, respectively, with grid cells at 0.5 • × 0.625 • latitude by longitude spatial resolution, the same as the native MERRA-2 resolution, was defined.

Evaluation of MERRA-2's Ability to Represent Ground Data and Model Selection
AOD was found to have a strong positive relationship with PM 2.5 concentrations observed on the surface [60,61], but there is also evidence that the use of AOD as the only predictor of PM 2.5 is subject to large uncertainties [62]. Several variables, such as temperature, relative humidity, wind speed, wind direction, aerosol type, and height of the boundary layer (PBL), are the most commonly used as model covariates of AOD in estimating PM 2.5 [60]. In this study, MERRA-2 AOD data were compared to ground-based PM 2.5 measurements to determine its suitability as a proxy for PM 2.5 in the MMA. The ability of MERRA-2 products to represent meteorological variables that are routinely measured in the MMA was evaluated, and MERRA-2 PM 2.5 components were included to represent the aerosol type. Nitrates, which can be important components of the total PM 2.5 mass, are not reported by MERRA-2 aerosol products. However, because of AOD assimilation, MERRA-2 total AOD is not affected by the lack of nitrates.
Local monthly averaged PM 2.5 concentrations were compared to MERRA-2 total AOD using simple linear regression. Also, seasonal variations of PM 2.5 concentrations were analyzed using time series. One-way analysis of variance (ANOVA) was used to test differences for temperature and relative humidity between MERRA-2 meteorological SDSs and local monthly averaged meteorology; simple linear regressions were performed to test correlation, too. Wind-roses analysis was performed to evaluate the models' ability to replicate the predominant wind pattern in the MMA. Because PM 2.5 speciation measurements are not available on a routine and continuous basis in the MMA, MERRA-2 PM 2.5 components were compared to previous PM 2.5 speciation studies carried out in the area during specific monitoring campaigns. The variables with no statistical differences in the ANOVA test, with a significant correlation in the linear regression, or with similar temporal or seasonal behavior with respect to ground-based data were retained as predictors. In total, eight predictor variables were selected and PM 2.5 concentration was estimated as a function of AOD, temperature (T), relative humidity (RH), dust PM 2.5 (Dust), sea salt PM 2.5 (SS), black carbon (BC), organic carbon (OC), and sulfate (SO 4 2− ) concentrations (Equation (1)). (1)

Model Development
Several thousand multiple back-propagation MLP type ANN models were developed, trained, and validated using identical topology and neuron functions. Figure 3 depicts the ANN structure applied in this study for every individual NN, in which eight input factors were used, and one hidden layer led to one outcome. A sigmoid function was employed as the transfer function in the hidden layer, and the Levenberg-Marquardt (LM) algorithm [63] was employed in the training. Independent validation to evaluate the predictive performance of the trained NN was carried out using cross-validation [64] along the hold-out method [65]. Hold-out validation is the simplest form of cross-validation. Contrary to other methods, the split procedure is performed by separating the entire dataset, only once, into two independent groups, one for model development/calibration (training dataset) and the other for validation [66]. Then, the ensemble modeling approach was applied as a hybrid system to reduce the variance in predictions and to reduce the generalization error [67] produced by the hold-out method [65].
Remote Sens. 2020, 12, x FOR PEER REVIEW 9 of 25 (outputs) from the best ten trained and cross-validated NNs were averaged to build an ensemble NN to estimate the local monthly averaged PM2.5 ground concentrations over the MMA. R, RMSE and MAE were used to evaluate the ENN performance, these metrics were computed for training and validation datasets in each NN, allowing consistent comparisons among them. The NNs with the best performance (highest R, and lowest RMSE and MAE) in the validation subset were selected to build an ENN in which the individual predictions of the constituent NNs were combined using a simple averaging technique. To avoid overfitting, an NN architecture was designed using the "simpler-structure principle" [68]. The early stopping method was then applied as the regularization technique [69]. According to the "simpler-structure principle," the optimal structure of a NN should be fairly simple, having several nodes in one hidden layer. In this study, the training was begun with the simplest structure (only one node in the hidden layer) and the local optimum was sought on the validation error curve. The procedure was performed again using a new node, and the resulting value was compared to the previous minimum value. This process was repeated until a structure having fewer nodes in the hidden layer was obtained [70]. The stopping criteria were determined by a predefined threshold number of training iterations, the cross-validation approach [64], and the reduction of root mean squared error (RMSE) in a validation data set [71].
Once the ENN were validated over the MMA, the NN models were used to estimate the PM2.5 over the NEM. The regional distribution of PM2.5 was estimated using only MERRA-2 data as input The NNs were developed using a database containing local monthly averaged PM 2.5 concentrations and all MERRA-2 predictor variables included in Equation (1) for a period of 58 months, which corresponded to SIMA's data availability. The entire database was randomly divided into training and validation datasets: 70% of data was used to train the neural networks, while the remaining set (30% of data) was arranged to test the models' performance. The PM 2.5 predictions (outputs) from the best ten trained and cross-validated NNs were averaged to build an ensemble NN to estimate the local monthly averaged PM 2.5 ground concentrations over the MMA. R, RMSE and MAE were used to evaluate the ENN performance, these metrics were computed for training and validation datasets in each NN, allowing consistent comparisons among them. The NNs with the best performance (highest R, and lowest RMSE and MAE) in the validation subset were selected to build an ENN in which the individual predictions of the constituent NNs were combined using a simple averaging technique.
To avoid overfitting, an NN architecture was designed using the "simpler-structure principle" [68]. The early stopping method was then applied as the regularization technique [69]. According to the "simpler-structure principle," the optimal structure of a NN should be fairly simple, having several nodes in one hidden layer. In this study, the training was begun with the simplest structure (only one node in the hidden layer) and the local optimum was sought on the validation error curve. The procedure was performed again using a new node, and the resulting value was compared to the previous minimum value. This process was repeated until a structure having fewer nodes in the hidden layer was obtained [70]. The stopping criteria were determined by a predefined threshold number of training iterations, the cross-validation approach [64], and the reduction of root mean squared error (RMSE) in a validation data set [71].
Once the ENN were validated over the MMA, the NN models were used to estimate the PM 2.5 over the NEM. The regional distribution of PM 2.5 was estimated using only MERRA-2 data as input into the NN. Monthly averaged PM 2.5 maps were generated for the regional gridded domain.

MERRA-2 and Ground-Based Data: Comparisons
In this study, a simple linear correlation between the local monthly averaged PM 2.5 concentrations and AOD data over the MMA was not statistically significant (p-value < 0.01). However, the PM 2.5 ground-based concentrations exhibited a marked seasonal pattern throughout the entire study period  [72]. The AOD and ground-based PM2.5 concentrations showed distinct seasonal differences: the ground PM2.5 values were higher in winter (≥ 30 μg m −3 ), while AOD was the highest in spring (≥ 0.20). These differences were due to the complex relationship between PM2.5 and AOD, which is influenced by aerosol type, chemical composition particle size, vertical concentration profile, and so on [73][74][75][76][77][78]. Smaller differences between PM2.5 and AOD occurred during the springtime. In spring, the lower relative humidity-compared to other seasons and relatively strong atmospheric dispersion-may have enhanced the correlation between AOD and PM2.5 [79].  The results of the ANOVAs revealed statistically significant differences (p-value < 0.01) when comparing the temperature and relative humidity SDS to monthly averaged ground-based data. The simple linear regression showed the ability of meteorological SDSs to represent the monthly average ground-based temperature (R 2 = 0.92) and relative humidity (R 2 = 0.60) data over the MMA (Figure  5a,b). The wind components from MERRA-2 were not able to represent the predominant wind The AOD and ground-based PM 2.5 concentrations showed distinct seasonal differences: the ground PM 2.5 values were higher in winter (≥30 µg m −3 ), while AOD was the highest in spring (≥0.20). These differences were due to the complex relationship between PM 2.5 and AOD, which is influenced by aerosol type, chemical composition particle size, vertical concentration profile, and so on [73][74][75][76][77][78]. Smaller differences between PM 2.5 and AOD occurred during the springtime. In spring, the lower relative humidity-compared to other seasons and relatively strong atmospheric dispersion-may have enhanced the correlation between AOD and PM 2.5 [79]. Similar meteorological conditions in the MMA were reported by Carrillo Torres et al. (2017) [80] and Mancilla et al. (2015) [43].
The results of the ANOVAs revealed statistically significant differences (p-value < 0.01) when comparing the temperature and relative humidity SDS to monthly averaged ground-based data. The simple linear regression showed the ability of meteorological SDSs to represent the monthly average ground-based temperature (R 2 = 0.92) and relative humidity (R 2 = 0.60) data over the MMA (Figure 5a,b). The wind components from MERRA-2 were not able to represent the predominant wind pattern (i.e., east-to-west direction) and the wind speeds observed in the MMA from 2010-2014, as shown in Figure 5c,d. Thus, the wind components from MERRA-2 were not incorporated into the NN model. According to the data extracted from MERRA-2, during most of the study period, the main components of PM2.5 were sulfates (33%) and OC (29%) (Figure 6  According to the data extracted from MERRA-2, during most of the study period, the main components of PM 2.5 were sulfates (33%) and OC (29%) (Figure 6 [48] revealed that anthropogenic sources were typically associated with high contributions of SO 4 2− , OM, and BC. Secondary inorganic aerosols in the form of ammonium nitrates that were not reported by MERRA-2 were found to be important compounds in the MMA [81]. these months, photochemical activity and transport were predominant over the MMA, promoting the aging of primary emissions into secondary organic aerosol (SOA), which constituted between 59 and 87% of total OC and 32-45% of PM2.5 in the area [43]. Moreover, despite the dissimilarity in spatial resolution, the MERRA-2 BC concentrations were consistently 7.5 times lower than the ground-based BC concentrations from January to April and from August to November. The exception was in the summer months when the MERRA-2 BC concentrations were approximately 13 times lower than those reported by Peralta et al. (2019) [85]. The highest monthly average dust concentrations (1.94-3.62 μg m -3 ) occurred in summer, while the highest monthly sea salt (SS) means were observed in spring (1.96 μg m -3 ), followed by summer (1.96 μg m -3 ). Because the MMA is not a local source of SS, it and dust are wind-dependent components. Therefore, the SS concentrations over the MMA can be attributed mainly to long-range transport. During spring and summer, wind speeds reach maximum values above 10 km h -1 , blowing primarily from the east (E) in the Gulf of Mexico region to the west (W) [86].   [82] suggested that lower solar radiation and lower relative humidity characteristics of winter might lead to the lower photochemical oxidation of SO 2 . This disparity might also be explained by the prevailing meteorological conditions in summer when stronger wind flows from the coastal zone (SE ≥ 80% and WS ≥ 10 km h −1 ) influence the atmospheric stability-mixing processes, and hence the concentration of air pollutants [83]. Based on the results of the present study, the further understanding of these dilution effects or "urban fluid mechanics" [84] is recommended, which would clarify the effects of SO 2 oxidation rates on the residence time of SO 2 and help determine whether dry deposition was relevant in the chemical conversion to SO 4 2− .
The highest MERRA-2 monthly average OC (>4 µg m −3 ) and BC (>0.4 µg m −3 ) concentration occurred from April-June, which was in accordance with findings by Peralta et al. (2019) [85]. During these months, photochemical activity and transport were predominant over the MMA, promoting the aging of primary emissions into secondary organic aerosol (SOA), which constituted between 59 and 87% of total OC and 32-45% of PM 2.5 in the area [43]. Moreover, despite the dissimilarity in spatial resolution, the MERRA-2 BC concentrations were consistently 7.5 times lower than the ground-based BC concentrations from January to April and from August to November. The exception was in the summer months when the MERRA-2 BC concentrations were approximately 13 times lower than those reported by Peralta et al. (2019) [85].
The highest monthly average dust concentrations (1.94-3.62 µg m −3 ) occurred in summer, while the highest monthly sea salt (SS) means were observed in spring (1.96 µg m −3 ), followed by summer (1.96 µg m −3 ). Because the MMA is not a local source of SS, it and dust are wind-dependent components. Therefore, the SS concentrations over the MMA can be attributed mainly to long-range transport. During spring and summer, wind speeds reach maximum values above 10 km h −1 , blowing primarily from the east (E) in the Gulf of Mexico region to the west (W) [86]. Table 2 shows the results of the comparison of the PM 2.5 component concentrations (in µg m −3 ) obtained from MERRA-2 with those reported in previous ground-based studies conducted in the MMA. It must be noted that although the timeline was the same, the temporal and spatial resolutions differed among the data sets. Typically, local monitoring campaigns are conducted to study air quality in specific locations and periods. Here, the components determined in local studies are reported as daily mean concentrations. Conversely, the MERRA-2 aerosol data used in this study were extracted and monthly averaged to represent the air quality over a grid box (Figure 2).  Despite the differences in temporal and spatial resolution, some similarities were observed. SO 4 2− derived by MERRA-2 was consistently~2 times lower than the SO 4 2− reported by local studies, which might suggest not only the contribution of local sources, but also that SO 4 2− could be spatially and temporally homogeneous over the MMA. Additionally, the OC/BC ratios of PM 2.5 derived by MERRA-2 ranged from 3.94 to 6.65, whereas those reported by previous ground-based local studies ranged from 2.38 to 5.40, indicating that MERRA-2 aerosol products are able to produce temporal OC/BC variations similar to those observed in the MMA; that is, seasonal variations in OC/BC could be captured by MERRA-2. OC/BC ratios exceeding 2 may reflect the combined contributions of charcoal combustion, motor-vehicle exhausts, and biomass burning sources, which are common in SOA formation [87], although they also indicate contributions from other primary sources with high OC/BC emission ratios [88].

Model Performance
The model performance of the ENN is presented in Figure 7. The ENN estimated a monthly PM 2.5 mean of 25.62 µg m −3 with the following metrics: R~0.90, RMSE = 1.81 µg m −3 and MAE = 1.31 µg m −3 . The measure of bias showed seasonal dependence, which was reported by other studies that used MERRA-2 aerosol data in other regions of the world (Table 3).  Table 3. Previous studies that used MERRA-2 aerosol data to estimate PM2.5 concentrations and compared them with monthly averaged ground-based concentrations or analyzed seasonal variations.

Study
Region and period* Model or Method* Seasonal performance     [91]. * To be compared in this study.
Ten trained and cross-validated NNs were used to construct the ensemble NN to calculate the monthly PM 2.5 . Figure 8 shows the statistical performance (model fitting) of each NN. In the training fittings, the R values ranged between 0.85 and 0.89, whereas in the cross-validation, the R values ranged between 0.80 and 0.88. The differences between the R values of the training fittings and the R values of cross-validation ranged between −0.02 and 0.05. The differences in the RMSE ranged from −0.60 to 0.16 µg m −3 , and the differences in the MAE ranged from -0.39 to 0.13 µg m −3 . These results suggested that the model was not over-fitted.
Remote Sens. 2020, 12, x FOR PEER REVIEW 15 of 25 Ten trained and cross-validated NNs were used to construct the ensemble NN to calculate the monthly PM2.5. Figure 8 shows the statistical performance (model fitting) of each NN. In the training fittings, the R values ranged between 0.85 and 0.89, whereas in the cross-validation, the R values ranged between 0.80 and 0.88. The differences between the R values of the training fittings and the R values of cross-validation ranged between −0.02 and 0.05. The differences in the RMSE ranged from −0.60 to 0.16 μg m −3 , and the differences in the MAE ranged from -0.39 to 0.13 μg m −3 . These results suggested that the model was not over-fitted.

Regional PM2.5 Distribution
Monthly averaged PM 2.5 maps of the entire domain were generated to analyze the regional distribution of PM 2.5 (Figure 9). The heterogeneity of PM 2.5 concentrations in both space and time was observed. Overall, the PM 2.5 distributions on the annual and seasonal time-scales exhibited a consistent spatial pattern, and the highest PM 2.5 levels were located in the MMA, which was the main source of air pollution within the domain. These results indicated that PM 2.5 was not evenly Remote Sens. 2020, 12, 2286 15 of 24 distributed around the MMA, perhaps because the mixed mountainous-and-valley topography acts as a physical barrier to natural wind circulation and complicates the dispersion of local emissions [82]. However, further research is needed to understand the relationship between air pollutants and urban morphology in the MMA.  The regional distribution of PM 2.5 also revealed seasonal variations. It was observed that in general, PM 2.5 levels were higher in winter and spring. Moreover, the month of May usually shows the largest concentrations (particularly the month of May 2011 and the month of May 2013) in the entire region, mainly caused by PM 2.5 emissions from biomass burning. In winter, concentrations increase due to the rise in emissions from burning fossil fuels, whether for heating [92], transportation, or cooking. However, meteorological conditions, especially those related to high pressure systems, thermal inversions, and low precipitation, also play an important role by inhibiting the dispersion and removal of pollutants [93]. Overall, the concentrations decrease from spring to summer and then increase from fall to winter. It could be concluded that regional air quality is worse in winter and slightly better in summer.

Discussion
Comparison of monthly PM 2.5 means (µg m −3 ) estimated by the ENN developed in this study with the monthly PM 2.5 means from MERRA-2 derived data and monthly PM 2.5 means recorded by SIMA are shown in Figure 10. PM 2.5 from MERRA-2 was calculated using Equation (2) following Buchard et al. (2016) [94], who multiplied the sulfate concentration by a factor of 1.375 because SO 4 is assumed to be primarily present in the form of neutralized ammonium sulfate ((NH 4 ) 2 SO 4 ). OC is multiplied by a factor of 1.4 to estimate particulate organic matter (POM) [95]. This factor varies both spatially and temporally with values between 1.2 and 2.6 [96].  PM 2.5 concentrations were estimated using Equation (2), which included only PM 2.5 components, were on average three times lower than those registered by SIMA. The differences between the observations and estimations were smaller in the spring and summer (by an average of 2.1 and 2.4 times, respectively) than in winter and fall (by an average of 4 and 3 times, respectively). Buchard et al. (2016) [94] evaluated MERRAero data (i.e., a version of aerosol products previously available before the release of MERRA-2 aerosol data) in the United States and found that the use of Equation (2) yielded closer-to-observation PM 2.5 estimations during the summer, while larger discrepancies were observed during the winter. In this research, the PM 2.5 estimations obtained by Equation (2) represented the peak of high concentrations observed in spring 2011, coincident with one of the worst years of drought in the northeastern region, according to the North American Drought Monitor of the NOAA. In comparing MERRA-2 reconstructed PM 2.5 with ground observations one has to acknowledge that the former represents the average over a large volume, while the latter is representative of a spatial coverage of a few kilometers. Nonetheless it is valuable to compare these values as similarities and biases can be identified.
The MMA and the eastern part of the regional domain -particularly, the northeaster grid cells over Texas-showed high PM 2. These correlation coefficients were found to be larger than expected, considering that PM 2.5 values reported at the monitoring sites were on-point measurements, whereas the estimations represented the average of a grid cell. The RMSE in the blind sites might indicate the potential issue of overestimation because the ENN model was trained and calibrated using only ground measurements from the MMA. The low R and high RMSE values obtained using Equation (2) over the MMA indicated that using only PM 2.5 components from MERRA-2 in a simple model is not enough to represent PM 2.5 concentrations over the area.
The PM 2.5 concentrations estimated by the ENN developed in this study showed the same temporal behavior as SIMA ground observations. These findings reinforce the usefulness of the methodology developed here. While the ENN was able to capture PM 2.5 concentrations observed at the Laredo and McAllen sites with moderate accuracy, research is needed to improve the performance of the NNs to estimate the PM 2.5 distribution in the entire region. Remote Sens. 2020, 12, x FOR PEER REVIEW 19 of 25

Conclusions
The ability of MERRA-2 products to represent ground-based PM2.5 measurements in the MMA was evaluated. Seasonal differences between MERRA2-AOD and ground-based PM2.5 concentrations were observed. MERRA-2 meteorological products also yielded relatively well-represented monthly averaged ground-based temperatures and relative humidity but the wind components from MERRA-2 were not able to represent the predominant wind pattern (east-to-west direction) or the wind speeds observed in the MMA. Despite the dissimilarity in spatial and temporal resolutions, temporal patterns from MERRA-2 OC, BC, dust, and SS were consistent with the ground-based trends observed in previous studies carried out in the MMA. However, some variations in sulfate trends were found in summer compared with the rest of the year (MERRA-2 reported its lowest values in summer in contrast to ground-based studies that reported large sulfate concentrations in summer). Further research is recommended to understand the dilution effect and the relationship between air pollutants and urban morphology in the MMA.
The training fittings and cross-validation differences indicated that the model was robust and not over-fitted. The measure of bias showed seasonal dependence (the best performance was in spring; the worst was in winter) as in other regions of the world. Over the entire domain, high concentration levels were observed in spring, which was probably due to biomass burning; however, further research is recommended. Moderate to low concentrations emerged in fall and summer, respectively. The ENN was also able to capture the PM2.5 concentration trends at the Laredo and McAllen sites in Texas. However, because the ENN model was trained and calibrated using only ground measurements from the MMA, the inclusion of all available ground data (e.g., Texan sites) is recommended to avoid overestimation and to enhance the calibration process and enable a better description of regional atmospheric dynamics.

Conclusions
The ability of MERRA-2 products to represent ground-based PM 2.5 measurements in the MMA was evaluated. Seasonal differences between MERRA2-AOD and ground-based PM 2.5 concentrations were observed. MERRA-2 meteorological products also yielded relatively well-represented monthly averaged ground-based temperatures and relative humidity but the wind components from MERRA-2 were not able to represent the predominant wind pattern (east-to-west direction) or the wind speeds observed in the MMA. Despite the dissimilarity in spatial and temporal resolutions, temporal patterns from MERRA-2 OC, BC, dust, and SS were consistent with the ground-based trends observed in previous studies carried out in the MMA. However, some variations in sulfate trends were found in summer compared with the rest of the year (MERRA-2 reported its lowest values in summer in contrast to ground-based studies that reported large sulfate concentrations in summer). Further research is recommended to understand the dilution effect and the relationship between air pollutants and urban morphology in the MMA.
The training fittings and cross-validation differences indicated that the model was robust and not over-fitted. The measure of bias showed seasonal dependence (the best performance was in spring; the worst was in winter) as in other regions of the world. Over the entire domain, high concentration levels were observed in spring, which was probably due to biomass burning; however, further research is recommended. Moderate to low concentrations emerged in fall and summer, respectively. The ENN was also able to capture the PM 2.5 concentration trends at the Laredo and McAllen sites in Texas. However, because the ENN model was trained and calibrated using only ground measurements from the MMA, the inclusion of all available ground data (e.g., Texan sites) is recommended to avoid overestimation and to enhance the calibration process and enable a better description of regional atmospheric dynamics.
The global coverage provided by the reanalysis datasets used in the methodology presented here could be easily replicated and used in the assessment of local and regional air quality applications, especially in regions where none or few PM 2.5 ground-based stations are located. This application could provide a better understanding of the regional PM 2.5 distribution and help improve current control strategies aimed at reducing PM 2.5 pollution in NEM. It is suggested that in another stage of the research, new monitoring campaigns be carried out over the MMA and other sites within the NEM to further validate and compare the results with satellite data (i.e., MODIS and MERRA-2) and to segregate regional databases by season to improve the analysis of seasonal dynamics at the regional scale.