Estimating the Daily NO 2 Concentration with High Spatial Resolution in the Beijing–Tianjin–Hebei Region Using an Ensemble Learning Model

: Nitrogen dioxide (NO 2 ) is an important pollutant related to human activities, which has short-term and long-term effects on human health. An ensemble learning model was constructed and applied to estimate daily NO 2 concentrations in the Beijing–Tianjin–Hebei region between 2010 and 2016. A variety of predictive variables included satellite-based troposphere NO 2 vertical column concentration, meteorology, elevation, gross domestic product (GDP), population, land-use variables, and road network. The ensemble learning model achieved two things: a 0.01 ◦ × 0.01 ◦ grid resolution and the estimation of historical data for the years 2010–2013. The ensemble model showed good performance, whereby the R 2 of tenfold cross-validation was 0.72 and the R 2 of test validation was 0.71. Meteorological hysteretic effects were incorporated into the model, where the one-day lagged boundary layer height contributed the most. The annual NO 2 estimation showed little change from 2010 to 2016. The seasonal NO 2 estimation from highest to lowest occurred in winter, autumn, spring, and summer. In the annual maps and seasonal maps, the NO 2 estimations in the northwest region were lower than those in the southeast region, and there was a heavily polluted band in the south of the Taihang Mountains. In coastal areas, the annual NO 2 estimations were higher than the NO 2 monitored values. The drawback of the model is underestimation at high values and overestimation at low values. This study indicates that the ensemble learning model has excellent performance in the simulation of NO 2 with high spatial and temporal resolution. Furthermore, the research framework in this study can be a generally applied for drawing implications for other regions, especially for other cities in China.


Introduction
As an important air pollutant closely related to human activities, nitrogen dioxide (NO 2 ) mainly derives from agricultural burning, traffic emissions, and industrial emissions [1][2][3]. Environmental epidemiological studies have shown that long-term and shortterm NO 2 exposure can lead to the exacerbation of respiratory and cardio-cerebrovascular diseases, as well as lung function damage, resulting in increased mortality and hospitalization rates [4][5][6][7]. Population exposure levels are usually obtained directly from NO 2 monitoring sites, but monitoring sites are sparse and unevenly distributed, and they cannot effectively characterize NO 2 concentration differences at fine scales. Meanwhile, the lack of historical monitoring data will limit the study of long-term health effects. Therefore, it is necessary to estimate long-term, full-coverage, and high-spatiotemporal-resolution NO 2 concentrations to provide basic data for environmental epidemiological studies on NO 2 exposure. The Beijing-Tianjin-Hebei region is the most serious NO 2 pollution region in China, with high population density and high NO 2 emissions [8]. Since the population has been exposed to high NO 2 levels for a long time, there is a growing need for accurate NO 2 estimation in the Beijing-Tianjin-Hebei region.
Various statistical methods have been used to construct models to calculate exposure to NO 2 concentrations, evolving from linear regression models to nonlinear machine learning models. The essence of linear regression models is that the predictor variables and the dependent variable are linearly related, including land-use regression models [9][10][11][12], linear mixed-effects models [13], and geographically weighted regression models [14,15]. Land-use regression models focus on the spatial variation of pollutants, and they have been applied at spatial scales from global to urban [16]. Land-use models usually have high spatial resolution and low temporal resolution, mostly presented in terms of months, seasons, and years. Linear mixed-effects models focus on the effects of influencing factors on pollutant concentrations in time and tend to have a high temporal resolution of mostly daily and low spatial resolution, mostly 10 km × 10 km. Geographically weighted regression models are similar to linear mixed-effects models. Both land use models and linear mixed-effects models need to reach trade-offs in temporal and spatial resolution, whereby it is difficult to achieve both. In high-spatiotemporal-resolution air pollutant simulation, linear models cannot adequately express the relationship between the predicted variables and the monitored values of pollutant. In recent years, machine learning models have been applied and performed well. The estimation of air pollutants is already available at a temporal resolution of days and spatial resolution of 1 km × 1 km [13,17]. Machine learning models consider that there is a nonlinear relationship between predictor variables and dependent variables, including random forest models [18][19][20], gradient boosting tree models [17,21], and neural network models [17]. Compared with neural network models, random forest models are simple in structure, require less computational resources, and have stronger interpretation abilities. Random forest models provide the importance of variables to select the model predictor variables. The random forest model is also less prone to overfitting and less sensitive to outliers than the gradient boosting tree model. Narrow land-use models only consider land-use types as predictor variables [22]. Broad land-use models also consider meteorological conditions, population density, elevation, road type and length, and coastal distance as predictor variables [23,24]. Some land-use models use satellite data as an important predictor variable [25][26][27]. Since its release in 2004, the tropospheric NO 2 vertical column concentration inversion data provided by the Ozone Monitoring Instrument (OMI) have been widely used due to their high spatiotemporal resolution (13 × 24 km, daily), which enhances the seasonal and monthly performance of land-use models. Predictive variables of typical machine learning models in daily high-spatial-resolution air pollutant simulation (fine particulate matter (PM 2.5 ), O 3 , NO 2 , SO 2 ) studies usually include satellite data, meteorology, population density, elevation, road type and length, and land-use type variables [18,[28][29][30][31][32][33][34]. Currently, there are still relatively few studies that built daily NO 2 models for high spatial simulations of 0.01 • × 0.01 • , and the maximum spatial resolution for daily NO 2 estimations is 0.1 • × 0.1 • in the Beijing-Tianjin-Hebei region [18].
The lack of historical data becomes a bottleneck problem in the long-term NO 2 simulation studies. The NO 2 monitoring network covering the whole country was only completed in China at the end of 2012. Long-term NO 2 exposure levels are usually estimated with NO 2 measurements over several years. In PM 2.5 simulations, Chen et al. used monitoring values between 2013 and 2016 to estimate the lacking historical data from 2010-2012 [34]. Building statistical models with NO 2 monitoring data over several years is a general way of bringing down annual variability and guaranteeing the accuracy of the estimation.
By constructing an ensemble learning model for 2010-2016 in the Beijing-Tianjin-Hebei region, this study aimed to (1) estimate the daily NO 2 concentrations with a high spatial resolution of 0.01 • × 0.01 • , (2) incorporate satellite data, meteorological hysteretic effects, and the influence of human activities into model, (3) simulate the historical NO 2 measurements from 2010 to 2012, and (4) describe the NO 2 maps for 2010-2016. This study fills a gap by providing NO 2 estimation at high spatial and temporal resolution in the Beijing-Tianjin-Hebei region, and it provides the necessary data for an environmental epidemiological study of the short-term and long-term health effects of NO 2 exposure and air quality management on coordinated inter-regional prevention and control.

Study Area
The study area (113.45 • -119.85 • east (E), 36.03 • -42.62 • north (N)) was the Beijing-Tianjin-Hebei region in eastern China, located in the northern part of the North China Plain, with the Yanshan Mountains to the north, the Taihang Mountains to the west, and the Bohai Bay to the east. The topography is complex and diverse, with high altitudes in the northwest and low altitudes in the southeast. The Beijing-Tianjin-Hebei region includes Beijing, Tianjin, and 11 prefecture-level cities in Hebei Province: Hengshui, Xingtai, Handan, Qinhuangdao, Tangshan, Langfang, Baoding, Shijiazhuang, Cangzhou, Zhangjiakou, and Chengde. (Figure 1) With a high population (1.13 × 10 8 people) and large area (2.18 × 10 5 km 2 ), the Beijing-Tianjin-Hebei study area has a developed transportation network and advanced economy; therefore, the NO 2 levels are more than anywhere else across China. To eliminate the boundary effect, the study regional boundary was extended by 0.

Ground-Based NO 2 Measurements
The ground-based daily NO 2 measurements were collected from the China Environmental Monitoring Center (http://www.cnemc.cn/(accessed on 14 February 2021)) from 2010-2016. The study area contained a total of 156 state-controlled monitoring stations. Chemiluminescence was applied to measure the ground NO 2 concentration for the monitoring sites. The mean values of the NO 2 measurements for 2013-2016 at all sites are shown in Figure 1. The annual means and the descriptive statistical analysis of the NO 2 monitoring values between 2013 and 2016 at all sites are shown in Figure S2 and Table S1 (Supplementary Materials).

Satellite Data
The OMI (Ozone Monitoring Instrument) on board the Aura satellite is a trace gas detector that can be used to accurately measure NO 2 . OMI provides daily global data at a local time of around 1:40 p.m. (https://search.earthdata.nasa.gov/ (accessed on 14 February 2021)). In this study, OMI tropospheric NO 2 vertical column concentration level-3 global gridded data (OMI-NO 2 ) from 1 January 2010 to 31 December 2016 were used [35]. The level-3 products are gridded OMI level-2 data averaged over a normalized grid (0.25 • × 0.25 • ), mainly after the quality screening criteria of topographic reflectivity <30%, 0 ≤ solar zenith angle ≤ 85 • , and a few row anomalies. OMI-NO 2 data have missing values mainly due to the influence of clouds or high reflections from the ground. The overall absence rate was 34.46%, with the highest rate in winter (39.44%) and the lowest rate in summer (29.86%). Decision tree models were built to fill the missing values of OMI-NO 2 and the performance of decision tree models is shown in Table S2 (Supplementary Materials).

Meteorological Data
The meteorological data for 2010-2016 were obtained from the ERA-interim (European Centre for Medium-Range Weather Forecasts Re-Analysis-Interim) reanalysis data of the mesoscale weather forecasts from the European Center (http://apps.ecmwf.int/datasets/ data/interim-full-daily/levtype=sfc/ (accessed on 14 February 2021)). The spatial and temporal resolution of the daily near-surface meteorological data is 0.125 • × 0.125 • and 6/12 h. A total of 15 meteorological parameters are available, namely, atmospheric boundary layer height, low cloud cover, surface pressure, 2 m temperature, medium cloud cover, 2 m dew-point temperature, 10 m u wind, 10 m v wind, high cloud cover, downward surface solar radiation, surface evaporation, total precipitation, albedo, snowfall, and snow depth. The meteorological hysteresis effects were incorporated into model construction, whereas the 1 day and the 2 day lags of the meteorological data were also included in the model as important predictor variables.

Elevation Data, Population Data, and Gross Domestic Product (GDP) Data
The elevation, national population, and GDP data were collected from the Environmental Resource Science Data Center, the Chinese Academy of Sciences (RESDC) (http://www.resdc.cn (accessed on 14 February 2021)), in 2010, with a spatial resolution of 1 km × 1 km.

Land-Use Data
Land-use data were obtained from the European Space Agency's Climate Change Initiative website for 2010-2016 (http://maps.elie.ucl.ac.be/CCI/viewer/ (accessed on 14 February 2021)). The spatial resolution of annual land-use data is 300 m. Data were gained from the 300 m Medium-Resolution Imaging Spectroradiometer (MERIS), reduced resolution products (RR), and full resolution products (FR), as well as the French Center for Space Research's Earth Observation System, sensors AVHRR (Advanced Very High Resolution Radiometer) and PROBA-V (Airborne Autonomous Vegetation Program) mounted on NOAA's (National Oceanic and Atmospheric Administration) meteorological satellites.
The extracted land-use data can be divided into three types, namely, farmland coverage, natural vegetation coverage, and urban coverage.

Road Network Data
All road network data were obtained from the Resource and Environmental Science Data Center, the Chinese Academy of Sciences (RESDC) (http://www.resdc.cn (accessed on 14 February 2021)), which utilized the latest navigation map information. The road network consisted of national roads, provincial roads, county roads, township roads, highways, railroads, other vehicular roads, and footpaths. This study used data on country roads, county roads, provincial highways, national highways, railways, highways, and road density as road variables.
The classification, spatial resolution, temporal resolution, study time, and source of the dataset are shown in Table S3 (Supplementary Materials).

Data Processing
The whole study area extended by 0.5 • was divided into a standardized grid with a spatial resolution of 0.01 • × 0.01 • , for a total of 560,180 grid cells. The datasets were resampled to the standardized grids by using the bilinear interpolation method. When calculating land-use coverage, a 0.03 • × 0.03 • grid was used as a buffer for the specified grid to calculate the land-use coverage of a certain type, and the proportion of that type of data was counted. The vector road data length was sliced through the standardized grid, and the final data were extracted to the standardized grid. In light of the latitude values and longitude values corresponding to each point, the data with ground-based NO 2 measurements were extracted as the dataset required for model establishment. Data standardization was adopted in the data preprocessing to make data dimensionless that could be expressed as follows: where X*, X, µ, and σ are the standardized value, the raw value, the mean value, and the standard deviation of the variables on the grid cell. ArcGIS and Python were used for data preprocessing and graphical plotting.

Modeling
Random forest, as a representative algorithm of the bagging method in the ensemble learning approach, is composed of several different and mutually independent decision trees [36]. The principle of ensemble learning methods is to combine weak evaluators into strong evaluators; therefore, ensemble learning methods perform better than a weak evaluator. For example, random forest usually performs better than a single decision tree. A decision tree is a forked structure where each internal node represents an attribute judgment, each branch represents the output of a judgment result, and each leaf node represents a classification result. Random forest generates a new set of training samples by repeatedly sampling k samples at random from the original training sample set N in a putback manner through the self-service method resampling technique, and then generates k regression trees as a function of the self-service sample set. The results of the prediction data are determined by the average of different regression trees. The ground-level NO 2 concentrations were simulated by building an ensemble learning model using the random forest method with the following expressions: where NO 2ij is the ground NO 2 concentration on grid cell j and day i, OMI-NO 2 , MET, POP, ELE, ROD, and LU are the OMI-NO 2 , meteorological variables, population, GDP, elevation, road length and density, and the coverage rate of land-use types on grid cell j, and MET_lag1 and MET_lag2 are the 1 day lag and 2 day lag of the meteorological variables. The maximum depth (the depth of the tree) and n_ estimators (the number of trees) were two important parameters for model adjustment. During model adjustments, the model had high prediction accuracy when the maximum depth and n_ estimators were set to 50 and 200. The error rate of the model was calculated by predicting the out-of-bag samples. An ensemble learning model was developed to estimate NO 2 concentrations in the whole study area for 2010-2016, including the historical years of 2010-2012 without NO 2 measurements. The scikit-learn package in Python was used to construct the model.

Validation
In this study, the tenfold cross-validation method and test set validation method were used to evaluate the model's performance. The entire dataset was randomly divided into a training set and a test set with a ratio of 9:1. Tenfold cross-validation method involves dividing the data in the training set into 10 randomly equal parts, of which nine parts are used for the training of the model and the remaining part is used for the validation of the model; this process is performed 10 times, and the final validation result of the model is averaged over these 10 times. Test set validation is the validation of the model results when the whole training set is used for the training of the model. The indicators of model evaluation included the R 2 value, mean absolute prediction error (MAE), and root-meansquare prediction error (RMSE). The R 2 value was applied to the tenfold cross-validation (cv-R 2 ) and the test validation (test-R 2 ), whereas the MAE and RMSE were only included in the test set evaluation (test-MAE and test-RMSE).

Descriptive Analysis
A total of 59 predictor variables and 160,371 modeled data were available in the NO 2 model construction. The mean value and standard deviation value of NO 2 concentrations at the monitoring stations were 46.80 µg/m 3 and 26.91 µg/m 3 , respectively, which were significantly higher than 40 µg/m 3 . The annual mean guideline of the World Health Organization (WHO) and China's GB 3095-2012 for NO 2 is no more than 40 µg/m 3 [37,38]; therefore, the study area was in a perennial pollution state between 2013 and 2016. The mean value of urban coverage was 81.3%, which meant that most of the locations of the currently collected monitoring data were concentrated in urban areas. The standard deviation value and mean value of OMI-NO 2 were 23.44 × 10 15 molecules/cm 2 and 8.30 × 10 15 molecules/cm 2 . The descriptive statistical analysis of these predictor variables is shown in Table S4 (Supplementary Materials). The variables such as GDP, population, and elevation are visualized in Figure S1 (Supplementary Materials).

Importance Percentage of Predictor Variables
The importance percentages of all predictor variables for the NO 2 model are shown in Table S5 (Supplementary Materials). The summation of all variables was 1. The meteorological lag factor had a very high contribution to the importance of the variables, with the top five being all meteorological variables with 1 day lag and 2 day lag. The 1 day lagged atmospheric boundary layer height was the most important variable, contributing 29.9%. Furthermore, GDP, elevation, OMI-NO 2 , and population density contributed 1.6%, 1.6%, 1.5%, and 1.3%, respectively. However, the contributions of the various types of road data and land-use variables were less than 1%.

Spatial Performance
The performance of the model in different provinces (Hebei), municipalities (Beijing and Tianjin), and prefecture-level cities of Hebei is statistically shown in Table 1. According to the evaluation index test-R 2 , the model had the highest performance in Beijing (0.82), followed by Tianjin (0.77) and finally Hebei (0.72). According to the evaluation indices test-RMSE and test-MAE, the model had the lowest performance in Beijing (11.65 µg/m 3 and 8.45 µg/m 3 ), followed by Tianjin (11.91 µg/m 3 and 8.76 µg/m 3 ) and finally Hebei (15.14 µg/m 3 and 10.33 µg/m 3 ). Therefore, the overall performance of the model was best in Beijing, followed by Tianjin and finally Hebei. In the validation results of 11 prefecturelevel cities, the model simulation results were the best in Xingtai, Hengshui, Shijiazhuang, and Tangshan, with a test-R 2 higher than 0.7. However, the model performed worst in Handan, Cangzhou, and Qinhuangdao, with a test-R 2 lower than 0.5. The model performed better in Beijing and Tianjin than in Hebei because the distribution density of monitoring stations in Beijing and Tianjin is much higher than that in Hebei. Due to the large number of monitoring stations in Beijing, Tianjin, Tangshan, Baoding, Shijiazhuang, Hengshui, and Xingtai, the model performed well in these places. Qinhuangdao and Cangzhou are far from these cities; thus, the model performed poorly in these two cities.

Spatial and Temporal Distribution
The statistical analysis results based on model simulation are shown in Table S6 (Supplementary Materials). The daily NO 2 estimation achieved the spatial resolution of 0.01 • × 0.01 • . Between 2010 and 2016, the annual mean NO 2 estimation ranged from 46.72 µg/m 3 to 47.63 µg/m 3 in the Beijing-Tianjin-Hebei region. Therefore, the annual mean NO 2 estimation from 2010 to 2016 did not vary much and was generally consistent with the monitored values. Figure 4 shows that the NO 2 estimations of the southeast part of the study region were higher than those in the northwest part. There was a heavily polluted band to the south of the Taihang Mountains, where annual NO 2 estimations were higher than 55µg/m 3 , including Beijing, Shijiazhuang, Xingtai, and Handan. High NO 2 estimations also appeared in the coastal areas of Tianjin and Qinhuangdao. The annual NO 2 estimated maps show similarity in spatial distribution to the annual NO 2 monitored maps ( Figure S2, Supplementary Materials).  Figure 5 shows the seasonal distribution of NO2 estimations from 2010 to 2016. In any season, the NO2 estimations in the northwest region were lower than those in the southeast region, and there was a heavily polluted band to the south of the Taihang Mountains, where NO2 estimations were higher than 55 μg/m 3 . There was a similar spatial distribution between the seasonal NO2 estimations and the annual NO2 estimations. In the southeast region, the heavily polluted bands from large to small occurred in winter, autumn, spring, and summer.   Figure 5 shows the seasonal distribution of NO 2 estimations from 2010 to 2016. In any season, the NO 2 estimations in the northwest region were lower than those in the southeast region, and there was a heavily polluted band to the south of the Taihang Mountains, where NO 2 estimations were higher than 55 µg/m 3 . There was a similar spatial distribution between the seasonal NO 2 estimations and the annual NO 2 estimations. In the southeast region, the heavily polluted bands from large to small occurred in winter, autumn, spring, and summer. The winter mean NO 2

Discussion
An ensemble learning model was constructed and applied to estimate daily NO 2 concentrations in the Beijing-Tianjin-Hebei region. This study achieved a spatial resolution of 0.01 • × 0.01 • (~1 km × 1 km) in the study area. The ensemble learning model achieved pretty good performance considering that cv-R 2 and test-R 2 were 0.72 and 0.71, respectively. Meteorological lag variables made great contributions to the model. Taking NO 2 ground monitoring values from 2013 to 2016 as the dependent variable, a pervasive model was constructed to simulate the historical NO 2 measurements between 2010 and 2012.

Comparsion with other NO 2 Models
This study achieved the highest spatial resolution of 0.01 • × 0.01 • (~1 km × 1 km) in the Beijing-Tianjin-Hebei region, and the model performed better than those presented in the Zhan et al. study [18] and Qin et al. studies [14,19] [19]. The NO 2 simulations at high spatial resolution showed a greater range of variation and higher values of RMSE than those studied at low spatial resolution. Similar to the study by Zhan et al. [18], this study was simulated for historical missing data from 2010 to 2012, which is important for long-term health risk studies of NO 2 exposure. Studies based on longer-scale monitoring data would accommodate more data and build more generalized models than studies using only short-term monitoring data to predict historical data. Compared with other NO 2 simulation studies at the spatial resolution of 0.01 • × 0.01 • , our model performed better than the linear mixed-effects model (R 2 = 0.56-0.64) used by de Hoogh et al. in their study across Switzerland [13], but worse than the random forest model (cv-R 2 = 0.79), gradient boosting tree (cv-R 2 = 0.75), neural network model (cv-R 2 = 0.76), and their superposition (cv-R 2 = 0.79) used by Di et al. in their study in the United States [17]. Among the applications of machine learning models, ensemble learning models were used more than other models, especially the random forest model. According to the generation method of the base learner, ensemble learning can be divided into two categories. One is the bagging method, which exploits the independence of the base learners, of which the outstanding representative is the random forest, and the other is the boosting method, which exploits the relevance of the base learners, of which an excellent representative is xgboost [40]. In this study, these two models were compared, where the xgboost model performed with cv-R 2 = 0.70, test-R 2 = 0.64, and RMSE = 16.13 µg/m 3 . The xgboost model was much less computationally intensive compared to the random forest model, but easily overfitted. Compared to neural network models, random forest models were less computationally intensive, simple in structure, and easy to tune and use. The ensemble learning models including random forest and xgboost provided the importance share of all predictor variables.

Predicator Variable Selection
There are three factors that affect NO 2 concentration in the atmosphere: meteorology, topography, and emissions. In terms of meteorology, 15 possibly related meteorological variables were selected. Elevation was chosen as a representation of the terrain. Emission sources were divided into anthropogenic sources and natural sources, including land-use cover types, population, GDP, and road network. Land-use cover types have a significant impact on NO 2 emissions [41]. Li et al. found that land-use cover types were important factors influencing the spatial pattern of NOx emissions in the Beijing-Tianjin-Hebei region except for Beijing [42]. In addition to land cover, there are other influencing factors in Beijing, such as industrial structure and pollution control level. As a result, GDP and population were also selected [43][44][45][46]. The amount of traffic emissions was represented by the type and density of the road network.
The data about the meteorological hysteresis effect were considered as important variables of the model and improved the performance of the model by~0.02 in terms of cv-R 2 and test-R 2 . Meteorological conditions affected the residual layer, which stored some NO 2 and released it later. Therefore, this study used meteorological lagged data to represent the contribution of the residual layer. In the Zhao et al. study, meteorological lag conditions were incorporated into the model construction to estimate PM 2.5 [34].
We found that the topography was also important, which may be due to the complex and variable topography in the Beijing-Tianjin-Hebei region. This model used variables to characterize emissions from human and natural sources, such as population, GDP, land-use types, and road types [43][44][45][46]. Among them, population and GDP had relatively important contributions to the model, but land-use variables and road variables were not as important as we might expect, mainly because the road data did not provide a good portrayal of the daily variation in traffic flow.
The contribution of OMI-NO 2 to the model was 0.015; however, in the Qin et al. study [19] and Zhan et al. study [18], the OMI-NO 2 contribution ranked first by much more than other variables. In the Di et al. study [17], the contribution of OMI-NO 2 was lower than 0.001. Therefore, coarse OMI-NO 2 (0.25 • × 0.25 • ) improved the model performance under a spatial resolution of 0.01 • × 0.01 • , but provided limited promotion. The emission inventories associated with NO 2 were included as predictor variables in the Qin et al. and Zhan et al. studies [18,19], but their contributions to the model were not significant. Furthermore, the original spatiotemporal resolution of the data was 0.25 • × 0.25 • and monthly; thus, the emission inventories were not included in this study. This study did not include year, season, region, latitude, and longitude as predictor variables in the model, and all predictor variables except satellite data were considered as causes affecting pollutants, which were beneficial to model interpretation. We did not exclude variables of low importance, mainly because of the potential contribution of low-importance variables in predicting long-term historical data. In addition, retaining all variables did not significantly increase the difficulty and speed of model training.

Spatial and Temporal Distribution
The NO 2 maps showed significant seasonal and regional differences. The NO 2 estimation was significantly higher in autumn and winter than in spring and summer. The favorable meteorological conditions mainly resulted in low NO 2 concentrations in summer. The prevailing southwest winds brought abundant rainfall in summer, which was conducive to the wet deposition of NO 2 pollutants. At the same time, high temperature and prevailing winds significantly accelerated the chemical reaction of NO 2 and increased the diffusion of pollutants. However, the static and stable weather conditions in winter weakened the atmospheric diffusion, and the low temperature led to fewer chemical reactions. In addition, the enhanced coal consumption and motor vehicle exhaust also led to the serious NO 2 pollution in winter. The area of the ultra-high NO 2 pollution zone (>55 µg/m 3 ) also shrank from spring to summer, then expanded in autumn, and reached its peak in winter.
Regionally, the southeast region's NO 2 estimations were higher than the northwest region's NO 2 estimations, and the difference was more obvious in winter. The reason is that the southeast is plain with flat topography, dense population, and high transportation emissions and industrial emissions, while the northwest is mountainous, with high topography, low population, and an underdeveloped industrial and transportation network. Combining the seasons and regions, in the northwest, the NO 2 concentrations were slightly higher in autumn than in winter, while those in the southeast were opposite, which may be due to the different emission intensity of the pollution sources. Straw burning was an important factor for the high NO 2 concentrations in autumn. In the Zhang et al. study [46], they used TEMIS (Tropospheric Emission Monitoring Internet Service) satellite retrieval data to describe annual NO 2 maps from 2010 to 2016, and two heavily polluted central zones were mentioned: one was Shijiazhuang, Xingtai, and Handan; the other was Beijing, Tianjin, and Tangshan. In our study, the two pollution zones were often connected in winter; however, in the annual simulation results, Tianjin and Tangshan were not reflected. From the NO 2 measurements in the northwest, we found that differences in autumn and winter were narrow for 2013-2016 and NO 2 concentrations in winter were higher than those in autumn in 2016, which might be due to the implementation of the prohibition of straw burning, the development of the industry, and the increase in the number of cars. Although we did not consider the variables related to inland and sea differences, such as coastal distance, as impact factors of the model, the model showed some differences. From the annual maps, the NO 2 concentrations in the coastal areas, especially around Qinhuangdao and Tangshan, showed high values, similar to the monitoring stations.

Advantages and Limitations
This study had several advantages. Firstly, all data for the research are publicly available. All data are free to download except for NO 2 monitoring site data. GDP, population, elevation, and road network data covered the entire Chinese region, making it convenient for research to be transferred to other Chinese cities. OMI-NO 2 and land-use cover data are globally available and easy to use for research in other regions and countries outside of China. Secondly, the random forest machine learning method used in this study has many advantages: (1) the principle is not complicated and easy to understand; (2) the sklearn library provides the relevant packages for random forest; (3) the random forest model only needs to adjust two important parameters, n_estimators and max_depth, to achieve excellent performance; (4) the random forest model requires a small amount of computation. Lastly, the ensemble model achieved two things: (1) a 0.01 • × 0.01 • grid resolution; (2) estimation of the historical data for the years 2010-2013.
On the other hand, this study also had some limitations. Firstly, one of the inherent drawbacks of the model is underestimation at high values and overestimation at low values. These results are shown in Table S8 (Supplementary Materials). The problem of overestimation in this study is more serious than that seen in PM 2.5 studies. Compared with PM 2.5 studies [34], NO 2 changes faster due to its short life, and the overall range of change is not as large as for PM 2.5 . In addition, especially when NO 2 is at a low level, photochemical reactions have a great impact on NO 2 . NO 2 simulations are more difficult due to these reasons. Secondly, the simulation of the model in coastal areas was not accurate due to the differences in meteorological conditions and emission sources between inland and coastal areas. Thirdly, seasonally, the NO 2 estimations in spring and summer were higher than those in autumn and winter, in contrast the monitored values, which may be related to the differences in meteorological conditions and emission sources between inland and coastal areas.
The model performed best in the winter and Beijing. However, the model performed worst in the summer and Hebei. The model performed well in Beijing, Tianjin, Tangshan, Baoding, Shijiazhuang, Hengshui, and Xingtai. However, the model performed poorly in Qinhuangdao and Cangzhou. These differences were due to the difference in the number of training samples in different areas. The monitoring sites are sparse and unevenly distributed, thus influencing the model's performance in different areas. The model performed worst in summer because the number of training samples with low NO 2 monitoring values was lower.

Conclusions
In the study, OMI-NO 2 data, meteorological data, elevation data, national population, GDP data, land-use variables, and road network data were used to establish a well-performing ensemble learning model in the Beijing-Tianjin-Hebei region between 2010 and 2016. The ensemble learning model was then applied to estimate daily NO 2 concentrations with the spatial resolution of 0.01 • × 0.01 • . The long-term NO 2 concentrations with high spatial and temporal resolution provide fundamental data for environmental epidemiological studies. In the future, it is necessary to expand the study area from regional to national for NO 2 simulation studies. At the same time, it is important to include more predictive variables related to NO 2 and adopt predictive variables with higher spatial and temporal resolution, such as meteorological data and satellite data. Since July 2018, the Copernicus Sentinel-5P product onboard the Tropospheric Monitoring Instrument (TROPOMI) has been available with a spatial resolution of 3.5 km × 7 km, thus improving satellite data. ERA-5 provides hourly meteorological reanalysis data with a spatial resolution of 0.1 • × 0.1 • . Anthropogenic activities and meteorological conditions in coastal areas are not consistent with those in inland areas, which require further study. With the rise of distributed computing, ensemble learning methods will be further applied at a larger spatial scale with high spatial and temporal resolution.
Supplementary Materials: The following are available online at https://www.mdpi.com/2072-429 2/13/4/758/s1: Figure S1. The distribution of some parameter data in the Beijing-Tianjin-Hebei region; Figure S2. Study region annual concentrations of ground-level measured NO 2 (µg/m 3 ) for 2013-2016; Figure S3. Study region seasonal mean concentrations of ground-level measured NO 2 (µg/m 3 ) for 2013 to 2016; Table S1. The descriptive statistical analysis of ground NO 2 monitoring stations in the study area; Table S2. Average annual performance and the performance of the model from 2010 to 2016; Table S3. Model variable classification, spatial resolution, temporal resolution, time frame, and source; Table S4. Descriptive statistics of modeling data for all independent variables for 2013-2016 (N = 160,371); Table S5. The feature importance determined by the model for the features in the study area; Table S6. Descriptive statistics of daily NO 2 concentration estimations in the Beijing-Tianjin-Hebei region in 2010-2016; Table S7. Descriptive statistical analysis of the seasonal mean and standard deviation of the spatial NO 2 measurements and NO 2 estimations in the southeastern (Beijing, Tangshan, Baoding, Langfang, Tianjin, Cangzhou, Shijiazhuang, Hengshui, Xingtai, Handan, and Qinhuangdao) and northwestern (Zhangjiakou and Chengde) parts of Beijing, Tianjin, and Hebei from 2013-2016; Table S8. Model performance with different NO 2 concentrations using test set validation; Model S1. Decision tree models to fill the missing values of OMI-NO 2 .