Land-Use Regression Modeling to Estimate NO 2 and VOC Concentrations in Pohang City, South Korea

: Land-use regression (LUR) has emerged as a promising technique for air pollution modeling to obtain the spatial distribution of air pollutants for epidemiological studies. LUR uses trafﬁc, geographic, and monitoring data to develop regression models and then predict the concentration of air pollutants in the same area. To identify the spatial distribution of nitrogen dioxide (NO 2 ), benzene, toluene, and m-p-xylene, we developed LUR models in Pohang City, one of the largest industrialized areas in Korea. Passive samplings were conducted during two 2-week integrated sampling periods in September 2010 and March 2011, at 50 sampling locations. For LUR model development, predictor variables were calculated based on land use, road lengths, point sources, satellite remote sensing, and population density. The averaged mean concentrations of NO 2 , benzene, toluene, and m-p-xylene were 28.4 µ g/m 3 , 2.40 µ g/m 3 , 15.36 µ g/m 3 , and 0.21 µ g/m 3 , respectively. In terms of model-based R 2 values, the model for NO 2 included four independent variables, showing R 2 = 0.65. While the benzene and m-p-xylene models showed the same R 2 values (0.43), toluene showed a lower R 2 value (0.35). We estimated long-term concentrations of NO 2 and VOCs at 167,057 addresses in Pohang. Our study could hold particular promise in an epidemiological setting having signiﬁcant health effects associated with small area variations and encourage the extended study using LUR modeling in Asia.


Introduction
Land-use regression (LUR) modeling is an accepted exposure assessment methodology, supported by numerous published models at local, country, and continental scales, for predicting the spatial distribution of ambient air pollutants and assessing long-term exposure to traffic-generated air pollution [1]. Regression methods are used to model pollutant concentrations measured at given sites on the basis of variables that characterize their surrounding land use, population density, and traffic patterns, as described elsewhere [2][3][4].
The development of geographical information system (GIS) offers the opportunity to predict pollutant concentrations on a fine spatial scale. LUR modeling is widely used in community health studies for capturing the smaller-scale variability because of the increasing ability of GIS to provide land-use data [5].
With geographic variables, such as traffic intensity, proximity to roadways, commercial and government areas, industrial area and point sources, population, and housing density (independent variables), a statistical model regresses measured values of a pollutant at sampling locations (the dependent variable). Air pollution levels are then predicted for any location, such as individual homes, using the parameter estimates derived from the regression model that mainly includes nitrogen dioxide (NO 2 ) and particulate matter (PM 10 and PM 2.5 ) [4,[6][7][8][9][10][11][12].
Moreover, the applications of LUR in Asian countries have been performed using regulatory monitoring and measured data in Japan [19], China [12], and Mongolia [20].
The traffic patterns and land use in Pohang, Korea, are very complicated because the world's third-largest steelmaker is located in the city. Therefore, it would be required to estimate the concentrations of air pollutants such as NO 2 and VOCs using LUR modeling for epidemiologic studies.
This study, modeled with measured data, could contribute to the wide and extended use of LUR models in Korea, as well as other countries in Asia. The objectives of this research are to (1) provide the concentrations of NO 2 and VOCs with long-term sampling methods, (2) predict the spatial distributions of NO 2 and VOCs in the study area with LUR modeling, and (3) identify which air pollutants can be predicted most accurately in Pohang City. These LUR models of NO 2 and VOCs would be useful for providing an exposure assessment tool for the epidemiologic studies of the adverse health impacts of outdoor air pollution. In this study, we provided the long-term NO 2 and VOC concentrations for every location in Pohang City, Korea.

Study Area
The study area is Pohang City, located at 36 • 01 N and 129 • 20 W, North Gyeongsang Province, South Korea. It is bordered by the East Sea to the east and covers an area of 1127 km 2 . The 2009 population of Pohang was 509,475 persons, and the population density was 451.97 persons/km 2 . Pohang is one of the largest industrialized areas in Korea and mainly manufactures steel products ( Figure 1).
With geographic variables, such as traffic intensity, proximity to roadways, commercial and government areas, industrial area and point sources, population, and housing density (independent variables), a statistical model regresses measured values of a pollutant at sampling locations (the dependent variable). Air pollution levels are then predicted for any location, such as individual homes, using the parameter estimates derived from the regression model that mainly includes nitrogen dioxide (NO2) and particulate matter (PM10 and PM2.5) [4,[6][7][8][9][10][11][12].
Moreover, the applications of LUR in Asian countries have been performed using regulatory monitoring and measured data in Japan [19], China [12], and Mongolia [20].
The traffic patterns and land use in Pohang, Korea, are very complicated because the world's third-largest steelmaker is located in the city. Therefore, it would be required to estimate the concentrations of air pollutants such as NO2 and VOCs using LUR modeling for epidemiologic studies.
This study, modeled with measured data, could contribute to the wide and extended use of LUR models in Korea, as well as other countries in Asia. The objectives of this research are to (1) provide the concentrations of NO2 and VOCs with long-term sampling methods, (2) predict the spatial distributions of NO2 and VOCs in the study area with LUR modeling, and (3) identify which air pollutants can be predicted most accurately in Pohang City. These LUR models of NO2 and VOCs would be useful for providing an exposure assessment tool for the epidemiologic studies of the adverse health impacts of outdoor air pollution. In this study, we provided the long-term NO2 and VOC concentrations for every location in Pohang City, Korea.

Study Area
The study area is Pohang City, located at 36°01′ N and 129°20′ W, North Gyeongsang Province, South Korea. It is bordered by the East Sea to the east and covers an area of 1127 km 2 . The 2009 population of Pohang was 509,475 persons, and the population density was 451.97 persons/km 2 . Pohang is one of the largest industrialized areas in Korea and mainly manufactures steel products ( Figure 1).  Figure 1 shows the 50 locations of Pohang sampling sites including four regulatory monitoring stations, operated by the government, that are located in residential (2 sites) and industrial (2 sites) areas and used to monitor their exhausts. The regulatory monitoring sta-Atmosphere 2022, 13, 577 3 of 12 tions in Pohang are designed to monitor compliance with regulatory standards. Although the location-allocation techniques [21] were not used, the monitoring sites were chosen based on a number of objective criteria to identify a spatial variability of NO 2 and VOC. To investigate the spatial variability in the pollutant concentrations, 35 and 7 locations across the city were selected to ensure covering the residential area and nearby major roads, respectively. Since the focus was to capture the spatial trend of ambient pollution, especially where there were high numbers of residents, only 3 sites were located near or within the industrial area. Five sites were placed to provide information across the city, away from any noticeable local sources of importance, such as laneways or industrial areas. Consideration was provided to make sure the inclusion of all types of land use, such as road networks, industry, and residential settings.

Air Sampling and Analysis
Once the monitoring sites were selected, a NO 2 passive sampler (Roshi Kaisha, Ltd., Tokyo, Japan [22]) and organic vapor samplers (3M, Saint Paul, MN, U.S.) were deployed for two 2-week integrated sampling periods to measure NO 2 and VOC, respectively. The first sampling session included 50 locations monitored from 6 September to 20 September (henceforth "fall") in 2010, and the second session was conducted between 15 March and 29 March (henceforth "spring") in 2011 at the same 50 locations ( Figure 1). Samplers were placed on lamp posts, utility poles, and street signs at a height of 2.5 m to prevent contamination and vandalism. Of the 50 samples, 4 were placed for duplicate samplings nearby the regulatory monitoring sites, and 5 field blanks in each session and pollutant were sampled. NO 2 , sampled by passive sampling badge, was analyzed by a UV spectrophotometer. Based on the recommendation of Lee et al. [23], an overall mass transfer coefficient of 0.10 cm/s was used. Analytical quality for the NO 2 passive sampling badge was controlled by the blanks from each sampling location to explain the difference in sealing quality and lag time.
The 3M passive samplers were extracted with 2.0 mL of solvent and VOCs were determined by gas chromatography/mass selective detector (Shimadzu Corp., Kyoto, Japan). The results were corrected with laboratory blank, internal standard, and recovery. A more detailed description of the samplers and the performance can be found elsewhere [24][25][26][27].
Three VOCs (benzene, toluene, and m-p-xylene, henceforth "BTX") were chosen for modeling. The number of measurements above minimum detection limits (MDL) for each pollutant and sampling session is shown in Table 1. All NO 2 and BTX values were above the MDL (1 µg/m 3 for NO 2 , 0.01 µg/m 3 for BTX) except for 2 benzene and 6 m-p-xylene values in both sessions. The values below MDLs were replaced with MDL/ √ 2 to provide a numerical result for all analyzed samples [28].

Temporal Trends and Adjustments
To account for the fact that measurements were made during two different sampling periods, and because our primary interest was in long-term (i.e., annual average) concentrations, NO 2 data were converted to effective annual averages, as previously described elsewhere [11]. To account for any bias in the NO 2 measurements, we planned to adjust NO 2 concentrations based on the regulatory monitoring sites operated by the government (Figure 1). However, due to not providing government data during 2010-2011, we analyzed the average concentrations from 2005 to 2009 using the regulatory monitoring data for NO 2 available from four sites in the GyeongSangbukdo Government Public Institute of Health and Environment. Then, we calculated the ratio of the annual average concentration to the relevant 2-week period (6 September-20 September 2020, 15 March-29 March 2021) and multiplied our measurements by the appropriate ratio to estimate the effective annual average. The two adjusted annual averages from each of the 50 locations were then averaged for LUR modeling. We could not adjust the BTX data because those were not provided by the regulatory monitoring sites. Therefore, we just averaged the two measurements in two sessions at each location. As a result, the BTX LUR models provide an assessment of relative concentrations across the city.

GIS Data
We generated variables in 5 categories and 20 subcategories to characterize the road length, land use, and population density in circular buffers with different radii around each sampling site ( Table 2). All variables in each category were derived from a single spatial dataset in vector format.   The Landsat ETM+ data were acquired from the Global Land Cover Facility (www. landcover.org, accessed on 13 July 2011). The advantage of using ETM+ data is its global coverage and free access. The scene for Pohang, Korea was at path-114/row-35 and captured on 10 September 2007. The ETM+ bands 1-5 and 7 were used to calculate the tasseled-cap brightness, greenness, and wetness [29], which were used to classify land cover types in Pohang. To facilitate the procedure of the LUR, all the individual images (30 m × 30 m) were resampled to have a spatial resolution of 5 m.
For population density, we used Calculate Density, Spatial Analyst extension in ESRI's ArcGis 9.3 (ESRI, Redlands, CA, USA) to estimate the values within each search radius after converting census polygon to centroids and assigning each centroid the population count of the polygon from which it was derived ( Table 2).
Eleven additional variables were included, which describe the geographic location of each site in terms of its elevation (ELEV), longitude (X), latitude (Y), distance to the city center (DCC), distance to the roads (DMJR, DGEN, and DMNR), distance from the ocean (DOC), and point sources for benzene, toluene, and xylene (DBP, DTP, and DXP), for a total of 140 potential predictors.

Land-Use Regression Model Development
While NO 2 values showed the normal distribution, BTX values were log-normally distributed. Using an algorithm described by [4], we built models for NO 2 and BTX: (1) Correlation coefficients between each potential predictor and the pollutants were calculated. (2) The predictors were ranked in each subcategory by the absolute value correlation. (3) Other variables with coefficients higher than 0.6 in the highest-ranking variables were eliminated. (4) All remaining variables were entered into a stepwise multiple linear regression. (5) To include only variables contributing at least 1% to the model R 2 and coefficients consistent with a priori assumptions (e.g., positive coefficients for industrial land use variables and negative coefficients for open land use), the models were rerun as necessary.

Evaluation of the Developed Models
The models were evaluated based on the model-based R 2 and root mean square error from a "leave-one-out" cross-validation. Each model was repeatedly parameterized on N-1 data points, where N is the total number of each pollutant in the selected sites, and then the concentration at the excluded site was predicted. Independent variables reserved in the models had to have a significant t-value (p < 0.05) and low collinearity, variance inflation factor (VIF < 2.0), with other retained variables. The surfaces of predicted NO 2 and BTX concentrations were created by applying the coefficients of the predictive model equation and generating predicted surfaces with a 5 × 5 m resolution. Where the estimates exceeded 120% of the highest measured concentration, grid cell values were truncated. All geographic variables were calculated with ArcGIS 9.3 (ESRI, Redlands, CA, USA).
Statistical analyses were performed using SAS (Statistical Package for the Social Sciences, SAS Institute Inc., Cary, NC, USA) 9.2 for Windows. The correlations among measured air pollutants were analyzed by Spearman's rank correlation test.

Descriptive Statistics
Descriptive statistics for measured and temporally adjusted NO 2 and BTX concentrations are shown in Table 3. With the duplicate samples, measurement precisions calculated as the standard deviation of duplicate differences divided by the square root of 2 were 1.88 µg/m 3 for NO 2 , 0.13 µg/m 3 for benzene, 0.37 µg/m 3 for toluene, and 0.002 µg/m 3 for m-p-xylene. The averaged mean concentrations of NO 2 , benzene, toluene, and m-pxylene were 28.4 µg/m 3 (SD = 7.5), 2.40 µg/m 3 (SD = 1.13), 15.36 µg/m 3 (SD = 5.20), and 0.21 µg/m 3 (SD = 0.13), respectively. The mean value of NO 2 in the fall session was significantly higher than that in spring, while the adjusted mean concentration (28.3 ± 7.5 µg/m 3 ) of NO 2 in 50 locations was lower than that (41.0 ± 6.2 µg/m 3 ) of annual average from 4 regulatory monitoring sites. The mean values of benzene and toluene in the fall session were significantly higher than those in spring, while m-p-xylene showed the opposite result.   Table 4 shows the correlation coefficients between pollutants. Benzene had significant correlations with NO 2 , toluene, and m-p-xylene.

LUR Models
The final LUR models and modeled pollution maps are shown in Table 5 and     Table 2 for buffer size and definition of each road, c Variance inflation factor, d Leave-one-our cross-validation, e Root mean square error of leave-one-out.   Table 2 for buffer size and definition of each road, c Variance inflation factor, d Leave-one-our cross-validation, e Root mean square error of leave-one-out.

Estimation of VOC Concentrations at Locations
By applying the equation of the regression models on a 5 m cell basis to all locations in the city area, we modeled maps for NO2, benzene, toluene, and m-p-xylene concentrations ( Figure 2) and then estimated long-term concentrations at 167,057 addresses in Pohang. These addresses contained 154,367 dwellings and 464,710 residents based on

Estimation of VOC Concentrations at Locations
By applying the equation of the regression models on a 5 m cell basis to all locations in the city area, we modeled maps for NO 2 , benzene, toluene, and m-p-xylene concentrations ( Figure 2) and then estimated long-term concentrations at 167,057 addresses in Pohang. These addresses contained 154,367 dwellings and 464,710 residents based on 2007 census data. The estimated air pollutant levels (not shown) of NO 2 (20.7 µg/m 3 ), benzene (1.46 µg/m 3 ), toluene (12.13 µg/m 3 ), m-p-xylene (0.17 µg/m 3 ) in Pohang area were lower than measured data in sampling sites (Table 3).

Discussion
In this study, we developed LUR models for the estimation of air pollutants concentrations (NO 2 and BTX) in Pohang, Korea by applying the measured data in 50 sampling locations. This study showed the NO 2 concentration was properly estimated using the predictor variables, but BTX modeling had comparatively low R 2 .
In Table 3, the measured NO 2 concentration was 28.4 µg/m 3 , ranging from 11.3 µg/m 3 to 41.4 µg/m 3 and lower than the annual NO 2 average in Pohang regulatory monitoring stations. To remove bias introduced by seasonal trends in background concentrations [11], the measured NO 2 values were corrected with adjusted factors of fall and spring, respectively to acquire the effective annual average.
The correlation between NO 2 and BTX species in this study (Table 4) showed slightly different coefficient ranges, compared with other studies in the US and Canada [35,36]. Our research had slightly low and narrow coefficient ranges (0.31-0.46) compared with those (0.53-0.89) reported in Toronto, Canada [35]. While the higher correlation ranges (0.78-0.99) resulted from the numerous petrochemical industries in the region [36], the lower correlation coefficients in this study may have been due to complicated NO 2 and BTX sources from roads, residential, and industrial areas. The significant correlation between NO 2 and benzene indicated that benzene was partially from traffic, considering other correlations among benzene, toluene, and m-p-xylene, presumably being from industrial areas.
As shown in Table 5, the R 2 value for our NO 2 model was 0.65. The highest R 2 values of NO 2 in the previous studies of Europe and North America were 0.77 in San Diego, United States [37] and 0.77 in Montreal, Canada [15], followed by 0.76 in Huddersfield, 0.73 in Sheffield, United Kingdom [3], 0.56 in Vancouver, Canada [4], 0.54 in Montreal, Canada [8], and 0.51 in Munich, Germany [10]. In Asia, the values of R 2 are 0.74 and 0.61 in Tianjin, China [12], 0.54 in Shizuoka, Japan [19], and 0.74 in Ulaanbaatar, Mongolia [20]. Our R 2 value was in the middle of the previous R 2 values for the NO 2 model. Our NO 2 modeling had variables related to industrial areas, as well as traffic variables. The result may support the fact that the densely populated areas, areas located in the north, and industrial areas also have higher traffic lengths.
In Pohang, only four regulatory monitoring stations are deployed to control the air quality level for the purpose of compliance with air standards. Even though there was a reasonable result about R 2 value in NO 2 modeling [19] using regulatory air quality data, the modeling would be possible in the case of providing the monitoring stations that can gather background data, not affected by specific sources of air pollutants. Without monitoring stations being able to gather background levels and having enough numbers, there would be no other way other than directly measuring NO 2 levels. Therefore, we selected the sampling sites located in the residential and industrial areas, including high traffic areas, and some locations to gather background data to cover the entire city.
The "leave-one-out" procedure was conducted to evaluate the validity of the regression models. The resulting estimations were a root mean squared error (RMSE) of 4.3 µg/m 3 for NO 2 , 0.24 µg/m 3 for ln(benzene), 0.17 µg/m 3 for ln(toluene), and 0.54 µg/m 3 for ln(m-p-xylene), as shown in Table 5. The RMSE value for NO 2 modeling in this study was similar to that in a previous study (4.5-4.7 µg/m 3 ) [39,40].
Even though we tried to apply the remote sensing data such as brightness, greenness, and wetness as predictor variables in NO 2 and BTX modeling, the remote sensing variables were not included in the final land-use regression model. Since the application of satellite remote sensing data in LUR modeling in recent years [16,38,[41][42][43], modeling based on Landsat ETM+ data has provided a relatively easy and feasible way to improve exposure analysis in cases in which highly resolved traffic, road, and land-use data are unavailable.
In our LUR modeling study, on the other hand, we directly measured the NO 2 and BTX levels with passive samplers to cope with the limitations of the regulatory monitoring data described above. The main encouragement of our LUR modeling is to bring effective tools that can extract reliable estimates of air pollutants concentrations for epidemiological study and risk assessment [4]. Therefore, according to spatial variability of the measured or regulatory monitored data, the appropriate process should be applied for extracting reliable estimates in LUR modeling.
In conclusion, our land-use regression models hold particular promise in epidemiological settings having significant health effects associated with small area variations. While the NO 2 model appears to predict well, data limitations hampered our ability to gather all potentially useful predictors in BTX models. Further studies are needed on designing more detailed monitoring and modeling strategies for epidemiological studies of chronic health effects.

Conclusions
For the first time, we developed LUR models for the estimation of air pollutant concentrations for NO 2 , benzene, toluene, and m-p-xylene in Korea. The performance of the LUR model was comparable to that found in previous reports in Europe and North America as well as Asia. The major road predictor in all models was commonly included, and the industry predictor was also found in NO 2 , benzene, and toluene models. The application of this prediction model is a promising tool for the assessment of individual levels of exposure to traffic and industry-related air pollution. The results obtained from this study will be utilized as fundamental data for a predictive model to pertinently manage the level of urban air pollution.