Estimation of Fine Particulate Matter in Taipei Using Landuse Regression and Bayesian Maximum Entropy Methods

Fine airborne particulate matter (PM2.5) has adverse effects on human health. Assessing the long-term effects of PM2.5 exposure on human health and ecology is often limited by a lack of reliable PM2.5 measurements. In Taipei, PM2.5 levels were not systematically measured until August, 2005. Due to the popularity of geographic information systems (GIS), the landuse regression method has been widely used in the spatial estimation of PM concentrations. This method accounts for the potential contributing factors of the local environment, such as traffic volume. Geostatistical methods, on other hand, account for the spatiotemporal dependence among the observations of ambient pollutants. This study assesses the performance of the landuse regression model for the spatiotemporal estimation of PM2.5 in the Taipei area. Specifically, this study integrates the landuse regression model with the geostatistical approach within the framework of the Bayesian maximum entropy (BME) method. The resulting epistemic framework can assimilate knowledge bases including: (a) empirical-based spatial trends of PM concentration based on landuse regression, (b) the spatio-temporal dependence among PM observation information, and (c) site-specific PM observations. The proposed approach performs the spatiotemporal estimation of PM2.5 levels in the Taipei area (Taiwan) from 2005–2007.


Introduction
Numerous studies over the last two decades indicate that the air quality measure of fine PM particles (PM 2.5 , particulate matter particles with an aerodynamic diameter ≤2.5 µm) can be more indicative of potential threats to human health than the commonly and long-used air quality measures of coarse particles, i.e., PM 10 (particulate matter particles with an aerodynamic diameter ≤10 µm) and total suspended particles (TSP). An increase in long-term exposure to PM 2.5 is closely associated with increased mortality and diseases, such as lung cancer and cardiopulmonary disease [1][2][3][4]. Despite the long history of air quality monitoring throughout the entire island of Taiwan from 1983, and much like many other countries, its PM 2.5 monitoring network did not begin to operate systematically and regularly until August 2005. The lack of long-term PM 2.5 measurements prevents epidemiologists from assessing the chronic health effects of long-term exposure to PM 2.5 . Geostatistical techniques have been applied to estimate the spatiotemporal distributions of PM 2.5 before the establishment of PM 2.5 monitoring networks [5][6][7][8]. The ratio of PM 2.5 /PM 10 is often used as an important indicator to characterize the underlying atmospheric processes within the local environment [7,8]. However, PM 2.5 /PM 10 ratios can vary with time and space, depending on the landuse and emission patterns of the space-time location. For example, these ratios are approximately 0.69 and 0.52, respectively, in the urban and suburb areas of Shanghai (China) [9], about 0.45 among five different Asian regions (Australia, Hong Kong, Korea, Philippines, Vietnam, and Japan) [10], and range from 0.39 to 0.69 in urban and semi-rural areas of the United States [11]. Previous research provides a summary of PM 2.5 /PM 10 ratios in megacities around the world [12]. Intra-urban ratios change significantly in Taipei, with a PM 2.5 /PM 10 ratio of approximately 0.82 around the Bei-tou incinerator [13], 0.68 in high traffic areas, and 0.57 in downtown areas [14].
The spatial and temporal variation of PM 2.5 , PM 10 , and other air quality levels in Taiwan are generally high due to their high association with local emission patterns and meteorological conditions. Recent developments have been focusing on quantifying the levels of PM 2.5 , PM 10 , and other air quality observations using the surrogates of local emissions [15,16]. The landuse regression technique (LUR) has been widely applied to determine the linear relationship between air quality measures and landuse information and generate air quality maps with high spatial resolution [17][18][19][20][21]. In general, LUR air quality maps can delineate the significant contributions of certain geographical objects, such as highways. However, due to changes in meteorological conditions and limited landuse information, the quantitative results of air quality levels by LUR can vary from time to time. Therefore, the LUR is generally used to quantify the long-term average air quality levels in space [20][21][22][23]. Studies show that landuse information also plays an important role in the variation of the PM 2.5 /PM 10 levels due to traffic and road emissions [24,25]. This is because the influence degree to PM 2.5 and PM 10 varies across different local landuse patterns. In addition, the temporal variations of PM 2.5 /PM 10 resulting from the change of meteorological conditions can be less significant than the direct observations of PM 2.5 and PM 10 . These characteristics make the PM 2.5 /PM 10 ratio a proper surrogate of air quality patterns, which quantify the contributions of spatial variations in landuse patterns. However, relatively few studies investigate the relationship between the PM 2.5 /PM 10 ratios and landuse information.
This study investigates the spatiotemporal distribution of PM 2.5 across the Taipei area from 2005-2007 by integrating the information of PM 10 and landuse information. This study uses LUR to establish a quantitative relationship between PM 2.5 /PM 10 and landuse information. The Bayesian maximum entropy (BME) method is then used to assimilate the PM 2.5 and the secondary information from the LUR analysis. The comparison is made by assessing the improvement of PM 2.5 prediction accuracy with the incorporation of the secondary information, i.e., geostatistical estimation by (1) only PM 2.5 , (2) both PM 2.5 and PM 10 and (3) PM 2.5 , PM 10 and landuse information.

Study Area
Taipei, including Taipei city and Taipei county, is the largest metropolitan area in Taiwan, and has a vehicle density as high as 6,000 vehicles per km 2 . In addition to traffic emissions, three incineration plants are major sources of pollutants in the area [26].
The Taipei area is bounded by mountains, i.e., Yangming Mountains to the north, Linkou mesa to the west, and a ridge of the Snow Mountains to the southeast. These mountains form the second largest basin of the island (Figure 1). This basin topography increases the concentration level of ambient pollutants and creates a high contrast between the urbanization of the basin floor in Taipei and the surrounding mountain areas.

Ambient Pollutant Data
An island-wide monitoring network operated by Taiwan Environmental Protection Agency (TWEPA) regularly records ambient pollutants, i.e., criteria pollutants such as PM, ozone, NO x , CO, SO 2 [27], and meteorological variables. There are 18 TWEPA stations within the Taipei metropolitan area, and these stations recorded both PM 2.5 and PM 10 from 2005-2007. Table 1 summarizes the PM 2.5 and PM 10 statistics. In addition, the Department of Environmental Protection and the local governments of Taipei city and Taipei county (TPEDEP) have independently collected PM data since 1970 and 1990, respectively. However, only the Taipei city government records PM 10 on a daily basis at its eight stations. This study uses the PM 2.5 and PM 10 data from both central and local governments to estimate the monthly PM 2.5 /PM 10 ratios at every PM station ( Figure 2). This study aggregates the PM 2.5 and PM 10 data into monthly data following the procedure suggested by USEPA [28]. The monthly PM 2.5 levels at the TPEDEP stations were estimated by the BME method as discussed below with only PM 2.5 observations. The estimated monthly PM 2.5 and PM 10 were then used to obtain the spatiotemporal distribution of PM 2.5 /PM 10 ratios for all stations from 2005-2007. The other observed ambient pollutants, i.e., CO, NO 2 , SO 2 , and O 3 , were used as the emission indicators, as discussed below. Figure 1. The highways, rivers, and topography in the Taipei metropolitan area.

Landuse Data
The National Land Surveying and Mapping Center (Taiwan) conducted a comprehensive landuse surveying of the entire Taipei area in 2007. This survey includes nine major classes of land usage, including agriculture, forest, traffic, water, buildings, utilities, recreation areas, mining areas, and others, i.e., transportation data discussed below. Each of the major landuse categories mentioned above includes more detailed classifications [29] This study analyzes the potential major or minor landuse classes that may have positive or negative effects on the air quality levels. The selection criteria include significant variables identified in previous studies, e.g., roads, and insights from local experts [30], e.g., motorcycles. The selected landuse classes include the areas of farms, forests, railroad, freeway, highway, roads, ports, government institutions, school, commerce, residence, industry, hospital, social welfare facilities, public utilities, and parks.   Figure 3 shows the spatial distribution of some landuse classes in Taipei. This figure clearly shows that city development is concentrated in the plains of the Taipei basin floor. In addition, this study generates spatiotemporal traffic information by uniformly assigning the recorded number of various registered vehicles [31,32] to the study area based on the road areas identified by the landuse data. The vehicle types of this analysis include motorcycle, bus, passenger car, and truck.

Methods
This study uses a landuse regression method to determine the relationship between PM 2.5 /PM 10 and local emission-related information. The emission-related information in this study includes non-PM ambient pollutants and landuse data. Local emission-related data are derived by GIS functions which estimate this size or area of selected indicators within the specified spatial buffers. Various spatial ranges of buffers are used for landuse information surrounding the PM 2.5 /PM 10 data (i.e., 0-50 m, 50-100 m, 100-300 m, 300-500 m, and 500-1,000 m) to determine the different ranges of transport processes produced by different types of emissions. The relationship between the sizes/proximity of local emission-related data and PM 2.5 /PM 10 ratios is assumed to be homogeneous over the entire study area, and can therefore be formulated in a linear form. Multivariate stepwise regression analysis was performed to select the most significant regressors and estimate their associated parameters. Due to the high linear dependencies among the selected emission-related variables in the landuse regression model, this study uses the variance inflation factor (VIF) to identify multicolinearity among the regressors and avoid potentially dubious results from the analysis [33]. This study uses SPSS software for landuse regression analysis.
The BME method mathematically represents air pollution attributes (i.e., PM measurements and ratios) in terms of spatiotemporal random fields (S/TRF; [34]). Let t X X , s p  denote a S/TRF of an air pollution attribute, where the vector ) , ( t s p  denotes a spatiotemporal point ( s is the geographical location and t is the time). The S/TRF model is a collection of all physically possible realizations of the attribute to be represented mathematically. The S/TRF model is fully characterized by its probability density function (pdf), KB f , where the subscript KB denotes the 'knowledge base' used to construct the pdf. In particular, BME considers a distinction between: (a) the general KB, denoted by G-KB, and (b) the site-specific KB, S-KB. The total KB is denoted as , i.e., it includes both the general and the site-specific KB. The fundamental BME equations are as follows (for technical details, see [35,36]): where g is a vector of  g -functions ( ,... 2 , 1   ) that stochastically represents the G-KB under consideration (the bar denotes statistical expectation), μ is a vector of   -coefficients that depends on the space-time coordinates and is associated with g (i.e.,   expresses the relative significance of each  g -function in the composite solution sought), S ξ represents the S-KB available, A is a normalization parameter, and K f is the pollutant pdf at each space-time point (the subscript K means that K f is based on the blending of the core and site-specific KB). The terms g and S ξ the inputs in Equation (1)  represent the ratio estimation and its standard deviation from LUR, respectively. The multiplication of PM 2.5 /PM 10 ratios and PM 10 generates an uncertain spatiotemporal trend of PM 2.5 . To account for the uncertainty in the ratio estimation and subsequent trend estimation in space and time, this study uses the BME method for the spatiotemporal estimation of the PM 2.5 with the uniform-distributed PM 2.5 residuals which upper and lower bounds are derived from the intervals of trend estimations and the PM 2.5 observations. In summary, this study applies the two-stage approach to integrate landuse regression and BME methods for spatiotemporal PM 2.5 estimations, in which landuse regression is used to characterize the spatial variability of PM 2.5 , i.e., ratios, and BME performs later by assimilating the uncertainty by landuse regression and the spatiotemporal dependence for the modeling of spatiotemporal PM 2.5 distribution. Table 2 lists the selected variables from the emission-related dataset in LUR model by the stepwise regression method. This table lists variables by the rank of their significance to the variation of PM 2.5 /PM 10 ratios. Most of the selected variables can elevate the level of PM 2.5 /PM 10 ratios. The road, forest, industrial area, and park landuse patterns has the greatest effect on increasing the PM 2.5 /PM 10 ratios. Most selected ranges of the variables are 500 m-1,000 m. This implies that the level of PM 2.5 /PM 10 represents the general air quality patterns of the area surrounding the monitoring stations rather than the direct emission impact from the short distances. The only selected variable that shows the ability to reduce PM 2.5 /PM 10 values is the park landuse pattern, which ranges between 300 m and 500 m. Note that most traffic information is not included in the model due to multicollinearity with the spatial distribution of road area. The exception is the bus volume, which can increase the local ratio level within.

Results
The spatiotemporal distribution of monthly PM 2.5 can be obtained by multiplying the empirical functional of landuse information, i.e., the LUR model and PM 10 variation in space and time. However, the spatiotemporal dependence among the PM 2.5 is not considered. This study integrates the BME method with LUR to model the high frequency part of the PM 2.5 variation in space and time, i.e., the unexplained PM 2.5 noise in the LUR model. The high frequency part of spatiotemporal variation of PM 2.5 is characterized by the stationary nested covariance shown below (see Figure 4): This study compares the modeling of spatiotemporal PM 2.5 distribution using the kriging method, LUR method, and the integration of LUR and BME methods, respectively ( Table 3). The kriging estimation is based upon the modeling of PM 2.5 observations directly, and ignores their uncertainty [35]. Leave-one cross-validation results show that the LUR model outperforms the kriging method in PM 2.5 estimations. Furthermore, the BME method can improve the accuracy of PM 2.5 estimation in this study. Figure 5 and Figure 6 show the spatial distribution of estimation performance at each PM 2.5 observation location by the LUR model and BME method, respectively.   Figure 7 shows the temporal variation of monthly PM 2.5 observations and their estimations by BME method at the four selected locations, i.e., Yungho, Cailiao, Sijhih, and Yangming. The selected locations represent different parts of Taipei area.

Discussion
This study uses the BME method to integrate the LUR model in the prediction (estimation) of fine particulate matter concentrations across space-time in the Taipei metropolitan area. This implementation of BME theory allows this study to determine attribute distributions in a composite space-time domain without restrictive or unrealistic assumptions (such as linearity, normality, independency etc.). The general knowledge base of the BME method used to characterize the general pattern of PM 2.5 is based on the empirical relationship between landuse information and the LUR model. Many studies [18,20,41] show that the LUR is able to produce high-resolution air quality maps and address the effects of each landuse pattern. However, updating a landuse database often requires tremendous efforts, making it difficult to update the information of landuse changes over time. To characterize the general pattern of PM 2.5 in space and time, the emission-related database in this study includes the variables of non-PM ambient pollutants and traffic information, which change over time and are considered to be highly associated with the level of PM 2.5 . As a useful indicator of local emission patterns [7,8], this study determines PM 2.5 /PM 10 ratios based on landuse distribution and some LUR model emission information mentioned above. As expected, most of the significant variables in the LUR model of ratios are pure spatial information, i.e., certain landuse patterns within the certain distances from the observation locations. This implies that the spatial variation of PM 2.5 /PM 10 ratios exceeds its temporal variation, i.e., the effects of landuse data to PM 2.5 /PM 10 ratios in Taipei are more important than other temporal factors, such as meteorological and seasonal effects. In addition, the monthly ratios mostly characterize the general emission pattern. Therefore, the ranges with greatest size of area at each neighborhood appear most frequently in this study. The factors included in this study are mostly variables which can increase the level of ratios. Some of the potential variable can significant reduce the level of ratios are selected, e.g., forest and park. Among them, the contradiction of the park effect from different ranges may be due to the common spatial distribution of the urban setting in Taipei, in which major parks are commonly located near high-density urbanized areas. Thus, only locations immediately next to parks can enjoy have the advantages of the park's ability to improve air quality. As for areas situated further from the parks, the air quality levels can easily be elevated by other contributing factors. This is partially responsible for the high variability of PM 2.5 levels, which can increase significantly based on local emissions and decrease significantly when emission sources are removed.
Covariance analysis shows that the PM 2.5 exhibits two spatiotemporal interactions with different space-time ranges. These interactions represent the local and long-term transport patterns of fine particulate matter over the Taipei area with the two distinct space-time ranges: [11 km, 3 months] and [50 km, 50 months]. The dominant process of PM 2.5 distribution is the local transport with spatial and temporal extents of 11 km and 3 months. The spatial extent considers the size of highly-urbanized areas, while the temporal range shows how seasonal effects play an important role in the concentration level of PM 2.5 . The variability of long-term process can result from the mass dispersion over the continents, such as dust storms, due to certain meteorological conditions [42][43][44][45]. Figure 4 and Figure 5 show the spatial distributions of performance assessment for the LUR and BME methods. Results show that both analyses obtain similar spatial patterns of accuracy distribution, and can perform relatively better in areas with better PM observations. The analysis of the LUR model assumes a homogenous relationship between landuse information and PM observations across space and time. However, the heterogeneity of the statistical relationship between the landuse and PM concentration may vary from location to location due to distinct causality between analysis attributes. This spatial unbalance of information support causes the homogeneous relationship address the area of abundant information better. This results in distinct performance differences between the central and boundary areas in spatial distribution of cross-validation results of the LUR and BME methods, especially in Figure 4. Though the BME method shares the same spatiotemporal patterns as the LUR model, the inclusion of spatiotemporal dependence in the BME method reduces the effects of unbalance information and improves the estimation accuracy, as Figure 5 shows. Table 3 shows the advantages of integrating landuse information in spatiotemporal estimation in PM 2.5 . Cross-validation comparison shows that the LUR model offers greater improvement than the kriging method, i.e., the most-widely used geostatistical method. The LUR and kriging methods only consider landuse information and spatiotempral dependence among PM 2.5 , respectively. Table 3 shows that the BME method achieves the smallest mean square error, standard deviation, and other statistics in PM 2.5 estimation errors. Figure 6 compares the temporal distribution of PM 2.5 observations and BME estimations for four selected locations. The four locations were selected to represent the East, South, West, and North parts of the city, respectively. Results show that, for all locations, the BME estimations generally achieved good agreement with the PM 2.5 observations.

Conclusions
This study discusses the application of spatiotemporal statistics to science-based PM 2.5 mapping in Taipei. The main goals of the BME method are to generate PM 2.5 maps in a composite space-time domain, in which the core knowledge in the form of empirical laws by LUR model with the informative secondary information derived from landuse data. Results show that incorporation of multi-sourced soft and hard information through BME analysis and mapping can effectively improve the accuracy of PM 2.5 estimation across space-time. This analysis demonstrates the most influential landuse patterns elevating PM 2.5 levels. In addition, the two dominant space-time mechanisms underlying PM 2.5 space-time distributions in Taipei include local and long-term transport processes.