Impact of Urban Growth on Air Quality in Indian Cities Using Hierarchical Bayesian Approach

Several studies have found rising ambient particulate matter (PM 2.5 ) concentrations in urban areas across developing countries. For setting mitigation policies source-contribution is needed, which is calculated mostly through computationally intensive chemical transport models or manpower intensive source apportionment studies. Data based approach that use remote sensing datasets can help reduce this challenge, specially in developing countries which lack spatially and temporally dense air quality monitoring networks. Our objective was identifying relative contribution of urban emission sources to monthly PM 2.5 ambient concentrations and assessing whether urban expansion can explain rise of PM 2.5 ambient concentration from 2001 to 2015 in 15 Indian cities. We adapted the Intergovernmental Panel on Climate Change’s (IPCC) emission framework in a land use regression (LUR) model to estimate concentrations by statistically modeling the impact of urban growth on aerosol concentrations with the help of remote sensing datasets. Contribution to concentration from six key sources (residential, industrial, commercial, crop fires, brick kiln and vehicles) was estimated by inverse distance weighting of their emissions in the land-use regression model. A hierarchical Bayesian approach was used to account for the random effects due to the heterogeneous emitting sources in the 15 cities. Long-term ambient PM 2.5 concentration from 2001 to 2015, was represented by a indicator R (varying from 0 to 100), decomposed from MODIS (Moderate Resolution Imaging Spectroradiometer) derived AOD (aerosol optical depth) and angstrom exponent datasets. The model was trained on annual-level spatial land-use distribution and technological advancement data and the monthly-level emission activity of 2001 and 2011 over each location to predict monthly R. The results suggest that above the central portion of a city, concentration due to primary PM 2.5 emission is contributed mostly by residential areas (35.0 ± 11.9%), brick kilns (11.7 ± 5.2%) and industries (4.2 ± 2.8%). The model performed moderately for most cities (median correlation for out of time validation was 0.52), especially when assumed changes in seasonal emissions for each source reflected actual seasonal changes in emissions. The results suggest the need for policies focusing on emissions from residential regions and brick kilns. The relative order of the contributions estimated by this study is consistent with other recent studies and a contribution of up to 42.8 ± 14.1% is attributed to the formation of secondary aerosol, long-range transport and unaccounted sources in surrounding regions. The strength of this approach is to be able to estimate the contribution of urban growth to primary aerosols statistically with a relatively low computation cost compared to the more accurate but computationally expensive chemical transport based models. This remote sensing based approach is especially useful in locations without emission inventory.


Background
Researches have shown that over a long-term time scale, pollutant concentrations simulated from dispersion models are correlated with satellite-sensor retrieved columnar depths. Such correlations have been found between modeled PM 2.5 and satellite-sensor retrieved aerosol optical depth (AOD) [1] as well as between modeled SO 2 ground emissions and satellite vertical column density Itahashi et al. [2]. However this proportionality may not hold true if emission inventory is out-of-date or unrepresentative of ground sources. An example of this was provided by Akimoto et al. [3] who observed increasing tropospheric NO 2 from satellite but decreasing emissions from coal consumption inventory estimates. Similarly de Meij et al. [4] found high spatial agreement of aerosol precursor gases from European inventories (EMEP and AEROCOM) with MODIS sensor based AOD retrievals but also found that large differences in simulated surface concentrations are not reflected well within AOD retrievals. de Meij et al. [4] suggested that highly temporal emissions do not strongly impact aerosol concentrations and they concluded that seasonal temporal variation is more important than daily or weekly temporal distribution. Therefore in developing countries which are undergoing rapid economic development and urban growth the emission inventories may be out of date and pollutant concentration derived solely from emission inventories using chemical transport modeling can be unreliable. It is necessary to thus explore the use of alternative schemes that can consider changes in land-use growth for modeling the concentrations.
Statistical approaches like land use regression (LUR) models present an attractive alternative to the emission based chemical transport models, as they have low computation costs [5] and employ explanatory variables constructed from easily obtainable data [6]. They were first developed in late 1990s's [7,8] mainly for mapping traffic related pollution like NO 2 . Since then LUR along with geographic weighted regression (GWR) models have been used as a computationally cost-effective method for mapping primary and secondary formed pollutants to explain spatio-temporal variation in the concentrations due to geographic factors. Recently interest is surging to develop LUR models for PM 2.5 [9][10][11][12][13] due to the cognizance of its effect on human health [14] and its rising concentration across developing countries [15]. Van Donkelaar et al. [16] showed that using land-use in GWR to observed and simulated AOD and ambient PM 2.5 concentrations results in significant PM 2.5 concentration prediction in locations lacking ground observations. Land-use changes or urban growth itself could be responsible for rising concentration. For example, Zhou et al. [12] undertook observations of 190 Chinese cities in 2014 and found that socio-economic and land-use variables, such as population density, industrial structure and road density increase PM 2.5 concentration level while per capita GDP (gross domestic product) improves air quality. Similarly Lin et al. [10] found PM 2.5 concentration was driven by population, local economic growth and urban area growth. Some researches have also described contribution of the land-use predictor variables in explaining the pollutant concentration [17,18]. These relative contributions can be useful for designing evidence based policies where land-use emission inventories or emission factors have not been established. Currently Indian cities are experiencing degraded air quality due to intense economic development and in the future, air quality management will become more challenging. Yet, literature is scarce about utilizing statistical approaches in India and only recently LUR [19,20] or multiple linear regression [21] have been attempted. This is probably due to limited availability of land-use data for Indian cities, similar to other developing countries. Thus, there is a need to investigate whether areal growth in remote sensing derived land-use classes could explain rising PM 2.5 concentrations. This would also be useful for identifying which land-use type contributes the most to urban PM 2.5 concentrations.

Objective
The goal of this research is to estimate the impact on long-term (2001 to 2015) PM 2.5 concentrations due to urban growth in 15 Indian cities with LUR model using remote sensing and meteorological datasets. The specific objective is to calculate relative contribution of the land-use and vehicular sources to urban fine aerosol concentration and compare the results with non-remote sensing based estimations. The originality of our LUR based model is the adaption of IPCC emission framework which allows explicit accounting of policy changes in the form of urban expansion and technological advancement as well as seasonal variation in emissions. Furthermore, we have used spatially distributed land-use types derived from publicly available satellite datasets which can be updated rapidly with urban growth compared to emission inventory.

Methodology
The flowchart of dataset and methodology employed for estimating emission parameters is shown in Figure 1. Within India, 9 cities with the highest PM 2.5 World Health Organization [22] concentration and the 6 megacities were chosen for this study. They are collectively referred to as 'Tier-2' and 'Tier-1' city respectively,. Tier-1 cities, also known as megacities, have higher per-capita gross domestic product (GDP) and population compared to Tier-2 cities. Correspondingly, Tier-1 cities tend to have more manufacturing industries compared to Tier-2 cities. All the 15 cities are shown geographically in Figure 2. Average population of Tier-1 and Tier-2 cities in 2010 was 13.3 million and 2.3 million respectively while the average per-capita GDP was 1700 USD and 1100 USD respectively.

Conceptual Framework
Kaya identity [23] states: Emission = energy consumption × emission f actor. To estimate anthropogenic emission of pollutants from emission sectors, emission inventories use its modified form [24], where emission factor is defined as emission per unit consumption of energy. Total emissions, E, from emitting source sectors sc and vehicles V is shown in Equation (1) [25,26].
where, C is a sector's energy consumption, EF the unabated emission factor and e f f the fuel efficiency. For vehicle of type V with a count of A V , VKT is its monthly distance traveled in kilometers. Several inventories exist with regards to black carbon, SO 2 and PM 2.5 anthropogenic emissions over Asian countries. The most spatially comprehensive inventory is Regional Emission inventory in ASia (REASv2) [26]. Since many countries do not have updated emission inventories, the emission factors are 'borrowed' from some other country and the energy consumption is estimated statistically. However the borrowed emission factors may not represent the emission sources or source emission strengths that are unique to that country. Within Asia, country level emission factors (EF) are mostly available for Japan, China, Taiwan, South Korea and India [25,26]. In India, several inventories have been prepared at country level [27,28] or for specific districts [29][30][31][32] but only a few have considered primary PM 2.5 emissions. Furthermore, emissions are dependent on seasonal activity of the emission sources and only few inventories [26,33] have explicitly considered this, for example, brick kilns are active only during the dry season from October to June [34]; crop residue burning is prominent during the two harvesting months of Rabi crop (April-May) and Kharif crop (October-November) [35]. In addition, emissions are affected not only by scale of production but also the production technology as it acts along with population and affluence in a multiplicative manner [36]. Although measuring or quantifying the state of technology is itself major research challenge [37], decreasing CO 2 emission intensity can be an indicator of overall status of technological evolution [38]. The World Resource Institute also suggests that pollution intensity of production (defined as ratio of emission and gross national product) can be a possible a technological indicator.
Based on this, carbon emission intensity was used to depict the state of technology in this research. Sector sc was segregated into land-use type LU. We modified Equation (1) to represent emission from any LU in a month mon and year yr as the product of total area under a LU type, its mean monthly per unit area emission in the base year base, technological evolution in yr with respect to the base year base and a coefficient to account for seasonal variation in emissions based on mon. Base year, base, is the reference year from which technological evolution in a year yr can be compared. The key emission sources considered included land-use LU (residential -R, commercial -C, industrial -I, brick kiln -BK, agricultural crop fire -agro) and vehicles V. Total emissions E from the key sources in a month, mon, of year, yr, was formulated as Equation (2).
In Equation (2), A is the number of units of each emission source such that it refers to the total area (in m 2 ) in the case of area sources (e.g., residential, commercial and industries) or total count in the case of point sources (e.g., brick kilns, crop-fires and vehicles). EC is emission coefficient of each emission source type in the base year and can be interpreted as corresponding to mean monthly emission of the emission source. Yearly technological evolution in EC is represented by EI, ratio of annual emission intensity in year yr to that of the base year. SEA mon is a coefficient varying between 0 to 1 to account for relative monthly variation in emissions from LU or V. The seasonal emission activity SEA takes on a value between 0 and 1 for each month. It is assumed to be 1 during the month relative monthly emission activity is maximum and SEA is assumed to be 0 when the emission source is not emitting at all. Thus, emission from each unit of LU type in any month mon of the year yr is a product of its emission coefficient EC in the base year scaled by technological evolution EI in year yr and corrected by seasonal emission activity SEA mon .

Fine Aerosol Concentration Indicator
In this paper air quality is meant to refer PM 2.5 concentrations because cities in India, like many other developing countries, have much higher than acceptable level for PM 2.5 than other pollutants commonly used for judging air quality (e.g., SO 2 , NO 2 , CO and O 3 ). PM 2.5 refers to particles with aerodynamic diameter less than 2.5 µm and is found to be fine mode atmospheric aerosols. PM 2.5 particles are produced by primary emissions or through gas-to-particle conversions. A high PM 2.5 mass concentration changes aerosol parameters causing high aerosol optical depth (AOD) and angstrom exponent (>0.8) values. Due to lack of dense spatio-temporal ground based PM 2.5 measurements, satellite retrieved AOD and angstrom exponent has been used to estimate PM 2.5 [1,39]. The major disadvantage with satellite based retrievals is that they represent columnar values and require other ancillary meteorological information on boundary layer height and relative humidity for estimating surface PM 2.5 . Under suitable conditions, such as a well-mixed boundary layer and low ambient relative humidity, correlation between daily PM 2.5 mass concentrations and satellite AOD can be as high as 0.9 [40].
Gridded AOD and angstrom exponent values at 10 km spatial resolution are available from 'MOD04L2 Collection6' product of MODIS (Moderate Resolution Imaging Spectroradiometer) sensor onboard NASA's Earth Observing System (EOS) Terra satellite. In an earlier paper, Misra et al. [15] developed a scheme (called 'AirRGB') to mathematically decompose AOD and angstrom exponent values into three unitless components for characterizing urban air quality. The decomposition was based on thresholds determined from mean annual AOD and angstrom exponent values from 60 cities around the world. One of the AirRGB components-called R-corresponds to high PM 2.5 concentrations on a scale of 0 to 100. R takes on a value of 100 when a high AOD (0.6) and a high angstrom exponent (0.8) value is retrieved, and R is 0 when either or both AOD and the angstrom exponent are low (0.05). R has been demonstrated to correspond to ambient PM 2.5 concentrations with a high coefficient of agreement (0.97) [15] and was used in this research as an indicator for PM 2.5 concentration. For reference, a binned PM 2.5 concentration between 0 to 25 µg/m 3 corresponds to a mean R of 13.12 and PM 2.5 between 176 to 200 µg/m 3 corresponds to a mean R of 95.47 Misra et al. [15]. The uncertainty in R is estimated as ±15.98 owing to the uncertainty in MODIS based AOD and angstrom exponent retrievals [41].
A similar decomposition scheme was considered in this research to obtain continuous spatio-temporal representation of ambient PM 2.5 concentrations. R was obtained by applying AirRGB scheme to daily MOD04L2 imagery from 2001 to 2015 and their mean monthly composites were used. Indian cities often have a higher AOD and angstrom exponent than what was considered while deriving the original AirRGB thresholds [15]. So, to prevent saturation of R at 100, the original AirRGB threshold for AOD and angstrom exponent was revised from 0.6 and 0.8 to 1.5 and 1.0, respectively. Several daily satellite retrievals were missing due to cloud coverage during the monsoon months (June, July and August). Techniques like spatially sliding windows [42], n-day composites on a rolling basis [43] or pixel coverage threshold monthly mean [44] have been used previously to interpolate missing values. However the final values can vary by as much as 30% and neither technique has been judged to be superior [44]. Furthermore, on monthly and yearly scales, bias towards clear-sky values has not been found to be significant [45]. In this research, missing R values were interpolated temporally by considering rolling mean of 15-day R values. Further discussion on the effect of missing values on the AOD mean is provided in Section 1.1 and Figure S1 in Supplementary Materials. Furthermore, AOD increases when relative humidity increases, due to aerosol hygroscopic growth [46]. Therefore, R was also corrected for hygroscopic growth using the approach suggested by Chin et al. [47], as shown in Equation (3). Where, R obs refers to the R corrected for relative humidity, and β(rh) is the mass extinction coefficient that characterizes hygroscopic aerosol at different relative humidity (rh) values [47] for different aerosol types, such as, sulfate sea-salt, black carbon and organic carbon. Water-soluble aerosols have the highest mass concentration in northern India [48] with sulfates being the most common water soluble aerosol [49]. Consequently, the β(rh) of sulfate aerosol, [47], was applied for hygroscopic correction. The R obs trend over the central pixel of study locations is shown in Figure 3 where an increasing trend can be seen from 2001.

Urban Land-Use and Expansion
Residential, commercial and industrial regions are the key area emission sources in cities. Emission may take place from municipal waste-combustion, residential and commercial cooking, use of diesel based electricity generators, and others [30,50]. Like many developing countries, land-use data is not publicly available for most cities in India. Sritarapipat and Takeuchi [51] proposed the technique of identifying spatial land-use morphology in terms of residential, commercial and industrial areas by using remote sensing derived building heights and nighttime light. This approach was followed to obtain the spatial distribution of emission sources. Open digital surface models at 30 m resolution, AW3D30 [52] and ASTER GDEMv2 [53] were used for deriving the height of built structures [54] for the years 2011 and 2001, respectively. Upsampled nighttime day-night band (DNB) of VIIRS dataset was used to obtain the nighttime light. As AW3D30 and ASTER GDEMv2 are the only two DSM datasets that are publicly available at 30 m resolution currently, urban land-use were prepared only for these snapshot times. Due to this spatial distribution of residential, commercial and industrial could only be identified for 2001 and 2011. The minimum user accuracy of residential and commercial area in these cities being more than 64% and minimum user accuracy for industrial being 72%. To overcome the limited availability of land-use data in studying their long-term contributions, population and per-capita GDP values for each city was used for interpolation to obtain total areas of residential (A R ), commercial (A C ) and industrial (A I ) structures from 2001 to 2015 [55]. This was based on the finding that urban expansion is governed by socio-economic development in the form of population and per capita GDP [56]. District level population and capita GDP values were obtained from statistical data sources [57,58].

Agricultural Fires
The burning of crop residue is a seasonal activity, the emitted particles from which can travel great distances and affect several cities after the harvest of rice or wheat crops in India. Daily fire count from the MODIS thermal anomalies product (MOD14) [59] aboard NASA's Terra satellite was used for indicating crop residue burning [60]. It is available at the daily level with a 1 km spatial resolution since 2001. MOD14 is known to underestimate fire counts in October and November over northern India [35]. The daily fire counts within 300 km of a North Indian city and a South Indian city are compared in Figure 4. In northern India, bimodal distribution peaking during April to May and October to November can be seen clearly owing to wheat-rice double cropping systems. The city with the most fire counts was Ludhiana. Other northern Indian cities like Kanpur or Jaipur had around a tenth of fire counts compared to Ludhiana. On an average cities in northern India also showed the highest incidences of fire occurrence (3 to 20 times compared to those in West or South). Unlike northern Indian cities, peaks are unimodal in southern Indian cities and occur around March-April. In this research, the monthly mean of daily fire counts within 500 km of each study site was considered for this purpose and designated as A agro .

Brick Kiln
Brick kilns are construction-brick making factories that emit high amounts of black carbon, PM 2.5 and SO 2 while they operate during the dry season [34]. Official estimate of the number of brick kilns in India is unavailable and they are assumed to be more than 100,000 [34]. Brick kilns are often found around the periphery of urbanizing cities and can be spotted in high resolution imagery due to their distinct appearance [61]. They were visually identified around the study sites using Google Earth imagery. Approximately 3000 brick kilns were located beyond 40 km and within 70 km radius of New Delhi, mostly along the riverbed. For other cities, the number of brick kilns was comparatively lower but they were also found beyond 20 km and within 60 km radius from the city center. Based on visual interpretation approximate number of the brick kilns, A BK , was assigned as (a) Low: 50 brick kilns, (b) Mid: 50-200 brick kilns and (c) High: 400 brick kilns.

Vehicle Kilometers Traveled (VKT)
Both vehicle population and vehicle kilometers traveled (VKT) affect total vehicular emissions, as formulated in several emission inventories [26,62] and as shown in Equation (1). Vehicle population is available annually for each major city in India from Ministry of Road Transport and Highways [63]. However annual VKT with a consistent calculation approach is available only for Bangalore, New Delhi and Kolkata as 8634 km, 9594 km and 7230 km respectively [64]. VKT for other cities was calculated by following the argument that VKT is influenced by urban sprawl [65,66]. Urban form can be characterized by: normalized built-up density(nBD), normalized landscape shape index(nLSI), normalized largest patch index (nLPI), normalized patch density (nPD), and normalized edge density (nED) [67]. These shape metrics were calculated for the built-up land-cover class by classifying Landsat 8 imagery for the year 2013 for each location. Thereafter correlation of the reported VKT was computed with each urban form metric as shown in Figure 5. Based on the strong inverse correlation between VKT and nLPI (R 2 = 0.95) and nBD (R 2 = 0.88), an empirical equation (Equation (4)) was formulated to estimate VKT for any other city s by using the built-up density of Bangalore city as reference. Thereafter VKT was allocated proportionately to grid level, VKT k , on the basis of length of major road section lying within each cell, k, of the grid. The grid size was chosen as 10 km × 10 km to keep it consistent with grid of R. Openly available road network shapefile [68] was used for this purpose.

Emission Intensity
For India's yearly carbon emission intensity, EI yr statistics are available at a national level for manufacturing, construction, waste and transport sectors from the World Resource Institute through its 'Climate Data Explorer' platform. Technological evolution in year yr was depicted with respect to the base year as 2001, EI(yr) was formulated as a ratio shown in Equation (5). As mentioned earlier, 2001 was considered as the base year and likewise EI(yr) was calculated with respected to the EI base .
Summary of the emission sources used in this research is shown in Table 1.

Seasonal Emission Activity
It is well known that monthly emissions depend on the source seasonal activity [26,32,33], for example, vehicles undergo cold start emission during winters, diesel-based electricity generators are used to offset power-cuts by residences and commercial entities like offices or hospitals during summers, brick kilns operate only during dry season, residential areas burn wood for heating during winter season, and so forth. While crop residue burning activity can be inferred directly from satellite based fire count products, emission activity for other sources is based on consumption based statistics. So far, only Kurokawa et al. [26] and Paliwal et al. [33] have shared the monthly variation of emission activity for India. Kurokawa et al. [26] shared the logical basis of using monthly electricity and energy consumption to characterize seasonal activity for domestic and road emissions over India. Same monthly activity variation was ascribed throughout India in the REASv2, which may not be true owing to diverse climates within the country. SEA for the colder months such as, November, December and January is designated as 1 to indicate that monthly emission activity is maximum during these months. Owing to a lack of seasonal emission activity data for each source, we derived these statistically for summer and rain seasons. They were denoted as s, r respectively as shown in Table 2. The SEA mon values used for LU and V are shown in Table 2. For industry, the SEA A I ,mon was assigned 1 for all the months to indicate no relative change in emissions throughout the year. For brick kilns, the SEA A BK ,mon was 1.0 for dry months (November to June) to indicate the period when they are active and 0.0 during wet months. For crop residue burning, the A agro itself varies to indicate monthly fire counts, hence SEA A agro ,mon is set as 1.0 which assumes that strength of crop fire in similar across all months. However, the variation in emissions from residential and commercial areas is not known. So combinations of s and r were grid searched from (0.1, 0.1) to (0.9, 0.9) with step interval 0.1, and each combination was tried for each location in the model (Section 2.1). That particular combination of s and r was selected as SEA mon which resulted in a high Pearson's correlation between R est and R obs . Table 2. Assumed monthly distribution of seasonal activity parameters (SEA mon ). s, r represent values of seasonal activity for summer and rain season respectively. For any source SEA mon represents ratio of emissions in any month mon compared to its maximum monthly emission.
Gridded monthly averaged surface meteorological data for zonal and meridional velocity and relative humidity (rh) was obtained from the 'NCEP/DOE AMIP-II Reanalysis (Reanalysis-2) Monthly Averages' [69] data set. Planetary boundary layer height (pblh) is available from 'NOAA-CIRES 20th Century Reanalysis version 2c Monthly Averages' [70]. These datasets are hosted at NOAA Earth System Research Laboratory [71]. Wind speed (wnd) was derived from the zonal and meridional velocity. The pblh is available only until 2014, due to which the monthly pblh values of 2014 were ascribed for 2015 as well.

Land Use Regression (LUR) Model
Based on the literature discussed earlier, long-term bottom-up PM 2.5 emissions trend corresponds to satellite derived PM 2.5 concentration. This relation was used to statistically link emissions directly to concentration trends. For example, Upadhyay et al. [21] modeled future anthropogenic PM 2.5 concentration from bottom-up emissions and meteorological data was used by to using a statistical multi-linear regression model. This was exploited to adapt the emission framework based Equation (2) to a concentration framework using a land-use regression model (LUR). The satellite-derived R obs corresponds to the concentration from primary emission, R em and the concentration due to secondary PM 2.5 formation and dispersion, EA O . This is shown in Equation (6).
This was implemented over the gridded land-use datasets, where R em k over any grid cell k is defined as the concentration due to primary emission E (E is defined earlier in Equation (2)) from each emitting source j inversely weighed by the distance between the source and k (d k,j ) as shown in Equation (7). The rate of decrease is specified by an exponential power of the distance. Most commonly used exponent is 2 [72][73][74][75]. de Mesnard [76] mathematically derived that the exponent should be chosen based on the atmospheric condition; it should be close to three for unstable atmospheric conditions, approximately two for moderately unstable conditions and between one and two for neutral conditions. As per weather records [69], average wind speed in most Indian cities is between 2 m/s to 4 m/s, qualifying for slightly unstable atmospheric conditions. Considering these, an exponential power of 2 was chosen as inverse distance weight. As Equation (7) is based on concentration, EC is subsequently interpreted as corresponding to concentration due to mean monthly primary emission.
EA O is defined as the concentration on account of secondary PM 2.5 formation and dispersion. EA O consists of the model constant (EC O ) and independent meteorological variables. This is shown in Equation (8). Since secondary PM 2.5 formation is regulated in part by the meteorology, inclusion of significant meteorological variables such as wind speed (wnd), relative humidity (rh) and planetary boundary layer height (pblh) can overcome this disadvantage to some extent [21]. Due to the complicated pathways involved in secondary PM 2.5 formation from the interaction of precursor gases and synoptic conditions, it cannot be modeled explicitly by this approach.
EA O k (mon, yr) = EC O k + β wnd .wnd k (mon, yr) + β rh .rh k (mon, yr) + β pblh .pblh k (mon, yr) In the LUR, R obs is the dependent variable which was regressed against the varying concentration due to primary emission from the key sources (A R , A C , A I , A agro , A B K, A V ) and influence of meteorological variables (wnd, rh, pblh) to estimate the coefficients of concentration due to primary emission (EC R , EC C , EC I , EC agrp , EC BK and EC V ) and coefficients for meteorological variables (EC O , β wnd , β rh and β pblh ) to predict R est , as shown in Equations (6)- (8). We considered seasonal and spatial variation and increase in emission sources from 2001 to 2011. In this research since AirRGB R is available from 2001, it was taken as base year.
Since the built-up area of Tier-1 is much larger than Tier-2 cities, respective domain area was chosen as 30 km × 30 km and 10 km × 10 km. Here domain area refers to the area within which emission sources were considered.

Hierarchical Bayesian Framework for LUR
The emission characteristics may not be same for all the cities due to diverse socio-economic situation, for example, Tier-1 cities like Chennai and Mumbai have a large presence of heavy manufacturing industries around them compared to Tier-2 cities [49]; vehicle emission standards are stricter in Tier-1 than Tier-2 Indian cities [77]. These individual city-level characteristics induce random effects in the LUR model. A better approach would be to pool information from specific locations towards the general population through a hierarchy of the city and the Tier to which it belongs [6]. In presence of hierarchical fixed and random effects of temporal or spatial nature, Bayesian hierarchical model have been used to assess impact of human activities on the concentration of PM 2.5 [6,[78][79][80]. Bayesian models use prior knowledge of parameter distribution along with the likelihood function to calculate posterior distribution of the parameters. A hierarchical model is one in which probability of one parameter can be thought to be dependent on another through a hierarchy and models are suitable for data with multiple levels and describing data from individuals within groups [81]. Such an approach allows estimation of individual parameter probabilities for the smallest analysis unit which is informed by all the other individuals via the estimate of the group-level distribution while the group-level parameters are estimated by joint constrained individual-level parameters [81]. Moreover due to the data limitation of spatial LU in only 2001 and 2011, Bayesian models are well suited for this task as they prevents overfitting and provide unbiased estimates even for small sample sizes [81].
The Equation (6) can then be described in the two stages: local R concentrations are conditional on the distribution of fixed effects in its background concentrations (which includes meteorological effects) as well as local emission from LU classes and vehicles. The random effect is denoted by whether a city c belongs to Tier-1 or Tier-2. In the second stage random effects of a Tier type, m, are modeled as a Gaussian processes with specific mean emission parameter and covariance. In the last stage depending on the chosen hyperparameters, the conditional distribution of covariance function is calculated. The hierarchical approach is helpful to deal with random effects or ambiguous variations arising from the Tier type, m, of a city c. The full hierarchical model to specify EC for each LU type and other emission sources concentration conditional on city c and Tier type m is given as Equations (9) and (10). Parameters for each city are drawn from normal distribution of Tier-level parameter (mean EC LU m , EC V m , β wnd m , β rh m , β pblh m , standard deviation σ LU m , σ V m , σ wnd m , σ rh m , σ pblh m ) which is estimated from the global population normal distribution (mean EC LU , EC V , β wnd , β rh , β pblh ,), as shown in Equation (10).
EC LU m,c , EC V m,c , EC LU m and EC V m were constrained to be positive to ensure valid concentration due to their emissions. β rh m,s and β rh,m was constrained to be positive as concentration increase when relative humidity increases due to formation of secondary particles. β wnd m,s , β pblh m,s , β wnd,m and β pblh,m was constrained to be negative as concentrations decrease with increased wind speed or a well-mixed boundary layer height due to dispersion effects. The model was trained on each city's data of 2001 and 2011 only due to limitation of land-use data availability. This was implemented model using Hamiltonian Monte Carlo algorithm to perform Bayesian inference in Stan software in R language. We ran 6000 iterations with a burn-in of 1000 iterations and 2 chains were set.

Seasonal Emission Activity Parameter
Most northern Indian cities (8 out of 9, except Jaipur) showed similar Pearson's correlation values with respect to combination of s and r. Correlation between R obs and R est increased as (s, r) values increased from (0.2, 0.2 ) to (0.7, 0.5). Another combination of (s, r) values, around (0.1, 0.1), also showed higher correlation between R obs and R est but this was discarded on account of overfitting and insignificant model parameters (<90% confidence interval). On the basis of similar climate characteristics the same value of (s, r), (0.6, 0.4) was used for all the cities in North India. In cities where summer and winter temperatures differ by less than <10 • C, such as Chennai, Bangalore, Kolkata, Hyderabad and Mumbai, higher correlation coefficients between R obs and R est was obtained when the summer value s tended towards 1. This may indicate that summer and winter emissions are similar in location where strong temperature seasonality does not exist. For rainy season SEA value r, the optimum values varied with the city's location. Summary of the (s, r) values chosen for each city is shown in Table 3. Another set of (s, r) was also investigated which was similar to those used in REASv2 emission inventory over India. The results obtained with those parameters are discussed in Tables S1 and S2 in Supplementary Materials.

Model Parameters
Emission coefficient parameter EC for each LU type as well the parameters for meteorological variables at city level and Tier level are shown in Figure 6. It is seen that model constant, EC O in Tier-2 cities (19.15 ± 3.92) is about 1.4 times that of Tier-1 cities (14.90 ± 4.79). Among residential, commercial and industrial regions, potency of concentrations due to emissions is highest for residential areas followed by industrial and commercial areas. EC R is higher for Tier-1 (0.001 ± 0.0004) than Tier-2 (0.0007 ± 0.0003), although when EC R of Jaipur and Ahmedabad is neglected, mean EC R of Tier-2 is higher than Tier-1. For some Tier-1 cities, like Hyderabad and Kolkata, EC I is higher than other cities suggesting industries in this locations are more polluting in nature compared to other Tier-1 cities. Contribution of a unit residential area (0.0006 ± 0.0002) is almost 1.5 times that of a unit industrial area (0.0004 ± 0.0003) in Tier-2 cities. Generally EC I had a higher coefficient of variation (ratio of mean and standard deviation) than EC R suggesting higher variation in industrial emitting sources. EC BK has the highest value among all the emission coefficient. EC BK is higher for Tier-1 (0.86 ± 0.31) than Tier-2 (0.53 ± 0.25). Out of the cities which have more brick kilns, some had much lower EC BK m,s , for example New Delhi. This could imply that the effect of an individual brick kiln on the urban R values is not significant in these cities compared to other emission sources. EC agro is higher in most cities compared to those cities where the problem of residue burning is not pervasive such as Hyderabad and Kolkata. However low EC agro for Ludhiana, which has the most crop residue fires is surprising and may imply that wind direction affects the impact of crop residue fires on the urban concentration. For a vehicle, its EC V is generally similar across the cities but overall it is higher in Tier-2 (8.47 ×10 −9 ± 4.09 × 10 −9 ) than Tier-1 (5.71 × 10 −9 ± 4.21 × 10 −9 ). This points the higher emission per vehicle in Tier-2 cities possibly as a result of vehicle age or emission regulations. Regarding coefficients for meteorological variables, the coefficient of planetary boundary layer height β pblh is negative and similar across the cities (−0.0008 ± 0.0005). The coefficient of wind, β wnd varies across the cities. It has a large absolute value in cities located in north-central Indo-Gangetic plain and those near the sea. β wnd is close to zero for New Delhi which implies a limited ability of the wind to flush out pollutants and lower the concentration. Given the skewness of β wnd towards zero for New Delhi, the wind maybe transporting PM 2.5 due to emission from surrounding regions rather than help the city in lowering the concentrations. This is especially true during the crop residue burning months when winds assist long-range transport of aerosols from burning fields in the north-western India to New Delhi [82]. Coefficient for relative humidity, β rh is higher (0.31) in cities like Bangalore, Agra and Ludhiana which implies that aerosol composition has more hygroscopic aerosol in these cities than other cities (0.16)

Long Term R Prediction and Out-of-Time Validation
The observed R obs and prediction of R est from the model trained on the 2001 and 2011 data is shown in Figure 7. The increasing trend in R obs is well predicted by R est and is on account of urban growth in residential, commercial, industrial ares and vehicle sectors. For some cases, the seasonality is also well captured in R est predictions. As the LUR model was trained only on 2001 and 2011 dataset, the confidence interval is large for those years whose A LU is quite different from that used for training. The overall Pearson's correlations between out-of-time (excluding 2001 and 2011 dataset) R obs and R est is presented in Table 4, where the significance level was Bonferroni corrected to overcome multiple comparison problem. Jaipur and Ahmedabad have low and non-significant correlation. This could be a result of inappropriate SEA which could not capture seasonal variation in emission behavior or the effect of meteorological dispersion is much stronger than local primary emissions. The rationale for suggesting this is that both Jaipur and Ahmedabad had quite different SEA from the cities closest to them, New Delhi and Mumbai respectively. Both Jaipur and Ahmedabad are also on the edge of an arid desert region, where fine crustal particles may also play some role. However this needs further investigation. For other cities the correlation ranges from 0.28 to 0.78 (median correlation is 0.52). The correlation is high (>0.5) for cities already known to have high R obs , e.g., Patna, Kolkata, New Delhi, Lucknow and Kanpur.
Accuracy of the monthly predictions R est was further explored by finding its root mean squared error (RMSE) with R obs , as shown in Figure 8. The errors were not normally distributed throughout the months and were biased towards higher RMSE during the monsoon months of July, August and September for cities in northern India. RMSE in these cities during the monsoon months ranges from 12.9 to 23.9. During monsoon months R values are low not because the emissions are significantly low during this months, but rather there is a strong meteorological influence in the form of wet scavenging due to rain and wind. Also, due to cloudy sky very few retrievals can be made further reducing the validity of monthly mean of daily R values. There are also moderate RMSE values (mean 11.95) for the cooler months of December, January and February in many cities like New Delhi, Kanpur and Delhi. Due to cold weather and sunshine there is an enhanced formation of secondary particles that remain trapped as haze due to shallow boundary layer height. Lowest RMSE is obtained for the month of November (8.45) and April (9.69) suggesting the LUR predictions are most reliable in these months. Furthermore, for cities in Indo-Gangetic plains there is an underestimation in R est in October and December and overestimation in April and May. This is caused by the underestimation of fire counts in October by the MOD14 dataset and an increased secondary PM 2.5 formation in December due to shallow boundary layer and stagnant winds.   Figure 7. Increase in R est m,s due to urban growth is compared with R obs over the selected cities in (a-n). Months of July, August and September usually have missing observations due to cloud-cover.

Source-Wise Relative Contribution
Relative contribution of LU and other sources on the central 10 km×10 km grid cell of cities and its seasonal variation is shown in Figure 9a and Table 5. Since the RMSE between R est and R obs was lowest for November (discussed in previous section), the relative contributions are discussed based on the result for this month. Residential areas have the highest contribution due to primary emission than other LU classes. Contribution of residential areas is higher in Tier-2 cities (39.6 ± 10.7%) than Tier-1 cities (28.0 ± 13.8%). This could be due poor enforcement of regulations regarding waste combustion, fine-dust from construction and road as well as burning biomass for fuel. If brick kilns are present, then they have the highest contribution (11.7 ± 5.2%) after residential areas. Their contribution is higher in Tier-1 cities as those cities are rapidly undergoing urban expansion and require construction materials. Contribution of commercial (7.3 ± 5.1%) and industrial (9.0 ± 5.5%) areas in Tier-1 cities is higher than the contribution of commercial (0.9 ± 0.7%) and industrial (3.6 ± 2.4%) areas in Tier-2 cities. Underestimation of the total areas under industries could have led to lower contribution of industrial sources than what is reported elsewhere. Contribution of vehicles to concentration is also quite low and observed only for the Tier1 cities. Hyderabad has a higher vehicle contribution, however this is on account of the very high vehicle population. It is also possible that spatial variation in concentration due to emission from industrial point sources and road network is not captured distinctly within the 10 km resolution of R pixels. Availability of retrievals at higher resolution from satellite such as Sentinel-5P [83] may overcome this limitation. Crop fires contribute negligibly to R est in most cities except New Delhi (9.4 ± 2.2%) and Ludhiana (4.4 ± 2.3%) which are in close proximity to such farm fields. This post-monsoon agricultural fire contribution is likely an underestimation due to the known issue of low fire counts reported in MOD14 product during post-monsoon season [60].
A significant portion of relative contributions is accorded to other sources and secondary aerosol formation as seen by a large estimation of EA O in Tier-1 cities (38.8 ± 16.5%) as well as Tier-2 cities (45.5 ± 12.5%). High contribution EA O to R could be on account of secondary PM 2.5 formation which is a result of aerosol chemistry and meteorology. During high pollution episodes, the relative contribution of secondary PM 2.5 is high [84,85]. Stagnant meteorology, for example, low wind speed, high humidity and shallow boundary layer, also enhance secondary PM 2.5 formation [84]. This includes complicated transformation pathways like oxidation of precursor gases like SO 2 , NO x and NH 3 to aerosols, that cannot be resolved with our data-based approach which relies on statistically linking emissions with concentrations. Some cities also have a high contribution of EA O due to proximity to sources like power and steel plants as well coal mines. Using bottom-up emission inventories, Guttikunda et al. [86] recently concluded that the meteorological transport of pollutants from surrounding peri-urban and rural areas in many cities makes it difficult to segregate contributions based on the to urban PM 2.5 alone. They also found that outside contribution is higher in Tier-2 cities and ranges between 30% to 40%. In Figure 9, overall emissions from residential areas contributed the most to R in urban areas (35.0 ± 11.9%), followed by brick kilns (11.7 ± 5.2%) and industries (4.2 ± 2.8%) over central part. The contribution of other sources is around 41.6% in Tier-1 and 45.5% in Tier-2 cities.
For monthly changes in relative contributions, an example for New Delhi for the year 2015 is presented in Figure 9b. Monthly changes in relative contributions are regulated by the SEA parameter, agricultural fire counts and meteorology. It is seen that contribution of agricultural fire is highest during May, October and November. Checking relative contributions for the month of November across the years shows that contribution of agricultural fires has increased from the 2001-2005 period (2.5%) to the 2010-2015 period (5.2%). This is due to recent groundwater related policy that suggests delay in sowing of the crop [87]. It is also seen that contribution of residential areas has increased from 2001 (29.7 ± 8.0%) to 2015 (40.0 ± 10.9%). Since we assumed a constant count of brick kilns for all the years it appears that the contribution of brick kilns is decreasing from 19.0% to 13.1% which may be incorrect. Nonetheless it does suggest that brick kiln as a significant contributor to R est values in New Delhi. Based on the decrease in contribution of EA O from 2001 to 2015, it can be said that contribution can be explained more by the sources within the city in recent years. Table 5. Comparison of relative contribution to R est over the central grid cell from concentration due to primary emission from LU classes and other (EA O ) is presented for for all the cities for the year 2015 and month November. The contribution is represented by residential (A R ), commercial (A C ), industrial (A I ), agriculture related fires (A agro ), brick kilns (A BK ), vehicles (A V ). Contribution to R est due to formation of secondary particles, meteorological transport and unaccounted sources is collectively labeled as 'others'.

City
A

Comparison with Other Studies
The relative contribution results from our LUR based approach were also compared with those from some recent studies. There have been dispersion modeling (DM), source apportionment (SA), bottom-up emission inventory (EI), receptor modeling (RM) based studies in Indian cities to identify the contribution of the emitters to the urban PM 2.5 concentration. On an overall country-wide scale Venkataraman et al. [88] have stated that about 60% of India's mean population-weighted PM 2.5 concentrations come from anthropogenic source sectors, like residential biomass combustion, industrial coal combustion, while the remainder are from 'other' sources, like transportation, brick production, windblown dust and extra-regional sources. Secondary PM 2.5 is known to contribute upto 42 ± 10% in winter and 23 ± 6% in summer [89]. Along with fine dust, contribution of secondary particles can rise upto 52% in winter and 65% in summer [89]. Recent research by Guo et al. [90] over few Indian cities also found that residential sources are dominant contributor to primary particulate matter; contributing 67% and 44% in Lucknow and New Delhi respectively. In comparison, our estimate of residential contribution to concentration due to primary emissions is 78.9 ± 8.3% and 54.5 ± 10.3%. Resulting concentration from primary emissions was also compared with PM 2.5 emissions from REASv2 inventory for New Delhi. Among urban emitters, REASv2 distinguishes PM 2.5 emissions between domestic, industrial and vehicles. Therefore relative contribution of these REASv2 based emissions was compared with our model's residential, total industrial and brick kiln emissions and vehicle emissions. When trends were compared for any particular month across 8 years, the correlation was quite high (>0.9). An example for the month of January is shown in Figure 10. Summary of contribution from other studies is shown in Table 6. Several studies have reported construction dust separately from residential emissions. Since our model does not identify construction dust separately we combined this with residential when making comparisons. For New Delhi residential contributions others have estimated 34% [91], 46% [91], 27% [91] and 31% [32] which is similar to our estimate of 37.9 ± 10.3%. Our estimates of contribution of EA O is 30.5 ± 6.9 similar to 30% by ARAI and TERI [91]. Guttikunda and Calori [32] also found contribution of brick kilns to be 15% similar to 13.1 ± 4.5% by us. For agricultural fires the estimate of 4% [91] is near to 9.4 ± 2.2% by us. Our estimates for Chennai are close with both [92] and [86] when accounting for the uncertainties. Contribution of residential and others by Central Pollution Control Board [92] was 24% and 39% and by Guttikunda et al. [86] was 42% and 15% compared to 38.7 ± 17.0% and 38.6 ± 24.0% by our estimate in Chennai. For Kanpur, Guttikunda et al. [86] estimated contribution of residential areas and others as 42% and 22% similar to 43.3 ± 11.0% and 39.3 ± 11.1 by us. For Raipur, Guttikunda et al. [86] estimated contribution of residential areas and brick kilns as 18% and 3% compared to 22.5 ± 10.6 and 8.7 ± 5.5%. The definition of the emitting sources has varied from study to study when the researchers have created their own emission inventory or method and this study is no exception. This presents challenges in comparing our results to other studies using well-established methods. It appears our estimates of contribution are mostly consistent for residential areas and there is slight agreement for brick kilns and agricultural fire contributions.

Ref. City
Year

Limitations
The model relies on SEA for estimating seasonal variation in R em . As only that set of SEA were chosen that maximize correlation between R obs and R est , endogeneity may be induced in the model. However currently there is no approach to estimate SEA using open gridded datasets. Monthly changes in the nighttime light datasets could inform variation in intensity of human-level emission activities [95,96]. The relative contributions estimated (Section 3.3) by our approach correspond to contribution to concentration due to primary emission of PM 2.5 . Although this is a limitation during the months of high PM 2.5 concentration when the total concentration is being regulated by secondary PM 2.5 formation, framing control policies for primary emission is easier than secondary formation and control of primary emission may also offer co-benefits in reducing precursor gases [94].
The accuracy of the relative contributions calculated by our approach is limited by the statistical approach as well as datasets that were used. In the LUR model as precursor gas emissions were not explicitly considered, secondary PM 2.5 formation is not accounted for the model. So at high R values, our model's prediction accuracy diminishes. This is especially true for the month of December in northern Indian cities when R est predictions are lower than the R obs as the R obs is being contributed by secondary formation. Also a linear dependence is assumed between R and independent variables [5,21]. Yet complex chemical and physical interactions between the variables may take place that cannot be resolved by LUR, as others have pointed out. A common situation is that high summer atmospheric temperatures enhances formation secondary PM 2.5 while leading to an elevated boundary layer height that increases dispersion of PM 2.5 . Further research is needed over whether a combination of land use regression model and chemical transport model would work as well over the highly polluted Indian cities as the performance over West Europe [97] and United States [98]. Regarding the data that was used for training the LUR model, uncertainties are present in R as well as land-use area estimations. R has an uncertainty of about ±16, which is lower than the range of monthly mean values R but not reliable when investigating smaller R variations at daily scale or months with similar meteorological conditions. The impact of missing observations during the cloudy months lead to a high RMSE, thus affecting the prediction accuracy. Impact of the missing observations was further explored in Table S3 of Supplementary Materials and suggests that better interpolation techniques may be useful. At the same time, accuracy of land-use classifications obtained for residential, commercial and industrial areas is not consistent across the cities, varying between 64% and 95% and being low especially for identification for industrial areas. This could have led to the identification of relative contribution of industrial emission being lower compared to other researches. Due to availability of spatial training data for only 2001 and 2011, the models implicitly assumes linear urban growth.
However non-uniform annual urban land-use expansion could take place in the intervening years under influence of population growth and per capita GDP growth. This is briefly explored in Table S4 of Supplementary Materials and suggests there may be slight variation in the contributions. Furthermore, it was assumed that all members belonging to an emitting sector have similar emission characteristics. This may be true for residential and commercial areas to some extent but industrial emissions vary greatly among themselves. For example, certain industrial structures that are involved as warehousing do not cause significant emissions. Further research must be taken to categorically differentiate high polluting industries from the low polluting industries on the basis of chimney size [99] or other characteristics. We used approximate counts of brick kilns based on visual interpretation. Work is under progress to prepare a brick kiln location dataset that can be used in LUR and CTM models. Fire counts obtained from the MOD14 based product are also known to underestimate counts during the crop residue burning due to a combination of crop residue characteristics, burning time and fire intensity. Recently launched VIIRS based fire count has been shown to overcome this issue to a large extent and further research is needed to improve retrospective MOD14 fire counts by using VIIRS based product.

Policy Implications
Implications for policy can be derived based on results regarding relative contribution of each LU type in Tier-1 and Tier-2. In India, most discussion of air quality policies is focused around Tier-1 cities [86]. However anthropogenic fine particle concentration is rising in other cities as well. Thus, Central Pollution Control Board of India may need emission policies in Tier-1 and Tier-2 based on separate priorities as the composition of relative contributions is different. Furthermore, current policies are biased towards regulating emission factors for vehicles, when it should also consider the vehicle kilometer traveled due to urban expansion, for example, we found a negative correlation of vehicle kilometer traveled with built-up density and largest patch index. With tighter vehicle emission norms and announcement of 30% 'electric vehicle policy' by 2030 in India [100], their relative contribution may reduce in the future. Emissions from residential areas must be investigated and regulated strictly as they have the maximum contribution to R in the urban areas. Recent attention has revealed the strongly adverse impact of open waste burning, construction, road-dust, traditional cooking methods and so forth, from residential areas. As these emissions are related to socioeconomic development in the form of urban expansion, policy is needed to expedite technological evolution such that PM 2.5 emissions may be decoupled from urban expansion. Since the model presented in this paper identifies emissions from per unit area of land-use class, it provides an approach for desired technological improvement to ensure required particulate emission. Brick kilns also contribute heavily to R, which implies implementation failure of current policy in ensuring their upgradation. Constructions technologies that reduces reliance on polluting brick kilns must also be explored. This is especially important for cities that have a large area and are expected to see more urbanization.

Conclusions
The goal of this research was to estimate the impact on long-term (2001 to 2015) PM 2.5 concentrations due to urban growth in 15 Indian cities with land-use regression model using remote sensing and meteorological datasets. The specific objective was to calculate relative contribution of the land-use and vehicular sources to urban fine aerosol concentration and compare the results with non-remote sensing based estimations. Ambient PM 2.5 concentration from 2001 to 2015 was indicated by a MODIS based product called R, created in a previous research. The experimental setup was modeled as a land use regression model with inverse distance weighting in a hierarchical Bayesian framework with Tier of a city as random effect. Land-use emission sources considered were residential, commercial and industrial units, agricultural crop fires and brick kilns. Land use regression could successfully predict rising R due to urban growth showing a correlation higher than 0.5 in 7 out of the 15 cities considered. We provide evidence that above the central portion of a city, concentration due to primary PM 2.5 emission is contributed mostly by residential areas (35.0 ± 11.9%), brick kilns (11.7 ± 5.2%) and industries (4.2 ± 2.8%). However their contribution differed between Tier-1 and Tier-2 cities which implies current policies must consider a city's socioeconomic growth status while designing policy measures. Scattered urban form of Tier-1 cities may cause high vehicle kilometers traveled leading to higher vehicular contribution in Tier-1. Parameters of annual seasonal emission activity and underestimation in MODIS fire count affected correlation with long-term trend. Trend of monthly comparison of R value estimated by the model showed high correlation with REASv2 PM 2.5 emissions (∼0.9). Comparisons with recent researches confirmed similarity of contribution estimated from residential areas and to some extent from brick kilns and agricultural fires. Upto 42.8 ± 14.1% contribution is attributed to formation of secondary aerosol, long-range transport and unaccounted sources in surrounding regions especially for Tier-2 cities. This is similar to findings from some recent studies. Due to their high contribution, it is suggested that policymakers must consider regulations for emissions from residential areas and brick kilns.
Supplementary Materials: The following are available at http://www.mdpi.com/2073-4433/10/9/517/s1, Figure S1: MODIS retrievals are missing during months of June, July and August due to which monthly mean of MODIS AOD is much lower than AERONET AOD over the Kanpur city, India (a). In (b) the difference mean monthly AOD from AERONET and MODIS show difference is maximum during the Monsoon months, Table S1: Assumed distribution of seasonal emission activity parameter (SEA mon ). For any source SEA mon represents ratio of emissions in any month mon compared to its maximum monthly emission, Table S2: Pearson correlation between coefficient obtained R est and R obs when the SEA was set according to Table 2, Table S3: Pearson correlation between coefficient obtained R est and R obs to compare predictions from the model trained on interpolated R values for rainy months, and the model trained on non-interpolated and rainy months discarded R values, Table S4: Acknowledgments: This research was partially supported by supported by the Monbukagakusho (MEXT) scholarship from the Government of Japan. NCEP Reanalysis and 20th Century Reanalysis V2c data provided by the NOAA/OAR/ESRL PSD, Boulder, Colorado, USA, from their Web site at https://www.esrl.noaa.gov/psd/. We are also indebted to the two anonymous reviewers for their constructive comments.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations and symbols are used in this manuscript: