A Satellite-Based High-Resolution (1-km) Ambient PM 2.5 Database for India over Two Decades (2000–2019): Applications for Air Quality Management

: Fine particulate matter (PM 2.5 ) is a major criteria pollutant a ﬀ ecting the environment, health and climate. In India where ground-based measurements of PM 2.5 is scarce, it is important to have a long-term database at a high spatial resolution for an e ﬃ cient air quality management plan. Here we develop and present a high-resolution (1-km) ambient PM 2.5 database spanning two decades (2000–2019) for India. We convert aerosol optical depth from Moderate Resolution Imaging Spectroradiometer (MODIS) retrieved by Multiangle Implementation of Atmospheric Correction (MAIAC) algorithm to surface PM 2.5 using a dynamic scaling factor from Modern-Era Retrospective analysis for Research and Applications Version 2 (MERRA-2) data. The satellite-derived daily (24-h average) and annual PM 2.5 show a R 2 of 0.8 and 0.97 and root mean square error of 25.7 and 7.2 µ g / m 3 , respectively against surface measurements from the Central Pollution Control Board India network. Population-weighted 20-year averaged PM 2.5 over India is 57.3 µ g / m 3 (5–95 percentile ranges: 16.8–86.9) with a larger increase observed in the present decade (2010–2019) than in the previous decade (2000 to 2009). Poor air quality across the urban–rural transact suggests that this is a regional scale problem, a fact that is often neglected. The database is freely disseminated through a web portal ‘satellite-based application for air quality monitoring and management at a national scale’ (SAANS) for air quality management, epidemiological research and mass awareness.


Introduction
Exposure to ambient fine particulate matter (PM 2.5 ) is one of the leading causes of health burden in India [1,2]. The rising ambient PM 2.5 concentration [3,4] and its staggering health burden [5,6] led the Government of India to launch the National Clean Air Program (NCAP) in early 2018. Though the NCAP addressed air pollution as a national scale problem, its focus on the urban centres essentially fails to recognize the air quality status in the rural areas. This is reflected in the ground-based monitoring network maintained by the Central Pollution Control Board (CPCB) with all of the 230+ continuous and We assume that the spatial heterogeneity in PM2.5 within a coarse MERRA-2 grid (used to derive η) can be captured by the MAIAC AOD data at 1-km resolution and will not be affected much by the interpolation of η. We compare ( Figure 2  First, we process level 2 AOD data retrieved by the Moderate Resolution Imaging Spectroradiometer (MODIS) using the Multiangle Implementation of Atmospheric Correction (MAIAC) algorithm at 1-km × 1-km resolution for each day (i) from 26 February 2000, to 31 December 2019. MAIAC provides global AOD retrievals over dark and bright surfaces using an explicit surface reflectance model and it features an improved cloud detection scheme, a general lack of bias in the urban areas and a better spatial coverage relative to the deep-blue or dark-target approach [21]. In this study, we have used the combined Terra and Aqua AOD product (MCD19A2) provided by the MODIS Remote Sens. 2020, 12, 3872 4 of 22 science team. The combined product enhances the spatial and temporal coverage and provides a more representative AOD during 10:00 AM to 2:00 PM local time [21]. MAIAC AOD validation over South Asia revealed that it has a better accuracy than the deep blue and dark-target AOD products [22]. We also examine the product over India ( Figure A2) and find that MODIS-MAIAC AOD at 550 nm shows a statistically significant (p < 0.05) correlation and root mean square error (RMSE) of 0.13 with AOD from AERONET [23] sites in India. MAIAC AOD is provided at 550 nm wavelength, therefore for a proper comparison, we estimate AERONET AOD at 550 nm wavelength from the spectral AOD measurements and Angstrom Exponent (α) at 440-870 nm wavelength following: AOD 550,AERO = AOD 500,AERO × 500 550 When the MAIAC-AOD tiles are merged, it shows a high variance along its swath edge. To minimize the edge effect across the swaths, we use the Savitzky-Golay filter [24] with a frame length of five pixels across the X-and Y-directions of the target pixel.
In the second step, we analyze aerosol products of MERRA-2 available at 0.5 • × 0.625 • [25]. MERRA-2 PM 2.5 is estimated as: PM 2.5,model = Dust 2.5,model + SS 2.5,model + BC model + OC model × 1.6 + SO 4 2− model × 1.375 (2) where Dust 2.5,model and SS 2.5,model are dust and sea-salt masses in size bins smaller than 2.5 µm, BC model is black carbon, OC model is organic carbon and SO 4 2− model is the sulfate. OC is multiplied by a factor of 1.6 to estimate total organic matter [26]. Sulfate in the MERRA-2 dataset is present in neutralized (NH 4 ) 2 SO 4 form, so a factor of 1.375 is used [25]. MERRA-2 PM 2.5 is underestimated over the Indian region [27]. We therefore calibrate MERRA-2 hourly PM 2.5 with the coincident hourly PM 2.5 from 120 sites in the CPCB network that provide multi-year data from 2009 to 2019 ( Figure A1). These CPCB stations use an automatic air quality monitoring system, where quality-control procedures are performed routinely to remove any unreliable, low-quality and invalid observations arising from instrument malfunction and electric power outage [28]. We note that the length of observations differs from site to site. To ensure enough samples, we use all quality-controlled data. We could not use the data from the manual monitoring sites because the robustness of the quality and the days when they are sampled are not consistent.
We train our calibration model for the 55 MERRA-2 grids having at least one CPCB site and develop a percentile-based calibration factor [3]. For every 10 percentile ranges, we estimate the ratio of surface PM 2.5 and MERRA-2-derived PM 2.5 for each site. Using the site-specific calibration factors representative for each month, we tune the MERRA-2 hourly PM 2.5 data in these grids close to the observed values and use the nearest neighbor algorithm to adjust the bias for the grids devoid of any CPCB site. Using this calibrated PM 2.5 and hourly AOD (at 550 nm) from MERRA-2, we estimate η for every day (i) and every grid (x and y) as: We find that MERRA-2 and AERONET AOD show a statistically significant (p < 0.05 for N = 4546) R 2 of 0.71 ( Figure A3) with a RMSE similar to that of MAIAC AOD. Therefore, we interpret that calibrating the MERRA-2 PM 2.5 is sufficient to improve the calibration of η and apply on satellite AOD.
In the third step, we interpolate η to finer resolution using spline interpolation to match the resolution of the AOD product (1-km × 1-km). The spatial patterns of interpolated η for every month are shown in Figure A4. Wherever η values are high (>160), most of the particles within the column stay close to the surface, resulting in high PM 2.5 due to a stable boundary layer (e.g., winter months). Whenever the atmospheric condition is conducive for dispersion (e.g., in summer months), particles are raised above the boundary layer and hence although PM 2.5 remains moderate (between 100 to 160), Remote Sens. 2020, 12, 3872 5 of 22 AOD remains high (i.e., moderate η values). During the monsoon, both AOD and PM 2.5 remain low and the high convective strength does not contain particles closer to the surface. As a result, η is found to be low (<100). We convert MAIAC AOD for the day i during the satellite overpass time (h) to PM 2.5 at the same resolution using the η values from Equation (2) as: We call this PM 2.5 as the instantaneous PM 2.5 because the satellites carrying the MODIS sensor cross the Indian region between 10:00 AM and 2:00 PM local time, and therefore, the satellite-derived PM 2.5 does not represent the 24-h cycle.
We assume that the spatial heterogeneity in PM 2.5 within a coarse MERRA-2 grid (used to derive η) can be captured by the MAIAC AOD data at 1-km resolution and will not be affected much by the interpolation of η. We compare ( Figure 2) the interpolated η from MERRA-2 with that derived directly for the grids having at least one CPCB site by taking the ratio of PM 2.5 (measured from the ground) and AOD (from MAIAC). Though most of the data points lie within 1:2 and 2:1 lines, the MERRA-2 η shows slightly low bias with respect to the in-situ data with a correlation coefficient that is statistically significant (p < 0.05) and a RMSE of 66.8 (that corresponds to an error of 20 µg m −3 in retrieved PM 2.5 for an AOD of 0.3). To minimize this bias in satellite-derived instantaneous PM 2.5 due to the interpolation of η to a finer resolution, in the fourth step, we perform the second calibration. We estimate the calibration factors for each 10 percentile ranges as the ratio of PM 2.5 measured at the surface during the satellite overpass time and satellite-derived instantaneous PM 2.5 . Using the site-specific calibration factors representative for each month, we tune the satellite-derived instantaneous PM 2.5 data in these grids closer to the observed values and use the nearest neighbor algorithm to adjust the bias for the grids devoid of any CPCB site.
Remote Sens. 2020, 12, x FOR PEER REVIEW 6 of 28 directly for the grids having at least one CPCB site by taking the ratio of PM2.5 (measured from the ground) and AOD (from MAIAC). Though most of the data points lie within 1:2 and 2:1 lines, the MERRA-2  shows slightly low bias with respect to the in-situ data with a correlation coefficient that is statistically significant (p < 0.05) and a RMSE of 66.8 (that corresponds to an error of 20 g m -3 in retrieved PM2.5 for an AOD of 0.3). To minimize this bias in satellite-derived instantaneous PM2.5 due to the interpolation of  to a finer resolution, in the fourth step, we perform the second calibration. We estimate the calibration factors for each 10 percentile ranges as the ratio of PM2.5 measured at the surface during the satellite overpass time and satellite-derived instantaneous PM2.5. Using the site-specific calibration factors representative for each month, we tune the satellite-derived instantaneous PM2.5 data in these grids closer to the observed values and use the nearest neighbor algorithm to adjust the bias for the grids devoid of any CPCB site.
In the fifth step, we estimate the diurnal scaling factor (DSF) of each grid (x, y) for the conversion of calibrated satellite-derived PM2.5 during the satellite overpass time (h) to the 24-hr average of each day (i) using MERRA-2 data as: The spatial patterns of the mean monthly diurnal scaling factors are shown in Figure A5. We find that the PM2.5 concentrations during the satellite overpass time are lower than the 24-hr average (hence the ratio is >1) almost everywhere in every month, except over the Western Ghats during Jul and Aug and parts of Central India in May. We also compare the diurnal scaling factor derived from MERRA-2 with that from CPCB data ( Figure 2). We find that the MERRA-2 diurnal scaling factor shows a statistically significant (p < 0.05) correlation with CPCB diurnal scaling factor, but with a low bias and a RMSE of 0.3. It implies that the retrieved 24-hr PM2.5 concentration is likely to be underestimated (as compared to the reference monitoring) if the diurnal scaling factors are underestimated.
In the sixth and final step, we convert the satellite-derived instantaneous PM2.5 to 24-hr average PM2.5 (our daily product) using DSFi,x,y,h,model as: We find that our daily (i.e., 24-hr average) PM2.5 product has spatial gaps due to the cloud cover In the fifth step, we estimate the diurnal scaling factor (DSF) of each grid (x, y) for the conversion of calibrated satellite-derived PM 2.5 during the satellite overpass time (h) to the 24-h average of each day (i) using MERRA-2 data as: The spatial patterns of the mean monthly diurnal scaling factors are shown in Figure A5. We find that the PM 2.5 concentrations during the satellite overpass time are lower than the 24-h average (hence Remote Sens. 2020, 12, 3872 6 of 22 the ratio is >1) almost everywhere in every month, except over the Western Ghats during July and August and parts of Central India in May. We also compare the diurnal scaling factor derived from MERRA-2 with that from CPCB data ( Figure 2). We find that the MERRA-2 diurnal scaling factor shows a statistically significant (p < 0.05) correlation with CPCB diurnal scaling factor, but with a low bias and a RMSE of 0.3. It implies that the retrieved 24-h PM 2.5 concentration is likely to be underestimated (as compared to the reference monitoring) if the diurnal scaling factors are underestimated.
In the sixth and final step, we convert the satellite-derived instantaneous PM 2.5 to 24-h average PM 2.5 (our daily product) using DSF i,x,y,h,model as: We find that our daily (i.e., 24-h average) PM 2.5 product has spatial gaps due to the cloud cover and satellite-retrieval issues. This gap is filled when we average the daily PM 2.5 to generate the monthly and subsequently the annual PM 2.5 product. All the products are developed for the entire duration (26 February 2000-31 December 2019).

Comparison of Satellite-Derived and Ground-Based Daily and Annual PM 2.5
For cross-validation, we train our two-stage calibration model with 70% of the surface measurements randomly chosen from 120 CPCB sites. The remaining 30% of the data are used for validation. We find that the calibration improves the R 2 of satellite-derived instantaneous PM 2.5 and surface measurements during the overpass from 0.51 to 0.67. Since our main products are the daily (24-h average) and annual satellite-derived PM 2.5 , we further compare these two against surface measurements from the CPCB network ( Figure 3). Note that no further calibration is carried out after we estimate the daily PM 2.5 in the sixth and final step from the calibrated instantaneous PM 2.5 . The slope (0.98) of the regression line and the intercept (2.6 µg/m 3 ) of the daily satellite-derived and surface measured PM 2.5 are close to the ideal values. The regression statistics are statistically significant at 95% CI following student's t-test (p < 0.05). Less than 0.5% of data points (out of the total number of samples = 34324) lie outside the 1:2 and 2:1 line. The cases (<0.3%) where surface measurements are more than double the satellite-based PM 2.5 are confined to the Delhi NCR when the satellite fails to capture the extreme pollution events [13]. In the other cases (<0.2%) where the surface measurements are much lower than the satellite-derived PM 2.5 , the satellite data are overfitted. Most of the epidemiological studies [4] are carried out with annual PM 2.5 exposure and the NCAP also aims to reduce the annual PM 2.5 concentration. Our annual PM 2.5 product shows a RMSE of 7.2 µg/m 3 and R 2 of 0.97 with the slope and the intercept similar to those of the daily product.
To understand the behavior of the error in the retrieved PM 2.5 dataset across a wide range of PM 2.5 , we plot the retrieval bias (which is PM 2.5 from the CPCB sites-PM 2.5,sat ) as a function of PM 2.5 from the CPCB sites ( Figure 4). The median bias remains lower than 10 µg/m 3 (<5%) up to a PM 2.5 level of 200 µg/m 3 , beyond which the underestimation in PM 2.5,sat starts to increase. Ground-based measurements reveal that 24-h PM 2.5 concentration in India usually remains below 200 µg/m 3 in most of the days [28]. PM 2.5 concentration exceeds this range during the peak pollution season for a few days, that too, mostly in the Delhi NCR, which the satellite-derived data underestimate. Retrieval of these extreme cases is challenging in urban areas [13]. Further, we segregate the entire dataset of our daily product into various seasons ( Figure A6) and various geographic regions ( Figure A7) to understand if there is any systematic seasonal or regional bias in the dataset. Seasonally, we get the highest R 2 during the winter (DJF) season followed by the post-monsoon (ON) season with comparable RMSE, when the PM 2.5 level remains high (as shown in Section 3.2). In the other seasons, the RMSEs are lower and though R 2 values are slightly lower they are statistically significant (p < 0.05). We note that there are no ground-based monitoring sites in North and Northeast India. However, comparable regression statistics across the various geographical regions covering a diverse land use attest to the robustness and applicability of the dataset for air quality management. As the ground-based network Remote Sens. 2020, 12, 3872 7 of 22 is being expanded including the rural areas under the NCAP, we expect further improvement in the product in future. slope (0.98) of the regression line and the intercept (2.6 g/m ) of the daily satellite-derived and surface measured PM2.5 are close to the ideal values. The regression statistics are statistically significant at 95% CI following student's t-test (p < 0.05). Less than 0.5% of data points (out of the total number of samples = 34324) lie outside the 1:2 and 2:1 line. The cases (<0.3%) where surface measurements are more than double the satellite-based PM2.5 are confined to the Delhi NCR when the satellite fails to capture the extreme pollution events [13]. In the other cases (<0.2%) where the surface measurements are much lower than the satellite-derived PM2.5, the satellite data are overfitted. Most of the epidemiological studies [4] are carried out with annual PM2.5 exposure and the NCAP also aims to reduce the annual PM2.5 concentration. Our annual PM2.5 product shows a RMSE of 7.2 g/m 3 and R 2 of 0.97 with the slope and the intercept similar to those of the daily product. To understand the behavior of the error in the retrieved PM2.5 dataset across a wide range of PM2.5, we plot the retrieval bias (which is PM2.5 from the CPCB sites-PM2.5,sat) as a function of PM2.5 from the CPCB sites ( Figure 4). The median bias remains lower than 10 g/m 3 (<5%) up to a PM2.5 level of 200 g/m 3 , beyond which the underestimation in PM2.5,sat starts to increase. Ground-based measurements reveal that 24-hr PM2.5 concentration in India usually remains below 200 g/m 3 in most of the days [28]. PM2.5 concentration exceeds this range during the peak pollution season for a few days, that too, mostly in the Delhi NCR, which the satellite-derived data underestimate. Retrieval of these extreme cases is challenging in urban areas [13]. Further, we segregate the entire dataset of our daily product into various seasons ( Figure A6) and various geographic regions ( Figure A7) to understand if there is any systematic seasonal or regional bias in the dataset. Seasonally, we get the We note that there are no ground-based monitoring sites in North and Northeast India. However, comparable regression statistics across the various geographical regions covering a diverse land use attest to the robustness and applicability of the dataset for air quality management. As the groundbased network is being expanded including the rural areas under the NCAP, we expect further improvement in the product in future.

Analysis of PM2.5 Trends and Meteorological Parameters
We examine the PM2.5 trends in two different ways. First, we estimate the linear trends in annual PM2.5 within the last decade (2000 to 2009) and the present decade (2010 to 2019). Secondly, we estimate the mean seasonal PM2.5 over the entire duration for the winter (Dec-Feb), pre-monsoon (Mar-May), monsoon (Jun-Sep) and post-monsoon (Oct-Nov) seasons. The seasonal variations are examined in terms of the anomaly (i.e., mean seasonal PM2.5-mean annual PM2.5) related to the annual concentration. The mean seasonal values are then used to estimate the linear trends in each season. The grids that are statistically significant (p<0.05) following the student's t-test are marked.
We also analyze the planetary boundary layer (PBL) height and wind speed from the MERRA-2 reanalysis data. PBL height is inversely proportional to PM2.5 as the particles get trapped if PBL

Analysis of PM 2.5 Trends and Meteorological Parameters
We examine the PM 2.5 trends in two different ways. First, we estimate the linear trends in annual PM 2.5 within the last decade (2000 to 2009) and the present decade (2010 to 2019). Secondly, we estimate the mean seasonal PM 2.5 over the entire duration for the winter (December-February), pre-monsoon (March-May), monsoon (June-September) and post-monsoon (October-November) seasons. The seasonal variations are examined in terms of the anomaly (i.e., mean seasonal PM 2.5 -mean Remote Sens. 2020, 12, 3872 8 of 22 annual PM 2.5 ) related to the annual concentration. The mean seasonal values are then used to estimate the linear trends in each season. The grids that are statistically significant (p < 0.05) following the student's t-test are marked.
We also analyze the planetary boundary layer (PBL) height and wind speed from the MERRA-2 reanalysis data. PBL height is inversely proportional to PM 2.5 as the particles get trapped if PBL height is low. Similarly, wind speed is also inversely proportional to PM 2.5 as stronger wind will allow particles to disperse easily. The combined effect of these two important meteorological factors on the PM 2.5 concentration is represented by the ventilation coefficient (VC), which is simply the product of PBL height and wind speed [29]. High VC implies a favorable condition for the dispersion, while low VC implies the condition as unfavorable. Therefore, the spatial and seasonal variations in PM 2.5 can be partially attributed to the changes in VC. We estimate the seasonal anomaly in VC with respect to the annual mean to understand the observed seasonal changes in PM 2.5 and the seasonal trends in VC to understand the seasonal trends.

Exposure Attribution
Population-weighted PM 2.5 exposure is estimated using the population data from the Indian Census. To separate the urban and rural PM 2.5 , we analyze the Global Human Settlement Layer (GHSL) data [30]. This dataset was developed as part of a European Union project using 40-years of Landsat imagery that tracked the land use and land cover changes globally. Various other geospatial data (e.g., global cover of the artificial surface, open street maps, global urban extents and population distribution) are integrated with the land use data to identify each 1-km × 1-km grid as one of the four classes-high-density urban (at least 1500 per km 2 population density), low-density urban (300-1500 per km 2 population density), rural (<300 per km 2 population density) and no-settlement (no permanent human occupancy). The GHSL provides the information in four distinct years-1975, 1990, 2000 and 2015. Here, we combine the high-density and low-density urban grids into a single urban class. Since the satellite-derived PM 2.5 is available from 26 February 2000, we consider the urban and rural PM 2.5 in 2001 as a baseline and estimate the changes in 2015. We assume that the human settlement pattern did not change much from 2000 to 2001 to affect our results.
The state/union territory (UT)-averaged urban and rural settlement fractions in India for the year 2000 and 2015 are shown in Figure A8. Overall, both the urban and rural settlement area has increased in India in a varying proportion to accommodate the growing population. We match each 1-km × 1-km grid of satellite-based PM 2.5 with the GHSL data and estimate the urban and rural PM 2.5 population-weighted exposure in each state and UT for the years 2001 and 2015. We compare and report the statistics of rural and urban exposure along with the changes between 2001 and 2015 (Table A1). For the geographical locations of the states, UTs and the regions, see Figure A1.

Results
This section is divided into four subsections. First, we present the spatial pattern of PM 2.5 concentration over India in Section 3.1, followed by the seasonal changes in Section 3.2, the trend analysis in Section 3.3 and the urban-rural divide in PM 2.5 exposure in Section 3.4.

Spatial Pattern in PM 2.5 Concentration over India
The spatial distribution of PM 2.5 at the annual scale shown in Figure 5a mimics the spatial pattern observed by satellite-derived AOD (Figure 5b). Four points are notable in this figure. First, ambient PM 2.5 exceeds the annual NAAQS of 40 µg/m 3 in every state except the states of Jammu and Kashmir (including the new Ladakh UT), Himachal Pradesh, Sikkim, Arunachal Pradesh, Manipur and Nagaland (see Figure A2 for the geographical locations), where the population is sparse and a large part is covered by mountains. As of 2019, we find that 99.5% of the Indian districts (second administrative levels) do not meet the World Health Organization (WHO)-air quality guideline (AQG) of 10 µg/m 3 . Second, the PM 2.5 level in the Indo-Gangetic Plain (IGP) and the western arid region is more than Remote Sens. 2020, 12, 3872 9 of 22 double the annual NAAQS. The IGP is a low-lying fertile alluvial plain bounded by the Himalayas in the north and central Indian highlands in the south. Due to its fertility, it is densely populated with a population more than 700 million. Continuous emissions of primary PM 2.5 and secondary precursor gases (that contribute to PM 2.5 eventually) from a range of anthropogenic activities (e.g., household solid fuel use, power plants, industries, open biomass and solid-waste burning, vehicles, brick kilns, diesel generator sets, construction activities, etc.) coupled with unfavorable topography and meteorology lead to a massive PM 2.5 buildup. This PM 2.5 does not disperse away towards the north or south (bounded by the mountains); rather it oscillates east-west by the seasonal winds [31]. The only pathway of the pollution dispersion is through the Gangetic West Bengal towards the Bay of Bengal. The IGP, therefore, has become a giant valley trapped with high annual PM 2.5 that persists throughout the year. Third, the PM 2.5 shows a north (high)-south (low) gradient, which, to some extent, mimics the population distribution (and therefore the anthropogenic source distribution). The only exception is the western arid region, which is sparsely populated but highly polluted because of the large contribution of desert dust raised by wind [3]. Fourth, PM 2.5 is not proportionally (as compared to the IGP states) high over the states of Odisha, Telangana and Andhra Pradesh where AOD is high. In these regions, the condition for dispersion is favourable for February-October (as shown by low η values in Figure A4). We note that a large part of the IGP and many other states where the ambient PM 2.5 is high are rural. We discuss the urban-rural contrast in PM 2.5 exposure separately.
Remote Sens. 2020, 12, x FOR PEER REVIEW 10 of 28 gradient, which, to some extent, mimics the population distribution (and therefore the anthropogenic source distribution). The only exception is the western arid region, which is sparsely populated but highly polluted because of the large contribution of desert dust raised by wind [3]. Fourth, PM2.5 is not proportionally (as compared to the IGP states) high over the states of Odisha, Telangana and Andhra Pradesh where AOD is high. In these regions, the condition for dispersion is favourable for Feb-Oct (as shown by low  values in Figure A4). We note that a large part of the IGP and many other states where the ambient PM2.5 is high are rural. We discuss the urban-rural contrast in PM2.5 exposure separately.  Figure 6 shows the seasonal anomaly in PM2.5 relative to the annual average PM2.5 distribution. PM2.5 is the highest in the winter season across the country except for the high-altitude regions in the north and the western arid region. This wintertime enhancement (a positive anomaly ranging from 5 to 40 g/m 3 relative to the annual average) in PM2.5 has been attributed to the additional emission from households (especially in the colder places due to space and water heating) and a stable atmosphere under calm conditions [12]. In the western arid region, dust activity remains at a minimum during the winter and in the high-altitude states of Jammu and Kashmir, Himachal Pradesh and Uttarakhand, major commercial activities remain closed due to extreme cold. Therefore, PM2.5 shows a negative anomaly relative to the annual average. In the pre-monsoon season, the PBL expands with a rise in the surface temperature and wind speed increases, allowing PM2.5 to be dispersed. As a result, PM2.5 concentration decreases over the highly polluted IGP. However, this impact is partially compensated by the additional dust load in west India and emissions from seasonal open biomass burning over a large part of the northeast and peninsular India [3].

Seasonal Anomaly in PM2.5 Concentration
PM2.5 decreases in the monsoon season (as shown by a negative anomaly ranging from -5 to -35 g/m 3 relative to the annual average) substantially as the particles are washed out by monsoon rain. The largest reduction is observed over the eastern IGP and along the west coast of India. In this season, PM2.5 level remains lower than 40 g/m 3 over entire India except for the arid region in the west (where the monsoon rain is scanty) and the western IGP including Delhi NCR (where the emission strength is so high that the aerosol recovery overcompensates the loss due to washout [32]). However, we note that PM2. 5 Figure 6 shows the seasonal anomaly in PM 2.5 relative to the annual average PM 2.5 distribution. PM 2.5 is the highest in the winter season across the country except for the high-altitude regions in the north and the western arid region. This wintertime enhancement (a positive anomaly ranging from 5 to 40 µg/m 3 relative to the annual average) in PM 2.5 has been attributed to the additional emission from households (especially in the colder places due to space and water heating) and a stable atmosphere under calm conditions [12]. In the western arid region, dust activity remains at a minimum during the winter and in the high-altitude states of Jammu and Kashmir, Himachal Pradesh and Uttarakhand, major commercial activities remain closed due to extreme cold. Therefore, PM 2.5 shows a negative anomaly relative to the annual average. In the pre-monsoon season, the PBL expands with a rise in the surface temperature and wind speed increases, allowing PM 2.5 to be dispersed. As a result, PM 2.5 concentration decreases over the highly polluted IGP. However, this impact is partially compensated by the additional dust load in west India and emissions from seasonal open biomass burning over a large part of the northeast and peninsular India [3].

Seasonal Anomaly in PM 2.5 Concentration
Remote Sens. 2020, 12, x FOR PEER REVIEW 11 of 28 dropping with the monsoon retreat, especially in the north (including the IGP), northeast, west and central India in the post-monsoon season. In addition, the open biomass burning is prevalent across the country, more so in the western and central IGP, in this season, which adds to the regional PM2.5 buildup due to a lower average PBL height [13].

Trends in PM2.5 Concentration
We next present the rate of changes in annual PM2.5 concentration (i.e., the annualized rate of changes) in the last and the current decade (top panel in Figure 7). The state-level statistics are shown in Table A1 (in the Appendix). During the last decade, PM2.5 over India shows a significant (p<0.05) increase (by >1g/m 3 per year) over the states of Jharkhand, Chhattisgarh, Odisha, Telangana, Andhra Pradesh, Tamil Nadu, Kerala, and parts of Karnataka, Maharashtra and north-east India, while it decreases (not significantly though) over the high-altitude states of Jammu and Kashmir and Himachal Pradesh, and the Indian desert. In the current decade, the increase is found to be significant (p<0.05) over the states of West Bengal, Odisha, Telangana, Maharashtra, and parts of Gujarat, Karnataka, Bihar, Uttar Pradesh, Madhya Pradesh and Uttarakhand. The decline continues over the eastern part of Jammu and Kashmir (now Ladakh UT) and parts of Rajasthan.
Emission data of primary PM2.5, BC and OC and secondary gaseous precursors (e.g., SO2, NO2 and volatile organics) from the Evaluating the Climate and Air Quality Impact of Short-lived Pollutants (ECLIPSE) emission inventory [33] suggest that the emissions from anthropogenic sources increased steadily over the last two decades everywhere in India with a larger increase in the eastern and central part of India dominated by mining activities and related industries and thermal power plants [34]. However, since we do not have continuous data, we can only qualitatively attribute the observed decadal trends in PM2.5 to the rising emissions. The decadal changes in the VC (bottom panel in Figure 7) suggest that the meteorological condition became increasingly unfavorable in the last decade. This, coupled with the rising emissions, led to the observed increase in PM2.5 in eastern and peninsular India. In the current decade, the VC does not show a significant trend. In fact, it has slightly increased (although not significantly) over western and central India, where we find a decrease in PM2.5. However, in southeastern Maharashtra, the PM2.5 increased despite an increase in VC. We only speculate that, perhaps, the meteorological impact is overcompensated by the impact of rising emissions and regional transport in this region. For more quantitative interpretation, simulations from a chemical transport model are required and are beyond the scope of this work. PM 2.5 decreases in the monsoon season (as shown by a negative anomaly ranging from −5 to −35 µg/m 3 relative to the annual average) substantially as the particles are washed out by monsoon rain. The largest reduction is observed over the eastern IGP and along the west coast of India. In this season, PM 2.5 level remains lower than 40 µg/m 3 over entire India except for the arid region in the west (where the monsoon rain is scanty) and the western IGP including Delhi NCR (where the emission strength is so high that the aerosol recovery overcompensates the loss due to washout [32]). However, we note that PM 2.5 does not meet the 24-h WHO-AQG (25 µg/m 3 ) on most of the days in most parts of the country even in this season, suggesting the severity of the problem. The temperature starts dropping with the monsoon retreat, especially in the north (including the IGP), northeast, west and central India in the post-monsoon season. In addition, the open biomass burning is prevalent across the country, more so in the western and central IGP, in this season, which adds to the regional PM 2.5 buildup due to a lower average PBL height [13].

Trends in PM 2.5 Concentration
We next present the rate of changes in annual PM 2.5 concentration (i.e., the annualized rate of changes) in the last and the current decade (top panel in Figure 7). The state-level statistics are shown in Table A1 (in the Appendix A). During the last decade, PM 2.5 over India shows a significant (p < 0.05) increase (by >1 µg/m 3 per year) over the states of Jharkhand, Chhattisgarh, Odisha, Telangana, Andhra Pradesh, Tamil Nadu, Kerala, and parts of Karnataka, Maharashtra and north-east India, while it decreases (not significantly though) over the high-altitude states of Jammu and Kashmir and Himachal Pradesh, and the Indian desert. In the current decade, the increase is found to be significant (p < 0.05) over the states of West Bengal, Odisha, Telangana, Maharashtra, and parts of Gujarat, Karnataka, Bihar, Uttar Pradesh, Madhya Pradesh and Uttarakhand. The decline continues over the eastern part of Jammu and Kashmir (now Ladakh UT) and parts of Rajasthan.
Emission data of primary PM 2.5 , BC and OC and secondary gaseous precursors (e.g., SO 2 , NO 2 and volatile organics) from the Evaluating the Climate and Air Quality Impact of Short-lived Pollutants (ECLIPSE) emission inventory [33] suggest that the emissions from anthropogenic sources increased steadily over the last two decades everywhere in India with a larger increase in the eastern and central part of India dominated by mining activities and related industries and thermal power plants [34]. However, since we do not have continuous data, we can only qualitatively attribute the observed decadal trends in PM 2.5 to the rising emissions. The decadal changes in the VC (bottom panel in Figure 7) suggest that the meteorological condition became increasingly unfavorable in the last decade. This, coupled with the rising emissions, led to the observed increase in PM 2.5 in eastern and peninsular India. In the current decade, the VC does not show a significant trend. In fact, it has slightly increased (although not significantly) over western and central India, where we find a decrease in PM 2.5 . However, in southeastern Maharashtra, the PM 2.5 increased despite an increase in VC. We only speculate that, perhaps, the meteorological impact is overcompensated by the impact of rising emissions and regional transport in this region. For more quantitative interpretation, simulations from a chemical transport model are required and are beyond the scope of this work. The annualized rate of changes at the seasonal scale is displayed in Figure 8. We note several key features. First, ambient PM2.5 shows a significant increase (p<0.05) over almost the entire country in the post-monsoon and winter seasons, except over the arid regions in the west and high-altitude regions in the north and northeast. The largest trends (>2g/m 3 per year) are observed over the IGP, eastern and southeast India (along the east coast), large parts of peninsular India and the state of Gujarat. In the pre-monsoon season, PM2.5 increases over east, northeast and peninsular India, which are affected by open biomass burning [35]. The decreasing trend over west and northwest India is perhaps attributed to the declining dust activity [36]. In the monsoon season when PM2.5 generally remains low (Figure 6), no apparent trend is observed, except over some patches in the west and central IGP. In terms of major emission sources, open biomass burning is a seasonal source and is observed in the pre-monsoon (after the wheat cultivation) and the post-monsoon (after the rice cultivation) seasons. Studies have suggested that the post-monsoon burning has increased post-2009 in the states of Punjab and Haryana [13] while the pre-monsoon burning marginally increased all over the country [37]. Since the PM2.5 shows an increasing trend over the peninsular and east India in the three seasons, the largest trend in annual PM2.5 is observed in these regions (Figure 9a). The trend over the IGP, the most polluted region in India (and one of the top polluted regions in the world), is governed mainly by the rising PM2.5 during the post-monsoon to winter seasons. Overall, the trends are higher over eastern and peninsular India (>1.6% per year) where the number of hazy days has been increasing at a faster rate than over the IGP [38] where the annual PM2.5 is the highest but the rate of increase is <1.2% per year (Figure 9b). The annualized rate of changes at the seasonal scale is displayed in Figure 8. We note several key features. First, ambient PM 2.5 shows a significant increase (p < 0.05) over almost the entire country in the post-monsoon and winter seasons, except over the arid regions in the west and high-altitude regions in the north and northeast. The largest trends (>2 µg/m 3 per year) are observed over the IGP, eastern and southeast India (along the east coast), large parts of peninsular India and the state of Gujarat. In the pre-monsoon season, PM 2.5 increases over east, northeast and peninsular India, which are affected by open biomass burning [35]. The decreasing trend over west and northwest India is perhaps attributed to the declining dust activity [36]. In the monsoon season when PM 2.5 generally remains low (Figure 6), no apparent trend is observed, except over some patches in the west and central IGP. In terms of major emission sources, open biomass burning is a seasonal source and is observed in the pre-monsoon (after the wheat cultivation) and the post-monsoon (after the rice cultivation) seasons. Studies have suggested that the post-monsoon burning has increased post-2009 in the states of Punjab and Haryana [13] while the pre-monsoon burning marginally increased all over the country [37]. Since the PM 2.5 shows an increasing trend over the peninsular and east India in the three seasons, the largest trend in annual PM 2.5 is observed in these regions (Figure 9a). The trend over the IGP, the most polluted region in India (and one of the top polluted regions in the world), is governed mainly by the rising PM 2.5 during the post-monsoon to winter seasons. Overall, the trends are higher over eastern and peninsular India (>1.6% per year) where the number of hazy days has been increasing at a faster rate than over the IGP [38] where the annual PM 2.5 is the highest but the rate of increase is <1.2% per year (Figure 9b).

Urban vs Rural PM2.5
Unlike the developed countries where PM2.5 is considered to be an urban problem, we observe that high PM2.5 cuts across the urban-rural transect. We therefore present comparative statistics of urban vs. rural population-weighted PM2.5 exposure in Figure 10 for the year 2001 and the stateaveraged changes in urban and rural population-weighted exposure from 2001 to 2015 in Figure 11. We observe that the urban PM2.5 exposure in Delhi increased by 10.9% from 82.2 (5 th -95th percentile ranges: 27.8-168.9)g/m 3 in 2001 to 91.3 (33.7-190.7) g/m 3 in 2015. During the same period, the rural PM2.5 exposure increased by 11.9% from 81.1 (27.8-163.4) g/m 3 to 90.7 (32.5-192.5) g/m 3 . We point out that though the urban and rural exposure is comparable, the urban area (80%) is disproportionately higher in Delhi than the rural area (10%). The remaining 10% area does not have any permanent human settlement and therefore can be considered as the background. Several key features are now presented. First, population-weighted PM2.5 exposure increased in all the states/UTs from 2001 to 2015 (Table A1)

Urban vs Rural PM2.5
Unlike the developed countries where PM2.5 is considered to be an urban problem, we observe that high PM2.5 cuts across the urban-rural transect. We therefore present comparative statistics of urban vs. rural population-weighted PM2.5 exposure in Figure 10 for the year 2001 and the stateaveraged changes in urban and rural population-weighted exposure from 2001 to 2015 in Figure 11. We observe that the urban PM2.5 exposure in Delhi increased by 10.9% from 82.2 (5 th -95th percentile ranges: 27.8-168.9)g/m 3 in 2001 to 91.3 (33.7-190.7) g/m 3 in 2015. During the same period, the rural PM2.5 exposure increased by 11.9% from 81.1 (27.8-163.4) g/m 3 to 90.7 (32.5-192.5) g/m 3 . We point out that though the urban and rural exposure is comparable, the urban area (80%) is disproportionately higher in Delhi than the rural area (10%). The remaining 10% area does not have any permanent human settlement and therefore can be considered as the background. Several key features are now presented. First, population-weighted PM2.5 exposure increased in all the states/UTs from 2001 to 2015 (Table A1)

Urban vs. Rural PM 2.5
Unlike the developed countries where PM 2.5 is considered to be an urban problem, we observe that high PM 2.5 cuts across the urban-rural transect. We therefore present comparative statistics of urban vs. rural population-weighted PM 2.5 exposure in Figure 10 for the year 2001 and the state-averaged changes in urban and rural population-weighted exposure from 2001 to 2015 in Figure 11. We observe that the urban PM 2.5 exposure in Delhi increased by 10.9% from 82.2 (5-95 percentile ranges: 27.8-168.9) µg/m 3 in 2001 to 91.3 (33.7-190.7) µg/m 3 in 2015. During the same period, the rural PM 2.5 exposure increased by 11.9% from 81.1 (27.8-163.4) µg/m 3 to 90.7 (32.5-192.5) µg/m 3 . We point out that though the urban and rural exposure is comparable, the urban area (80%) is disproportionately higher in Delhi than the rural area (10%). The remaining 10% area does not have any permanent human settlement and therefore can be considered as the background. Several key features are now presented. First, population-weighted PM 2.5 exposure increased in all the states/UTs from 2001 to 2015 (Table A1) Figure 9).    Figure 9).  Figure 9).

Discussion
In this work, we develop and present a 20-year ambient PM 2.5 database for India at a high (1-km) spatial resolution. The data are disseminated freely through a web portal 'satellite-based application of air quality monitoring and management at a national scale', SAANS (www.saans.co.in) for use in air quality management, epidemiological studies and creating awareness amongst the citizens, especially from the states/UTs where the ground-based measurements are unavailable or scanty. Our work adds to the recent efforts of retrieving PM 2.5 at high resolution [19,22] for an improved exposure assessment. We note the following issues for the proper interpretation of our database. First, we could not calibrate the scaling factor with ground-based data before 2009 and assume the calibration factors would be the same in this period. Second, the evaluation of the data is restricted to the urban centres as rural air quality monitoring from the surface does not exist in India. In future, when the surface network will be expanded to the rural area, the true error in satellite-based PM 2.5 can be identified. We discuss several important implications and potential applications of our database.
High PM 2.5 in the rural area is not surprising as a large fraction of the population still relies on solid fuel for domestic use (cooking, heating and lighting) [4]. These emissions do not remain confined with the household and filtrate out to pollute ambient air. Household sources are found to be the largest contributor to ambient PM 2.5 in India [39][40][41][42]. This implies that poor air quality in India is not an urban-centric problem; rather it is a regional scale problem. Therefore, India requires a regional scale management strategy that transcends urban boundaries and focuses on regional airsheds. The NCAP focuses on 122 non-attainment cities. Many cities/towns in India do not have any ground-based measurements and hence whether they are non-attainment could not be determined during the early phase of the NCAP. Using our database, we found that 436 cities/towns with a population more than 100,000 (as per the 2011 Indian Census) exceed the NAAQS in 2019. We recommend setting up ground-based monitoring in these cities/towns on a priority basis.
The Government of India launched a program-Pradhan Mantri Ujjwala Yojana (PMUY, the Prime Minister's program of clean household fuel)-in 2014 to empower rural women by promoting clean cooking fuel (LPG) in the rural areas. This policy is highly important as mitigating emissions completely from the household sources can potentially help India achieve the NAAQS [43]. As the PMUY is rolled out, it lacks a mechanism to track its progress. Since the household sources contribute more than 50% to ambient PM 2.5 in the rural areas [44], successful implementation of PMUY with sustained usage should arrest or even reverse the increasing trend in rural PM 2.5 in recent years.
The high-resolution database will enable track the local hotspots within a city, especially where a single or no ground-based monitoring sites exist. It also will facilitate identification of the representative sites for the expansion of the CPCB network under the NCAP in the coming years.
In India, the epidemiological studies are either time-series (as summarized in [12]) or by design establishing the association, not causality [45], or the acute exposure impact on health outcomes like birthweight [46]. For the chronic exposure impacts on mortality and various health outcomes, we still rely on the GBD framework [1,2,4] that does not include any cohort study from India on ambient PM 2.5 exposure. Our database will be highly useful to fill this important gap by planning retrospective cohorts with the existing health data and generating India-specific exposure-response functions.

Conclusions
Using a novel high-resolution (1-km) ambient PM 2.5 database, we examine the trends in PM 2.5 concentrations in India over two decades (2000-2019). Our key conclusions are: (1) the urban and rural ambient PM 2.5 exposure increased by an almost similar margin from 2001 to 2015; (2) particulate air quality in India is a regional scale problem and needs a coordinated clean air action plan addressing the urban and rural sources simultaneously; and (3) mitigating emissions during October-February in the north and east India and December-May in peninsular India would arrest the rising annual PM 2.5 trend.