Exposure to particulate matter is in Europe regulated by PM-derived standards (Directive 2008/50/EC). This regulation only distinguishes between the size of the particles in large categories (PM10
, etc.) and is not source specific. The soot fraction, black carbon (BC), is the part of the PM directly related to combustion processes. Recent evidence, summarized by the World Health Organization, documents the relevance of BC for evaluating traffic-related health effects [1
]. Large personal exposure measurement campaigns prove the relevance of the in-traffic exposure contribution [2
]. Further research into health effects is hampered by the difficulty to measure or model the actual personal exposure to BC. An important reason for this is the strong spatial variability of BC compared to PM10
]. Most of the experiments focus on quantifying the differences between different commuting modes [5
]. General statistics by road type are provided by Dons and colleagues, showing relevant differences by road class, period of the day and period of the week [12
]. Outdoor concentrations are reported to be very dynamic along the individual routes. An extensive study on black carbon exposure for bicyclists showed local changes of a factor 15–20 only due to changing local traffic dynamics and instantaneous meteorology [13
]. Especially in the immediate vicinity of traffic lights and complex traffic situations, high exposure levels were found. Other authors reported similar effects based on road classification for different air pollutants [12
When moving from outdoor to in-vehicle concentrations, the variability of the in-vehicle concentrations increases significantly. The influence of the ventilation settings and influence of the speed of the vehicle on the ventilation was addressed by several authors [15
]. Two important conclusions summarize these studies. Firstly, in “outdoor air ventilation” settings, the outdoor concentration changes are registered very fast inside the vehicle. Lags between 30 and 60 s are detected [15
]. Secondly, the strongest component of the personal exposure inside the vehicle is the outside concentration due to the tail-pipe emissions of the preceding vehicles [16
]. This feature illustrates the complexity of modeling and predicting the real-life personal exposure to traffic-related pollutants inside the vehicles. Several attempts are available in literature. The first instantaneous in-vehicle personal exposure models have been published [20
]. The exposure models are based on traffic counts, road types, the number of lanes, speed of the traffic and a set of meteorological parameters. The largest study is based on about 300 km of data on a predefined route during a period of six weeks [20
]. The authors focus on the comparison of linear models and generalized additive models (GAM) models in the attempts to model the in-vehicle exposure and conclude that the GAM models capture more of the non-linear features in the highly diverse set of exposure descriptors. The authors also mention the lack of instantaneous traffic data and tested the use of more general traffic parameters, such as the annual average daily traffic (AAWT, weekdays only). Predicting instantaneous particulate matter exposure in-vehicle is not very successful at this point due to the interactions between meteorological conditions, local traffic dynamics, ventilation settings, vehicle properties, etc. Other authors build further on these results by excluding the variability due to the ventilation settings [22
]. By excluding the variability of the ventilation, these results do not include the correlation between driver-defined ventilation settings and the meteorological conditions. This important and relevant component of the meteorological and seasonal variability is not included due to this design restriction. This approach cannot disentangle the effects of fleet composition, traffic dynamics and meteorological interactions between emission, dispersion and changes in the ventilation settings. Several attempts were made to use data science methods to improve the modeling of in-vehicle exposure. One of the main scientific limitations of these methods is the ‘black box’ nature of data science techniques [23
]. Paas and colleagues (2017) present an artificial neural network (ANN) approach for urban near-road PM10
, but had to remove the high wind condition to reach a valid artificial neural network model [23
Lioy and Smith summarized the major challenges for the future of exposure science [24
]. Their main conclusion is that restrictions in the experimental designs reduce the applicability of the resulting models in health effects research. In our previous work, a new methodology was proposed to explicitly include all variability in the personal exposure models and, by doing so, provide actual real-life route-sensitive and micro-environment-specific exposure assessments [25
]. The in-vehicle exposure model for black carbon was used in [25
] as an example case of the methodology. In this publication, we focus on the details of the model selection and model features. In that process, noise maps are used as a low effort, but highly available alternative land use regression traffic attribute. Section 2
addresses the measurement campaign, the methodology and the definition of the covariates. Section 3
presents the data exploration and the models. In Section 4
, an external validation is described, and the discrepancies are investigated in detail. The results are discussed in Section 5
3. Data Exploration and Models
3.1. Summary Statistics and Lag Investigation
The physical features of the ventilation of the vehicle result in a lag between outdoor and in-vehicle concentrations. In a first step of the data exploration, this lag has to be explored and quantified. To achieve this, the data were modelled with a set of potential relevant lags. The model based an accumulated lag of 60 s showed the highest deviance explained (Table 2
). The accumulated lag of 60 s was calculated as the average of six 10-s values at and after the spatially-evaluated timestamp (referred to as LAG60). The LAG0 model is the weakest model expressing the fact that the local features do not influence the in-vehicle concentrations immediately. LAG120 is less strong compared with LAG60 and expresses a reduced correlation with prior traffic conditions. The black carbon concentrations at a specific moment in time in front of the vehicle affected the in-vehicle exposure within the next min. This matches the available information in the literature [15
The average in-vehicle exposure is 5644 ng/m3. The data were clipped to a minimum value of 100 ng/m3 (the lower measurement threshold of the µ-aethalometer) and at 100,000 ng/m3 as a maximum value. These minimum and maximum values also accommodate the use of the logarithm of the BC exposure in the GAM models. The 10–90% percentiles in 10% steps are: 912, 1668, 2441, 3258, 4146 (median), 5359, 6786, 8746 and 12,187 ng/m3. These values will not be compared to other datasets since this measurement campaign did not aim to achieve an unbiased dataset. The data are, explicitly, only used to investigate and model the short-term variability of the in-vehicle black carbon exposure.
The exposure distribution is strongly skewed (skewness = 3.6), which reduces the capability of the GAM model to predict the less frequent high exposure episodes. The physical properties of the black carbon measurements result in small residuals for low BC values. By applying a weighting function (WBC) for both low and high BC values, the model becomes more sensitive to the low and high exposure values. The resulting variant of the model BC_LAG60_WBC shows a stronger deviance explained compared to the BC_LAG60 model (last row in Table 2
). The model fit strength was tested by evaluating the total and average trip exposure prediction versus the measured BC trip exposure, aggregated by trip (Figure 1
The total trip exposure prediction is strong (r2
= 0.89, slope = 0.88); on the average trip exposure, the fit is somewhat lower (r2
= 0.73 and slope = 0.63). The automatic use of criteria as AIC for selecting models has been reported previously to be potentially misleading [30
]. The reduction of AIC for BC_LAG60_WBC is compensated in the model fit evaluation by improving the slope for the average trip exposure from 0.50–0.63 and deviance explained from 39.1–46.9%. All models in the next sections will be based on the LAG60 exposure dataset with weight WBC.
3.2. Non-Linear In-Vehicle Exposure Characteristics
In this section, we present the non-linear aspects of all covariates. The aim is to illustrate the variability in the measurements and map the specific non-linear characteristics of all the covariates to the potential origin of the in-vehicle exposure variation (see Figure 2
). The hourly traffic counts and noise attribution are presented within a single model to illustrate the relative behavior and strength. Other interactions between other sets of covariates exist, as well, but they are only described in a qualitative manner. The diurnal pattern of log (BC) is fitted by using the hour of the day as a covariate, referred to as HourOfDay. The GAM summary statistics are available in Supplementary Data Table S1
The relative speed (actual speed divided by the local speed limit) shows a maximum below 0.5. The peak can be related to dynamic traffic with starts and stops and/or congested traffic. Short distances between vehicles in such traffic conditions result in higher in-vehicle concentrations. The drop-off at low relative speed indicates actual stops or congestion with idling vehicles. This idling effect was also visible for bicyclists in Bangalore, India [33
]. The acceleration is the weakest covariate and is numerically highly sensitive to the quality of the GPS readings, but behaves as expected; increased levels when accelerating and decreased levels with moderated deceleration. Strong deceleration, related to actual stops in traffic, shows high variability, which can be due to the short distance to the source when waiting and idling in a queue at a traffic light. The bulk of the data show little to no acceleration and this is reflected in the very low strength of the covariate. The actual speed shows higher values at low speeds and lower values at low speed. The increased levels at low speed can find their origin in congested traffic, as well. The reduction of in-vehicle exposure at high speeds could be linked to the higher efficiency of the ventilation systems at higher speeds as mentioned by Xu [15
]. Higher speeds also occur in free flow traffic with higher distances between the vehicles. Other interactions with other covariates are evident; the actual speed does for example also relate to the type of road travelled. The absolute speed is lower in strength compared to relative speed, expressing the complex interactions between vehicle speed and traffic dynamics.
The in-vehicle exposure decreases with high wind speeds as expected [20
]. Wind speed is by far the strongest covariate. The temperature shows a distinctive and complex pattern. Very low temperatures and moderate temperatures result in high exposure, and low and very high temperatures result in lower exposure. The high exposure at very low temperatures could relate to the cold periods with high background levels, stable atmosphere and/or due to cold start increased vehicle emission. The high exposure for moderate temperatures and the lower exposure for the highest temperatures can be linked to the changing ventilation settings. At moderate temperatures, fresh outdoor air is enough for cooling and refreshing the vehicle interior; at higher temperatures, air conditioning is turned on, changing the air flow drastically, while the filter removes particles [18
]. Relative humidity shows increased levels at high values. This potentially links to the light scattering properties of water-saturated BC particles, which can trigger an increased response in the aethalometer [35
]. The in-vehicle exposure increases with background concentrations, but for high background concentrations, saturation is occurring.
The large-scale spatial covariate introduced in the model through the PM10 map has an interesting feature. For low values, the covariate is not significant, but at high levels, typically near the major cities, it adds value to the model, expressing higher in-vehicle concentrations in larger cities. This links to the higher density of roads and traffic in and around the cities and can be expressed through several mechanisms (higher urban background, increased traffic congestion, etc.). The street canyon index correlates with the PM10 map, but adds additional spatial detail.
The two traffic-related data sources LDEN
and weighted traffic Trafwgt
are similar in strength despite the fact that LDEN
is only a spatial covariate and Trafwgt
is a spatiotemporal covariate. The HourOfDay covariate shows higher concentrations in the morning rush hour compared to the rush hour in the evening, matching the well-known diurnal pattern. This pattern captures the increased emission related to the modified traffic dynamics during rush hours. The HourOfDay covariate is discontinuous because no trips were performed during the night. In the evening, the traffic volumes and exposures are low. In the early morning, traffic is already significant. The data therefore include long-distance commutes along highways before rush hour. This pattern can also express the effect of the stable atmosphere on ambient concentrations. The traffic-related covariates are investigated in detail in Section 3.3
3.3. Comparing Traffic-Related Data Sources
This section evaluates the strength of the traffic-related data sources and investigates how they relate to the diurnal pattern of the in-vehicle BC exposure. The main focus is on the contrast between traffic covariates including a diurnal pattern and traffic covariates with total daily traffic only. Adding a HourOfDay covariate will enable the models to adjust the traffic data to the in-vehicle exposure pattern and will account for the non-linear aspects between traffic and exposure. The meteorological aspects (wind speed and temperature), background concentration and the street canyon index are, as the strongest components in the BC_LAG60_WBC model, kept as fixed covariates in these model variants. The relative changes in the models with or without the HourOfDay covariate reveal the non-linear behavior between the diurnal patterns of traffic data, hour of the day and the in-vehicle BC exposure. The exercise is performed for the direct traffic attribution (in weighted number of vehicles) and the alternative approach through the noise covariates (LDEN
). In Table 3
, the different variants of the GAM models are defined and the matching F-values are shown. Higher F-values express an increased relative strength of the covariate compared to the other covariates within a single model. The models are sorted by deviance explained and AIC. The models and covariates including a diurnal pattern are indicated with †. The simulations were performed for two sets of models. The first set includes the GPS-based relative speed and the acceleration (prefix BC), and the second set does not include the GPS information (prefix BCR).
The models including the GPS information are the strongest, and within those models, the noise-based models are the strongest. The BC_LDAYWH model has a similar evaluation as BC_LDENWH, but the relative importance of the HourOfDay covariate does not behave as expected. The F-values of the BC_LDAYWH model are higher compared to BC_LDENWH. Since Lday,hour includes a diurnal pattern, less influence of the HourOfDay is expected (lower F-value). This indicates that a higher adjustment of the HourOfDay spline is required to achieve the same model quality. For the models excluding the GPS information, a similar pattern emerges, confirming the mismatch between the hourly traffic covariates and the in-vehicle BC exposure. Improving the temporal resolution of the traffic data does not automatically result in stronger models. The in-vehicle exposure is a complex non-linear function of the traffic. The spline of the HourOfDay covariate fits that complex relation.
As the final model, the LDEN
noise map-based model BCR_LDENWH is selected. With a low number of covariates, not requiring the GPS information, it is the most general applicable model in Table 3
. The splines of the BCR_LDENWH model are shown in Figure 3
The trip fit evaluation of the BCR_LDENWH is shown in Figure 4
. The correlations and slopes are slightly reduced compared to BC_LAG60_WBC. The six covariates in BCR_LDENWH predict the trip exposure properly. The Pearson and Spearman correlations for the trip total fit are 0.89 and 0.89 and 0.7 and 0.69 for the trip average.
4. External Validation
4.1. Properties of the External Citizen Science Campaign
A large citizen science campaign for BC exposure was performed in the region of Flanders by Dons and colleagues [2
]. The in-vehicle dataset, referred to as external data (EXD) of Dons et al. was used as an independent data source to validate the µLUR model. The main differences between the two measurement datasets are:
Temporal resolution: 10 s for the µLUR model versus 5-min resolution for EXD
Year of sampling: 2013 for µLUR, 2010–2011 for EXD
Season: all year seasonally-balanced campaign for µLUR and an unbalanced combination of summer (six household) and a winter campaign (19 households) for EXD.
The most important similarity between the two campaigns is the availability of GPS data at a similar resolution. The map-matching post-processing was also applied to the GPS data of the external citizen science campaign. The Q1, median, mean and Q3 of the validation data (EXD) are 3805, 6147, 7187 and 9508 ng/m3 and 2040, 4146, 5644 and 7637 ng/m3 for the µLUR. The summary statistics of the two BC campaigns differ significantly (p < 2.2 × 10−16). The difference in the lowest values was expected due to the different resolution measurement campaigns.
4.2. Validation Data Workflow
The µLUR methodology extracts spatiotemporal features from participatory campaigns and applies the micro-environment-specific model to any external mobile population [25
]. In this case, the in-vehicle trips of the external data are the micro-environment-specific activities, and they contain the recorded GPS positions and black carbon measurements. The GPS data were pre-processed to match the processing of the µLUR dataset (map matching to the road network). The temporal resolution of the black carbon data series of the external data was five minutes, which prohibits the comparison of the µLUR prediction on the 10-s resolution of the µLUR model. For this reason, the validation was performed at the level of the individual trips. Trips with duration of less than 15 min were excluded from the validation data. The validation is sensitive to all meteorological influences, but the short-term variation within the trip cannot be evaluated.
4.3. External Validation
In Figure 5
, the external validation is illustrated. The correlations are strong for the total trip prediction (Pearson 0.82 and Spearman 0.79) and reasonable for the average trip prediction (Pearson 0.50 and Spearman 0.48). The Q1, median, mean and Q3 of the relative trip fit are 32%, 49%, 67% and 74%, expressing a strong underestimation of the exposure measured in the EXD. The median is underestimated by 51%, the mean value by 33%.
4.4. Investigating the Discrepancy
The correlation of the external validation is strong, but the µLUR fails to predict the absolute levels. This implies that the model captures the temporal variability very well, including the meteorological influences and the spatiotemporal variation of the local traffic influences. The first candidate is a potential difference in the use of the ventilation system in both groups of participants. There were restrictions on ventilation use in both citizen science campaigns. Ventilation could bias the datasets due to the lower number of participants in the µLUR campaign, but no objective arguments can be formulated to relate this discrepancy to the ventilation use of the population sample. Ventilation is most likely not the origin of the discrepancy.
The second potential candidate is an actual decline of the emissions of black carbon. The two participatory campaigns are separated by three to four years. In 2009, more stringent EU legislation (Euro 5) reduced particulate matter emission limits for diesel vehicles from 0.025 g/km down to 0.005 g/km. The new legislation could hardly have influenced the fleet composition at the time of the external campaign in 2010 and 2011. To quantify the potential change in the vehicle fleet emission, two options are available.
The first option is to estimate the potential effect of the vehicle fleet composition on the emission of PM by 2013 based on the emission standard. The potential decay is simulated based on the changing composition of the vehicle fleet. Belgium has a relative fast renewal of the fleet, and vehicles are typically replaced after four to five years. In 2013, 30% of the vehicle fleet was already compliant with the Euro 5, resulting in a reduction of the fleet emission by 33%. This matches the mean discrepancy in the external validation.
The second option is to investigate the evolution of black carbon concentrations measured by the official air pollution monitors. Black carbon monitoring started in Flanders in 2010 in five locations near the city of Antwerpen with the main focus on industrial sources. In Antwerpen-Linkeroever, a background location was chosen (also used in the models). One measurement location was chosen close to a major road inside the city (Borgerhout background) at approximately 30 m from the roadside. In these measurement points, the average black carbon concentration dropped by respectively 19% and 17% between 2010 and 2013. This mean reduction does not fully explain the discrepancy in the external validation, but traffic is not the only source of black carbon concentrations. It is interesting to attempt to assess the in-traffic reduction in a higher spatial and temporal resolution. In 2012, an additional monitor was positioned by the Flemish Environmental Agency at 10 m from the main road near the Borgerhout street-side location. This illustrates that the Flemish Environmental Agency was already aware of the potential strong distance to source effects of the black carbon concentrations. These three monitors were used to investigate the long-term evolution of the traffic-related black carbon exposure. In Figure 6
, these data series are presented in diurnal patterns by year, with a boxplot for each half hour of the day (the temporal resolution of the monitoring network).
On each chart, two trend lines are shown. The red line is matched to the Q3 levels of the morning rush hour (Q3rush
); the green curve is mapped to the lowest Q1 levels during the last hours of the night (Q1ngt
). The slopes visualize the decrease of the black carbon concentrations for different indicators within the diurnal pattern of the black carbon. The results are summarized in Table S2 in the Supplementary Data
. The yearly average reduction during the night-time hours is higher compared to the all-day levels and the relative difference between Q1ngt
increases when the measurement location is closer to the closest road (from 20–68% when the distance to the road drops from 30 down to 10 m). The evaluation on Q3rush
shows a similar pattern, but the difference between the background location and the in-city background location is much smaller (a relative drop of 12%). The relative reduction at the location closest to the road during rush hour is more than 90%. This illustrates the huge spatial variability of the BC concentrations and very strong distance to source effects. It can be expected that this effect increases even further when closing in on the traffic lanes. The consequences on the in-vehicle exposure are considerable. The overall reduction during a trip inside the vehicle is the result of a complex sample of this spatiotemporal variability, sensitive to the instantaneous meteorology, background concentrations and local traffic and traffic dynamics. The yearly lowest reduction for background locations was quantified through the Q1ngt
indicator (−8%) and reaches values of −10% during morning rush hour. A similar evaluation on Q3 provides a typical range for the expected reduction along the trips. The yearly reduction of Q3 for dense traffic locations was quantified through the Q3rush
indicator (−10.6%). Over a three-and-a-half year period, this resulted in reductions of up to 32% in the in-city street-side location. This last value is most likely an underestimation of the actual concentration reduction near the vehicle intake when travelling in dense traffic during rush hour.
The participants traveled mainly during rush hour, and the discrepancy of the external validation fits between the values Q1ngt and Q3rush (median minus 51% and mean minus 33%). The reduction of 50% in the median is plausible near the air-intake of the vehicles. The discrepancy in the external validation can be explained by the changes visible in the diurnal patterns of the official measurement stations of the Flemish Government. These two independent sets of information support the conclusion that the discrepancy on the median and mean can be explained by the changes of the PM emission of the vehicle fleet.
The spatiotemporal aspects of an in-vehicle black carbon exposure participatory campaign were successfully modeled using detailed spatiotemporal attribution. The measurements were attributed with meteorological data and different types of traffic-related data, including noise maps as a proxy for the traffic data. The models including the non-linear aspects of the covariates were modeled with generalized additive models (GAMs). The traffic dynamics and diurnal traffic patterns did not resolve the diurnal patterns of the in-vehicle exposure. The noise maps were not necessary to provide a solution, but have important assets compared to the raw traffic data. The strongest in-vehicle exposure model was based on six parameters with LDEN in combination with a fitted diurnal pattern resolving the diurnal exposure pattern. Noise maps are therefore a valid proxy for air pollution without actual knowledge of the local traffic dynamics. Noise maps are widely available and can increase efficiency and stability in long-term land use regression approaches.
The external validation with a three- to four-year-old external participatory campaign showed a severe underestimation, which could be attributed to the introduction of Euro 5 emission standards. Data science techniques can provide successful prediction models without the need for disentangling the underlying complex interaction of the underlying parameters. The µLUR methodology has the potential to attribute epidemiological databases and provide high quality policy support without requiring full analytical solutions for the in-vehicle air pollution exposure.