NO2, BC and PM Exposure of Participants in the Polluscope Autumn 2019 Campaign in the Paris Region

The Polluscope project aims to better understand the personal exposure to air pollutants in the Paris region. This article is based on one campaign from the project, which was conducted in the autumn of 2019 and involved 63 participants equipped with portable sensors (i.e., NO2, BC and PM) for one week. After a phase of data curation, analyses were performed on the results from all participants, as well as on individual participants’ data for case studies. A machine learning algorithm was used to allocate the data to different environments (e.g., transportation, indoor, home, office, and outdoor). The results of the campaign showed that the participants’ exposure to air pollutants depended very much on their lifestyle and the sources of pollution that may be present in the vicinity. Individuals’ use of transportation was found to be associated with higher levels of pollutants, even when the time spent on transport was relatively short. In contrast, homes and offices were environments with the lowest concentrations of pollutants. However, some activities performed in indoor air (e.g., cooking) also showed a high levels of pollution over a relatively short period.


Introduction
Air pollution has become a significant public health concern today, responsible for approximately 50,000 deaths per year in France [1,2]. Exposure to air pollutants can be chronic or acute and lead to serious health problems, such as rhinitis and asthma [3][4][5]. Such adverse effects can occur at lower concentrations than previously determined, demonstrated by the new thresholds recommended by the World Health Organization (WHO) ambient air quality guidelines [5,6]: 10 µg·m −3 per year and 25 µg·m −3 over 24 h for nitrogen dioxide (NO 2 ), 5 µg·m −3 per year and 15 µg·m −3 over 24 h for particulate matter ≤2.5 µm (PM 2.5 ) and 15 µg·m −3 per year and 45 µg·m −3 over 24 h for particulate matter ≤10 µm (PM 10 ) [6].
Monitoring the concentrations of the various regulated or tracked pollutants in outdoor air is generally carried out by air quality networks through ambient measurements at fixed stations, which allows concentrations to be measured with high accuracy [6,7]. Although these measurements are used for personal exposure assessments (PEs), they are not able to correctly represent the indoor pollutant concentration variability that populations are exposed to [8]. With the development of portable sensors, it has become possible to measure the concentrations of pollutants to which a person is exposed to throughout the their schedule logbook. In addition, most participants in the campaign were volunteers from partner institutions, i.e., with a scientific background, who agreed to test the protocol during the campaign.
Based on the lessons learned from previous work, we designed an ambitious protocol for new campaigns in order to go beyond the limitations mentioned above [10]. More precisely, this research was based on data collected with the collaboration of 63 volunteer citizens over five weeks. These citizens were recruited by responding to a call for participatory science. The issue of incomplete schedules was overcome using a machine-learning tool that we developed. Additional experiments were conducted to better characterize specific environments. The primary goal of this study was to refine the quantification and progress in understanding the PE to citizens in ÎDF.

Presentation of the Campaign
The Polluscope campaign took place in the autumn of 2019 in IDF. It lasted five weeks, from mid-October to mid-December, and involved 63 volunteers. The volunteers wore two or three sensors in a backpack, depending on the availability of sensors. In addition, they had a tablet at their disposal to record their different activities. The participants wore the sensors for one week. Between each week of measurements, there was a week break so that the sensors could be checked, and the data retrieved.
The AE51 (AethLabs, San Francisco, CA, USA) was used to measure BC; its measurement principle is based on infrared absorption on a filter [31]. The deposition on the filter is measured by a light-emitting diode (LED) at 880 m and a photo-diode detector. The absorbance of the deposition is compared to a blank portion of the filter, which serves as a reference. Both measurements are made with the same time step. During the campaign, only six units of AE51 were used because of their relatively high cost. The AE51 was placed in the backpack; the air was sucked through a short sampling line with a flow rate of 150 mL/min.
The Cairsens sensors (Envea Cairpol Microsensors, Poissy, France) were used to measure NO 2 [32]. The measurement principle of this sensor is based on electrochemistry [32]. It has a miniaturized measuring cell composed of three electrodes: the anode, the cathode, and the reference electrode [32]. The electrical signal generated between the anode and the cathode is proportional to the concentration. Fifteen sensors were used during the campaign; the sensor had a dynamic air-sampling system, and it was hung outside the backpack.
The Canarin II was used to measure PM 1 , PM 2.5 , and PM 10 ; its measurement principle is based on laser light scattering [33]. This sensor is the result of a joint project between the Laboratoire d'Informatique de Paris 6 located at the Université Pierre et Marie Curie in Paris, the Asian Institute of Technology in Bangkok, and the Macao Polytechnic Institute in Macao, China [33]. It uses the principle of laser scattering to irradiate the suspended particles, the scattered light is collected at an angle, and the variation curve with time is obtained. The equivalent particle diameter and the number of particles per unit volume according to the particle size can then be calculated. This sensor is equipped with a GPS and a Wi-Fi interface to send the data. Fifteen sensors were used during the campaign, and like the Cairsens, it was attached to the outside of the backpack. A more detailed description of the sensors is given by Languille et al. (2020) [34]. The time resolution was set to one minute for all sensors, which provided 3,504,418 measurement points.

Sensor Qualification Tests
The first phase (2017-2018) of the Polluscope project was devoted to selecting and qualifying sensors. All these tests are described by Languille et al. (2020), and therefore only a very brief summary is given here [34]. The first test performed was the fixed measurement test. The various sensors were placed at the SIRTA site for qualification. The SIRTA site is part of the ACTRIS (Aerosols, Clouds, and Trace Gases Research Infrastructure) European research infrastructure. It is a peri-urban monitoring station located about 20 km southwest of Paris. The sensors were fixed at the entrance of the SIRTA site at a height of 2.50 m in the open air under a shelter so that they were not exposed to rain. The sensors were permanently connected so that there would be no data loss due to battery failure. The test lasted one week to have a sufficiently large dataset and a good representation of the sensors' capabilities [34]. The sensors were then tested in a controlled atmosphere and mobility conditions. All these tests are described in Languille et al. (2020).
After the selection tests (2017) and the purchase of all sensors (2018), several qualification tests were performed. Here, we show the results from the 2019 qualification (the closest to our measurement campaign period), whereas the qualification results presented by Languille et al. (2022) were performed in 2018 [10]. The Canarin II and the AE51 sensors were qualified at the SIRTA-ACTRIS site with the same protocol as the fixed measurement test performed for the selection test. The Cairsens was qualified by Airparif at a measurement station classified as a "traffic proximity" station because the NO 2 concentrations at the SIRTA site were too low and often below the detection limit of the Cairsens sensors (20 ppb). The concentrations measured by the sensors were then compared to the measurements of reference instruments. The Cairsens was compared to the 42i analyzer (Thermo Fisher Scientific, Waltham, MA, USA); for the Canarin the PM 1 was compared to a TEOM PM 1 1400 coupled to an FDMS 850 module (Thermo Fisher Scientific, Waltham, MA, USA), the PM 2.5 to a TEOM PM 2.5 1400 coupled to an FDMS 850 module (Thermo Fisher Scientific, Waltham, MA, USA) and the PM 10 to a TEOM PM 10 1400 coupled to an FDMS 850 module (Thermo Fisher Scientific, Waltham, MA, USA). In addition, the Canarin data were compared to the data from FIDAS 200 (PALAS, Karlsruhe, Germany), which measures all PM. The AE51 sensors were compared to the Aethalometer Model AE33-7 (MAGEE Scientific, Berkeley, CA, USA). All reference instruments had a time step of 1 min, except for the TEOM PM 1 1400 coupled to an FDMS 850 module and the TEOM PM 2.5 1400 coupled to an FDMS 850 module which had a time step of 15 min, while the TEOM PM 10 1400 coupled to an FDMS 850 module had a time step of 5 min; therefore, the data from the sensors had to be averaged for comparison. Most of the reference instruments are described in more detail by Petit et al. (2015) [35].
After these various tests, the evaluation algorithm developed by Fishbain et al. (2017) was used to derive a performance index (IPI) for each sensor. The closer the index is to 1, the more reliable the sensor is (Table 1). These indices are calculated according to seven different metrics: root-means-square error (RMSE), different correlation coefficients (Pearson, Kendall, Spearman), the ratio of recorded over missing data, match score, and low-frequency energy (LFE). The IPI index was then calculated for each sensor. An average value for each type of sensor for the 2019 qualification campaign (Tables S1-S5) is given. The results of the Pearson correlation coefficients (Table 1) showed that for Canarin the uncertainty was about 10% for PM 1 and PM 2.5 , and 20% for PM 10 (Tables S1-S3). For AE51, the uncertainty was also about 20% (Table S4), whereas for the Cairsens the uncertainty was more significant, around 35% (Table S5). Although relatively low for portable sensors, these uncertainties were estimated for fixed measurements in outdoor air. They may be higher when measurements are performed in mobility and in environments such as railway stations (see above). Another source of error could come from the influence of humidity on PM measurements, as no correction was made on PM, even at high humidity. Nevertheless, it is noteworthy that the median and P95 values of relative humidity were similar for the campaign and qualification phase (medians of 37.3% and 34.5%; and P95 of 53.9% and 48.2%, respectively). The Canarin II has a time resolution of about 1 min and works on the principle of an optical counter. This can lead to artifacts in mass concentration measurements. The conversion of numbers to mass involves using a density factor which may be more or less accurate depending on the chemical nature of the aerosols. In particular, PM composition is known to be higher loaded with metals in underground railway stations leading to a higher density of particles [36]. For PM 2.5 , a density between 2.2 and 3.1 has been estimated in underground stations, whereas the density is usually between 1 and 2.3 in ambient air [36,37]. Therefore, a correction factor of~2 should be applied to Canarin PM 2.5 measurements in underground stations (a higher correction factor should be applied for PM 10 ). However, no correction was made because it was difficult to determine the precise times when participants were in these environments (e.g., underground railway stations). Nevertheless, participants spent a total of 11.5 h in the subway (which was only 0.78% of the total time), and of this time, only a tiny fraction was spent in the station itself. The time spent in the subway was relatively short because most of the participants lived in Versailles, and the subway was mostly used by the few participants who worked in the city of Paris.

Data Analysis
The data analysis was performed in three main steps: pre-processing, environment assignment, and validation.

Pre-Processing Phase
Portable sensor measurements are often noisy and may contain outliers, which inevitably biases the analysis of raw data. The pre-processing step aims to detect and eliminate as many artifacts as possible. We implemented an algorithm to detect and remove such artifacts by adapting the peak detection approach [38]. We set up an algorithm that detects when a sudden peak increase occurs in the values of the time series such that its difference with the preceding and following values within a time window is exceptionally high compared to their average. Precisely, s i at the time i is a peak if s i > f* mean ({s j | i-k ≤ j ≤ i + k, j = i}) where f is a given factor, and k determines the window size around s i . We empirically set k and f to 2, corresponding to a time window of five minutes and peaks more than twice the mean values. In addition, pre-selected threshold values were implemented in the algorithm so that the out-of-range values were removed (see Table S6 and the corresponding supplement). Some other rule-based processing was performed. For the PM data, there were some inconsistencies, so a condition was added to respect the fact that PM 1 ≤ PM 2.5 ≤ PM 10 . The data that did not meet this condition were discarded. The data were also verified by generating graphs by week or day for visual verification. This further control helped detect whether the first-minute data given by the Cairsens were out of line. This was probably due to the device's heating, so the first three minutes of data were removed for the Cairsens sensors. During this phase, about 10% of the data (with a maximum of 11% for PM 10 ) have been deleted.
The AE51 data contained some negative values, unlike the other two sensors. We chose not to rule out the low negative values in order to keep track of the measurement variability. The detection limit (LOD) of the sensor was evaluated at about 1500 ng·m −3 , corresponding to three times the standard deviation of the signal measured during a period corresponding to background levels. We set the negative threshold of the instrument at −1500 ng·m −3 in relation to the LOD.
GPS data is also imprecise due to noise in GPS signals. We implemented a basic GPS data denoising by calculating motion speed between two consecutive points. If it exceeds a certain threshold, the second point is removed (we used a threshold of 130 km as the campaign was conducted in urban and suburban areas). Once denoised, we computed the mean speed per minute and used the generated speed time series in the next step. Subsequently, the GPS coordinate subset that matched the retained sensor measurements were added, drastically reducing the data volume due to the difference between the GPS sampling rate (almost 1 Hz) and the per-minute sampling of the other sensors.
The pre-processing phase ended with the data fusion in a single table containing all the time series. Notice that a row is maintained in this table whenever it has at least one sensor measurement. The labels are added to the dataset for training the machine learning model. Since home and office generate much more samples in the table than outdoor and transport activities, this leads to an imbalanced class problem. The imbalanced dataset exhibits a significant problem for the classifiers to be biased towards the majority class. Therefore, techniques of class balancing should be implemented. We used a combination of the undersampling of the majority classes and the over-sampling of the minority classes based on a data augmentation algorithm. Precisely, we applied the synthetic minority oversampling technique (SMOTE), which under-samples the majority class and over-samples the minority one by randomly generating new samples close to the border of the minority class data (Figures S1 and S2) [39]. Then, we apply the time series generative adversarial (TimeGAN) network to generate a more diverse, realistic time series while considering the temporal characteristics of the data in the minority class [40]. In fact, the generative adversarial network (GAN) has shown promising performance among various types of data, including time series [41].

Environment Assignment
PE strongly depends on the environment. For this reason, there is great interest in making exposure analysis context aware. However, context annotation is the most complicated information to collect in a real-life application setting since only a few participants thoroughly reported their activities during the campaign. Therefore, there is a great interest in automatically detecting the context without burdening the participants.
Furthermore, the plots of data collected during the preliminary tests exhibit patterns specific to some environments. Based on this observation, one may consider the time series of sensor data as predictors of the environment. Therefore, we designed a machine learning algorithm and trained a model using a manually annotated dataset with the respective environment. Then, we applied this model to assign the environment to the sensor measurements of the other participants. This leads to solving a classification problem on time series. The overall process is schematized in Figure 1.
To build the model, we adopted multi-view learning, which consists of two-stage: a first-level learner is trained on each view (here, each time series) separately, then a metalearner is trained on the concatenation of the prediction output (both the prediction class and its probability) by the first-level learners [42]. A concrete example is given in Table 2 below. Thus, the meta-learner predicts the environment by combining the results from previous learners, which enhances the global accuracy of the classification. We trained our multi-view learning model and tested it using a part of the Polluscope data collected in a previous campaign on the RECORD cohort [43]. These data were carefully checked using a dedicated interactive tool, manual verification, and corrections. So, it provides a reliable In our machine learning algorithm, the model takes the available measurements (i.e., "temperature", "humidity", "PM1", " PM2.5", "PM10", "NO2", "BC", and "speed") as inputs and outputs the environment (context) of the participant.
To build the model, we adopted multi-view learning, which consists of two-stage: a first-level learner is trained on each view (here, each time series) separately, then a metalearner is trained on the concatenation of the prediction output (both the prediction class and its probability) by the first-level learners [42]. A concrete example is given in Table 2 below. Thus, the meta-learner predicts the environment by combining the results from previous learners, which enhances the global accuracy of the classification. We trained our multi-view learning model and tested it using a part of the Polluscope data collected in a previous campaign on the RECORD cohort [43]. These data were carefully checked using a dedicated interactive tool, manual verification, and corrections. So, it provides a reliable ground truth for the training and evaluation of our results. We obtained 91% accuracy of the model on the testing set.
Please refer to the following paper, El Hafyani et al. (2022), for more details [44]. Table 2. Input raw data of the meta-learner. This example shows the output of the first-level learners on four views (temperature, humidity, speed, and NO2), the predicted environment, and the probability. For instance, the first-level learner predicted "transport" with a probability of 0.6.

Post-processing/Validation
A post-processing phase was added to the machine learning process to enhance the prediction accuracy further. To segment the stops, we partitioned the data points according to a grid and calculated the density per grid cell. The stops were the pixels with the higher density. Other a priori rules were employed to correct the misclassified segments by the model. For instance, the home locations are the densest cells between 2 am and 5 am. With the post-processing, the accuracy improves to 93.4%.
Some GPS data of this campaign around Versailles were spot-checked using a mapping tool to validate the ML allocations. For instance, if the ML predicts an office, but the GPS coincides with a park, this would indicate a misclassification problem. Averages were calculated by environment and pollutant type. This also allowed the affiliations given by the machine learning (ML) algorithm to be verified.  Table 2. Input raw data of the meta-learner. This example shows the output of the first-level learners on four views (temperature, humidity, speed, and NO 2 ), the predicted environment, and the probability. For instance, the first-level learner predicted "transport" with a probability of 0.6.

Post-Processing/Validation
A post-processing phase was added to the machine learning process to enhance the prediction accuracy further. To segment the stops, we partitioned the data points according to a grid and calculated the density per grid cell. The stops were the pixels with the higher density. Other a priori rules were employed to correct the misclassified segments by the model. For instance, the home locations are the densest cells between 2 a.m. and 5 a.m. With the post-processing, the accuracy improves to 93.4%.
Some GPS data of this campaign around Versailles were spot-checked using a mapping tool to validate the ML allocations. For instance, if the ML predicts an office, but the GPS coincides with a park, this would indicate a misclassification problem. Averages were calculated by environment and pollutant type. This also allowed the affiliations given by the machine learning (ML) algorithm to be verified.

Measurement from the Air Quality Network
The data were also compared with observational data from the Airparif fixed monitoring stations (Figure 2) to analyze the representativity of outdoor measurements to represent PE. The Airparif stations are positioned to represent the different types of environments (urban/suburban background, rural and road traffic) and to assess the spatial variability of atmospheric pollutants over the IDF region. residential area at about 50-60 m on both sides of two main roads. NO2, PM10, PM2.5, and BC measurements are performed at this site. The Gennevilliers and Paris XIII stations are about 20 km north-east of Versailles. The station of Versailles (48°47′58″ N, 02°07′53″ E-125 m altitude) operated by Airparif is representative of a suburban background situation. This site is also located in a residential area with parking facilities near the train station and about 100-200 m from two major departmental roads. These three stations are relevant to evaluate outdoor PE measurements.

Additional Experiments in Specific Environments
To better understand the campaign's results, additional experiments were conducted with the same sensors used during the campaign (IPI indices Table 1). These experiments were conducted in the regional train (RER), metro, bus, streetcar, three cars (gasoline and diesel), and indoors.
These results are presented in supplement S1 and helped to interpret the results from the 2019 Polluscope campaign.

Results of All Participants in the Autumn 2019 Polluscope Campaign
The results presented here are for all participants combined. They, therefore, represent the results measured by the 63 participants during the five weeks of the campaign. Results are presented with different time resolutions, daily mean (to be compared with

Additional Experiments in Specific Environments
To better understand the campaign's results, additional experiments were conducted with the same sensors used during the campaign (IPI indices Table 1). These experiments were conducted in the regional train (RER), metro, bus, streetcar, three cars (gasoline and diesel), and indoors.
These results are presented in supplement S1 and helped to interpret the results from the 2019 Polluscope campaign.

Results of All Participants in the Autumn 2019 Polluscope Campaign
The results presented here are for all participants combined. They, therefore, represent the results measured by the 63 participants during the five weeks of the campaign. Results are presented with different time resolutions, daily mean (to be compared with the WHO recommendations), hourly means and medians (to be representative of PE during an integrated period time), and finally, minutely (to represent PE to high concentrations) ( Table 3).  Figure 3 and Table 3 show the results as daily averages. Only the days with more than 20% data completion were considered for representativeness purposes. These results were compared to the daily averages from fixed stations (Versailles, Gennevilliers, Paris XIII) and to the WHO recommendation.
were compared to the daily averages from fixed stations (Versailles, Gennevilliers, Paris XIII) and to the WHO recommendation. Figure 3 shows that the 75 th percentile of the participant PE was lower than the WHO recommendations for the three regulated pollutants (NO2, PM2.5, PM10). For NO2, the ones measured by the participants are close to the concentrations measured by the Versailles station, with a mean of 18 µg.m −3 . The concentrations measured by fixed stations (Airparif) in Paris and Gennevilliers are higher (around 30 µg.m −3 ) due to more significant sources, including higher housing density and heavier traffic, particularly in and near Paris. There is no PM2.5 or PM10 measurement stations in Versailles, so the participants' measurements were compared to those of Gennevilliers (located about 20 km from Versailles). For PM2.5 data, the median and mean concentrations measured by the participants are close to those measured by Airparif; the mean concentrations for both are around 9 µg.m −3 . On the other hand, for PM10 values, the values measured in Gennevilliers by Airparif are on average 17 µg.m −3 , while the concentrations measured by the participants are on average 10 µg.m −3 [42]. The BC concentrations measured by the participants are generally lower than those measured by the fixed stations (Table 3).
This first analysis shows that the results of this study are close to the concentrations that could be measured in a suburban environment (Versailles) at that time, consistent with the volunteers' location during the measurements (Figure 4).
The data from the fixed stations (Airparif) were only measured outdoors [42]. However, it is crucial to consider that the measurements of the participants were made in different environments (indoor, outdoor and transport).  Figure 3. Boxplot of the concentrations measured during the Polluscope campaign and measurements made by the fixed stations of Airparif (Gennevilliers, Paris XIII, Versailles), all data were used with the daily average. The whiskers extend from p10 to p90, the box from p25 to p75, and the median is plotted by the middle line. The black circle represents the mean. The WHO recommendation value is also shown. Figure 3 shows that the 75th percentile of the participant PE was lower than the WHO recommendations for the three regulated pollutants (NO 2 , PM 2.5 , PM 10 ). For NO 2 , the ones measured by the participants are close to the concentrations measured by the Versailles station, with a mean of 18 µg·m −3 . The concentrations measured by fixed stations (Airparif) in Paris and Gennevilliers are higher (around 30 µg·m −3 ) due to more significant sources, including higher housing density and heavier traffic, particularly in and near Paris. There is no PM 2.5 or PM 10 measurement stations in Versailles, so the participants' measurements were compared to those of Gennevilliers (located about 20 km from Versailles). For PM 2.5 data, the median and mean concentrations measured by the participants are close to those measured by Airparif; the mean concentrations for both are around 9 µg·m −3 . On the other hand, for PM 10 values, the values measured in Gennevilliers by Airparif are on average 17 µg·m −3 , while the concentrations measured by the participants are on average 10 µg·m −3 [42]. The BC concentrations measured by the participants are generally lower than those measured by the fixed stations (Table 3).
This first analysis shows that the results of this study are close to the concentrations that could be measured in a suburban environment (Versailles) at that time, consistent with the volunteers' location during the measurements (Figure 4). Figure 3. Boxplot of the concentrations measured during the Polluscope campaign and measurements made by the fixed stations of Airparif (Gennevilliers, Paris XIII, Versailles), all data were used with the daily average. The whiskers extend from p10 to p90, the box from p25 to p75, and the median is plotted by the middle line. The black circle represents the mean. The WHO recommendation value is also shown. The highest concentrations were located by major roads, with concentrations between 60 and 100 µg.m −3 by the freeways for NO2 and between 4500 and more than 10,000 ng.m −3 for BC. The variability in concentrations was more significant for national and secondary roads than for highways. This is because car flows are more constant and more numerous on freeways. For national and regional roads, car flows are generally less dense. In the Versailles Park and La Boulie (circled in black in Figure 4), concentrations ranged between 0 and 20 µg.m −3 for NO2 and between 0 and 2000 ng.m −3 for BC. These locations are relatively far away from major roads, and the impact of road traffic on the measured concentrations is, therefore, low. Moreover, NO2 and BC are primary pollutants with short lifetimes, so they only have little time to accumulate in the atmosphere [43,44]. Away from the source, concentrations decrease rapidly. In Versailles, concentrations varied between 0 and 80 µg.m −3 for NO2 and between 1500 and 3000 ng.m −3 for BC, but they mainly remained low. The concentrations in the city center remained higher than in La Boulie or the Park of Versailles. This is explained by the fact that the emissions of NO2 and BC come mainly from road traffic and the residential sector [21, 22,45]. As the campaign took place in the autumn of 2019, heating was also used more frequently.
These results provided information on what the participants were exposed to during the campaign. The results show that their exposure was relatively low. For NO2, the concentrations from the Versailles station and those measured by the participants during the campaign are similar, which suggest that in some conditions, the outdoor concentration monitored by the network are representative of the mean exposure of citizens.
The results were then separated according to the different environments to understand better where the participants were most exposed. The data from the fixed stations (Airparif) were only measured outdoors [42]. However, it is crucial to consider that the measurements of the participants were made in different environments (indoor, outdoor and transport). Figures 4 and S3 represent the map of the NO 2 and BC measurements, respectively, from all participants. The city of Versailles and its surroundings were zoomed in because it was the place of residence of the majority of the participants.
The highest concentrations were located by major roads, with concentrations between 60 and 100 µg·m −3 by the freeways for NO 2 and between 4500 and more than 10,000 ng·m −3 for BC. The variability in concentrations was more significant for national and secondary roads than for highways. This is because car flows are more constant and more numerous on freeways. For national and regional roads, car flows are generally less dense. In the Versailles Park and La Boulie (circled in black in Figure 4), concentrations ranged between 0 and 20 µg·m −3 for NO 2 and between 0 and 2000 ng·m −3 for BC. These locations are relatively far away from major roads, and the impact of road traffic on the measured concentrations is, therefore, low. Moreover, NO 2 and BC are primary pollutants with short lifetimes, so they only have little time to accumulate in the atmosphere [43,44]. Away from the source, concentrations decrease rapidly. In Versailles, concentrations varied between 0 and 80 µg·m −3 for NO 2 and between 1500 and 3000 ng·m −3 for BC, but they mainly remained low. The concentrations in the city center remained higher than in La Boulie or the Park of Versailles. This is explained by the fact that the emissions of NO 2 and BC come mainly from road traffic and the residential sector [21, 22,45]. As the campaign took place in the autumn of 2019, heating was also used more frequently.
These results provided information on what the participants were exposed to during the campaign. The results show that their exposure was relatively low. For NO 2 , the concentrations from the Versailles station and those measured by the participants during the campaign are similar, which suggest that in some conditions, the outdoor concentration monitored by the network are representative of the mean exposure of citizens.
The results were then separated according to the different environments to understand better where the participants were most exposed.

Results for the Participants According to the Environments
Based on the ML methodology described in Section 2.3. we separated the concentrations measured in five different environments: home, office, indoor (e.g., restaurant, train station, store), outdoor, and transport (e.g., car, train, subway).
Additional experiments were performed to better understand the concentrations and variability measured in the different environments. The results of the experiments performed in a kitchen, car, and subway are presented in supplementary S1.2.
The case study presented below shows the results of the ML and the concentrations of one participant measured during their whole week of participation. Then, the results of all participants combined according to the different environments are presented.

Case Study
The time series of pollutant concentrations measured by the sensors worn by participant 71 is presented in Figure 5. The bottom panel represents the environments reported by the participant, while the other panels (PM 2.5 , NO 2 , and BC) display the pollutant concentrations with the environments recalculated by the ML. The figure shows that the reported environment is absent at the beginning (in white) but that the algorithm recovered it from some PM 2.5 values, temperature, and humidity time series (not shown in this graph). We can also see that the model predictions are more reliable than the reported environment. For example, the participant reported about 10 h of transport on November 1, which is unlikely. This was replaced by a sequence of short transport and indoor environments episodes. of all participants combined according to the different environments are presented.

Case Study
The time series of pollutant concentrations measured by the sensors worn by participant 71 is presented in Figure 5. The bottom panel represents the environments reported by the participant, while the other panels (PM2.5, NO2, and BC) display the pollutant concentrations with the environments recalculated by the ML. The figure shows that the reported environment is absent at the beginning (in white) but that the algorithm recovered it from some PM2.5 values, temperature, and humidity time series (not shown in this graph). We can also see that the model predictions are more reliable than the reported environment. For example, the participant reported about 10 h of transport on November 1, which is unlikely. This was replaced by a sequence of short transport and indoor environments episodes.
According to the allowances given by the ML, the concentrations measured at home and the office are relatively low. However, pollution peaks are sometimes observed due to specific activities (e.g., cooking, wood heating) (experiment supplementary and [28]). The participant spent the majority of their time in these two environments. However, the concentrations measured indoors are pretty high, this participant potentially frequented restaurants, and cooking is an emitter of PM, which explains these concentrations (indoor experiment and [45]). Finally, the concentrations of NO2 and BC are relatively high as the participant mainly uses his car as a means of transportation. A substantial increase in NO2 and BC in combustion cars is generally observed (experiments in Supplementary S1and [21,46]). BC concentrations are incredibly high during traffic jams, as seen already in a study of near-road exposure to BC in Korea [47]. According to the allowances given by the ML, the concentrations measured at home and the office are relatively low. However, pollution peaks are sometimes observed due to specific activities (e.g., cooking, wood heating) (experiment supplementary and [28]). The participant spent the majority of their time in these two environments. However, the concentrations measured indoors are pretty high, this participant potentially frequented restaurants, and cooking is an emitter of PM, which explains these concentrations (indoor experiment and [45]). Finally, the concentrations of NO 2 and BC are relatively high as the participant mainly uses his car as a means of transportation. A substantial increase in NO 2 and BC in combustion cars is generally observed (experiments in Supplementary S1 and [21,46]). BC concentrations are incredibly high during traffic jams, as seen already in a study of near-road exposure to BC in Korea [47]. Figure 6 represents the concentrations of the participants according to the environments and the different pollutants.  [28]. A mean concentration of about 850 ng·m −3 was measured; this value is close to that measured during our survey of 800 ng·m −3 [28]. The measured PM concentrations are slightly higher than those measured for offices because some activities (e.g., cooking) can emit PM (indoor experience and [22]). However, NO 2 concentrations are higher in offices, which can be explained by the fact that they are usually closer to major roads, unlike homes, which are often located in residential areas [47]. Therefore, it is likely that outdoor air influences indoor air [47]. In comparison, in the review by Salonen et al. (2019), the average NO 2 concentration measured in offices was 25.1 µg·m −3 . It is very close to the one measured in our campaign at 28 µg·m −3 [48].

Results for All Participants
Contrary to the home and office, the concentrations measured for all the pollutants in the indoor environment are more comparable to the outdoor and transport environments. The concentrations measured for PM are the highest of all environments; this environment contains places like restaurants or train stations with PM emitters (Section 3.1 and [22,36]). Indoor concentrations of BC may come mainly from the restaurant environment.
For transport, the concentrations are close to those measured outside. All types of transport are mixed in this environment, but many participants used their cars. In a study by Mehel on passenger compartment air quality in the Paris region, the results showed that the concentrations measured for NO 2 in the cabin were lower than those measured outdoors, 117 µg·m −3 outdoors and 80 µg·m −3 in the cabin [48]. For PM 1 concentrations, the study showed that the concentrations are equivalent, i.e., 25 µg·m −3 indoors and 23 µg·m −3 outdoors [49]. The study by Mehel also showed that the changes in concentrations indoors and outside the vehicle occurred almost at the same time [48]. The concentrations are not directly comparable because the Mehel study only involved cars on freeways, the ring road, or tunnels, i.e., particularly polluted areas in the Paris region. However, some common points can be highlighted: for NO 2 , the general variability is the same as in the Mehel study. The outdoor concentrations are higher than those for transport, which may be linked to the transport cabin, which protects us from part of the pollution. Indeed, vehicles can be equipped with cabin filters that allow for less polluted air in the car interior. Moreover, measured PM 10 concentrations may need to be considered (Section 2.2). For PM 1 , the general variability is the same: concentrations measured during the campaign are very close, 8 µg·m −3 for transport and 9 µg·m −3 for outdoors. For the BC, the concentrations are also higher outdoors.
The concentrations measured outdoors are more representative of the concentrations impacted by road transport because these sensors directly measure the outdoor emissions. For comparison, the average NO 2 concentrations measured by the station in Versailles were calculated over the same period as the campaign. The mean NO 2 from October to December 2019 is 18 µg·m −3 . The difference with the 44 µg·m −3 determined during our campaign may be due to the volunteers traveling outside Versailles and closer to Paris, where NO 2 concentrations were higher. The Versailles station is a background station, so it is not directly influenced by road traffic. Over the campaign period, the average NO 2 concentration measured in a background station in Paris was 31 µg·m −3 . To compare with another city studied in the literature, in a study conducted by Garcia et al. in Lisbon, Portugal, the average NO 2 concentrations measured ranged from 29 µg·m −3 to 47 µg·m −3 [50]. The concentrations are comparable to those measured during the campaign (44 µg·m −3 ) and reflect that it is an urban environment for both studies.

Conclusions
This study assessed PE to NO 2 , PM 1 , PM 2.5 , PM 10 , and BC. These pollutants were measured using portable sensors that 63 participants wore during a campaign that took place in autumn 2019 in the IDF region.
To analyze the participants' data, a study of the temporal and spatial variabilities was carried out, as well as analyses according to concentration, pollutants, and environments. The results of this campaign showed that concentrations measured by the participants were very dependent on their environments and activities. An ML algorithm was used to perform the participant allocation. These allocations were calculated based on PM concentrations, temperature, and measured humidities by the Canarin. The allocations calculated by the ML were more reliable than those reported by the participants. NO 2 concentrations in transport were high (mean of 45 µg·m −3 ), even though participants spent little time there (4% on average). In contrast, participants spent most of their time at home (68%), but the concentrations in this environment were generally low (9 µg·m −3 for PM 10 and 18 µg·m −3 for NO 2 ). However, indoor pollution peaks caused by various activities (e.g., cooking) were observed. This study also showed that using portable sensors is of great interest. However, these measurements must be combined with a powerful data processing system. In contrast to fixed instruments, hand-held sensors allow participants to be followed in all the environments they encounter, thus complementing existing methods. This made possible the measurement of the PE, which is very dependent on each person's lifestyle.
Other campaigns conducted at different seasons could provide more thorough data concerning PE. In addition, improving the ML to have a more precise environment allocation could improve data collection. Measurements of other pollutants, such as VOCs, could also improve the characterization of indoor air exposition.  Table S6: Limit values of pollutants chosen for the algorithm; Figure S1: Distribution of data over classes before class balancing; Figure S2: Distribution of data over classes after class balancing; Figure S3: Geolocated BC concentration in Versailles; Table S7: Summary of concentrations for participant 71; Figure S4. Concentration measured indoors for BC and PM; Table S8: Mean concentrations measured indoors experiment; Figure S5: Concentration measured in the subway experiment; Table S9: Mean of concentrations measured in the subway experiment, Figure S6: