Practical Field Calibration of Portable Monitors for Mobile Measurements of Multiple Air Pollutants

: To reduce inaccuracies in the measurement of air pollutants by portable monitors it is necessary to establish quantitative calibration relationships against their respective reference analyser. This is usually done under controlled laboratory conditions or one-off static co-location alongside a reference analyser in the ﬁeld, neither of which may adequately represent the extended use of portable monitors in exposure assessment research. To address this, we investigated ways of establishing and evaluating portable monitor calibration relationships from repeated intermittent deployment cycles over an extended period involving stationary deployment at a reference site, mobile monitoring, and completely switched off. We evaluated four types of portable monitors: Aeroqual Ltd. (Auckland, New Zealand) S500 O 3 metal oxide and S500 NO 2 electrochemical; RTI (Berkeley, CA, USA) MicroPEM PM 2.5 ; and, AethLabs (San Francisco, CA, USA) AE51 black carbon (BC). Innovations in our study included: (i) comparison of calibrations derived from the individual co-locations of a portable monitor against its reference analyser or from all the co-location periods combined into a single dataset; and, (ii) evaluation of calibrated monitor estimates during transient measurements with the portable monitor close to its reference analyser at separate times from the stationary co-location calibration periods. Within the ~7 month duration of the study, ‘combined’ calibration relationships for O 3 , PM 2.5 , and BC monitors from all co-locations agreed more closely on average with reference measurements than ‘individual’ calibration relationships from co-location deployment nearest in time to transient deployment periods. ‘Individual’ calibrations relationships were sometimes substantially unrepresentative of the ‘combined’ relationships. Reduced quantitative consistency in ﬁeld calibration relationships for the PM 2.5 monitors may have resulted from generally low PM 2.5 concentrations that were encountered in this study. Aeroqual NO 2 monitors were sensitive to both NO 2 and O 3 and unresolved biases. Overall, however, we observed that with the ‘combined’ approach, ‘indicative’ measurement accuracy ( ± 30% for O 3 , and ± 50% for BC and PM 2.5 ) for 1 h time averaging could be maintained over the 7-month period for the monitors evaluated here.


Introduction
Large public health burdens are associated with human exposure to nitrogen dioxide (NO 2 ), ozone (O 3 ), and particulate matter (PM) in urban areas [1][2][3]. The ambient concentrations of these pollutants are routinely monitored to assess compliance with air quality legislation and the effectiveness of air pollution mitigation measures. The black carbon (BC) component of PM is also often monitored, although not specifically for regulatory compliance, as a marker of combustion-related air pollution and because of its association with adverse health effects [4,5].
These pollutants are usually monitored using automatic 'reference' analysers at a small number of (sometimes only one) fixed sites within urban areas. They are termed reference analysers because of the instruments used-and the QA/QC processes that are applied to instrument operation, calibration, and post-processing of output-follow defined protocols, such that final concentration values are, in principle, traceably correct, within stated uncertainty tolerances, to agreed national or international standards. In the United Kingdom (UK), these instruments form part of the national Automatic Urban and Rural Network (AURN, https://uk-air.defra.gov.uk), and similar regional networks. Reference instruments have high capital and on-going operational costs, and require secured sites with mains power. As a consequence, there has been considerable interest in the recent emergence of smaller, battery-operated instruments that can measure a range of air pollutants [6][7][8][9][10][11]. The lower capital cost, small size, and low power needs of these instruments have led to their deployment in large spatial networks [12][13][14] and in mobile and peripatetic (short periods of deployments at multiple sites) measurement designs [15][16][17][18][19], including in personal monitoring and 'citizen science' contexts [20][21][22][23].
However, neither laboratory studies nor one-off static co-location adjacent to a reference analyser represent likely typical 'field' usage of these portable monitors. Thus, the aim of this study was to investigate approaches to the establishment and evaluation of portable monitor vs. reference calibration relationships for four types of portable monitor (Aeroqual Ltd. (Auckland, New Zealand) S500 O 3 and S500 NO 2 monitors, RTI ((Berkeley, CA, USA) MicroPEM PM 2.5 monitors, and AethLabs (San Francisco, CA, USA) microAeth AE51 monitors) under conditions that are realistic of their likely usage, namely repeated instances over an extended period of intermittent deployment at a reference site, portable monitoring, and completely switched off. Previous studies have not examined the repeated usage of portable monitors over a period of several months in the field. Of interest is the reproducibility of the quantitative relationship between a portable monitor and its respective reference in a series of fixed-site co-locations, and whether to take a one-off near-in-time fixed-site relationship as the calibration for a given set of portable measurements, or to pool together a set of fixed-site calibrations to cover a set of portable measurements. Since it is impractical to compare portable and reference monitors 'on the move' during mobile measurements, studies to evaluate portable monitors must still resort to comparison of measurements adjacent to fixed-site reference analysers. Thus, the innovative aspects of this study were: (1) comparison of the accuracy of calibrated estimates using calibration data from separate co-location periods against reference analysers with estimates using a single combined calibration dataset of all co-location periods; and, (2) evaluation of portable monitor calibrations over short periods when the portable monitors were transiently positioned close to fixed-site reference analysers during mobile measurement campaigns at times separate from the periods of co-location from which the calibration equations were derived. To the best of our knowledge these aspects have not been investigated before. The overarching motivation was to gain insight into the 'in field' quantification of the portable monitor output during routine usage, not on the detail of the resultant calibrations, which will likely vary according to the particular monitor available.

Portable Monitors
This study used the following portable monitors.
The PM 2.5 and BC portable monitor measurements are both potentially subject to a source of uncertainty additional to any error in recording the direct response of the instrument to the 'analyte' in the air stream: a given ambient particle mix may have composition and size distributions, and hence optical properties, that differ from the particle mix on which the internal factors that convert the optical measurements into their respective PM 2.5 and BC mass concentrations are based. This gives rise to unquantifiable erroneous values. However, in the case of BC measurement, the same mass extinction coefficient is also applied in the BC reference analyser.

Measurement Locations and Schedules
The NO 2 , O 3 , and BC portable monitors were repeatedly co-located for a few days at a time at a UK government Automatic Urban Rural Network (AURN) urban background monitoring site at Townhead in central Glasgow during February to August 2016 (Table 1). Reference analysers for NO 2 , O 3 , and BC at this site are a Teledyne API200A chemiluminescence analyser, a Thermo 49i UV absorbance analyser, and a Magee Scientific AE22 Aethalometer respectively. The PM 2.5 portable monitors were co-located at the St. Leonard's urban background AURN monitoring site in central Edinburgh, which houses a Thermo 1400 Tapered Element Oscillating Microbalance Filter Dynamics Measurement System (TEOM-FDMS) instrument for PM 2.5 .
Operation and data ratification of all of the reference instruments is covered by UK-wide QA/QC procedures that ensure compliance to measurement objectives, as specified in EU Air Quality Directives (2008/50/EC and 2015/1480) (https://uk-air.defra.gov.uk/networks/network-info?view= aurn). The reference analyser data are reported as hourly averages, and all data used here were ratified.
The multiple co-located deployments of the portable monitors against reference analysers over approximately a seven-month period enabled investigation of two approaches to deriving portable monitor calibrations.
(1) 'Local' calibration, in which individual calibration equations were calculated for each co-location period, and the calibration equation from the co-location closest in time to a given day of mobile measurement was used to correct that dataset. Portable monitor concentrations corrected this way are denoted by the suffix '.corr_loca'.
(2) 'Global' calibration, in which measurements from all periods of co-location with the respective reference analyser were combined to derive a single calibration equation that was applied to all of the measurements throughout the entire period. Portable monitor concentrations corrected this way are denoted by the suffix '.corr_glob'.
Between periods of co-location with reference analysers, one of each type of portable monitor was used for mobile measurements on multiple occasions along walking routes in Glasgow (dates and routes in Supplementary Information Table S1 and Figure S1). The walking routes were designed to cover as wide an area of central Glasgow as practical, extending from the city centre to suburban areas, and were followed at different times of day. Of relevance here is that the majority of the mobile measurement walking routes started or finished at the Glasgow Townhead AURN monitoring station. When this occurred, the individual carrying the portable monitors paused for up to an hour to collect mobile measurements close to the reference analyser enclosure (i.e., within 2 m horizontal distance, and 1.5-2 m vertical distance of the reference analyser inlets). Since the mobile measurements were made on different days from the calibration periods at the AURN site, these instances of transient co-located measurements provided an opportunity for independent evaluation of the portable monitor calibrations.

Portable Monitor Operation and Data Post-Processing
During static calibration, each Aeroqual monitor was deployed in a ventilated weather-proof box that was supplied by the manufacturer. The boxes were attached to railings on the roof of the monitoring site enclosure, 3 m above ground level, at same elevation and approximately 2 m horizontal distance from the inlet of the reference analysers. For the mobile measurements, monitors Aq2 (NO 2 ) and Aq4 (O 3 ) were carried in open side pockets of a backpack. In all of the cases, the Aeroqual monitors were programmed to record concentrations every 1 min.
Occasional false zero readings in the raw Aeroqual NO 2 and O 3 monitor measurements were replaced with interpolation between the two values either side using a custom R script. For the NO 2 monitor, there were 84 false zero readings out of 6565 measurements (~1.28%); for the O 3 monitor, it was 154 out of 6504 measurements (~2.37%). The screened raw measurements were averaged to hourly values using the timeAverage() function in the R package 'openair' (www.openair-project.org), before comparison with the hourly-average reference analyser measurements that were downloaded with openair.
Each MicroPEM monitor was housed in a weatherproof box for the static deployments and sampled air through 1.5 m of conductive silicone tubing located about 1 m from the inlet to the reference TEOM-FDMS. During mobile measurements, the MP618N instrument was carried inside a backpack with the sampling tubing inlet protruding over the carrier's shoulder. In all of the cases the MicroPEM monitors were set to record values every 5 s at a flowrate of 0.5 L min −1 .
Before and after each deployment, the MicroPEM monitors were checked for baseline drift by sampling for at least 3 min through a high-efficiency particulate arrestance (HEPA) filter connected prior to the inlets; no baseline adjustment was required. After each use, the inlets of the monitors were disassembled to clean any particulate matter built-up on the oiled impactor surface. Flowrates were checked using the MicroPEM Docking Station software before and after the full study period using a TSI 4140 flowmeter and were within the ±10% range, as required by the manufacturer. MicroPEM raw measurements were averaged to hourly averages and compared with reference analyser measurements as described above.
The microAeth monitors were placed in a waterproof box on the roof of the monitoring site enclosure for the static deployments and operated from mains power. Each monitor sampled air about 1 m from the inlet to the reference analyser through 1 m of tubing supplied by AethLabs. For the mobile measurements, monitor MA1204 was carried inside the backpack, sampling through the same tubing as for the fixed-site deployments. In all of the cases the microAeth monitors were set to record concentrations at 1 s intervals at a flowrate of 150 mL min −1 .
The microAeth monitor data were uploaded to the AethLabs website for smoothing using an Optimised Noise-reduction Averaging (ONA) algorithm (with attenuation coefficient (ATN) threshold set to ∆ATN = 0.01) so as to reduce potential instrumental optical and electronic noise [39]. The smoothed data were further processed using a custom R function to correct for potential underestimation that is associated with an increased BC mass on the filter [40]. The correction used the instrument-reported attenuation coefficient (ATN) as follows: where Tr = exp(−ATN/100) and BC is the corrected black carbon concentration, BC 0 is the instrument-reported concentration, and Tr is the aethalometer filter transmission that is calculated from the instrument-reported ATN. Processed data were hourly averaged prior to comparison with reference analyser measurements.  Tables 2-4.

Comparisons against Reference Analysers
The direct comparison plots are not shown for the Aeroqual NO 2 monitors because preliminary investigations of their outputs revealed a clear sensitivity to O 3 concentration, as has been noted previously [36] and for other manufacturers' NO 2 monitors [20]. Lin et al. [36] used the relationship between [Aeroqual_NO 2 − Reference_NO 2 ] and Aeroqual_O 3 to calibrate the Aeroqual NO 2 monitor concentrations. This effectively constrains the relationship between Aeroqual_NO 2 and Reference_NO 2 to be 1:1 (as may be expected for recent factory-calibration). In this study, we used the following multiple linear regression of Aeroqual_NO 2 on both Reference_NO 2 and Aeroqual_O 3 .
This regression is based on the reasonable expectations that the Aeroqual O 3 monitor has a linear response to 'true', i.e., reference analyser, O 3 , and that the Aeroqual NO 2 monitor has a linear response to both NO 2 and O 3 , but that its response to O 3 may be different from the Aeroqual O 3 monitor's response to O 3 [41]. The multiple linear regression is not readily visualised, but the statistics for the regressions of Aeroqual NO 2 monitor output on reference NO 2 concentrations and Aeroqual O 3 monitor output are given in Table 5. The pairings in the calibration regressions of the Aq1 NO 2 monitor with the Aq3 O 3 monitor, and of the Aq2 NO 2 monitor with the Aq4 O 3 monitor, was arbitrary, as would be the case in field use. Measurements from both Aeroqual O 3 monitors were highly correlated with reference analyser concentrations for the combined dataset of all the co-location deployments (R 2 = 0.96 and 0.88, Figure 1a). However, monitor Aq3 had better precision, sensitivity, and bias statistics than monitor Aq4. Whilst monitor Aq3 had almost 1:1 correspondence with reference analyser concentrations, monitor Aq4 was only half as sensitive as monitor Aq3. The slope coefficients of regressions for Aq3 declined slightly between individual co-location periods, suggesting a small loss in sensitivity ( Table 2, SI Figure S2). No such trend was obvious for Aq4. The poorer regression statistics for both Aeroqual monitors during 29 April-4 May 2016 appears to result from the very small dataset, which was caused by a power cut that curtailed measurements. This co-location periods had so few data points that its inclusion has a negligible impact on the regression statistics for the combined datasets ( Table 2). Monitor Aq4 also had a poorer correlation with reference O 3 during the co-location on 1-4 July ( Table 2, SI Figure S2). The multiple regressions for the Aeroqual NO 2 monitors showed a moderate to high correlation for the majority of co-deployment periods (R 2 = 0.47-0.89, Table 5). Exceptions were for 29 April to 5 May, as noted above for the Aeroqual O 3 monitors, when there were only six data points because of a power cut, and for two other periods for the Aq2 and Aq4 pairing, where R 2 = 0.25 (in both cases). The 'one off' poor calibration periods demonstrate the risk of relying on isolated periods of co-location calibration. The co-location period with only six data points was not used for subsequent calibration; for mobile measurements around that time the calibration from the next nearest co-deployment period was used as the local calibration. The data in Table 5 also show that the two sets of Aeroqual NO 2 and O 3 monitor pairings had substantially different coefficients in their regression relationships (particularly the intercepts). This reflects the varying sensitivities of the individual Aeroqual monitors to their target gas, and, for the NO 2 monitors, also to O 3 .
MicroPEM measurements were also well correlated with TEOM-FDMS reference analyser concentrations for all of the co-deployment periods combined (R 2 = 0.72 and 0.70 for MP586N and MP618N, respectively, Table 3). SI Figure S3 illustrates generally high correlations between microPEM monitors and reference analyser for individual co-deployments, except for the period 13-20 July 2016. This one-off poor calibration period again highlights the risk of relying on isolated periods of co-location calibration. When this period was excluded from the combined dataset of all the co-deployment periods, the correlation between MicroPEM monitor and analyser increased to R 2 = 0.78 and 0.76 for MP586N and MP618N, respectively) ( Figure 1b, Table 3). The regression slopes indicated that the MicroPEM monitors frequently overestimated when compared with the reference analyser ( Figure 1b). Figure 1b also indicates some non-linearity at PM 2.5 concentrations resulting from greater overestimation of concentrations by the MicroPEM at reference concentrations >~20 µg m −3 . Figure 1c shows that, for all the periods of co-deployment combined, both portable microAeth AE51 monitors also had clear linear relationships with their reference analyser (R 2 = 0.81 and 0.72, for the MA1303 and MA1204 monitors, respectively). Data for the individual co-deployments indicated that the relationships of both microAeth AE51 monitors with reference analyser remained consistent over time ( Figure S4, Table 4). However, whilst correlation was again good, regression slopes differed from 1:1 (Figure 1c).  Figure 2b and SI Figure S3.   Figure 2c and SI Figure S4.

Evaluation of Portable Monitors during Transient Deployment
As described above, some mobile measurement routes passed by the reference analyser monitoring station. Reference analyser concentrations are hourly averages and transient co-location was often of shorter duration. Therefore, to increase the number of comparison data points, comparisons were included if the transient portable monitor co-location was for 15 min or more of the hour of the AURN hourly-average value, i.e., a 'data capture rate' of ≥25% for each pairwise comparison. Even with this relaxation in data capture, acquiring a dataset of these transient comparisons during mobile deployments takes time and effort.  Table S2). The IQR is used here as a simple approximation of a significant confidence interval; thus, if the reference value falls within the IQR of the corrected monitor value there is deemed to be no evidence of a significant difference between calibrated monitor value and the test reference value. If a 50% (rather than 25%) data capture rate (n ≥ 30) was imposed for the co-location comparison, the advantage of the global linear correction (12 out of 15 co-located periods, 80%) over the local linear correction (4 out of 15 co-locations, 26.7%) and the uncorrected raw measurements (none of 15 co-locations, 0%) was clearer (Table S2).
As noted above, the Aeroqual NO 2 monitors were subject to interference from O 3 , and although this was included in the calibration regression, the relationship between calibrated Aeroqual NO 2 estimates and NO 2 reference analyser observations deviated more substantially from the 1:1 line than for the corresponding relationships for the other monitors, and had a lower correlation coefficient than the O 3 and BC monitors ( Figure 2). An important consideration is that uncertainty in calibrated Aeroqual NO 2 estimates incorporates the uncertainties in measuring two pollutants. The local calibration approach for the Aq2 NO 2 monitor yielded such extremely scattered corrected data ( Table S3) that it was not appropriate to pursue investigation of this approach, so only data for the global calibration approach for the Aq2 NO 2 monitor are shown in Figure 2b. The figure demonstrates that the global calibration has substantially improved the Aq2 NO 2 monitor agreement with the reference analyser compared to uncorrected data. Without correction, none of the 27 periods of transient standing by the reference analyser monitoring station had 1-h NO 2 reference measurement within the interquartile ranges (IQRs) of the Aq2 measurements (Table S3). For the global calibration approach, the 1-h NO 2 reference analyser measurement lay within the IQR of the Aq2 monitor estimates for 21 of the 27 (i.e., 77.8%) periods (Table S3). However the 1-min Aq2 NO 2 estimates, even after adjustment, were extremely variable, so that the IQRs of the Aq2 NO 2 values were wide. The variability extended to some negative calibrated NO 2 estimates which were clearly unrealistic.
The uncorrected and corrected measurements that were made by the MicroPEM MP618N PM 2.5 monitor for the transient periods standing adjacent to the monitoring station are shown in Figure 2c. Neither of the calibration approaches gave closer agreement between calibrated estimates and reference analyser concentrations than the uncorrected monitor measurement. The 1-h reference measurement fell within 2 out of 32 IQRs (6.3%) of the corrected values (for both global and local approaches), when compared with 6 out of 32 (18.8%) for the uncorrected measurements (SI Table S4).
For the microAeth MA1204 BC monitor used in mobile measurements, Figure 2d shows that the global calibration approach yields data that corresponds more closely to analyser concentrations as compared to estimates from the local calibration approach. Out of a total 27 co-located periods, the 1-h BC reference measurement fell within the IQRs of 16 uncorrected MA1204 measurements (59.3%), but the global calibration approach increased this to 22 out of 27 of the periods (81.5%). When a 50% data capture for the comparison was imposed, the global calibration approach reference measurement fell within the IQRs of 14 out of 17 uncorrected MA1204 measurements (82.4%), as compared with the local approach (10 out of 17, 58.8%) and no correction (8 out

Discussion
The aim of this work was to develop field-calibration procedures for four types of commerciallyavailable portable air quality monitors during typical use in exposure assessment research. These monitors were used over a period of several months that involved repeated cycles of: static co-location adjacent to reference analysers; use for mobile (walking) measurements; and, intervening periods of no usage (power off). This experimental design permitted monitor-reference analyser comparisons for separate co-location periods ('local' calibration) and for all of the co-location periods combined ('global' calibration). An additional feature of this study was transient (≤1 h) comparisons of the portable monitors that were adjacent to the fixed-site reference analysers during mobile deployments.
Although none of the four monitor types used in this study yielded data immediately comparable to their respective reference analyser during fixed-site co-locations, the generally high correlations between monitor and reference analyser for the Aeroqual O 3 , MicroPEM PM 2.5 and microAeth BC monitors (Figure 1 and SI Figures S2-S4), indicated that calibration was feasible. Correlations between monitor and reference analyser did, however, vary and were not significant for a small number of co-location periods with reduced concentration ranges (Tables 2-4, and SI Figures S2-S4). There were no long-term temporal trends in regression coefficients over the multiple co-location deployments except for a decline in sensitivity of the Aq3 O 3 monitor (Tables 2-4, and SI Figures S2-S4). Collectively, these issues illustrate the strong potential for a single co-location period to yield monitor vs reference analyser comparison data unrepresentative of the relationship on average. This point is further illustrated in SI Figures S5-S7, which show scatter plots of the O 3 , PM 2.5 , and BC portable monitor measurements on each mobile deployment day corrected using either the global or local calibration approaches. The globally or locally calibrated mobile measurements were always well correlated but were quantitatively different on a number of the days.
The Aeroqual NO 2 monitors appeared to be systematically sensitive to O 3 since a moderate correlation between Aeroqual NO 2 and reference NO 2 could be obtained for the majority of co-deployments by inclusion of Aeroqual O 3 values in a multiple linear regression (Table 5). A sensitivity of the Aeroqual electrochemical NO 2 sensor to O 3 in field deployments has been noted before [36,41]. Correction for this cross-sensitivity is therefore feasible in principle by using an Aeroqual O 3 sensor in tandem with the Aeroqual NO 2 sensor; however, the application of a correction function involving output from another monitor adds the uncertainty that is intrinsic to the second monitor to the uncertainty already intrinsic to the first monitor. The inherent variability in the regression resulted in instances of negative calibrated NO 2 concentration estimates when the calibration regression was applied to separate measurements that were made during mobile deployment (Figure 2b, Table S3).
The advantage of merging all the periods of co-location into one 'global' calibration dataset was further demonstrated by the experiments to independently evaluate the monitor calibrations during mobile usage on different days to those that were used to establish calibration relationships ( Figure 2). The improved comparison statistics for the global calibration approach was particularly evident for the Aeroqual O 3 and microAeth BC monitors. Since the regression relationship in any individual co-location period sometimes differed from the overall average regression (for identifiable or not-identifiable reasons), the global calibration approach is recommended. These global calibrations should be derived from several individual co-location periods bounding the time period of the field measurements. It is acknowledged, however, that due to possible longer-term changes in instrument response, a given calibration should not be extrapolated over time periods longer than a few months.
The generally poorer agreement between calibrated MicroPEM monitor estimates and TEOM-FDMS measurements (when compared with other monitors that are evaluated here) in the independent evaluation during mobile usage, irrespective of correction, may be attributable to the greater uncertainty in measurements of the low ambient concentrations of PM 2.5 when compared with e.g., O 3 . The ambient PM 2.5 concentrations encountered throughout this study were generally low, even in central Glasgow, and both the reference analyser and the MicroPEM monitors have acknowledged the limitations in measuring PM 2.5 concentrations of just a few µg m −3 [42,43]. Furthermore, the smaller concentration range often recorded for ambient PM 2.5 than for NO 2 and O 3 limits the extent of variation that can be explained by calibration and evaluation statistics. Since the calibration of the MicroPEM MP618 monitor was derived at the Edinburgh St. Leonard's reference site (ED3), but was evaluated at the Glasgow Townhead referenced site (GLKP), this may also highlight a limitation of extrapolation from fixed-site calibrations. Finally, it is noted again that the MicroPEM monitor does not directly measure PM 2.5 mass but uses an internal factor based on assumptions about the size distribution and optical properties of the sampled particles to convert scattered light into a PM 2.5 value. This introduces unquantifiable uncertainty into the MicroPEM measurements if the size and optical properties of the actual particle mix differ from that on which the internal factor is based.
Our findings for portable monitor performance against their respective analyser are compared here with some relevant previous literature. Delgado-Saborit [15] reported similar slope and intercept (y = 0.9727x + 0.0938, R 2 = 0.8937) between a microAeth AE51 and an Aethalometer AE41 analyser from a four-day co-location study in Birmingham. The Aeroqual NO 2 monitor co-located with a Horiba APNA370 NO x chemiluminescence analyser for seven days showed a better agreement than in this study (y = 0.7569x + 7.0505, R 2 = 0.6321). A co-location of four AE51 BC monitors alongside the reference AE22 under lab conditions in Shanghai found an averaged slope of 1.04 and an average intercept of −0.09 (R 2 = 0.960) between twelve 24-h averaged BC concentration measurements [44]; both are similar to our study (slope = 0.79-1.13, intercept = 0.00-0.14, R 2 = 0.594-0.958). A co-location of Aeroqual O 3 monitors alongside reference analysers in and around the City of Arvin, California, recorded slopes of 1.001-1.051 and intercepts of −3.28-−0.015 ppb (R 2 = 0.926-0.984) between measurements made by the two techniques [45]. While the comparison between our Aeroqual Aq4 monitor and its reference analyser always had slope less than 1 (0. 38 [36]. Finally, Sloan et al. [43] reported significant difference (difference = 2.2 µg m −3 , p < 0.0001) between measurements of PM 2.5 by MicroPEM and the reference analyser over~5 h co-location at Lindon air quality monitoring station in Utah County, US. However, this was for 5 h only and no slope or intercept was reported.
It is important to consider the extent to which calibrated observations during measurement periods are, or are not, extrapolated outside of the range of sensor responses that occurred during calibration periods. The extent of extrapolation during transient co-locations can be assessed by comparing the axis scales on the scatter plots in Figures 1 and 2. For O 3 and BC, the ranges of measured concentrations by both reference and portable monitors are of similar magnitude during calibration and measurement periods (as one would hope for); with the range of measurements made during the calibration periods encompassing the range of measurements during the transient co-location periods. Approximately 66% and 42% of the calibration ranges were represented by transient co-location ranges for O 3 and BC. For PM 2.5 , the range of the majority of measurements during the transient co-location periods is about 22% of the range of PM 2.5 measurements during the calibration periods. Thus, there is no extrapolation outside of the range of calibration measurements. Nevertheless, this discrepancy is a reminder that the relatively large range of PM 2.5 measurements during calibration periods probably results from higher concentrations that were observed during periods of long-range transport of particles [46]. These relatively large changes in PM 2.5 concentrations originating from distant sources may, or may not (e.g., through different optical properties of aged accumulation mode particles vs. recently emitted/re-suspended particles), accurately represent the relatively small temporal changes in PM 2.5 in our transient co-locations, and the increments between background and roadside locations that we attempt to quantify in subsequent mobile measurements. However, without detailed chemical, size, and/or optical measurements of the PM 2.5 , is not possible to know the extent to which calibration may be impacted, but this is the case with any measurements that are made using these types of portable PM instruments. The interpretation of ranges during calibration, transient co-location, and mobile measurement for NO 2 measured by Aeroqual sensors is complicated through the requirement for a multivariate calibration approach to allow for apparent dual sensitivity of the Aeroqual NO 2 sensors to both NO 2 and O 3 . The presence of negative adjusted Aeroqual NO 2 concentrations during transient co-locations (Figure 2b) may indicate that calibrated Aeroqual estimates were affected by extrapolation beyond the measured ranges during calibration. The presence of negative values is discussed further below.
Collectively, the above considerations give greater confidence in the reliability of 'globally' calibrated estimates from the O 3 and BC portable instruments for characterisation of temporal and spatial concentration variations when compared to the reliability of 'globally' calibrated estimates from the PM 2.5 and NO 2 portable instruments, as a result of the differences in measurement concentration ranges cf. calibration concentration ranges for the latter two pollutant metrics. The limited range of concentrations during 'local' calibration periods is the likely cause of differences between 'globally' and 'locally' calibrated estimates. Table 6 presents the proportions of negative pollutant values that are derived from applying the global calibrations to all of the 1-min data collected by the monitors on all of the walking routes that are listed in SI Table S1; the data show that negative corrected concentrations was only an issue for the NO 2 mobile measurements. Some studies have reported the potential impacts of weather conditions on air quality sensor performance [13,36], so hourly ambient temperature and relative humidity observations at the Glasgow Bishopton Met Station were downloaded from NOAA's Integrated Surface Database (https://www.ncdc.noaa.gov/isd) using the R package 'worldmet', and padded into the same 1-min time resolution as the mobile measurements. No significant relationships between meteorological and monitor air pollution data were determined. Aside from the inherent larger uncertainty in the Aeroqual NO 2 calibration regression, it is possible that differential time responses for the Aeroqual NO 2 and O 3 instruments (and therefore in air volume sampled) may be a contributing factor to issues with correcting for cross-sensitivity of the NO 2 monitor with data from the O 3 monitor when these instruments were used to measure the rapid changes in pollutant concentrations in mobile deployments when compared with relatively slow changing concentrations during static evaluation periods. A further source of potentially unmeasurable error is that one or more of the portable monitor responses varies when the monitor(s) is(are) being moved when compared with static periods of evaluation against reference instrument(s). It is appropriate to anticipate that lower-cost, portable monitors may not be as accurate and precise as reference instruments in a quality-assured network, such as the UK AURN, and our study did not attempt to evaluate monitor responses to all of the possible variables that may affect the precision and accuracy of measurements. Our aim was to investigate the practical calibration in realistic deployment scenarios over several months. Of the four types of monitors that were investigated, the microAeth BC and Aeroqual O 3 monitors were most consistent in instrument response, and hence in stability of calibration. The mean relative standard errors in the slopes between the two microAeth BC monitors and their reference analyser across their four co-deployment periods (a measure of the variability in the absolute relationship between monitor and analyser) were 5% and 8%. The mean relative standard errors in the R 2 values between the two microAeth BC monitors and their reference analyser across the co-deployment periods (a measure of the variability in the precision) were 10% and 10%. For the two Aeroqual O 3 monitors the corresponding mean relative standard errors in monitor-reference slope were 5% and 10%, and mean relative standard errors in monitor-reference R 2 values were 1% and 10%. Reproducibility in a quantitative calibration relationship for the MicroPEM PM 2.5 monitors was poorer than for the BC and O 3 monitors (mean relative standard errors in slope for the two MicroPEM instruments were 16% and 16%), which, as discussed above, is potentially attributable to the low values and ranges of ambient PM 2.5 concentrations. Nevertheless, the generally high correlations with temporal changes in PM 2.5 , as measured by the reference analyser (Table 3) gives confidence in the relative trends of PM 2.5 data from these monitors (i.e., lower vs. higher pollutant concentration between locations and/or between two points), for the time averaging of 1 h used here. Indeed, the calibrations of all the O 3 , BC, and PM 2.5 monitors, derived from a single dataset of periods of field co-location against their respective reference analysers, appears sufficient to yield 'indicative' quantitative concentrations. In European Union (EU) air quality legislation an 'indicative' method is where the relative expanded uncertainty of a measurement is within 30% for O 3 , 25% for NO 2 , and 50% for PM 2.5 at their respective limit or target value [47]. These uncertainty ranges are marked on Figure 2 (the uncertainty tolerance for PM 2.5 is used for the BC data in Figure 2d). In all instances, the 'globally' calibrated BC monitor measurements from the independent 'transient' calibration periods lie within the tolerance for an 'indicative' measurement (Figure 2d), and in the majority of cases the O 3 and PM 2.5 monitor calibrated values are also in their respective indicative ranges (Figure 2a,c). The performance of the Aeroqual O 3 and MicroPEM monitors was also reported favourably in the AQ-SPEC program (www.aqmd.gov/aq-spec/evaluations/summary).
Despite this generally satisfactory demonstration of 'indicative' measurement for the O 3 , BC, and PM 2.5 monitors, it is emphasised that this should not be over-interpreted. We have presented only data from a single study, albeit over an extended time period. For the O 3 and BC monitors, we compared calibrated monitor output at the same site where we derived global calibration relationships, albeit for different times. In most instances, the measurements that were made during mobile usage for the transient checks on calibration were for less than the full hour of a reference analyser measurement and were slightly further from the reference analyser inlet than for the static co-deployments. On the other hand, our study has sought to evaluate monitor calibration during actual mobile deployments of these monitors, which previous studies have not. Acquiring a dataset of these transient comparisons during mobile deployments takes time and effort-e.g., standing with a portable monitor by a reference monitoring station for an hour yields only a single data comparison pair. It also needs to be noted that the reference analyser data, even with formal QA/QC and data ratification processes, are only required to satisfy expanded absolute uncertainties within ±15% for O 3 and NO 2 and 25% for PM 2.5 [47].

Conclusions
We have shown that with the implementation of repeated field calibration cycles, it is possible to attain indicative measurement accuracy for the Aeroqual O 3 , microAeth BC, and MicroPEM PM 2.5 portable monitors used in this study. For studies of a few months duration, it is recommended to use a 'global' calibration that is derived from a set of co-locations with a reference analyser rather than calibrations from a single co-location deployment nearest in time to a given mobile deployment period. However, it is important to emphasise that although the capital and consumable costs of the portable monitors used here were much lower than for the reference analysers, it was necessary to devote substantial time and effort to calibrate and post-process the portable measurements to attain this outcome. Our study also only considered portable monitor application outdoors, not personal exposures switching between outdoor, in-transit, and indoor environments. The Aeroqual NO 2 monitors that were used in this study had interference from O 3 ; and whilst work presented here and elsewhere suggests the potential for reasonably-effective correction using O 3 measurements in tandem with the NO 2 measurements during extended periods of fixed-site deployment, the correction here did not extrapolate consistently to calibration of NO 2 measurements during mobile deployments, for the possible reasons discussed above. However, with the continued development in sensor technology, it may be anticipated that sensor-based portable monitors will increasingly provide additional relevant information to existing air quality monitoring.