Field Evaluation and Calibration of Low-Cost Air Pollution Sensors for Environmental Exposure Research

This paper seeks to evaluate and calibrate data collected by low-cost particulate matter (PM) sensors in different environments and using different aggregated temporal units (i.e., 5-s, 1-min, 10-min, 30 min intervals). We first collected PM concentrations (i.e., PM1, PM2.5, and PM10) data in five different environments (i.e., indoor and outdoor of an office building, a train platform and lobby of a subway station, and a seaside location) in Hong Kong, using five AirBeam2 sensors as the low-cost sensors and a TSI DustTrak DRX Aerosol Monitor 8533 as the reference sensor. By comparing the collected PM concentrations, we found high linearity and correlation between the data reported by the AirBeam2 sensors in different environments. Furthermore, the results suggest that the accuracy and bias of the PM data reported by the AirBeam2 sensors are affected by rainy weather and environments with high humidity and a high level of hygroscopic salts (i.e., a seaside location). In addition, increasing the aggregation level of the temporal units (i.e., from 5-s to 30 min intervals) increases the correlation between the PM concentrations obtained by the AirBeam2 sensors, while it does not significantly improve the accuracy and bias of the data. Lastly, our results indicate that using a machine learning model (i.e., random forest) for the calibration of PM concentrations collected on sunny days generates better results than those obtained with multiple linear models. These findings have important implications for researchers when designing environmental exposure studies based on low-cost PM sensors.


Introduction
Much research on the health impacts of individual exposure to particulate matter (e.g., PM 1 , PM 2.5 , and PM 10 ) is based on people's residential neighborhoods [1,2]. However, using people's residential neighborhoods for environmental health research can lead to the uncertain geographic context problem (UGCoP) and the neighborhood effect averaging problem (NEAP) [3][4][5][6][7][8][9]. The UGCoP stresses that using different geographically delineated contextual areas could lead to different research findings about the health effects of environmental factors [10]; and the NEAP suggests that ignoring people's daily mobility and exposure to nonresidential contexts could lead to biased estimations of personal exposure, specifically, the exposure of people who have very high or low residence-based exposures would tend toward the average of the population when their daily mobility is taken into account (i.e., mobility-based exposures) [11,12].
To address these methodological issues, some recent studies have applied a mobilitybased exposure approach to simultaneously capture the spatiotemporal variability of PM concentrations and human mobility [13][14][15]. The mobility-based exposure approach uses low-cost sensors to measure individual-level PM-concentration exposure in different types of environments, which is different from conventional environmental exposure studies (e.g., using residential neighborhoods). Furthermore, the mobility-based exposure approach has advanced the acquisition of accurate data on human space-time behaviors considerably in different types of environments (e.g., workplaces, supermarkets, and transport stations). It provides detailed space-time information that is essential for a personal environmental exposure assessment and has been increasingly used in health research [16][17][18]. The mobility-based exposure approach requires people to carry a portable low-cost PM sensor and a GPS sensor, which can simultaneously monitor individuals' geographic locations and measure the real-time PM concentrations in people's immediate surroundings at very fine spatiotemporal resolutions [19]. Specifically, a mobility-based exposure approach can capture individuals' real-time location (i.e., longitude and latitude) and exposure to PM concentrations every second [4]. Although such an approach can significantly improve the accuracy of people's PM exposure assessment compared with conventional methods (e.g., using residential neighborhoods), our knowledge on the reliability of data collected by low-cost PM sensors in different types of environments is still very limited to date.
Some studies have found that the data collected by low-cost PM sensors are not as accurate as those collected by conventional monitoring stations [20][21][22]. Specifically, various studies and reports have indicated that certain ambient physical conditions (e.g., temperature, humidity, and precipitation) can significantly affect the performance of low-cost PM sensors [23,24]. For instance, by comparing the measurements of low-cost air quality sensors (i.e., Airbeam and the Alphasense Optical Particle Counter) with the measurements of high-end instruments (i.e., the GRIMM 11-R optical particle counter and the Met One beta attenuation monitor), Mukherjee et al. [20] found that sensor measurements were influenced by the meteorological conditions and the distribution of aerosol size. Sousan et al. [25] conducted simulation experiments in the laboratory and suggested that the reliability of data provided by low-cost PM sensors can be improved if calibrated differently for various environmental conditions (e.g., different environmental and occupational settings) using site-specific calibration factors.
Although previous studies provided a useful foundation for improving the accuracy of data recorded by low-cost portable PM sensors, they did not examine how the performance of these sensors varies in different types of environments for people's daily life. Specifically, the performance of low-cost PM sensors may vary between different types of environments (e.g., workplaces, supermarkets, and transport stations) due to different environmental conditions (e.g., temperature, humidity, and precipitation). Note that most people are exposed to different types of environments in their daily life, since they may travel around and visit different types of places and venues. Thus, low-cost PM sensors may have different levels of performance when people are required to carry the sensors while conducting their daily activities in mobility-based environmental exposure studies. Besides, although the mobility-based exposure approach can measure individuals' PM concentrations every second, the accuracy of the data recorded by low-cost PM sensors may vary due to different aggregation levels of the temporal unit (i.e., 1 s, 1 min, 10 min, etc.). Thus, it is important to evaluate and calibrate the data recorded by low-cost PM sensors under different types of environments using different aggregated temporal units for mobility-based environmental exposure studies. The contribution of such an analysis is two-fold. On one hand, the new knowledge generated can help in the development and enhancement of the effectiveness of a research design for mobility-based environmental exposure studies. On the other hand, the uncertainties arising from the performance of low-cost PM sensors can be examined and addressed using appropriate calibration models.
Motivated by the abovementioned research gaps, this study seeks to evaluate and calibrate low-cost PM sensors in different types of urban environments using different aggregated temporal units. Specifically, we first used five AirBeam2 low-cost sensors and a TSI DustTrak DRX Aerosol Monitor 8533 as a reference sensor to collect PM concentrations (i.e., PM 1 , PM 2.5 , and PM 10 ) data in five types of environments (i.e., an indoor location and an outdoor location of an office building, the train platform and lobby of a subway station, and a seaside location) in Hong Kong. Then, using the PM concentrations data aggregated into 5-s, 1-min, 10-min, and 30 min intervals, the performance of the AirBeam2 sensors was evaluated based on the correlation, accuracy, and bias of the data collected in different environments. Furthermore, we calibrated the 1 min average PM concentrations data using the temperature and humidity data recorded by the AirBeam2 sensors based on multiple linear regression (MLR) and random forest (RF) (a machine learning method). Finally, we discuss how different types of environments and different aggregated temporal units would affect the performance of low-cost PM sensors and their implications for better design in mobility-based environmental exposure studies.

Sensors and Study Sites
The low-cost PM sensors used in this study are five AirBeam2 sensors (HabitatMap, Brooklyn, NY, USA). We used AirBeam2 sensors because they are lightweight (198.5 g), have a long battery life (10 h when fully charged), and are widely used in mobility-based environmental exposure studies [4,5,19,26]. Furthermore, AirBeam2 sensors can measure fine particulate matter (PM 1 , PM 2.5 and PM 10 ), temperature, and relative humidity. Note that the PM concentrations measured by AirBeam2 sensors have been calibrated based on equations developed by fitting the data to the GRIMM EDM180. Past studies have evaluated the performance of AirBeam2 sensors by comparing them with different reference instruments. For instance, by comparing AirBeam2 measurements to measurements by a TSI DustTrak DRX Aerosol Monitor 8533 in a concentrated air pollutants (CAPS) chamber, Michael and Lim [27] found that the PM concentrations measured by AirBeam2 have a highly linear relationship with the data recorded by the DustTrak sensor (i.e., R 2 = 0.89 for PM 2.5 and R 2 = 0.88 for PM 1 ). In field tests performed by different researchers, the measurements of PM 2.5 of AirBeam sensors strongly correlated with the measurements obtained by a GRIMM monitor (i.e., R 2 = 0.80-0.99) [20]. While these studies provided significant evidence to demonstrate the reliability of AirBeam2 sensors in particulate matter (PM) measurements, they did not examine how the performance of the AirBeam2 sensors vary between different types of urban environments (e.g., Mass Transit Railway [MTR] stations, supermarkets, seaside, and offices). Thus, this paper seeks to evaluate the performances of AirBeam2 sensors in different types of urban environments.
To evaluate the accuracy and bias of the AirBeam2 sensors, we compared the PM concentrations they recorded with the data obtained by a TSI DustTrak DRX Aerosol Monitor 8533 sensor. The DustTrak sensor offers simultaneous measurements of PM concentrations for different particle sizes (PM 1 , PM 2.5 , PM 3 , PM 10 , and total particles) [28]. The DustTrak is widely used in measuring PM concentrations in indoor and outdoor environments and evaluating low-cost PM sensors [29][30][31][32][33][34].
The AirBeam2 and DustTrak sensors were employed in this study to measure PM concentrations in different types of environments in Hong Kong. The city has a highly transit-oriented development (TOD) around Victoria Harbour due to limited land resources, and more than 90% of the people in Hong Kong are serviced by the public transport system [35,36]. Note that the yearly average PM 2.5 and PM 10 concentrations in Hong Kong are 16.7 µg/m 3 and 29.7 µg/m 3 according to the data collected by the Hong Kong Environmental Protection Department, covering the period from October 2020 to September 2021 [37]. In addition, Hong Kong has a subtropical climate. Its 2021 monthly average temperatures range from 16.2 • C (January) to 29.7 • C (July) [38]. Thus, people in Hong Kong usually conduct their daily activities in places or venues around Mass Transit Railway (MTR) stations or seaside locations (e.g., walking, running, or taking a ferry). Note that the MTR stations in Hong Kong usually have a lobby with several shops (e.g., cake shops, banks, coffee shops, and newsstands), station facilities (e.g., customer service centers, restrooms or toilets, and breastfeeding areas), and platforms for passengers to take the train.
We selected three locations with five types of environments in Hong Kong to collect the PM concentrations data. Table 1 provides a brief description of the selected environments and the date of data collection, including the indoor and outdoor areas of an office building, the platform and the lobby of an MTR station, and a seaside location near a ferry pier. The office is in the Institute of Space and Earth Information Science at The Chinese University of Hong Kong. The MTR station and ferry pier are in Hung Hom, which is a major transport hub in Hong Kong with an MTR station, a ferries pier, a cross-harbor tunnel, and a terminus of cross-border bus services with transport to major cities in Mainland China. Note that the day we collected data in the office (outdoor) was rainy, and the days we collected data in other environments were all sunny. The patterns of people's daily mobility in a city tend to be quite regular over days (e.g., weekdays and weekends) [36,39,40]. Thus, mobility-based environmental exposure studies usually require people to carry an AirBeam2 sensor for two days (one weekday and one weekend) [4,5,19,26,41]. Hence, for each type of environment, we collected data from 9:00 to 17:00. For PM concentrations data in the office (indoor), data were missing for one of the AirBeam2 sensors from 12:30 to 13:00. Besides, Table 1 also presents the average PM 2.5 and PM 10 concentrations reported by the Hong Kong Environmental Protection Department on the days of our data collection (i.e., Sha Tin, Sham Shui Po, Mong Kok, and Kwun Kong monitoring stations) [37].

Evaluation of Correlation, Accuracy, and Bias of Data Collected with Low-Cost PM Sensors
Data collected from the Airbeam2 sensors were time-paired with the data collected by the DustTrak sensor every second using Python. First, we used scatterplots and descriptive statistics (i.e., the mean and standard deviation) for the 1 min average concentrations of PM 1 , PM 2.5 , and PM 10 recorded by the AirBeam2 and DustTrak sensors to explore the general patterns of the data. Then, using the 1 min average PM concentrations recorded by the sensors, the data of the AirBeam2 sensors were examined based on their correlation, accuracy, and bias in different types of environments. Specifically, the Pearson coefficient was used to explore the correlation between the data recorded by the five AirBeam2 sensors. A linear regression model was used to measure the accuracy (i.e., the slope and R 2 ) of the AirBeam2 sensors compared to the DustTrak sensor. Finally, the bias (how well the Airbeam2 data agreed with the DustTrak data) was evaluated for the 1 min average PM concentrations using the following Equation (1): where y is the observed PM concentrations of the AirBeam2 sensors, x is the observed PM concentrations of the DustTrak sensor, i is the data pair index, and n is the total number of data pairs. In addition, we also aggregated the collected PM concentrations data into 5 s, 1 min, 10 min, and 30 min intervals to explore how the correlation, accuracy, and bias of the data recorded by the AirBeam2 sensors would be affected due to the use of different aggregated temporal units.

Machine Learning-Based Calibration Model Development and Validation
In this subsection, we focus on developing and validating calibration models based on the temperature and humidity data recorded by the AirBeam2 sensors using multiple linear regression (MLR) and the random forest (RF) method. Equation (2) was used in the MLR model.
where Y is the PM concentrations (i.e., PM 1 , PM 2.5 , and PM 10 ) recorded by the DustTrak sensor, X is PM concentrations recorded by the AirBeam2 sensors, Temp is the temperature reported by the AirBeam2 sensors, and RH is the relative humidity reported by the Air-Beam2 sensors. Note that all data collected in the selected environments were used and aggregated into 1 min intervals in the model. We also used the same data to develop a calibration model based on RF, which is a decision tree-based machine learning method widely used for classification or regression [42][43][44]. In the RF method, a collection of regression or classification trees is first drawn from different bootstrap samples of the training data. Then, each tree acts as a regression or classification function on its own, and the final output is taken as the majority vote for classification or average of the individual tree for regression. In this study, the RF model was applied by using scikit-learn library in Python, and the parameters for the model were selected by using an optimizing hyperparameter tuning method [45].
For the MLR and RF models, the ten-fold cross-validation method was applied to fit better models. For the ten-fold cross-validation method, each model was trained using 80% of the data, and the remaining 20% of the data were used to validate the model. This process was repeated ten times and all the data were used to validate the calibrated results. In addition, the performance of the calibration models was evaluated by comparing data between model-calibrated PM concentrations data and the data recorded by the DustTrak sensor using R 2 , bias (i.e., Formula (1)), mean error (ME) (µg/m 3 ) and root mean squared error (RMSE) (µg/m 3 ).
where n is the number of data pairs, y i is the calibrated PM concentrations generated by the MLR and RF models, and x i is the PM concentrations recorded by the DustTrak sensor.
Recall that the day we collected data in the outdoor area of an office building was a rainy day, and all the other days were sunny. To further explore how the performance of the calibration models would be affected by weather conditions, we also excluded the data collected in the office (outdoor) environment and reanalyzed the MLR and RF models. The training data and validating data were randomly chosen from the data set using a method similar to that described above.

PM 1 , PM 2.5 , and PM 10 Concentrations Collected by Sensors in Different Environments
In this subsection, we first use scatterplots and descriptive statistics (i.e., the mean and standard deviation) for the 1 min average PM concentrations recorded by the AirBeam2 and DustTrak sensors to explore the general patterns of the data. Figure 1 presents the distribution of the 1 min PM 1 , PM 2.5 , and PM 10 average concentrations reported by the AirBeam2 and DustTrak sensors in the indoor and outdoor space of an office building, the platform and the lobby of an MTR station, and a seaside location. The results indicate that the PM concentrations recorded by the DustTrak sensor are generally higher than those recorded by the AirBeam2 sensors in different environments, which is in line with the results of previous studies [27]. Specifically, the 95% confidence interval on the mean of the differences between the PM concentrations reported by the AirBeam2 and DustTrak sensors range from 1.54 to 11.04 for PM 1 , −0.01 to 8.17 for PM 2.5 , and −2.41 to 8.47 for PM 10 . and standard deviation) for the 1 min average PM concentrations recorded by the Air-Beam2 and DustTrak sensors to explore the general patterns of the data. Figure 1 presents the distribution of the 1 min PM1, PM2.5, and PM10 average concentrations reported by the AirBeam2 and DustTrak sensors in the indoor and outdoor space of an office building, the platform and the lobby of an MTR station, and a seaside location. The results indicate that the PM concentrations recorded by the DustTrak sensor are generally higher than those recorded by the AirBeam2 sensors in different environments, which is in line with the results of previous studies [27]. Specifically, the 95% confidence interval on the mean of the differences between the PM concentrations reported by the AirBeam2 and DustTrak sensors range from 1.54 to 11.04 for PM1, −0.01 to 8.17 for PM2.5, and −2.41 to 8.47 for PM10.   Table 2 presents the descriptive statistics of the 1 min PM 1 , PM 2.5 , and PM 10 average concentrations obtained by the AirBeam2 and DustTrak sensors in different environments.
The results indicate that the mean values of the PM concentrations reported by the Air-Beam2 sensors are lower than that of the DustTrak sensor, while the differences between the mean values reported by the sensors decrease as the size of PM increases. In other words, the PM 10 concentrations recorded by the AirBeam2 and DustTrak sensors have the smallest difference, while the PM 1 concentrations present the largest difference. These  By comparing the mean values of PM concentrations reported by the AirBeam2 sensors and the monitoring stations, we found that the mean values of the PM 2.5 and PM 10 concentrations reported by the AirBeam2 sensors were generally lower than that of the PM concentrations reported by the monitoring stations (see Table 1). Furthermore, by comparing the mean values of the PM concentrations reported by the DustTrak sensor and the monitoring sensors in the selected environments (except the office [indoor]), we found that the mean values of the PM 2.5 concentrations reported by the DustTrak sensor are generally similar to that of the PM 2.5 concentrations reported by the monitoring stations, while the mean values of the PM 10 concentrations reported by the DustTrak sensor are lower than that of the PM 10 concentrations reported by the monitoring stations. One of the potential reasons is that the DustTrak sensors may underestimate PM 10 concentration [46]. Meanwhile, the mean values of the PM 2.5 and PM 10 concentrations in the office (indoor) environment reported by the DustTrak sensor are lower than that of the PM 2.5 and PM 10 concentrations reported by the monitoring stations. The potential reasons include that there are air filters and air cleaning devices in the office (indoor). Using these devices and closing the windows improves the air quality in the office (indoor) compared that in the other office (outdoor).

Sensor Performance in Different Environments
In this subsection, the data collected by the AirBeam2 sensors in different environments are evaluated by their correlation, accuracy, and bias using the 1 min average PM concentrations recorded by the AirBeam2 and DustTrak sensors. Specifically, the Pearson coefficient was used to explore the correlation between the data reported by the five Air-Beam2 sensors. Figure 2 presents the correlations between the 1 min PM 1 , PM 2.5 , and PM 10 average concentrations obtained by the AirBeam2 sensors in different environments. First, the correlation coefficients range from 0.73 to 0.96, which indicates a high linearity and correlation between the PM concentrations recorded by the AirBeam2 sensors in all the selected environments. Then, we found that the correlation coefficients for data collected in the MTR station (platform and lobby) are higher than those collected in other environments. Specifically, data obtained by the AirBeam2 sensors on the platform of the MTR station have the highest correlation coefficients for the 1 min PM 1 , PM 2.5 , and PM 10 average concentrations. In addition, the results also indicate that the correlation coefficient decreases as the size of PM increases. In other words, the correlation coefficient for the 1 min PM 1 average concentrations has the highest value, while it has the smallest value for the 1 min PM 10 average concentrations.
Besides, the linear regression model and bias assessment (i.e., Formula (1)) were used to measure the accuracy of the AirBeam2 sensors compared to the DustTrak sensor. Table 3 shows the results of the evaluation for different environments. First, we found that the 1 min PM average concentrations recorded by the AirBeam2 sensors in the office (indoor) and MTR station (platform and lobby) environments, in general, have a linear relationship with the data reported by the DustTrak sensor (i.e., R 2 values range from 0.61 to 0.78, and slope values range from 0.44 to 0.95). The 1 min PM average concentrations obtained by the AirBeam2 sensors in the office (outdoor) and seaside environments have a non-linear relationship with the data recorded by the DustTrak sensor (i.e., R 2 values range from 0.11 to 0.23). Note that we also found that the R 2 values range from 0.05 to 0.11 for PM 1 , 0.05 to 0.33 for PM 2.5 , and 0.23 to 0.58 for PM 10 by using a high-order curve (second-order and third-order) to fit the data obtained in the office (outdoor) and seaside environments. These results are different from those obtained in previous studies, which found a highly linear relationship (R 2 = 0.88-0.89) between PM concentrations recorded by AirBeam2 sensors and a DustTrak sensor [27]. A possible reason for this is that AirBeam2 sensors are significantly affected in a relatively high humidity environment with hygroscopic salts [24,47,48]. Specifically, the sensitivity of AirBeam2 sensors may be affected due to the fog droplets, which may be detected as particles in a relatively high humidity environment [49]. Because a seaside location is a relatively high humidity environment with a high level of hygroscopic salts, the PM concentrations reported by the AirBeam2 sensors in the seaside location in this study would have lower accuracy than the PM concentrations collected in other environments. In addition, the relative humidity of the atmosphere increases during rainy weather, often approaching 100%. Thus, the PM concentrations reported by the AirBeam2 sensors on a rainy day would be significantly affected and have lower accuracy than the PM concentrations collected in other environments.     18 24 In addition, the bias percentage between the data recorded by the AirBeam2 sensors and the data collected by the DustTrak sensor decreases as the size of PM increases. This result is in line with the results reported earlier in Section 3.1. The differences between the mean values of PM concentrations recorded by the sensors decreases as the size of PM increases.

Sensor Performance in Different Temporal Units
In this subsection, we focus on exploring how the correlation, accuracy, and bias of PM concentrations recorded by the AirBeam2 sensors are affected by the use of different aggregated temporal units. Figure S1 presents the correlation between the PM 1 , PM 2.5 , and PM 10 concentrations reported by the AirBeam2 sensors in different environments and different temporal units (i.e., 5 s, 1 min, 10 min, and 30 min). We found that the correlation increases as the aggregation level of the temporal unit increases for all environments. Specifically, the correlation coefficients are around 0.6 to 0.7 when the temporal unit is 5 s, while the correlation coefficients range from 0.95 to 1 when the temporal unit is 30 min. In addition, the correlation coefficients for different temporal units are not affected by changes in PM size. Figure S2 presents the results of the multiple linear regression models for the data recorded by the AirBeam2 and DustTrak sensors in different environments and different temporal units. The result indicates that the R 2 values generally increase as the aggregation level of the temporal units increases (i.e., from 5 s to 30 min) in the office (indoor) and MTR station (platform and lobby) environments. For the office (outdoor) environment, increasing the aggregation level of the temporal units (i.e., from 5 s to 30 min) does not significantly affect the results of the multiple linear regression between the PM concentrations reported by the AirBeam2 and DustTrak sensors. For the seaside environment, the R 2 values first increase as the aggregation level of the temporal units increases (i.e., from 5 s to 10 min) and then decrease for the 30 min interval. Thus, the R 2 values reach the maximum value when the temporal unit is 10 min. Figure S3 presents the results of the bias percentage between the PM concentrations recorded by the AirBeam2 and DustTrak sensors in different environments and different temporal units. The results suggest that changing the aggregated temporal units does not significantly reduce the bias. Specifically, the bias values may increase or decrease with an increase in the aggregation level of the temporal units (i.e., from 5 s to 30 min) in different environments. These results suggest that increasing the aggregation level of the temporal units increases the correlation of the PM concentrations obtained by the AirBeam2 sensors in different environments, while it does not significantly improve the accuracy and bias for the data.

Machine Learning-Based Calibration and Validation
In this subsection, we focus on developing and validating calibration models that are based on the temperature and humidity data reported by the AirBeam2 sensors using multiple linear regression (MLR) and the random forest (RF) method. We first used all PM concentrations collected in the selected environments and aggregated them into 1 min intervals. Table 4 presents the results of the calibrated models based on the data collected in all selected environments. The results indicate low-to-moderate linearity (i.e., R 2 values range from 0.16 to 0.59) between the calibrated 1 min PM 1 , PM 2.5 and PM 10 average concentrations recorded by the AirBeam2 and DustTrak sensors based on the MLR and RF models. Furthermore, the results also suggest that the R 2 values between the calibrated PM concentrations data recorded by the AirBeam2 sensors and those recorded by the DustTrak sensor decrease as the size of PM increases. In addition, the bias percentage significantly decreases (i.e., the bias percentage range from −2.54 to 1.23) after the PM concentrations data were calibrated.
Besides, we also excluded data collected in the office (outdoor) environment and rerun the MLR and RF models. Table 5 presents the results of the calibrated models based on the PM concentrations collected in the office (indoor), MTR station (platform and lobby), and the seaside location. First, the results indicate a high linearity (i.e., R 2 values range from 0.89 to 0.95) between the calibrated 1 min PM 1 , PM 2.5 , and PM 10 average concentrations recorded by the AirBeam2 and DustTrak sensors based on the MLR and RF models. Furthermore, the results also suggest that R 2 values between the calibrated PM concentrations recorded by the AirBeam2 and DustTrak sensors are not affected by the size of PM. In addition, the bias percentage significantly decreased (i.e., the bias percentage ranges from −1.21 to −0.04) after the data were calibrated. Additionally, by comparing the results of cross-validation ME, RMSE, and bias percentage between the MLR and RF models in Tables 4 and 5, we found that the RF models can generate more robust calibrated PM concentrations for the AirBeam2 sensors than the MLR models, as the values of the ME, RMSE and bias percentage reported by the RF models are generally smaller than that those reported by the MLR models.
These results are consistent with previous findings [24]. The results imply that there is a non-linear relationship between the temperature, relative humidity, and PM concentrations reported by AirBeam2 sensors during rainy weather. It is thus difficult to derive any appropriate correction models.

Discussion
Previous studies have focused on establishing different linear models to calibrate PM concentrations recorded by low-cost portable sensors in certain environments with different concentration levels based on the applications in the literature [50,51]. This study is important because it extends previous studies by presenting how the correlation, accuracy, and bias of PM concentrations reported by low-cost PM sensors (i.e., AirBeam2 sensors) would be affected by different types of urban environments and different aggregated temporal units. Specifically, the results reveal that the PM (i.e., PM 1 , PM 2.5 , and PM 10 ) concentrations recorded by the AirBeam2 sensors are generally lower than those obtained by the monitoring stations and the DustTrak sensor in different environments. The results show the high linearity and correlation between the data recorded by the AirBeam2 sensors in different environments for the three types of PM concentrations. By comparing the data collected by the AirBeam2 and DustTrak sensors, the results also indicate that the accuracy and bias of the data recorded by the AirBeam2 sensors are significantly affected by weather conditions (i.e., rainy day) and environments with a relatively high humidity and a high level of hygroscopic salts (i.e., seaside). Meanwhile, the correlation, accuracy, and bias of the PM concentrations recorded by the AirBeam2 sensors are affected by PM size. In addition, by using data aggregated in different temporal units (i.e., 5 s, 1 min, 10 min, and 30 min), the results suggest that increasing the aggregation level of the temporal units (i.e., from 5 s to 30 min) significantly increases the correlation coefficients for PM concentrations recorded by the AirBeam2 sensors in different environments, while it does not significantly improve the accuracy and bias of the data. Lastly, the calibration models indicate that using random forest (RF) models would generate better results than multiple linear regression (MLR) models for the data collected on sunny days (i.e., excluding the data collected on rainy days). The findings have several important implications for researchers when designing mobility-based environmental exposure studies that use low-cost PM sensors.
First, our results reveal a high linearity and correlation between the data recorded by the AirBeam2 sensors in different environments when the aggregated temporal unit is larger than a 5-s interval (e.g., 1 min). Specifically, the correlation coefficients between the PM concentrations reported by the AirBeam2 sensors are not significantly affected by the environment and the weather conditions (e.g., a relatively high humidity environment with a high level of hygroscopic salts or a rainy day). Thus, our findings suggest that researchers should perform tests to detect how different aggregated temporal units may affect the results when used PM concentrations data reported by low-cost PM sensors for mobility-based environmental exposure studies, which largely compare individual exposure data. For instance, by comparing the PM concentrations recorded by low-cost PM sensors with certain aggregated temporal units (e.g., 1-min), studies can explore which social groups have a disadvantage in exposure to air pollution and thus shed light on environmental inequality issues.
Second, our results indicate that the accuracy and bias of PM concentrations recorded by the AirBeam2 would be affected by weather conditions (i.e., rainy days) and environments with relatively high humidity and a high level of hygroscopic salts (i.e., seaside). Thus, when studying the health effects of PM exposure, low-cost PM sensors, without being properly calibrated, may generate misleading measurements. In addition, the results also highlight the importance of deriving calibration models based on machine learning methods (e.g., random forest) with consideration of weather conditions. Specifically, the results suggest that excluding data collected on rainy days based on random forest models for different environments can generate better outcomes for data calibration. Thus, researchers could develop machine learning models for data calibration in different environments when using low-cost PM sensors to explore how PM concentrations may affect people's health. In addition, it should be kept in mind that weather conditions (e.g., rainy days) would significantly affect the accuracy and bias of the data collected by low-cost PM sensors.
However, the study has several limitations. First, due to the sensors' (i.e., the AirBeam2 and DustTrak sensors) limitations, we developed and validated the calibration models based on the temperature and humidity data recorded by the AirBeam2 but could not consider other meteorological factors (e.g., wind speed) in different types of environments (e.g., outdoor and indoor environments of the office). Studies have shown that these factors may also be important in terms of affecting PM concentrations [52]. Thus, it is necessary to explore calibration models for PM concentrations data recorded by low-cost PM sensors in different environments, with consideration of other meteorological factors (e.g., wind speed) in future work.
Second, there is great potential to further extend our study on the evaluation and calibration of low-cost PM sensors to different seasons or weather conditions based on longer periods of data collection (e.g., including a weekday and a weekend day because people's daily activities would be significantly different between weekdays and weekends). In addition, PM concentrations vary between different seasons or weather conditions. Thus, future studies should also consider how the data recorded by low-cost PM sensors may be affected in different seasons (e.g., winter and summer) or weather conditions (e.g., a snowy, dry, and rainy day).
Third, there is an issue with the performance of the DustTrak sensor in the PM 10 concentration data collection. The DustTrak sensor may underestimate PM 10 concentrations, and PM concentrations measured with the sensor may suffer from some random jumps (see Section 3.1). This bias could introduce uncertainties in the accuracy assessment of calibrated PM concentrations data for low-cost PM sensors. However, the methods applied in this study can still be used for designing better mobility-based environmental exposure studies that use low-cost PM sensors. Future studies would, of course, benefit from using more reliable portable PM concentrations sensors to capture more accurate data for the calibration of low-cost sensors in different types of indoor and outdoor environments (e.g., kitchen, smoking room, and various activity venues).

Conclusions
Low-cost PM sensor evaluation and calibration are crucial for establishing reliable PM concentrations measurements for mobility-based environmental exposure studies. Using data collected by five AirBeam2 sensors and one TSI DustTrak DRX Aerosol Monitor 8533 sensor in different environments (i.e., office [indoor and outdoor], MTR station [platform and lobby], and seaside) in Hong Kong, this study first assessed the reliability of the PM concentrations recorded by the AirBeam2 sensors during a 1 min average aggregation using correlation, accuracy, and bias analysis. Then, the study further explored how the correlation, accuracy, and bias of the PM concentrations recorded by the sensors are affected by the use of different temporal units (i.e., 5 s, 1 min, 10 min, and 30 min). Lastly, the study calibrated the data obtained by the sensors using multiple linear regression (MLR) and random forest (RF). The results suggest that the accuracy and bias of PM concentrations recorded by the AirBeam2 would be affected due to weather conditions (i.e., rainy days), environments with relatively high humidity, a high level of hygroscopic salts (i.e., seaside), and different aggregation levels of the temporal unit. The results also indicate that using RF models would generate better results than MLR models for the data collected on sunny days (i.e., excluding the data collected on rainy days). These results provide valuable insights (e.g., the selection of predictive variables) for the research community when designing environmental exposure studies that use low-cost PM sensors. First, it is necessary to perform tests to detect how different aggregated temporal units may affect the results of PM concentrations data reported by low-cost PM sensors. Second, using machine learning models (i.e., RF) for data calibration under different environments can generate better results than MLR models. In addition, more efforts are needed to develop calibration models under different weather conditions to obtain accurate PM concentrations data from low-cost PM sensors.

Data Availability Statement:
The datasets generated from the current study are available from the corresponding author on reasonable request.

Conflicts of Interest:
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.