Using an Internet of Behaviours to Study How Air Pollution Can Affect People’s Activities of Daily Living: A Case Study of Beijing, China

This study aims to quantitatively model rather than to presuppose whether or not air pollution in Beijing (China) affects people’s activities of daily living (ADLs) based on an Internet of Behaviours (IoB), in which IoT sensor data can signal environmental events that can change human behaviour on mass. Peoples’ density distribution computed by call detail records (CDRs) and air quality data are used to build a fixed effect model (FEM) to analyse the influence of air pollution on four types of ADLs. The following four effects are discovered: Air pollution negatively impacts people going sightseeing in the afternoon; has a positive impact on people staying-in, in the morning and the middle of the day. Air pollution lowers people’s desire to go to restaurants for lunch, but far less so in the evening. As air quality worsens, people tend to decrease their walking and cycling and tend to travel more by bus or subway. We also find a monotonically decreasing nonlinear relationship between air quality index and the average CDR-based distance for each person of two citizen groups that go walking or cycling. Our key and novel contributions are that we first define IoB as a ubiquitous concept. Based on this, we propose a methodology to better understand the link between bad air pollution events and citizens’ activities of daily life. We applied this methodology in the first comprehensive study that provides quantitative evidence of the actual effect, not the presumed effect, that air pollution can significantly affect a wide range of citizens’ activities of daily living.


Introduction
Continuing human urbanisation exacerbates various physical world conditions, e.g., causing air pollution, traffic congestion, habitat destruction, and loss of arable land [1][2][3]. This threatens the sustainable development of urbanisation by governments [4]. Citizens perceive such negative impacts and are becoming more active to counteract these effects [5]. High levels of air pollution, which can be reflected by a high air quality index (AQI) or PM 2.5 (air-borne particulate matter having a diameter of fewer than 2.5 micrometres) concentration, have been shown to impact citizens' health [6], labour productivity [7], later-life educational outcomes [8] and happiness [9]. As mitigation measures, people may choose to live in less polluted cities and in green buildings [10], to use air filtration systems and reduce the time spent outdoors on highly polluted days [11], or decide to wear a breathing mask outside. Governments and businesses may also benefit from understanding the quantitative influence of air pollution on peoples' activities, such as how much their citizens' curtail their outdoor activities to avoid bad air pollution.
Although there are some derivative concepts of the internet of things (IoT) that have been applied in many research areas such as the internet of vehicles, a clear definition of the internet of behaviours (IoB) is lacking so we define it as follows. An internet of behaviours (IoB) is defined as a system of IoT devices that collect, use and analyse data about physical (and cyber) human behaviour that seeks to influence human behaviour, i.e., through being better informed about environmental events, and to even trigger changes in human behaviour, en mass, in time and space. One key type of human behaviour critical to human well-being is defined as the basic activities of daily living or ADL, e.g., personal care, mobility, and eating [12]. An IoB system can combine data from multiple IoT environmental sensor sources with commercial customer data, citizen-driven data, data processed by public departments and government agencies, social media, and geographic information science (GIS) data. Based on such data sets, data mining and machine learning enable people's behaviour to be analysed, then an IoB can enable different stakeholders, e.g., businesses, authorities, citizens to better interpret human behaviour en mass.
To do this, we first need to identify suitable IoB data sources that can be used to track mass human behaviour, such as human movement. For example, call detail records (CDRs) are produced in a cellular phone network base station to document a call, text message, etc., and describe the time of the call, closest base station and location information [13]. Mobile users' density distribution can be regarded as a high-sampling-rate people's density distribution that can be computed in time.
It's useful to differentiate people's habitual points of interest (POIs) associated with ADLs, such as visiting a favourite restaurant near work. We classify this as a specific place with a specific activity (SPSA), where a location-driven ADL (LD-ADL) occurs, e.g., eating out. These will be differentiated from other POIs where less identifiable LD-ADLs occur (non-SPSA class).
Next, we need to investigate how environmental changes, e.g., air quality affect humans' behaviour at SPSAs. However, very few studies have quantitatively assessed the impact of air pollution on ADLs, especially at a large, city-wide scale. The main reasons for this are: first, some ADLs are composite and difficult to describe with the data acquired. Second, datasets that can be used to calculate ADLs in a large area, at a sufficiently high sampling rate, can be difficult to obtain unless one works for large telecoms, internet app or social networking companies.
To analyse how people's ADLs are affected by bad air pollution, we used highfrequency peoples' density distributions calculated using CDR data, and air pollution data, across the whole of Beijing, China in February 2015, acquired from a telecom provider In Table 1, the time spent on SPSA ADLs including sleeping, eating, housework, fitness exercise, watching TV, and transportation, account for more than 60% (=63.74%) of a day. Hence, we select four representative types of SPSA ADLs: sightseeing (N.B. sightseeing has been shown to have a strong effect on human well-being [14]), eating out, staying-in (to rest and recuperate) and the type of transport mode used. Each type of SPSA ADL is associated with a point of interest (POI) dataset of fixed places, except for the transport mode used. Some transport modes, e.g., bus and subway can be detected by fixed POIs, while others could be extracted from the movement speed. However, people who take taxis or cars can be hard to recognize as they have no fixed POIs and can have a similar speed to buses and cycles, during traffic congestion. According to the Beijing Traffic Development Annual Report in 2016 (http://www.bjtrc.org.cn/, accessed on 1 August 2019), the total average travel distance travelled using bus, subway, cycling, and walking is 26.1 km, which accounts for 53% of the total average distance (49.2 km). Their time duration accounts for 65% of the total average travel time (485 min). Thus, we focus on four types of transport modes: bus, subway, cycling and walking.
Our key and novel contributions are as follows: (1) We propose a methodology to better understand the link between environmental changes such as air pollution and citizens' activities of daily life. It can help government and businesses to understand better the actual effect not the presumed effect of air pollution on the pattern of daily activities of citizens; (2) This opens up a new perspective for understanding and exploring the interaction between PM 2.5 and in more general air pollution and people's physical behaviour. (3) This can not only reveal the subtle impact of PM 2.5 /air pollution on human ADL but can also monitor the indirect impact of PM 2.5 /air pollution on some human-based business activities, e.g., restaurants. This is challenging to do because different data sets, such as air pollution, human movement, location contexts, etc., with different temporal and spatial characteristics, need to be acquired and fused. Human activities can be complex to characterise. Individual human behaviour in a crowd needs to be identified. Human behaviour is affected by a range of environmental factors, some of which may not be observable, that need to be correlated.
The remainder of this article is organized as follows: Section 2 analyses related work, Section 3 presents the data and pre-processing used. Section 4 introduces an overview of the methodology, followed by a more detailed description. Results and discussion are reported in Section 5. Section 6 gives our conclusions, limitations, and thoughts for future research. The abbreviations are explained in Table 2.

Related Works
Several factors affect crowd activities, including human factors (such as leaving work) and physical environment factors (such as temperature, and rain). Bad air quality can affect the outdoor activities of residents [15]. To study how the influence of haze (air pollution with high degrees of PM) affects different crowd activities in different urban areas, we first consider regression models that establish regression relationships between weather factors and specific human behaviour. De Freitas [16] observed that atmospheric conditions affect beach user behaviour. Lin et al. [17] found that poor air quality causes the elderly to stay indoors. Jiang et al. [18] used a social media survey and regression and variance analysis to find that particulate pollution negatively impacts the maximum number of park visits. R-Toubes et al. [19] analysed the relationship between weather conditions and people flow, daily, at tourist beaches highlighting that sunshine is important. Zhao et al. [20] found via a survey that in hazy weather, higher-income cyclists in Beijing tended to switch to use private vehicles rather than to use public transit, while lower-income cyclists were more likely to continue cycling. Hu et al. [21] used a multivariate regression method to study the relationship between air quality and outdoor exercise in China. The total number of exercise sessions, average duration and an average distance of each exercise mode, were analysed under each air quality category (from excellent to severe).
Omitted variable bias is a primary statistical challenge in nonexperimental research (research that lacks the manipulation of an independent variable, control of extraneous variables through random assignment, or both). Fixed effect models (FEMs) with panel data were developed to address the issue of omitted variable bias in nonexperimental research [22]. Thus, FEMs can be applied to detect the relationship between ADLs and air pollution where the model is an estimation technique, employed on panel data that allows one to account for time-invariant unobserved individual characteristics, i.e., other factors such as special offers at different times of day, etc., that can be correlated with the observed independent variables (AQI or PM 2.5 ) [23]. For example, Gao et al. [24] studied the impact of different air pollutants on dining-out activities and the satisfaction of urban and suburban residents. They found that due to differences in environmental and health awareness, the impact of air pollution on dining-out behaviours varies among urban and suburban residents. Zheng et al. [25] studied how air pollution affects residents' eating out frequency and satisfaction based on the reviews from dianping.com. They proposed that air pollution can reduce the dining-out frequency and satisfaction of residents. However, in both studies, they collected residents' dining-out data from only one third-party website. This is not subjective because not everyone tends to leave a review on the website. In contrast, CDR data is much more representative because people's locations are recorded and computed with a higher frequency and the data is objective. Further, when using a FEM, both two studies did not take measures to solve the potential endogeneity issue brought about by an omitted variable [26], which leads their results to lack robustness. Table 3 summarizes the above related work along with their ADL, data, method, and limitations. In conclusion, current studies of how air pollution can affect physical human behaviour have the following specific limitations: (1) Current studies only focus on one, or very few, type(s) of ADLs; (2) they do not clarify the differences between haze, AQI, and PM 2.5 ; (3) they do not study how to model the link between air quality and multiple types of ADLs at a large spatial scale and (4), although they consider other observable impact factors that may influence the ADLs using traditional questionnaires survey or simple statistic regression models, there are many other unobservable or unquantified factors that may also have a significant influence that are not considered. Even though (4) could be solved by using a FEM, current FEM air quality studies still have problems of (5) data objectivity and (6) model robustness. Thus, in this study, our contributions also include the solutions to solve the six limitations mentioned above.

Data Introduction
Here, the main datasets CDR and air pollution and any preprocessing are introduced. Other datasets, weather conditions, POI and building area are introduced in Appendix A.4. It is challenging to identify and fuse such heterogeneous data about POIs that may have different temporal and spatial characteristics such as resolution.

Call Detail Record (CDR)
People's activities can be reflected by how their distribution density changes when they visit different POIs derived from their anonymised individual mobile phone CDRs via a base station positioning method [27] because of the high penetration rate of smart mobile phones according to the World Telecommunication Development Conference (2014). CDRs from a telecommunication operator with the highest customer number in China from Monday 2 February to Sunday 22 February 2015 (21 days) were analysed. This dataset includes over 4.8 billion records for more than 300 million users per day. The size of an hourly CDR file is about 2 Gigabytes (Figure 1). The CDR processing details are reported in Appendix A.1, Appendix A.2, Appendix A.3. Figure 1. Daily CDR files' sizes and AQI over the study period (N.B. the left y-axis represents the data size in megabytes, the x-axis date represents a day in the month in February 2015. The dotted line is a threshold AQI of 100 and represents a poor AQ in which it's recommended that sensitive citizen groups should cut back or reschedule strenuous outdoor activities). Figure 1 shows how the CDR file size distribution varies with AQI over the study period. In major national holiday periods such as the Spring Festival in China (from 18 February 2015), there is a considerable fluctuation in file size because of a significant people movement, which makes this research close to a natural experiment [28]. Additionally, because of the reasons introduced in Section 3.2 and Appendix A.4.1, the focus here is on the analysis of the relationship between people's activity and air quality from sunrise to sunset and the analysis of the corresponding relationship at nighttime is omitted.

Air Pollution
China's Ministry of Environmental Protection (MEP) reports real-time hourly concentrations for the major air pollutants such as PM 2.5 , PM 10 (particulate matter with an aerodynamic equivalent diameter of less than 2.5 and 10 µm, respectively), SO 2 , O 3 , NO 2 and CO at about 1000 monitoring stations. Beijing has 35 of these.
Among these six pollutants, MEP defines a city's "primary pollutant" as the pollutant which contributes the most to the air quality degradation on an hourly basis. MEP also releases a composite air quality measure, AQI, which is calculated hourly. PM 2.5 seems to dominate more in recent years [29]. Its smaller size makes PM 2.5 much more harmful for people's health than larger particulates, such as PM 10 . Baidu.com, China's most popular online search engine. which accounts for 93% of the search engine penetration rate in 2015 (CNNIC, 2015) and 91% in 2019 (CNNIC, 2019), shows that Chinese citizens' concerns for PM 2.5 are about 12 to 500 times higher than those of other air pollutants in the past several years (as computed using the Baidu Search Index tool http://zhishu.baidu. com/v2/index.html#/, (accessed on 1 March 2021). Thus, we use both PM 2.5 and AQI to detect their relationship with people's activity in our study. Other key pollutants (PM 10 , SO 2 , O 3 , NO 2 , CO) have been considered and are reported in Appendix A.4.1.

Data Collection and Accuracy Analysis
Excluding the CDR data, the data used in this study is measured by scientific calibrated instruments managed by national air pollution stations and collected according to international standards. The details of the sensors used to collect the data, including their measurement range, resolution, and accuracy, are reported in Table 4. Due to the different measuring principles of multi-sensors, the definitions of the accuracy of these differ. For the p.m. data, the accuracy could be reflected by parallelism of monitors (PoM), effective data rate (EDR) and comparison test of reference method (CTRM). In Table 4 we only show the EDR of the p.m. (others are reported in Appendix A.4). For other pollutants, the accuracy is defined as the indication error, while for weather condition data, the maximum allowable error could reflect the accuracy.
Although the data collection processing should meet the request of the international specifications, which ensures that the accuracy and repeatability of the datasets have qualified with the specifications before they were published as an open-source, sensors may fail which leads to inaccuracies in the data. Thus, we design our method set up as a randomly sampled experiment which serves as a cross-check of our results. This processing can eliminate the resulting error caused by sensor failure and other factors.
For the CDR data, we have computed the spatial accuracy to be about 500 m when estimating mobile phone users' density distributions, which has a high spatial resolution to recognise the ADLs of people based on their location. More details can be found in Appendix A.2, Appendix A.3, Appendix A.4

Data Pre-Processing
The CDR dataset is converted into hourly dynamic mobile phone users' density distributions to then extract people's dynamic activity at specific POIs. First, we extract the first 5 min of CDRs from each hour file as a representative sample of each hourly CDRs (to reduce the computation time) and then count the unique International Mobile Subscriber Identification Number (IMSI) as a representative sample mobile phone user. From the CDRs, we can derive the hourly density distribution of mobile phones corresponding to each VP (Voronoi polygon or cell-the area representing the coverage) of the base station. However, an unknown error may be caused by the uneven distribution of VPs, leading to an uneven positioning accuracy resolution, even though the POIs are evenly spatially sampled, e.g., the people density for a POI within a small VP is more accurate than for a POI in a big VP. To alleviate this, we use the kernel density estimation (KDE) method [30] to estimate hourly density distributions for a city (see Appendix A.3), with the raster resolution parameter set as 500 m (as justified and reported in Appendix A.2) Through using an inverse distance weighting (IDW) method [31] to interpolate the point values of air quality from 35 monitor stations in a city (Beijing), we map the hourly dynamic AQI and PM 2.5 density distributions and build the corresponding two spatialtemporal datasets. Then, we can extract the AQI and PM 2.5 values for specific POIs, at a density that is similar to the CDR-based density.

Overview
Two workflows are defined for our IoB framework (see Figure 2) based on if an ADL uses a POI that acts as an SPSA or not. Both workflows include three modules: input, processing, and output. Workflow 1 (red parts in Figure 2) focuses on the first category of ADLs, which covers four ADLs represented by five fixed POIs that are SPSAs: (1) sightseeing, (2) eating out (restaurants are POIs), (3) staying-in, (4) travelling by bus (a transport mode that uses bus stops as the POIs), and (5) travelling by subway (that uses metro stations as the POIs). The inputs include the mobile phone users' density distributions for these five fixed POIs and the corresponding air pollution, weather conditions and types of day datasets. Next, the processing module is used to build FEMs to detect the relationship between people's ADLs and air pollution, where any behaviour impacting indices (a coefficient in a FEM for people's density, e.g., β in Equation (1)), are computed as the outputs (Output 1). Based on the overall POI distributions in the whole city, the spatial-temporal distribution of the behaviour impacting indices is mapped (defined as Output 2).
In addition, before using CDRs to extract peoples' density values, we analyse if the spatial resolution of the calculated people distribution is accurate enough to extract values to represent the four ADLs (see Appendix A.2). We conclude that except for the restaurant POI/ SPSA ADL, all other four kinds of POIs can be used to represent the related SPSA ADLs in Strategy 1 (S1). Thus, to decrease the estimation error when processing FEMs for the restaurant POI, we present Strategy 2 (S2) that samples Voronoi polygons (VP) that includes large areas of restaurants (Appendix B). Then, S2 can also be applied to Workflow 1. Finally, Output 3 estimates changes in restaurant revenue due to air pollution based on the behaviour impacting indices.
Workflow 2 (green parts in Figure 2) focuses on the second category of non-SPSA ADLs, which covers one ADL, represented by the two transport mode POIs: (1) walking, and (2) cycling. For these inputs, a multivariate linear regression model is used to compute how many people tend to cycle or walk and whether they would curtail the distance travelled or change transport modes. Output 4 indicates the quantitative impact of air pollution on the average moving distance of people walking and cycling.
Note that the testbed to build the FEM and multivariate linear regression model is STATA16 (https://www.stata.com/, accessed on 7 August 2021), while the GIS related data is processed in ArcGIS 10.5 (https://www.esri.com/en-us/arcgis/about-arcgis/overview, accessed on 7 August 2021). Further, in terms of Workflow 1, we conduct random experiments to extract half of the POIs in each dataset randomly 10 times, and then the effect of air pollution on four ADLs influenced by significant changes of the AQI and PM 2.5 is investigated (Tables A7-A11). Then after getting the results of the random experiments, we build FEMs for all POIs (SPSAs) and the results are reported in Tables A12-A16. Furthermore, the random sampled experimental settings could be regarded as validation processing. The strategy of the validation can solve the potential endogeneity issue brought about by an omitted variable [26], as well as eliminate the error from inaccurate sensor data. Based on the 10 experimental results, we determine that PM 2.5 as part of an AQI influences people's activity only when the percent of significant coefficient (p-value < 0.05) is more than 60%.

Workflow 1 4.2.1. Input Module: Data Preparation and Input Set Up
At nighttime people tend to sleep and communicate less and interact less with the base station so these CDRs do not reflect the real users' density distribution during such a study period. Hence, we focus on the daytime period from 6:00 a.m. to 7:00 p.m. (13 h) when studying mobile phone users' records and also because in this period people can visually perceive the main air pollution. Some examples of the spatial-temporal distribution of the CDR-based people density are given in Figure A5 which is discussed later in Section 5.1. We merge the mobile phone user's density data with the POI-level hourly AQI, PM 2.5 concentration data and weather data. Data sources, definitions and summary statistics of the main variables are provided in Table 5.
We divide each day into different time slices for different types of ADLs. For sightseeing, using different transport modes, and staying-in, we define three daily periods, from 6 to 10 a.m. (P1), from 10 a.m. to 2 p.m. (P2) and from 2 to 6 p.m. (P3). So that the three periods each span 4 h, hence, the time slot 6 p.m. to 7 p.m. is excluded. For the eating out ADL, we consider two periods in each day: the lunch period 11 a.m. to 2 p.m. (P-Lunch) and dinner period 5 to 8 p.m. (P-Dinner). We omit the influence when sunset happens after 6 p.m. for eating out. For every period, we calculate the mean value of the POI-level hourly mobile phone user density, AQI, PM 2.5 concentration data and weather data, and then analyse these.

Processing Module, Output 1: Fixed Effect Model (FEM)
To study the main effect of air pollution on people's activity, we use a FEM panel regression approach as shown in Equation (1): Here, Y it is the dependent variable, which changes w.r.t the time and individual. α 0 is the constant, while α i is the individual effect which is time-invariant. We can set where the unobservable random variable u i repre-sents the intercept term of individual heterogeneity, called the individual effect. X it is a k × 1 vector representing the independent variables/ β is a k × 1 vector representing the correlation coefficients of X (Equation (2)). Z i is the unobservable independent variable which is time-invariant. ε it represents the idiosyncratic error. N is the index number of the individual. T represents the time number index.  The FEM method estimates the coefficient of air pollution impacting on people's ADLs, which is shown as below: First, fixing the i in Equation (1), the time is averaged, giving: While X i and ε it have similar definitions. Then, using Equation (1) In this step, Z i and u i have been eliminated.
Finally, we use the ordinary least squares (OLS) method to estimate β, which is called the Fixed Effect Estimator,β FE .
In our study case, the dependent variable is an ADL, represented by the CDR-based people density for a specific POI. Independent variables are classified into three categories: pollution (AQI/PM 2.5 ), weather conditions (temperature, wind speed, cloud cover rate, rainfall, snowfall) and type of day. Because in the study period, there is no rainfall or snowfall, FEM is defined in Equation (7): DENSITY it and POLLUTION it represent the people density and the pollution level of POI i at time t, respectively. In the FEM, we use AQI and PM 2.5 concentrations as the pollution variable, respectively. TEMP t , W I ND t , CLOUD t represents temperature, wind speed, cloud cover rate (it only has a t index as all the POIs have the same weather values at the same time). TD t refers to a type of day dummy variable. To control the time-invariant unobservables that vary across cities, we include the POI fixed effect δZ i . Note that unobserved factors are not classified, they are known unknowns and just grouped. Coefficient α 1 (is the corresponding coefficient ofβ FE ) reflects people's pollution responsiveness, which should be negative, while other coefficients α 1 , α 2 , α 3 , α 4 , and α 5 correspond to other independents observed variables. N is set as the POI number. T is set as 21 (days).
Before using the FEM in our method, the panel unit root test (PURT) is applied to each variable to see if it is unstable or not, which can avoid spurious regressions [32]. We chose the Im-Pesaran-Shin (IPS) test [33] method for this. Then we use the Harris-Tzavalis (HT) method to test for stationarity to see if the statistical properties of the time series change over time [34]: Equation (8) shows the IPS hypotheses, H 0 and H IPS 1 . If the test result rejects H 0 , that means the tested data is stable. Further for the HT test shown in Equation (9), if the test result rejects H 0 , this means the tested data is stable: The results show that the variables are all stable even though they span the Spring Festival holiday period, which means the condition of our datasets satisfies this requirement and hence, we can effectively use a FEM.
The type of day, such as weekday, weekend day, festival day, may cause a major impact on people's activity in different periods, we thus represent the influence of the type of the day as a dummy variable in the FEM. Furthermore, in many similar previous studies [9], weather-related conditions also play a key role in the analysis using FEMs.
In the next step, we input weather conditions, such as the temperature value squared to examine the non-linear effect of it on people's activities following [9]. We give the variable definitions and summary statistics in Table 5. We recognise that the actual relationship between air quality and people's activity may be generated by omitted variables that represent unknown factors that vary hourly for individual POIs. For example, historical POI sights in Beijing may be visited by tour groups outside Beijing, whose time plan would not likely be changed by bad air quality and even weather conditions as there may be no alternative day to visit such a sight.

Output 2: Spatial-Temporal Distribution of Behaviour Impacting Indices
We create and display a summary of the spatial-temporal distribution of the area impacted by air pollution, city-wide as follows. First, a city is divided into grids of 5 km × 5 km cells. Second, five types of POIs for the four ADLs are counted in every cell. Then the corresponding correlation coefficient is used as a weighting factor to create a summary behaviour impacting index and map the 3-dimension distributions for the three different 4-h daily periods. Equation (10) computes the summary behaviour impacting index for every cell as follows: (10) where N i is the i-th POI number that represents ADLs in a cell, m is the number of types of POIs, 5 in this study. A is the corresponding value of the correlation coefficients (Tables A12-A16). The final I NDEXs are shown as the different bars for the three periods. The index for each different period of a day is computed independently, so there are three distributions. We use two colours to distinguish the negative and positive final effects of air pollution: red means negative and green means positive.

Output 3: Estimating the Revenue Change
Then we describe how the influence of air pollution on the revenue of restaurants in Beijing is calculated. We use the correlation coefficient to represent the impact of AQI and PM 2.5 on people who eat out in restaurants during the lunch period. After getting the significant value (p-value) from the AQI or PM 2.5 impact α r 1 by the FEM, which means that when AQI or PM 2.5 change by one unit quantity, the density of sampled people reflected by mobile phone users' density in the restaurant VP decreases by α r 1 people/km 2 . Then because we use mobile phone users as the sampled people in Beijing, there is a sampling rate SR. Thus, if PM 2.5 increases by 1 µg/m 3 , the density of people in a restaurant VP would decrease by α r 1 × SR people/km 2 . Then the average area of the sampled restaurant A is computed. Based on the per capita consumption (PCC), considering the different cost levels of restaurants is 19 Chinese Yuan (CNY) [35], we can calculate that if the AQI or PM 2.5 change by one unit quantity. The total number of people who go to the restaurant for lunch would change by α r 1 × SR and the change in average revenue (CAR) of all restaurants in Beijing is computed using Equation (11) in one day as follows: To estimate the total change of the revenue of the restaurant when air pollution comes, the change in air pollution (Set to unknown x unit quantity) is used to estimate the final change in revenue of the restaurant by multiplying x by CAR. We consider if the other two transport mode groups, cycling and walking, are impacted by air pollution. For each group, two indicators of the responding people's number and movement distance are calculated hourly during the study period. These constitute six time series. After combining the weather conditions and type of day, we test the autocorrelation for each time series using the Ljung-Box test method [36], drawing the conclusion that all the time series have at least a first-order autocorrelation. Then we input the data into the Transport mode options analysis model as shown above and we use the Prais-Winsten method [37] to estimate the relationship between the activity of the two groups and air quality respectively.
From the CDRs, we can also extract a single user's trajectory based on a person's unique identity IMSI. The base station records a user's IMSI with a timestamp when this is combined with the location of the base station. We can derive the distance between two or more base stations to represent people's movement and then we can calculate the speed of people moving in a specific period. We define these two features as people's CDR-based moving distance and CDR-based moving speed. When this speed is within a range, it is believed that the mobile phone user is using a specific transportation mode. The experiments of Wang et al. [38] confirm that this method can generally obtain an 80-90% accuracy when inferring simple transportation modes, e.g., walking and driving. Furthermore, within the same case region, Beijing, Wang et al. [39] utilize CDR data to analyse travel distance between traffic zones and conclude that CDR data use for traffic mode analysis is feasible. Bwambale et al. [40] use the logit model to prove that CDR can capture the expected behaviour towards overlapping routes. All these studies demonstrate that CDR-based trajectories have very similar features to the ground truth ones for distance and speed.
According to [41], the average bike speed was 9.1 km/h in Beijing, and the walking speed was on average 5 km/h [42]. In each hour, the first 5 min is still sampled, and then all unique users are extracted using the unique IMSI. For each user or sample person, we get all the records sorted based upon continuous-time nodes and calculate the distance and speed in each section (defined as when one person moves from one base station to the next base station). If the speed is within 7 to 10.5 km/h in one section, we judge the user as a bike-riding person and then add one to the total number of this group and calculate the total distance in all cycling sections for this person. Walking has a speed lower than 7 km/h. Finally, we sum the number of people and total distance travelled respectively for each group. We get the two-time series datasets for distance and speed.
For the number of people in a group, we can easily calculate this from the CDR data, while we use Equation (12) to calculate the distance travelled by people: where DISTANCE t represents the summary distance of all people who have moved at hour t; n is the number of people who have moved in hour t. i is the identity. D ti P tij , P ti,j+1 is the final distance of Person i in hour t, calculated using the accumulated length from point j to point (j + 1).

Processing Module: Multivariate Linear Regression Model
After getting the number and distance of each group of people, we calculate the average value of AQI, PM 2.5 concentration and weather conditions within the whole of Beijing. Then we use a multivariate linear regression model from Equation (13) to estimate the impact of air quality on people's activity of these two groups respectively. The dependent variable is the number of, or the distance moved by people, which is defined as an ND-features of people. The independent variables include air quality, weather conditions and type of day as follows: where ND FEATUREt represents the ND-features and POLLUTION t (AQI/PM 2.5 ), while β 2 TEMP t , β 3 W I ND t , β 4 CLOUD t and β 5 TD t control the weather conditions and type of day effects. β 1 reflects the relationship between the ND-feature and air pollution. Because all variables are time series data, they have the potential for autocorrelation. Thus, we use the Prais-Winsten method [37] to estimate β 1 , which aims to decrease the influence of temporal autocorrelation.

Output 4: Average Distance Changing of Walking and Cycling
People β 1 represent a feature unit that changes when POLLUTION changes by one unit. For example, if the results of β 1 are significant, at a 95% confidence level (p < 0.05), when AQI changes by one unit, the number of people who cycle changes by β 1 units. If we get the two statistically significant level values β 1 of the number, and the distance of, people walking or riding a bike, the relationship function between POLLUTION and AverageDistance of the specific group can be calculated directly using Equation (14) as follows: where N is the hour-average number of each group and D is the hour-average distance of people moving during the study period. β n 1 is the β 1 when an input feature is the number of people moving in the group, and β d 1 is β 1 when the feature is the corresponding distance. The function consists of two parts, where the D−β d 1 ·POLLUTION N−β n 1 ·POLLUTION part returns the average distance impacted by POLLUTION, and D/N part calculates the original average distance for every person in the group. The difference between these two values reflects the changing average distance that varies with pollution where N, D, β n 1 and β d 1 are all constants.

Spatial-Temporal Dataset Description
For the CDR-based people density distribution spatial scale ( Figure A5), in the urban area such as Dongcheng, Xicheng Districts, the people density is much higher than the suburban area such as Huairou, Yanqing Districts, which suggests that the density decreases from the city centre to the surrounding areas. At a temporal scale, peoples' daily activities are reduced early in the morning (e.g., 6:00 a.m., Figure A5a,e), while the density gets higher in some same urban, central, areas in the afternoon time (e.g., 6:00 p.m., Figure A5b,f).
In Figure A6, it is obvious that the distributions of AQI in the study period has some irregular features. The overall trend of the AQI is from a high-value to low-value, to middle-value, to high-value, return to low-value, ( Figure A6a-u), corresponding to the line chart in Figure 1. Daily, the AQI changes slightly during the morning, noon, and afternoon. However, in a few daily cases, as shown in Figure A6a The spatial-temporal distributions of PM 2.5 are very similar to that of the AQI, especially for the overall temporal trend changes during the study period for the whole of Beijing. However, there are some daily differences between the AQI and PM 2.5 distributions. The spatial-temporal changes in PM 2.5 in one day is much more obvious than for the AQI. For example, on 14 February 2015, the PM 2.5 concentration is above 300 µg/m 3 in southeast Beijing in the morning ( Figure A7m), but in the middle of the day ( Figure A7n), it starts to spread to other places, resulting in the concentration in southeast Beijing decreasing to about 300 µg/m 3 but the southwest and northeast Beijing start to suffer more serious air pollution with a concentration of PM 2.5 above 200 µg/m 3 . Afternoons, almost all regions of Beijing have a PM 2.5 above 200 µg/m 3 . Figure 3 documents the relationship between pollution (AQI and PM 2.5 ) and people's activities during different daily periods. According to the right part of each subplot, we see that the overall AQI, and more specifically PM 2.5 , impacts specific kinds of human activities in the three specific four-hour daytime periods. We note that in the first period (P1, 6-9 AM), air pollution has a positive influence on people staying-in (Figure 3c), which indicates people are more willing to stay in, in the morning, while the pollution conditions seem to have far less or little impact on other kinds of activities (except the dining-out activity). During the second period (P2), peoples' activities of staying-in, using bus stops and subway stations, seem to be affected by air pollution, as shown in Figure 3c-e. For those who need to use transport, they tend to select bus and subway as their choice as they represent relatively closed-off areas that lessen the exposure to outside air pollution [43]. In P2, people tend to spend more time staying-in, at home, compared with P1. This is because period P2 covers lunchtimes, while in period P1 people generally work weekdays. In the third period (P3), air pollution impacts people who visit tourist sites, which has a negative relationship, indicating the higher the air pollution, the fewer the people who would visit these (Figure 3a). It is not hard to explain this because, since 2013, citizens living in China have improved their awareness to avoid the potential risk of illness when bad air pollution manifests itself as hazy weather (Lu et al., 2018). Air pollution tends to lower the desire of people to go to a restaurant (Figure 3b), as people may choose to cook food themselves as represented by the increasing staying-in ADL coefficient shown in Figure 3c. People eating out seem not to be impacted so much by air pollution in P3. This is after sunset when people cannot so easily visually appraise haze (in the dark). There are some differences between the overall AQI and more specifically PM 2.5 that influence people's activities. For example, the most significant influence is from PM 2.5 especially in the latter part of a day (P2 and P3), while AQI's impact is less significant and occurs mainly during P1. probability value (p-value) that are less than 0.05, which means the corresponding coefficients are significant within a 95% confidence interval among the 10 times they are repeated with different datasets. The red bars to the left of each subplot represent the percentage of the p-value is higher or equal to 0.05, which means there is no obvious relationship between the people's activity and AQI or PM 2.5 concentrations. We determine that PM 2.5 as part of an AQI influences people's activity only when the percent of significant coefficient (p-value < 0.05) is more than 60% (as indicated by a single length of green bar in the left graphs). We plot the mean correlation coefficient value and standard error for every group of experiments to the right of the Figures represented by blue (if the value > 0) and purple (if the value < 0) with the error bars (if it is no more than 60%, we do not plot anything in the right-side graphs). The results are reported in Tables A7-A11.

Output 2: Spatial-Temporal Behaviour Impacting Indices of Air Pollution on ADLs
Figure 4a-c illustrate the spatial distribution of the final summary index that reflects that ADLs are affected by air pollution. Here we note that in the morning period, fewer people are impacted by air pollution for most of the days when they go to work as usual, while the green pattern means that the impact is mainly positive because staying is the main part of the index. For the middle of the day, the impacted area of air pollution starts to cover the suburbs of Beijing as shown in the greener parts in Figure 4b, w.r.t morning period. Considering that people's activities may be affected both negatively and positively, the distribution patterns appear more complex-there could have both red and green parts at the same time. In the afternoon period, the main impacted activity is eating out, so all of the affected areas have a negative relationship with air pollution. A city centre may tend to have more accessible, well-known, frequently visited, tourist sites and entertainment sites, hence, the index is much higher than in regions away from the city centre. The results of the map of the summary behaviour impacting indices indicate that the impact of air pollution on ADLs not only has a spatial but also a temporal, disparity. We define the no data area as an empty area disparity. These impacts appear in three different patterns temporally in one day: full positive (e.g., P1), positive and negative mixed (e.g., P2) and full negative (e.g., P3). Similarly, at the spatial scale, the impact of such patterns is also seen. For example, in the middle of Beijing, it appears to be positive in P1, then negative in P2, and still negative in P3, thus, this pattern could be classified as a positive-negative-negative (PNN) group. While in some suburbs in north Beijing (e.g., the northernmost Huairou district), the patterns include empty-positive-negative (EPN) and empty-positive-empty (EPE).

Output 3: Restaurant Business Loss Estimation Due to Air
The average correlation coefficient value of the random experiments' result is −0.236 (p < 0.001) from the PM 2.5 impact, which means that when PM 2.5 increases by 1 µg/m 3 , the density of people reflected by mobile phone users' density in the restaurant VP decreases by 0.236 people/km 2 . Thus, it is estimated that air pollution tends to cause a revenue loss for restaurants. Because we sample the mobile phone users in the first 5min of every hour and their average sample number is 1.1 million each time during daytime, we use a scaling factor to project this to the whole of the (Beijing) city population. In 2015, Beijing had 21.7 million people, so the scaling factor is roughly 20. Thus, if PM 2.5 increases by 1 µg/m 3 , the density of actual people in the restaurant VP would decrease by 4.72 people/km 2 .
In a similar study, Zheng et al. [25] focused on how PM 2.5 can affect people's eating out in Beijing. They conclude that when the concentration of PM 2.5 increased by one standard deviation, the number of people eating out decreased by 1.05%. In our case, if PM 2.5 increases by 1 standard deviation (92.99 µg/m 3 ), the density of actual people in the restaurant VP would decrease by 4.72 × 92.99 ≈ 438.9 person/km 2 , equal to a decrease in 10% of people eating out for lunch. The number is much higher than the study of Zheng, this may be because they combine types of eating out for breakfast, lunch and dinner, while our study only considers lunchtime and dinner. Further, another similar study, Gao et al. [24] concludes that for every 1% increase in the concentrations of PM 2.5 , the dining-out frequency of urban residents reduces 0.059% around Beijing in 2016. In our case, if PM 2.5 increases by 1% (0.97 µg/m 3 ), the density of actual people in the restaurant VP would decrease by 4.72 × 0.97 ≈ 4.59 person/km 2 , equal to a decrease in 0.44% for people eating out for lunch. The qualitative results of the two studies are consistent with ours.
Further, according to Equation (12), because the average area of a sampled restaurant A is 225m 2 the CAR (using Equation (12) Table 6 summarises the results. More details are given in Tables A17-A19. It is seen that both the numbers of people and distance of movement are impacted by AQI, negatively, when groups consist of people walking and riding (normal, manual) bikes. For the walking group, the value of the correlation coefficient between the number of them and AQI is −7.466 with a 0.031 p-value, while the correlation coefficient of the distance of movement and AQI is −1.201 with a 0.032 p-value. Our research demonstrates that air pollution has a specific negative impact on specific transportation modes, which means that citizens already have an awareness to avoid air pollution. However, in some specific cases, people may not be able to avoid bad air pollution. Furthermore, as the number of bikes sharing schemes increases in many cities in China, this provides greater convenience for ad hoc cyclists but may also incur a financial expense. If bad air pollution arises, this may become under utilised, advertently.

Output 4: Changes in the Average Distance Travelled by People Walking and Cycling
Equations (15) and (16) reflect the changes in the average distance for each group w.r.t AQI changes. It is interesting to note that the relationship between these two variables is nonlinear and has a monotonically decreasing function. This means that as AQI increases, the average number of people walking and cycling decreases. For cycling, the hourly average number of people in each group is 8725 km, while the hourly-average distance moved by people is 19,276 km. The correlation coefficient result is −7.27, thus the relation between average distance for cycling group people and AQI is as follows: For the walking group, the N is 782 and D is 4485, while the correlation coefficient result is −1.201, thus the relationship between the average distance for the walking group and AQI is as follows: The curves of the two equations are shown in Figure 5 and can be used to compute how the average size of the distance moved by people cycling or walking decreases when air pollution worsens. For example, when AQI increases by 200, the cycling distance decreases by about 0.096 km, while the average walking distance decreases by 0.213 km. Hu et al. [21] concluded that when the AQI decreases from excellent to severely polluted, the average distance of people cycling decreases by about 0.26 Km per person, while for people walking, this decreases by about 0.8 Km per person. In our case, when the AQI changes from excellent to severely polluted (AQI increases 300), the cycling distance decreases by about 0.14 Km, while the average walking distance decreases by 0.32 km. Although the study of Hu et al. was also in 2015, the data of the study were collected from 1243 mobile application users all over China, which could explain why their results differ somewhat from ours.

Conclusions
In this study, we first define the internet of behaviours (IoB), then we apply an IoB framework to explore whether, and how, air pollution changes affect people's specific activities of daily life quantitatively. In the IoB framework, the qualitative and quantitative impacts of air pollution on the four ADLs could give viable advice to authorities and businesses to better manage their service resources more appropriately. Our case study first provides a good application for IoB, which aims to link and analyse multiple human behaviours on mass and output this as possible feedback to the users themselves. Second, we also create a methodology that can contribute to the further development of IoB systems, frameworks, or other related components such as algorithms, communication protocols, and more diverse types of human physical behaviour detecting sensors such as millimetre wave radar, ultra wide band (UWB) and lidar.
The methodology of an IoB system presented in our study could be applied to other cities theoretically under specific conditions. These conditions are mainly related to the dataset, which is summarized as follows: Because this study focuses on people's activities of daily living (ADL), a dataset that can estimate people's density distribution needs to be acquired from service providers such as Telcom companies or Internet-wide service providers such as social media companies. To be more specific, a city fully covered with telecom base towers could generate the CDR data, which could be applied to compute the ADLs on mass in this study. Such data needs to be shared by a service provider but often this is regarded as a commercial product by them even for special cases such as scientific use, which is costly; sometimes only more historical rather than current data is shared. If such CDR data cannot be accessed, other geographic data with similar features (spatial and temporal resolutions, etc.) could also be used, e.g., Tencent position data (https://heat.qq.com/, accessed on 1 March 2021), Baidu heatmap (https://mtj.baidu. com/, accessed on 1 March 2021), etc. Besides the CDR data, other datasets including air pollution and weather condition datasets also need to be obtained and fused which is complex to do because data in different data sets may have different data structures, metadata, linked data and semantics. In addition, these datasets should have two dimensions (individual and time) with high spatial and temporal resolutions, to be able to be used as panel data, to apply FEMs. An IoB framework can serve different groups of people based on their roles in society, such as citizens, governments and businesses. Hence, we propose some practical recommendations here: first, when facing the threat of bad air pollution, citizens should improve their awareness to avoid this potential great harm and take some protective measures. At the same time, as citizens, we can each increase our awareness to protect the environment, or we may face more and more environment-related threats in the future.
In terms of city authorities, besides controlling air pollution from sources such as industrial emissions, these could elect to take appropriate mitigation measures, i.e., planting more leaf or broad-leaf tree species which have been proven to have a high dust-retention capability in regions where particulate matter threatens people welfare more according to behaviour impacting indices. For example, in suburban areas with a limited green space, especially close to the bus stops or subway stations, planting high percentages of Pinus tabulaeformis and Platycladus orientalis type trees can help to clean the air.
Further, transport companies could arrange different fees for travelling at different times, such as, in peak hours, public transport ticket prices could be decreased to encourage more citizens to take public transit. Businesses could use air pollution forecasts and IoB models to conduct expedited business operations to reduce losses or gain greater profits. For example, restaurant managers could consider business solutions, such as proposing special offers at lunchtime to attract people, through calculating the costs and benefits because air pollution would decrease the number of people who want to go out for lunch. But at the same time, restaurant managers should fulfil their social responsibility of protecting citizens' health by reminding potential customers to implement necessary measures, such as wearing a mask on the way to the restaurant. Further, because an increasing (worsening) AQI would decrease the number of people who want to cycle, as well as the average distance they ride, bike-sharing companies could adjust the charging strategy appropriately, such as reducing the cost per hour, to attract more users to ride, to reduce their potential loss in income. But in terms of their social responsibility, they could also increase the cost of riding per hour, to encourage citizens to use more public transport, to reduce their duration and exposure to air pollution outdoors.
Despite our achievements, our work still has some limitations: First, the study period and case region could be extended to detect spatial-temporal disparities. However, it is very difficult to gain access to CDR data from service providers for longer periods. The use of this methodology in other applications/studies needs a high amount of data, that maybe heterogeneous in character and may lack accessibility. Second, transport modes did not consider private cars or taxis because classifying these is difficult based upon our dataset. Third, deep machine learning could be performed to compare with the statistical models in our study to check the robustness of our study. Fourth, no quantitative comparison can as yet be performed with the work of others as, to the best of our knowledge, no one else has studied the effect of air pollution changes on a wider range of ADLs such as sightseeing, staying-in and travelling by bus or subway at this time. In the future, the methodology of the IoB system could be applied in other cities to test its robustness and to advance some of the limitations above.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A.1 Additional Materials for CDR Dataset Description
A CDR contains: The International Mobile Subscriber Identification Number (IMSI) that is a unique code for every subscriber identity module (SIM) card to identify users on the network; a timestamp that records when interactive communication events happen; Cell Identity (CI) corresponding to the base station location (Table A1). CDRs are generated every 1 s (its temporal resolution of recording) and stored in a Comma-Separated Values (CSV) file. We used the locations of all the 51,216 mobile base stations shown in the Global System for Mobile Communication (GSM) engineering parameters inner structure (Table A2). However, because some base stations are so close that the identified latitude and longitude are practically the same, we combined "collocated" base stations such that their number reduces by about 2/3 from 51,216 to 17,445. The coverage area of each mobile base station can be approximated as a Voronoi polygon (VP) that is built around it (Thiessen, 1911).
When a phone is used to make a call or send a text message, its location is found via being in range of the specific mobile base station. Lat, Lon Latitude and longitude of the base station location Figure A1 illustrates more details for hourly record number information, from which we observe that for most days, numbers of records increase suddenly from 6 a.m. These also become minimal from 11 p.m., which is because people call or text much less at night when most people are asleep. However, during the daytime, the number of records reflect better the active level of people's activity. Figure A1. Hourly distribution of CDR number over the whole study period.

Appendix A.2 Error Analysis When Estimating Mobile Phone Users' Density Distributions
In many cities such as Beijing, the use of mobile phones in an urban area is larger than in a suburban one and the density of the base stations is larger, while in a suburban area, a single station covers a bigger circle area impacted by the requirement and terrain. Figure A2 shows the distribution areas of the base stations. It can be observed that more than 50% of areas of base stations are less than 0.25 Km 2 , corresponding to a 500 m positioning resolution if we represent or abstract the base station as a square. In Beijing, almost 80% of people live in the urban area, which is covered by a dense mass of base stations so that we can regard CDR-based positioning accuracy as 500 m.
Then we are concerned if the resolution of the CDR-based positioning density distribution is accurate enough to let the extracted value of a vector point accurately represent people's density or other feature values for the POI.
A POI is a random sample point in a region because a vector point cannot represent an area, for example, a restaurant POI does not mean this restaurant has only one sampled point. In terms of the sightseeing ADL, these areas consist of sites such as parks that are used for touring and for leisure and are larger than 1 Km 2 . Community regions have a similarly large average area as well. In terms of the transport mode options, except for people walking and bike-riding, we focus on people who intend to take a bus and the subway. Bus stops and subway stations are the POIs that we use to represent the people's density values to investigate any potential changes in these concerning air pollution. According to the statistics from a route planning website (https://lbs.amap.com/getting-started/path, accessed on 1 March 2021), the average distance between bus stops in Beijing is over 1 Km, which is twice as long as the KDE density distribution with a 500 m spatial resolution in the main urban area. Note, the average distance between two adjacent subway stations in Beijing is about 1.5 Km [44], which is much larger than 500 m. Furthermore, the transport bus and subway POIs sites tend to be set beside main roads in Beijing. There are fewer other types of POI close by, such as restaurants or apartments, so we can use these POIs to collect information on how people move by bus and subway. Thus, these POIs can represent the situation for these two kinds of activity. However, in terms of the eating out option, some restaurants may be part of a big mall, mixed in amongst other kinds of shops such as clothing stores representing other human activities. Using only a single POI (only eating out) to represent here may cause a bigger error. To solve this problem, we present an additional strategy to decrease the error and test for this in the model and define it as Strategy 2 (S2) while the previous one is defined as S1. The details of S2 are introduced in Appendix B.

Appendix A.3 Error Control Solution When Estimating Mobile Phone Users' Density Distributions
To decrease this unknown impact, we present a standard method for spatial resolution as follows. First, we randomly generate an equal number of points in each VP that is equal to the actual records of sampled mobile phone users in every corresponding VP, where every point represents one mobile phone user. Then, we use the kernel density estimation (KDE) method to estimate hourly density distributions for the whole of Beijing, with the raster resolution parameter set as 500 m. The final distribution raster maps the datasets for continuous hours with a geospatial resolution of 500 m × 500 m. Although the errors could still be spatially uneven, this method can reduce the error when extracting values (i.e., mobile user density, PM 2.5 concentration, etc.). POI, especially when the POI is within a bigger VP. For example, Figure A3a shows the density distribution using KDE, while Figure A3b shows a simple symbolization method to display the density for every VP. we divide the density into five levels in this case and as the level index increases, the density increases. POI A and B are located at the same VP on different sides of it. In (b) the original density distribution, A and B have the same density value; however, because of the spatial autocorrelation theorem, A should have the value closer to level 4 or level 5. Hence, after the KDE process stage, point A get a value at density level 3, which is more accurate. Throughout this paper, we study the role of air pollution on people's activities. We recognize that there are several pollutant criteria. We have emphasized the central role of PM 2.5 both because we observe this variable's value by POI/hour and because several independent research studies have documented its role in raising the mortality and morbidity risk, e.g., [45,46]. Our focus is on the Air Quality Index (AQI) and concentrations of key air pollutants (PM 2.5 , PM 10 , SO 2 , O 3 , NO 2 , CO) and their correlations for Beijing, see Table A3.
Scientific studies show that high concentrations of a particular matter can cause severe air pollution problems in some Chinese cities in recent years [47,48]. gov.cn/hjbhbz/bzwb/dqhjbh/dqhjzlbz/201203/W020120410330232398521.pdf, accessed on 1 March 2021), which is also reflected in the study period for 1.8% of the hours for O 3 and 15.5% for NO 2 . According to MEP's AQI data, PM 2.5 was the primary pollutant for 64.3% of the hours and PM 10 was in 18.7% of hours the major polluting factor in our study period.
People can visually perceive visible particulate matter in the air, thus they perceive PM 2.5 and PM 10 with their eyes. SO 2 is an odorous gas that is emitted with industrial smoke and other coloured sulphides; people can see and smell it at high concentrations. However, during this study period, its concentration was low enough not to be perceivable. As ground-level O 3 and CO are both invisible and odourless; people tend to be less likely to perceive their effects. NO 2 was always at a low concentration level during the study. However, it reacts with some organic compounds in the air to increase other pollutants such as O 3 and PM 2.5 , which means it may be more indirectly, rather than be directly perceived (see Table A3). Also note the individual elements of AQI seem to be highly correlated with PM 2.5 , except for SO 2 and CO, which are consistently low. PM 2.5 is highly correlated with PM 10 (correlation coefficient = 0.703, p < 0.001), AQI (correlation coefficient = 0.625, p < 0.001) and NO 2 (correlation coefficient = 0.723, p < 0.001). In contrast, O 3 is negatively correlated with PM 2.5 . Thus, PM 2.5 is the primary pollutant for the majority of days in Beijing during our study period. In terms of particulate matter (PM), PM 10 and PM 2.5 are collected by a continuous monitoring system that consists of a sample acquisition unit, sample measurement unit, data acquisition and transmission unit and other auxiliary equipment. The measuring methods of the monitoring instruments configured in the system are the β-Ray absorption method and tapered element oscillating microbalance (TEOM) method, which are performed in a PM 2.5 sampler or PM 10 sampler. The principles and operation details of the two methods are specified in the related standards, which can be accessed from the National public service platform for standards information (China) (http://std.samr.gov.cn/, accessed on 1 March 2021) (Note all the specifications or standards in this paper refer to this platform).
Range and resolutions are also key parameters when collecting data using sensors, which are described in Table A4. Furthermore, the accuracy and repeatability of the data collection, reflected by the use of parallelism of monitors (PoM), effective data rate (EDR) and comparison test of reference method (CTRM) are reported in Table A4. The definitions of the indicator are as follows.
PoM: Root mean square of each batch data result: In the same test environment, adjust the inlet of the three monitors to the same height, and the distance between the monitors is 2-4 m. After the calibration and setting of the sampling flow, the instrument parallelism test is carried out.
EDR: After debugging, the monitor will run continuously for at least 90 days to test the effective data rate. During this period, the maintenance time and details are recorded, and the daily average value of the three monitors to be tested, are analysed.
CTRM: At least three samplers are used for the reference method, meanwhile, an automatic testing monitor works simultaneously. The automatic monitoring data C and the reference method test data r in the same sampling period are taken as a data pair, and a total of 10 groups of samples are tested. Then the reference test data and the corresponding automatic monitoring data are analysed using linear regression, and the slope k, intercept B and correlation coefficient r of the test regression curve are analysed.
Here we list all the referred specifications or standards in In terms of the other four pollutants, the monitoring system consists of the sampling device, calibration equipment, analytical instrument, data acquisition and transmission equipment. The system collects the pollutants data using a point analyzer, which refers to the monitoring and analysis instrument that collects the ambient air through sampling the concentration of an air pollutant at a fixed point.
The measurement parameters, such as the measurement range and the sensor resolution, and the sensors themselves used to measure each pollutant are shown in Table A5. The indication error represents the accuracy of the collected data, which is defined as follows After the monitoring system runs stably, a zero-point calibration and full-scale calibration are carried out respectively, a standard gas with a concentration of about 50% of the range is introduced, and the display value is recorded after the reading is stable; Then a zero calibration gas is injected. The test is repeated three times, and the indication error of the analytical instruments are calculated according to the formulae given in specifications.
The reference standards used are given below:  Weather data are collected from the National Oceanic and Atmospheric Administration (NOAA). The data are collected from weather stations included in the National Climatic Data Centre (NCDC) of NOAA (https://www.ncdc.noaa.gov/, accessed on 1 March 2021). The temporal resolution of the weather data is hourly, but in space, they are from only one pollution monitoring station because the weather conditions do not vary across Beijing at the same time. Most of our POIs distribution is in the core area of Beijing, where the temperate and other weather conditions vary less across this area, which has little influence on the regression results. It is worth mentioning that in our 21-day study period, the rain and snow value are all zero. Hence, in this study, we considered the weather factors to only include temperature, wind speed and sky/cloud cover.
Weather sense is governed by international standards. The key parameters of the meteorological sensors to collect corresponding data are shown in Table A6, including the range and resolution, accuracy. Measurements made using scientific instruments are repeatable as evidenced by the observation that readings don't change when weather patterns are stable.
Standard specifications can be downloaded from national public bodies according to a unique identifier as follows:  Note that in terms of determining the Cloud coverage in the sky, this is often determined from visual measurements and image analysis and may even be determined manually.
(1) POI To study how SPSAs might be impacted by air pollution, we analysed this relationship for four situations: if people go out sightseeing, if people eat out, their transport mode options and if they stay in options. The point of interest (POI) for every situation are different datasets that were obtained from the AutoNavi Software Limited Company (https://mobile.amap.com/, accessed on 1 March 2021). In terms of sightseeing, we consider whether or not the place is free (of charge) to visit as this could influence whether people go to visit them or not. We extract 200 sightseeing POIs that are free, such as Chaoyang Park, Nanluoguxiang and Tiananmen Square, and 200 POIs where citizens need to pay to visit, such as Lama Temple, The Summer Place and Yuyuantan Park. For the eating out option, 200 restaurant POIs in Beijing have been identified or extracted. We also extract 200 house or community POIs to study the impact of air pollution. To reflect people's transport mode options, we selected bus stops and subway stations as POIs (linked to the use of bus and subway as commonly used transport modes) that are static or fixed positions or waypoints during citizens' use of transport. For these two kinds of transport mode POIs, we select or extracted 100 of each to a total of 200 POIs. It is important to note that sightseeing POIs can represent at least 1 Km 2 of the area around the points, which are also called buffer zones. This decreases the error relating to the limit of CDR-based positioning accuracy, which is analysed in more detail in Appendix A.2. Note also that all POIs are extracted, spatially randomly, which means they have a very dispersed spatial distribution for each type of POI.
(2) Building distribution Building spatial distribution data is represented as ESRI polygon data in shapefile format (https://www.esri.com/library/whitepapers/pdfs/shapefile.pdf, accessed on 1 March 2021). This data covers the main urban area defined to be within the sixth ring road in Beijing. The primary use of this dataset is to extract the area of a building within a single base station VP (S2, Appendix B), and then to calculate the proportion of a restaurant area in this part of a building (Section 4.2.4).

Appendix B An Additional Strategy (S2) for the Eating out ADL
Although in some areas such as large shopping malls, restaurants may be scattered amongst other kinds of shops, we can just select those specific base stations where the POIs nearby mainly consist of restaurants. We define a restaurant community (RC) area using the following rules. If a base station VP has N number restaurants while the building area in this region is B Km 2 , and they then meet the following condition N × A > B × 50%, where A is the minimum area of 80 m 2 for a restaurant based on Beijing catering enterprise operating area access standards in 2007, the base station VP is an RC. It is worth mentioning that we assume a restaurant with an 80 m 2 area is too small to have more than one floor. Based upon this rule, and my building distributions ever and POIs dataset, we sample 71 RCs as an additional area to study. For example, Figure A4 shows the Sanlitun area, which is one of the most famous business districts in Beijing, but only the VP that meets the RC judging condition would be sampled, as shown by the red outlines. Then, we extract the average density values, air quality data and weather condition data over the polygon, and finally, we still use the same FEM to analyse the relationship between the air quality and the eating out the option of people.

Appendix C.1 Spatial-Temporal Distributions for Key Datasets
This appendix aims to describe the spatial-temporal characteristics of the CDR-based people density distribution, AQI and PM 2.5 IDW result in the study period and region. Because every hour during the 21 days' daytime has one distribution per kind of dataset, there are more than 2500 distribution maps. Thus, we only plot some examples for three key datasets, CDR-based people density, AQI and PM 2.5 , as sampled distributions and describe the characteristics for their spatial and temporal features here.

Appendix C.2 The Results of FEM Regressions
We report the FEM results for different POIs in different periods in a day in Tables A7-A16 (using Equation (8) in the main manuscript). Tables A7-A11 show the 10 times of the conducted random experiments (see Section 4.2.2 in the main text) involving sightseeing, eating out/restaurant, staying-in and travelling via bus stop and subway station, POIs. Columns (i), (iii) and (v) in Table A7, Table A9, Table A10, Table A11 show the impact of the AQI on people's activities in three periods that have been defined in the main text. Columns (ii), (iv) and (vi) illustrate the impact of the PM 2.5 concentration on people's activities. Table A8 shows this for the eating out POI. It has a similar format to the other 4 POIs but just has 4 columns because it only includes the two time periods used mostly for eating. In Table A8, the correlation coefficient between AQI and mobile phone users' density (MPUD) in the lunch period (11 AM to 2 PM) and the dinner period (5 PM to 8 PM), are reported in columns (i) and (iii), while the MPUD coefficient with PM 2.5 is reported in columns (ii) and (iv). Tables A12-A16 show the results of the FEMs that use all POIs, where the correlation coefficients are utilized to compute the spatial behaviour impacting indices in Section 4.2.3 of the main text. Tables A12-A16 have the same structure  as Tables A7-A11.           Note: The dependent variable is the mobile phone users' density (MPUD on a POI in a period. Robust standard errors are clustered by POI and reported in parentheses; * p < 0.05, ** p < 0.01, *** p < 0.001.

Appendix C.3 The Results of the Mutilative Linear Regression
We input the time series into the regression model given in Equation (14) in the main manuscript. The Tables A17 and A18 are the results for the groups' bike riding people and walking people. For each table, columns (i) and (ii) show the estimation of the number of people in the group impacted by AQI and PM 2.5 , while the other two columns show the distance. Because the input data are time series, they might have an autocorrelation. Hence use Prais-Winsten (PW) method to decrease the negative impact of this. The comparison between the Durbin-Watson test value before and after using the PW method are shown in Table A19, which shows the benefits of using the PW method to decrease the negative impact of autocorrelation.