Humans-as-a-sensor for buildings: Intensive longitudinal indoor comfort models

Evaluating and optimising human comfort within the built environment is challenging due to the large number of physiological, psychological and environmental variables that affect occupant comfort preference. Humans are often better than sensors at capturing all of these disparate phenomena and interpreting their impact; the challenge is collecting spatially and temporally diverse subjective feedback in a scalable way. This paper presents a methodology to collect intensive longitudinal subjective feedback of comfort-based preference using micro ecological momentary assessments on a smartwatch platform. An experiment with 30 occupants over two weeks produced 4,378 field-based surveys for thermal, noise, and acoustic preference. The occupants and the spaces in which they left feedback were then clustered according to these preference tendencies. These groups were used to create different feature sets with combinations of environmental and physiological variables, for use in a multi-class classification task. These classification models were trained on a feature set that was developed from time-series attributes, environmental and near-body sensors, heart rate, and the historical preferences of both the individual and the comfort group assigned. The most accurate model didn't use environmental sensor data and yet had multi-class classification F1 micro scores of 64%, 80% and 86% for thermal, light, and noise preference, respectively. The discussion outlines how these models provide comfort preference prediction as good or better than installed sensors, even in situations when some occupants are not willing or able to wear smartwatches. The approach presented prompts reflection on how the building analysis community evaluates, controls, and designs indoor environments.


Introduction
Many office workers are familiar with the battle of the thermostat, or that co-worker who talks loudly on the phone. Many researchers in indoor comfort are also aware of the high rates of discomfort amongst office workers [1,2]. Vast global efforts have been undertaken to evaluate this discomfort, and with that knowledge, build models that can be used for the design and control of buildings. In the realm of thermal comfort, for example, two dominant models are in use. The first is the Predicted Mean Vote (PMV) that models comfort based on heat transfer characteristics between the human and their surrounding environment [3]. The other, more modern version, is the Adaptive Comfort model that includes the human adaptability to climate, drawing a linear relationship between the indoor and outdoor environments [4].
The underlying issue with modelling human comfort is the sheer number of variables present and the difficulty in accurately measuring them. Figure 1 highlights this issue by detailing a list of studied physiological, psychological, and environmental variables that influence thermal, visual, and aural comfort. While the empirical models in the academic literature are capable of incorporating a handful of these variables, the exclusion of the rest can cause significant errors. One reason is that the interrelationship between different indoor environmental parameters is not well-known [5]. It was shown in a recent study that the lowest indoor environmental satisfaction factor drives the overall satisfaction [6]. For example, while one can measure the temperature and humidity of a room, the type of meal a person ate, and even the spices present in the meal, can put the human body in a different state of thermal perception [7,8]. Furthermore, most of the studies that depend on measuring environmental variables using mobile carts with mounted sensors [9,10,11] or low-cost continuous sensing sensors [12] face problems related to the accuracy and calibration [13]. While being comprehensive in capturing most comfortrelated factors, Figure 1 excludes literature about physical and mental ailments, which further adds variance to the models.

Can humans be a cost-effective sensor?
The human nervous system is essentially designed to detect sensation and convert it into the thoughts that are the very foundation of the word comfort. What if occupants in buildings were asked about their subjective preference in spaces, instead of only measuring numerous variables and using them to infer comfort? Collecting enough comfort preference feedback from a single person over days or weeks would take advantage of a human's ability to evaluate dozens of variables simultaneously, including those that are difficult to measure. How can this type of methodology be accomplished in a scalable way without annoying occupants too much or inducing survey fatigue? Can this approach provide insight into comfort problem areas that contemporary sensors are too expensive or problematic in implementation?
The goal of this paper was to test the ability of an intensive longitudinal method to capture numerous environmental feedback data from experimental participants in a field setting. This study uses Micro Ecological Momentary Assessments (EMA) as a subjective feedback methodology that overcomes many of the challenges presented by traditional methods [51]. Micro-EMA is a method of using a smartwatch interface to prompt and collect momentary, right-here-right-now subjective feedback from a single person over several weeks [52].
Receiving a large amount of feedback from a single person in a diversity of spaces and comfort exposures provided the ability to understand the comfort preference tendencies of a person. It is proposed that these behavioural tendencies can be used to segment people into groups related to how they perceive their environment. Grouping people with similar comfort preferences could, therefore, increase the accuracy of predicting where a person will be comfortable and what the system can to respond without additional sensors. Additionally, collecting large amounts of subjective preference data from numerous people in a particular space can characterise the comfort-related attributes of that space independent from the sensors installed. If technically scalable and not too disruptive to an occupant, using humans-as-a-sensor in buildings could change the way post-occupancy evaluations, building and system design, and controls and automation are done. There would be opportunities for people to provide feedback for short-term uses (days or weeks) for building commissioning or tuning or long-term (months or years) for continuous system control and management. This work complements the momentum from other disciplines focused on the use of humans as sensors for applications in detection of events using social media data [53], for detecting emergencies [54], and for cybersecurity [55].

Paper overview
This paper presents how high-frequency micro-EMA, combined with sensor data time-series analysis, can enable the evaluation, control, and rethinking of the design of indoor environments. Section 2 first gives a more detailed overview of foundational work in indoor preference capture and modelling and the novelty being proposed. Section 3 provides a comprehensive  explanation of the design and deployment of a smart watchbased subjective preference data collection and environmental variable measurement system. Section 4 details the results from a field-based implementation at the National University of Singapore and the testing of various preference models based on intensive longitudinal data. Finally, Section 5 and 6 discusses integration methods in buildings, limitations, future work, and details on how to reproduce the study using open data and code.

Background and novelty
This work builds upon previous literature focused on the measurement of factors that may influence thermal, visual, and aural comfort in the built environment. These modelling techniques are converged with an intensive longitudinal experience sampling technique that is common in the medical and psychological communities, but only emerging in the analysis of buildings. This section covers previous work in the building context using intensive longitudinal data and an overview of the novelty of the work in this paper as compared to the literature.

Indoor environmental comfort variables and models
There are generally two models types used in the literature for indoor comfort assessment: 1) objective-subjective, and 2) objective-criteria [56]. Which method to use is decided based on the aim of the evaluation. On the one hand, the objectivesubjective model combines the indoor environmental measurements from sensors with the subjective feedback from users, mostly in the form of post-occupancy evaluation (POE) surveys [57,58,59,60]. This combination of data is used for predicting indoor environmental satisfaction (IES) or for Model-Predictive Controls (MPC) of indoor environments amongst other applications. On the other hand, the objective-criteria model is used in ranking or rating a building by comparing the indoor measurements from IEQ sensors with building performance measurement protocols such as LEED or WELL certifications [11]. Both of the methods have drawbacks both in measuring the environmental data as well as surveying occupants [56].
In terms of environmental measurements, work has been done that used accurate sensors that were mounted on movable carts [61,11]. However, these sensors were not affordable to all building operations scenarios [56]. The affordability challenge was met using low-cost continuous sensing sensors that required frequent calibration [12]. Nevertheless, the location of these sensors in buildings, and interpolation of the readings still represented a challenge in the literature, given the fact that indoor spaces are heterogeneous [62]. On the other side of the spectrum, surveys pose some problems related to questions, e.g. what to ask, whom to ask, and how to interpret the results [56]. Additionally, Porter et al. [63] discussed the term survey fatigue in which users feel overwhelmed by questions that may lead to a misrepresentation in responses and reduced response rates. Surveys that cause survey fatigue, mostly come in the form of long surveys that distract users from their primary tasks, leading them to fill-in wrong data [63].
A related area of recent focus is the use of wearable and infrared radiation sensors to capture near-body physiological data that define the environmental conditions close to or at the skin surface of an occupant. A recent study focused on creating personalised comfort models from these data in the context of field-based deployment on 14 subjects [64]. This deployment and the models produced used wrist and ankle skin temperature from several sensors placed on the participants and a smartphone application to collect surveys. Further work in the indoor context showed that both wearable sensors and infrared radiation cameras led to a 3-4% increase in accuracy of thermal comfort sensation prediction, marginally justifying the cost of implementation in a field setting [65].

Ecological momentary assessments (EMA)
The next area of background focuses on the challenge of collecting large amounts of longitudinal data from a person. Many fields of study have relied upon the ecological momentary assessment [51] methodology to meet this challenge. This method is a type of intensive longitudinal experience sampling most often utilised in studying human behaviour. The word ecological describes that fact that the measurement is taken in the subjects' natural environment without impacting their task at hand. The word momentary pertains to the fact that feedback is requested at the moment of experience, as opposed to asking a subject to recall a past experience. And finally, the assessments are not static one-off outcomes but occur over time, thus accounting for temporal dynamics. Traditional models found in literature such as surveys are insufficient as their sampling rates are low, require the occupant to completely stop their task at hand to focus on the survey, and in many cases, ask for a recollection of past experiences. There is the further issue of survey fatigue [63] and even when willing to participate, there is a concern about how accurate their responses are [66]. The use of a smartwatch for data collection, coined micro ecological momentary assessments, has been shown to be so user-friendly that it does not significantly disrupt any ongoing activity [52]. Furthermore, an eight-fold increase in sampling frequency can be obtained, in comparison to smartphone use, without burdening the user. Recent work has used ecological momentary assessments to assess the built environment through the use of smartphones [67]. While such applications are a step in the right direction, they were only able to collect eight feedback points per occupant, which is insufficient for time-series analysis.

Similar work in intensive longitudinal data collection in the built environment
Intensive longitudinal methodologies have begun to emerge as a way to characterise occupants for various built environment objectives. In the urban context, several studies have deployed sensors on people to understand their experiences across their daily lives. A large study based in Singapore used thousands of wearable sensors in populations of students to discover travel patterns [68], collect information about thermal parameters [69], and even infer the impact of public spaces on happiness [70]. Work has been done in a controlled outdoor field study to understand the impact of the urban context on various emotions and physiological responses of human [71]. In the indoor setting, targeted work on collecting longitudinal data for more specific purposes has also emerged. The previously mentioned wearable study focusing on thermal comfort collected numerous data from the 14 participants over the 2-4 weeks study [64]. Another recent study that deployed a cyber-physical system to collect longitudinal data in offices focused on occupant concentration [72]. The work in this paper is most directly related to previous work in collecting longitudinal comfort feedback from smartphone interfaces for the allocation of activity-based workspaces [73] and through a sustainability tour in a university campus building [74].

Novelty of proposed approach
Despite the momentum in field-based intensive longitudinal methodologies, there are still several barriers to their implementation in real-world settings. Not the least is the challenge of getting human occupants to give data for comfort surveys, install applications, or wear devices. Working from this knowl-edge, the authors developed cozie, as seen in Figure 2, an opensource, smartwatch clock-face designed to conduct micro-EMA surveys for high-frequency data collection [75]. The application is open-sourced and free to download and use on the Fitbit gallery 1 .
The innovations outlined in this work as compared to the previously mentioned studies are: • The hardware and software deployment methodology has a focus on practicality in scalable, field-based implementations. Experimental participants were only asked to wear a single smartwatch device and answer survey questions that utilise a relatively small amount of time. The focus was on testing a configuration that was easily applied in a realworld context. The modelling methodology was designed to capture as much signal as possible in the field setting without the ability to control and verify sensor proximity and accuracy consistently.
• A series of pre-processing steps were developed to convert intensive longitudinal data into model input features that characterise the tendency of groups of people to have similar comfort preferences, sometimes independent of the objective environmental factors such as temperature. A simple example of this concept is the commonly discussed, yet often anecdotal, person who seems always to need more cooling, even when the temperature is already low relative to the comfort zone. In this study, clustering was used to group people into comfort preference types as an input feature to a preference prediction model.
• This paper introduces and tests a simple form of a cold start variant to the preference models that could be used to predict an occupant's preference with limited or no data about their preference history in a particular space or according to particular objective measurements such as temperature, humidity, or other factors. This model enables the deployment of the cozie data collection methodology by a set of participants in a building and then the creation of prediction models that could accommodate future occupants regardless of whether they have worn a smartwatch in those spaces.
• The process seeks to show that comfort-based preference prediction can be accurate even in the absence of environmental sensors if enough intensive longitudinal data has been collected from enough occupants. The context of this experiment was in relatively uncontrolled, field-based settings as opposed to laboratory conditions.

Methodology
To collect intensive longitudinal data in a field setting, the cozie platform was built on the Fitbit smart watch 2   1 is the base methodology which is production-ready for real-building deployment. It requires a smart watch with the cozie clock-face installed. Tier 1b, is an extension to the base methodology by adding a temperature sensor to the watch. 2 includes building-wide indoor localisation. In this experiment, Steerpath Bluetooth beacons were used, which communicate with the occupant's smartphone to determine the occupant's location. 3 merges the localised feedback points with environmental sensors in the same comfort zone as the occupant.

and various
time-series database technologies. In this section, the details of this technology stack are explained in the context of a deployment on 30 test participants in buildings at the National University of Singapore (NUS) in the School of Design and Environment (SDE). The definition of an occupant in this study was a test participant who wore the smartwatch, and a manager as the person who coordinated the study. Thirty participants were recruited via an online form, were compared to the inclusion criteria for the study, and were on-boarded according to an approved ethics review application. Priority was given to participants who work full time in the SDE-related buildings on campus, and they were selected to maintain an even gender distribution. The occupants were asked to wear a Fitbit Versa smartwatch during daytime hours while on the NUS campus at the very least but were also welcome to wear the device for the entire duration of the study. Participants were asked to leave momentary assessment feedback on their comfort preferences at different points throughout the day on the watch face of the Fitbit device. Each time they responded to the survey, they were asked about their thermal, visual and aural preference using the options found in Figure 2. Comfort preference was chosen as the feedback most applicable to the methodology due to a three-point scale that is most appropriate for frequent watchbased surveys. Preference surveys also provide more meaningful information by indicating how the occupant would want the environment to change as opposed to satisfaction or sensation survey types that only capture how the occupant feels. The participants were asked to answer the questions when they moved from one environment to another, which amounted to approximately 5-15 assessments per day. The smartwatch also prompted the occupants with a small vibration that requested feedback from them at different timed points in the day. This prompt only occurred during daytime hours when the subject was active. The momentary assessment took less than 15 seconds to complete. Throughout the experiment, the cumulative amount of time spent answering the momentary assessment was approximately 20-40 minutes.
The technology used in the deployment of this study can be sub-sectioned into individual tiers as described in Figure 3, with each level requiring additional resources to implement. For the experiment in SDE4, all tiers were incorporated.

Tier 1: Smartwatch for micro-EMA
Tier 1 is the core methodology presented in this paper, which uses the cozie clock-face, as shown in Figure 2. The base functionality of cozie was a simple clock-face with buttons that the occupant could press to record momentary feelings of comfort preference. When pressed, the feedback, combined with GPS location, heart-rate, time, and an anonymous user-id was sent a time-series database via the occupant's phone, see Figure 3. Further questions were be pre-selected by the manager, using the phone that was paired with the Fitbit. This process generated a flow of successive interactive screens. In the experiment, the occupant was requested to provide thermal, visual, and aural preference feedback. Detailed documentation for using cozie, along with the source code to an open-source repository, can be found on a GitHub repository 3 . The platform also can collect sensation, satisfaction and objective feedback such as clothing and activity levels. These features were added after the experiments outlined in this paper and were not used in this study.

Tier 2: Indoor localisation
Tier 1 is likely sufficient for experiments conducted in small office spaces. If only a few different zones exist, then an occupant's location could be quickly determined through a supplementary question in the question flow of the survey. However, in a large building, such as the SDE4 building where the outlined experiment was conducted, a more sophisticated indoor localisation system was required. The SDE4 building has six different floors, a gross floor area of around 8,500 square meters, and a large variety of different indoor environments. To determine an occupant's location in a building, 100 Bluetooth beacons and the Steerpath 4 platform were installed throughout the building. These beacons communicated with a custom-built smartphone application, called the Yak App [76], to determine their location with a one-meter precision. The location data was then used to geo-fence the occupant within various zones of the building and was merged with the subjective preference feedback data in the cloud.

Tier 3: Preference data convergence with environmental sensors
Tier 3 included the deployment of 45 indoor and outdoor environmental quality (IEQ) sensors in the experimental context. This data collection tier was used to compare the results of the subjective feedback, with existing environmental models. The IEQ sensors were WiFi-connected and were deployed by the company SenSING 5 as part of an installation of sensors campus-wide. These sensor kits measured temperature, humidity, noise level, and illuminance. At least one sensor device was installed in each zone of the building, and the data was pulled from an API and merged with the subjective preference data in the cloud.

Tier 1b: Strap-mounted sensor kit
In this study, there was the intent also to test the correlation and prediction power of a simple near-body temperature sensor that was mounted on the watch strap of the Fitbit. This technology can be considered supplementary to Tier 3 sensors and could be an alternative in cases where environmental sensors are not available. In the experiment, a temperature sensor was used from mbient labs 6 , which was attached to the watch through a custom three dimensional (3D) printed case. The design file for this case can be found online 7 . The mbient device logged data locally, which was transferred to the cloud database at the end of the experiment.

Occupant and room preference clustering
A central focus of this paper was to determine whether automated segmentation of the occupants into groups which have similar feedback tendencies could have an impact on the ability to predict preference. There was also the hypothesis that the feedback of one of the occupants in such groups could be used to characterise the preferences of all group members for a particular space or set of conditions. In this step, the preference history of occupants was used to do a simple clusteringbased segmentation step to group occupants according to their raw feedback preference tendencies. For example, occupants who more frequently indicated prefer cooler as compared to a no change would be grouped together. This strategy was a simplified version of this type of clustering as it neglects other context-based variables (environmental and physiological measurements). This choice was made to keep the method feasible even in situations in which other measurements are not available.
Given its widespread usage in related literature, occupant and room clustering is calculated using the k-means clustering algorithm with Euclidean distance, using the scikit-learn package 8 . The features used for clustering were the ratio of votes of each feedback class value for each subject. For example, the ratio of prefer cooler for a given participant, or room, would be calculated as follows: #prefer cooler votes #total votes . This calculation is repeated for all types of feedback responses for thermal, light, and aural feedback. Then, the number of clusters was chosen to match the number of possible responses per type of feedback, this led to initially k = 9, but given that there were no data points with prefer louder responses, the clusters were merged into eight.

Occupant comfort preference prediction
The metric of comparison in this study was the improvement in a machine learning prediction model of added feature sets made available from the intensive longitudinal method. This structure matches implementation-based environmental comfort studies outlined in the literature that showed the predictive improvement of additional data [64,65]. This approach can be compared to more controlled, lab-based methods that seek to isolate variables and individually test their influence.
The prediction problem translates to predicting the right class value or, in this case, the preference feedback response, at the given feature values. A random forest classifier from the scikit learn package was chosen to handle this comfort prediction. This model type has been proven to have the highest accuracy at predicting personal comfort in one previous study [77] and is one of the best performing of other recent studies [64,65,78].
The decision was made to focus on the implementation of a single model type that has been proven effective and is straightforward to use based on documentation and ease-of-tuning. With this in mind, we fixed the hyper-parameters for the random forest classifier to 1000 number of trees, Gini criterion for node splitting, and two minimum samples per split.
Additionally, the prediction problem was divided into an individual and a grouped prediction task. The former refers to a model developed specifically for a given occupant, using only parts of its data to train a model and test it on its remaining data. On the other hand, the latter approach consists of combining all occupants' training data subsets with training a model and testing it on all the occupants' remaining data.
The data of each occupant was split into a 60:40 train test set based on time. That is, the first 60% of votes from each occupant was used in their training set, and the remaining 40% was used for testing. The sets were split by time to prevent the scenario of future data being used to predict the past. For the grouped model, all the occupants' training sets (60% of each occupant's data) was used as one training set, and the remaining 40% of each occupant's data was combined with being used as one test set.
A primary component of the method was to test the ability for various feature sets to influence the prediction power of the random forest model. The method used six combinations of these feature sets to test the influence each has in the predictive capability of the overall model. The following is an overview of these feature categories developed for testing: • Time was created through feature engineering the time stamp of when an occupant gave feedback. This feature was a cyclical representation of the hour of the day and day of the week. This simple feature type detects if certain cyclical habits or components have a role in preference prediction and was included in all scenarios.
• Environmental Sensors were features extracted from measurement data from lighting (lux level), noise (dB level), temperature (deg. Celsius), and relative humidity (RH%) measurement. These variables were collected from the IEQ sensors that were closest spatially and temporally to an occupant when they gave feedback.
• Near Body Temperature was a feature created from the temperature sensor mounted on the smartwatch strap that had temporal proximity to the time-stamp of when the occupant gave feedback.
• Heart Rate was collected from the Fitbit smartwatch device as an instantaneous value collected when the occupant gave feedback.
• Room was a feature that was encoded to a numerical preference type based on the history of feedback in the room in which the survey was taken. This feature was designed to increase the prediction accuracy by complimenting data from rooms of similar comfort profiles. For example, if an occupant only works from their office, the model will still be able to accurately predict how that occupant may feel in other rooms that have a similar comfort profile to their office. In the same way, each occupant's anonymous ID was also encoded to their historical preference, thus allowing for occupants of similar preference history to crowdsource each other's data.
• Preference History features are similar to the Room features. These features use the ratio of responses of each type (thermal, visual, and aural) that were calculated for each user. This ratio was only calculated for the responses of prefer cooler, prefer warmer, prefer dimmer, prefer brighter, prefer quieter, and prefer louder. E.g., the ratio of response of prefer cooler responses of a given occupant is calculated the following way: #prefer cooler votes #total votes .
Model classification results were calculated using the F1micro scores (as shown in Equation 1) which were equivalent to accuracy in the a multi-class classification problem by calculating precision and recall averaged across all classes, i.e., subjective thermal comfort response value. As the objective was to provide a comparison among different feature sets with a standard metric, F1-micro was chosen due to its usage for benchmarking different aspects of the modelling pipeline in thermal comfort datasets [78]. The term micro refers to the fact that it aggregates the contributions of all possible classes and then averages the results, whereas a macro metric computes the value independently for each class and then takes the average. For multi-class classification problems where there is a class imbalance, i.e., certain classes have more data points than others, micro is preferable.

Results
The results presented in this section are complemented with an interactive web application 9 and interactive code 10 which enables the reader to regenerate all the plots. During a two week collection time of 30 participants, 4,378 comfort preference votes were collected, which is 146 data feedback points per person on average. From this set, 1,474 data points were successfully localised to building environmental sensors. To allow for comparison with those data, this subset was used for analysis and machine learning in the following sections. Figure 4 illustrates an overview of the intensive longitudinal preference history data for each person according to the three preference categories. These feedback responses were only those collected in the SDE4 building, and a maximum number of 75 votes is shown. A simple clustering step was applied 9 https://sde4demo.herokuapp.com/ 10 https://github.com/buds-lab/humans-as-a-sensor-for-buildings  in this figure to represent the segmentation according to each preference category on its own. This visualisation shows how this simplified clustering step captured the tendency of an occupant to lean more towards one feedback response over the others. This segmentation was independent of the environmental parameters of the spaces to maintain the simplicity of the approach. The subsequent modelling steps were designed to test the effectiveness of doing this type of simplified segmentation. Figure 5a is an aggregated representation of the segmentation process for each occupant, this time with all three preference categories being used in the clustering process. This figure summarises each occupant as a row of data, and the colour of the box represents the percentage of votes given to a particular preference category, where dark colours indicate higher preference. These clusters provided segmentation of the users according to their preference tendency types that were used in the preference models. Even a group of 30 occupants, there were varying comfort tendencies present, which complemented the concept of a personal comfort model tested by Kim et al. [79]. This clustering step provided the foundation for the creation of the individual versus grouped models used in the prediction step.

Tagging the spatial context with preference feedback
While the subjective feedback highlighted varying comfort tendencies within a building, localisation also enabled the characterisation of preference tendencies in certain zones. Figure 5b presents each room as a row, where the colour of each cell represents the percentage of a preference vote given for a particular room. The utilisation of k-means clustering once again enabled the splitting and labelling of these zones, this time by the tendency for different comfort preferences to be left by occupants in those spaces. This result firstly served as an overview for facility managers to understand the office spaces they manage, and take action to improve upon the comfort. A visualisation of the subjective thermal preference data for can be found in Figure 6 and online 11 .

Correlation with indoor environmental quality variables
One standard aspect of environmental comfort studies is the comparison of feedback to objective environmental measurements. For the data collected in this study, standard distribution plots of the environmental sensor data are summarised in Figure 7. Intuitive insight in the data can be observed, such as the absence of prefer brighter votes after an illuminance threshold of 250 lux. Nevertheless, there was a significant overlap between classes for each of the environmental parameters, which were likely attributed to the numerous unmeasured variables described by Figure 1, and the varying comfort tendencies shown in Figure 5. This result reinforces the evidence that environmental measurements are not descriptive enough to characterise a person's preferences, which results in poor prediction as found in previous studies [50].

Predicting field-based indoor preference using intensive longitudinal data
The main objective of this paper was to discover the impact of intensive longitudinal subjective preferences can impact the ability to predict comfort satisfaction, instead of solely relying on environmental sensor data. In this section, the time-series feedback was used to predict comfort satisfaction. Figure 8 shows a comparison of the various models built with the feature sets and process outlined in Section 3. The individual comfort model uses that occupant's training data for prediction, while the grouped comfort model uses the input data for the groupings outlined in Figure 5. The top of Figure 8 shows a table in which each row represents the feature set that was used to train the model in that column.
Several insights were evident from this modelling analysis. The first was that there were only small differences in the F1 scores between the different feature sets for the visual and aural preference models. These models, in general, had higher F1 scores than thermal preference prediction. Aural preference prediction had the highest F1 score, which was intuitive since it ended up being a binary classification challenge due to the lack of prefer louder feedback responses.   Thermal preference prediction had more diversity across the feature sets tested as compared to the other preference categories. Merely using the conventional time-series and environmental sensor features had the lowest F1 score. Adding the physiological attributes of heart rate and near body temperature provided marginal improvements. The best thermal preference model used the physiological, room, and preference history features while excluding the environmental sensor data.

(b) Room-based Clustering
For all three preference categories, the grouped comfort model performed better than the individual version. Participants with similar comfort preferences became clustered together, thus increasing the training dataset for that particular occupant type. This results showed the impact that assigning a variety of peer group can have on preference prediction.

Cold-start comfort preference prediction
The group-based models had a byproduct that was discovered in this process. The success of the preference models using grouping allowed for testing of the ability to have cold-start models that can predict an occupant's preferences if they can be assigned to a tendency group, but their own personal data was not included in training the model. This scenario was labelled as a cold-start situation as it emulates when an occupant doesn't wear a watch to collect data in a particular building, but comfort preference prediction is desired. The line graphs to the right of Figure 8 show the results of this type of analysis. They illustrate the number of occupants required to sufficiently crowdsource the data for an average occupant for each of the preference categories. The orange line represents an ordinary person who doesn't wear a smartwatch, whereas the blue line is a smartwatch owner who is regularly giving feedback. In this study, nine and five users were sufficient on average to crowdsource the thermal and visual comfort prediction respectively to the same accuracy as a user wearing a watch.

Predicting continuous comfort preference without sensors
A final byproduct of this modelling process was the discovery that the intensive longitudinal preference models could be used to continuously predict preference of occupants in individual rooms in a similar way as environmental sensors are conventionally used. Since the preference feedback in this methodology was at a much higher high-frequency than a typical survey or occupants acting on the thermostat, this study had preference data with relatively high temporal and spatial diversity. The random forest classifier was used to predict comfort preference based on a time-stamp input for each zone to create a continuous prediction over time. This approach emulates the concept of using human feedback as a type of sensor. Figure 9 illustrates the prediction of two different zones, an office and an outdoor space, for a typical week using this model output. First, one can see that the office was generally a comfortable space, while the outdoor seating had an overall higher preference for cooling over time. Time-dependent fluctuations were seen that show how the model was able to predict comfort preference for different parts of the day or days of the week. The office had a peak of warmer preference around mid-day. Finally, it was observed how the model, often inaccurately, tried to predict comfort at times where no data is present. The square peaks in the office for aural and visual prediction between the hours of 22:00 and 7:00 were due to an absence of data to accurately predict during these times.  Figure 7: Distribution of sensor data by preference vote. While trends can be observed many feedback votes overlap for the same environmental or physiological measurement. This was possibly due to the different comfort tendencies as shown in Figure 5 or numerous other variables described in Figure 1 that are not accounted for. Near body temperature and noise appear to have the most distinct differentiation.

Discussion
The results of this implementation showed the potential of collecting intensive longitudinal feedback from occupants in the built environment. This approach revealed that the deployment and implementation of such a methodology were effective, and comfort models for visual, aural, and thermal comfort can have similar performance to sensors measurements. The key focus in this section is to discuss for what these data and this process are potentially useful.

Practical Application of Intensive Longitudinal Data in
Industry At the foundation of the method, the creation of more significant amounts of occupant feedback information in the form of preferences was successful. The utilisation of these type of high-frequency subjective feedback data has potential for building evaluation and occupant comfort optimisation. It changes the paradigm in which facility managers operate a building. For example, instead of saying light levels are below the comfort threshold in Office-1, the new conclusions could state that a higher frequency of prefer brighter votes are recorded in Office-1. Furthermore, due to the high-frequency sampling rate provided by the micro ecological momentary assessment methodology, these periods of discomfort can also be mapped to particular times, and certain groups of people. The time series comfort profiles could also serve as input data for occupant-centriccontrol efforts of building systems which can then optimise for human comfort and energy optimisation.   Comparison of prediction F1-micro-score between grouped and individual comfort models using data from different feature sets. The feature set that excluded environmental sensor data for the thermal model had the highest F1-score, while minimal differences in F1-score were noted between feature sets of the visual and aural models. Right: The accuracy in predicting the comfort of an individual as further participants are added to the training set. The blue line includes the test participants training set in the training data, and the orange line excludes the test the participants training data meaning that it depends on crowdsourced feedback from other occupants.  Figure 9: Comfort prediction for two zones for an average occupant over a week. The grey circles indicate votes that were given for each category, and the shadedout sections indicate times where no data were present. These time-series predictions can be used to detect anomalies, such as the mid-day peak for a warmer preference for the office, or the general discomfort in the outdoor seating area. Note that there was an absence of data in these zones between the hours of 22:00 -7:00, and on the weekend. This lack of data caused inaccurate predictions as seen in the square-shaped peaks in the office.

Monday
building would be given smartwatches and asked to wear them for a 2-4 week period. Perhaps they are incentivised through vouchers, or the occupants would have their own smartwatch to use and simply would need to give consent for their data to be collected. These data could then be used to supplement the systems installed to characterise whether there are blind spots in terms of sensors not picking up comfort-influencing phenomenon that is not being measured. For example, it's rare to measure mean radiant temperature in most buildings; therefore hot spots might exist that are undetectable by thermostats and might be a result of inadequate shading or control of shading systems. To adapt the presented methodology to this context is straightforward as the two-week time frame of the experiment is similar. The current method involved asking each participant to wear the smartwatch until 100 data points were recorded. In a real-world setting, a similar approach could be deployed. At that point, perhaps the occupant could choose to return the device and rely on the data of co-workers for comfort prediction or continue to wear it and help crowdsource the data for others. It is strongly recommended that these deployments use automated indoor localisation to put the feedback in the spatial context without user intervention. This type of data collection and modelling would also be useful for the building systems commissioning process. In this phase, the various sensor systems used for control and automation could be compared to the intensive longitudinal occupant feedback to validate that the environments are being correctly characterised. This process could result in the detection of poorly-installed or miscalibrated sensors.

Potential for spatial recommendation systems and im-
pact on activity-based workspace design A less typical application for intensive longitudinal data might be the development of spatial recommendation engines for occupants in activity-based workspaces. In these spaces, an occupant doesn't have a constant workspace but instead finds a space that matches their immediate needs. This paradigm could prove to be an integral part of future working style, especially in light of social distancing due to global pandemics such as COVID-19 that forces a less conventional spatial working arrangement. This recommendation engine might work in a way that an occupant's comfort tendencies could be matched with the comfort zone of the building. For example, those that prefer warmer spaces can be recommended to work in areas that have a higher percentage of prefer cooler votes. Previous work in this direction showed progress using a platform known as Spacematch [73]. Smart watch-based longitudinal feedback could enhance the model development process for this type of platform.
The aspect of testing group-based models in this study is essential for this context as building owners can't expect all occupants to be willing to wear or use devices. And those that do agree will likely have a limited amount of patience for giving feedback over long periods. This paper tested the ability to cluster occupants such that it was not necessary for everyone in an office to wear a smartwatch. The only requirement for this type of system to work is that each new employee would wear a smartwatch during a two-week data collection phase, which is sufficient to build their comfort preference tendency history as described in Section 4.1. The experiment also showed that not everyone in an office space needed to be using the smartwatch application all the time. In this particular experiment, six occupants were sufficient, on average, to crowdsource the prediction for the remaining 24. This value is not generalisable amongst all buildings and would change depending on the different comfort tendencies the building occupants might present.
In an extreme case, if everyone were to share the same comfort preferences, then only one occupant might be required to crowdsource the prediction of the rest, but this is usually not what facility managers experience in the field experience. The higher the variation in comfort preferences, the greater the number of the occupants needed to crowdsource the data.
In terms of office space design, the collection of intensive longitudinal preference data could facilitate floor plan design decisions. Understanding the breakdown of comfort needs according to the tendencies of the occupants would enable architects to design or retrofit buildings with different comfort zones to match the different types of people. For example, if the zones with more cooling were popular and being used to their capacity, then the floor plans or systems control could respond to by creating more spaces of that type to increase the probability that a person feels comfortable.

Integration into building control systems
Intensive longitudinal data has the opportunity to influence the control and automation systems of buildings through the use of preference feedback data in the control logic. Most building control systems rely on optimising a set-point temperature that is considered comfortable for the average occupant or comfort standard [80]. In that scenario, discomfort instead of comfort is evaluated as the current difference of the environment thermostat and the HVAC system set-temperature [81], occupancy density estimation, or via more traditional ways such as PMV [82]. While some of these approaches have dealt with singleoccupant offices or Personal Comfort Systems (PCS), there is a distinction between controlling the actual HVAC system and allowing the occupant to control their immediate space. PCS systems are those that locally condition the occupant independent of the centralised HVAC system [83]. The intensive longitudinal data and the models developed in this study could help the controls field take the next step forward in occupant-centred building controls through the use of reinforcement learning [84]. The feedback mechanism in reinforcement control is generally the standard occupant-building interface such as switch or thermostat [85]. Intensive longitudinal data could be used to enhance that interaction by focusing on finding the motivations of those control actions. This work is a strong focus of the occupant-centric building operations in the IEA Annex 79 project [86].

Limitations of environmental sensors in predicting preference
In this study, measured environmental sensor data and nearbody physiological parameters were collected as a comparison to the intensive longitudinal data. The performance of classification models that used these variables as input features provided little to no increase in accuracy. If there was stringent control over when occupants voted, such as telling occupants that they were not allowed to vote after an hour of eating or walking up a flight of stairs, then it may improve the value of environmental and physiological. When sensor readings (within the limits of what is usually perceived as comfortable) are paired up with subjective responses that prefer a change, the performance of a data-driven model decreases. The reasons behind the uncomfortable subjective responses might come from one or many of the variables that influence comfort but are hard or impossible to measure in an in-situ field experiment ( Figure  1). Furthermore, such control over the occupants would have interfered with the momentary nature of ecological measurements by imposing sampling schedules. On top of this concern, the addition of sensor data from fixed environmental sensors caused longitudinal issues in the data collection effort. To utilise sensor data as input features for modelling purposes, the occupants were required to be near the sensor; for this experiment, this meant they needed to be in the same room. Thus, all the data-points where the subject was in a hallway, staircase, or open environment within the building, were lost. Sensor availability is also a prevalent issue among these data-driven efforts, such as when a sensor is unplugged or incorrectly collecting and storing data.

Prediction models are only as good as the training data
The primary limitation of the presented approach was that it would only work where data were present. As seen in Figure 9, there were errors in the prediction when data were absent, i.e., no historical data collected at similar time windows such as the middle of the night. Furthermore, since there was a reliance on other subjects' historical preferences, i.e., crowdsourcing preferences, to evaluate environments, an office space that was rarely used would have a poor prediction of occupant comfort. Classical comfort models based on sensor data do not have this issue as spaces that are not used can still be characterised by the measured data. Furthermore, this particular study was conducted in Singapore, which doesn't have seasons and has minimal variability in temperature. For seasonal countries, the day of the year would be an added feature that may take up to a year worth of data to train. Further work could investigate the opportunity of using sensor data to characterise a space and then continuously refine the comfort prediction by crowdsourcing the occupants' preference on said space.

Future work
As mentioned, there are limitations to this study related to the breadth and diversity of data collected. Future studies should expand the generalisability of results by increasing the number of occupants whose data is obtained similarly. Further, there are several critical future improvement directions that can be developed using this deployment methodology as a basis.

Collection of Objective Comfort Parameters and Expan-
sion of Subjective Feedback Targets One of the improvements of this methodology is the potential integration of intensive longitudinal feedback that captures information about the objective attributes of the instantaneous situation. These attributes include a self-evaluation of the clothing level, activity level, changes in steady-state, recent meals eaten, or other objective aspects of a person at the instant that they indicate a comfort preference. Collection of this information would cause each survey to become slightly more tedious through additional questions, but the information would be valuable to add adjustments to the segmentation and prediction process using such information. Additionally, there is potential to add additional subjective responses that could be used to characterise occupant perception of satisfaction, sensation, privacy, productivity, and even social distancing measures. The cozie watch face is an open-source project that welcomes the implementation of new survey questions. Several of the previously mentioned question types are currently being implemented and tested for future work.

Spatial and temporal targeting
In addition to adding new question types, there is an opportunity to better target when and where feedback is being requested [87]. With the presented methodology, this approach could be achieved by purposely prompting the participant for a subjective response when a specific environmental condition is met, e.g., there are not many data points with Prefer cooler responses so the occupant might be prompted to leave feedback when the temperature in his or her surroundings are warmer than average. This capability is being explored with the current cozie platform.

Data augmentation
Another approach to tackle the diversity of classes in the feedback responses is to use data augmentation techniques, which are commonly used in related data-driven thermal comfort modelling [88,64]. Preliminary results on using techniques from other fields for this task are shown in [89]. Data augmentation using balancing and synthetic data generation could further improve modelling performance.

Improving occupant preference segmentation
One large area of improvement in this methodology is testing more rigorous means of occupant clustering. In this study, this process was done in a simplified fashion using only preference tendency history. There are a lot of opportunities to optimise this segmentation step. A more in-depth analysis on the best way to create these groups of occupants are currently being explored, such that they fully leverage peer groups.

Feature sets from spatial models
Finally, building information models (BIM) and building energy models (BEM) could be processed to provide a rich feature set to assist in the prediction of occupant subjective feedback in buildings. This opportunity focuses on combining spatial data from a BIM model with the temporal individual feedback output from this research to enhance satisfaction prediction. Preliminary work in this area using a graph embeddings model called Build2Vec is in progress [90].

Conclusion
This paper presents how micro ecological momentary assessments of subjective comfort can generate sufficiently large intensive longitudinal data for occupant comfort prediction and enhancement that reduces the reliance on objective environmental sensor data, and empirical comfort models. These results suggest a shift in the way the indoor environment is evaluated, controlled, and designed. This shift might focus on transitions from only measuring the variables that influence how a person feels, to asking how a person feels. Results of an implementation of the platform on 30 occupants showed the segmentation and variation of indoor occupant comfort tendencies and highlighted the shortcomings of one-size-fits-all comfort models that are commonly applied in real buildings. Furthermore, the use of a smartwatch enabled data collection at a sufficient frequency to build time-series models of indoor spaces. These models could be used to detect building anomalies, serve as building data for subjective driven building control, or be used to recommend spaces that best match the comfort preference tendencies of each occupant. The optimum technological setup uses a smartwatch for subjective data collection, combined with a method for localising an occupant in the building. This localisation may be achieved by asking the occupant directly through the smartwatch, or through Bluetooth or WiFi signals. Environmental sensor data provided negligible improvement in the prediction of comfort.
Author contribution statement PJ: hardware, software, infrastructure development, experimental design, implementation and lead author of the paper; MQ: software, infrastructure development, data analysis, machine learning lead and author of the paper; MA: software, infrastructure development, and author of the paper; CM: funding, project leadership, experimental design, the corresponding author of the paper.

Funding
The Singapore Ministry of Education (MOE) (R296000181133 and R296000214114) and the National University of Singapore (R296000158646) provided support for the development and implementation of this research.