Impact of Outdoor Temperature Variations on Thermal State in Experiments Using Immersive Virtual Environment

: Recent studies have established immersive virtual environments (IVEs) as promising tools for studying human thermal states and human–building interactions. One advantage of using immersive virtual environments is that experiments or data collection can be conducted at any time of the year. However, previous studies have conﬁrmed the potential impact of outdoor temperature variations, such as seasonal variations on human thermal sensation. To the best of our knowledge, no study has looked into the potential impact of variations in outdoor temperatures on experiments using IVE. Thus, this study aimed to determine if different outdoor temperature conditions affected the thermal states in experiments using IVEs. Experiments were conducted using a head mounted display (HMD) in a climate chamber, and the data was analyzed under three temperature ranges. A total of seventy-two people participated in the experiments conducted in two contrasting outdoor temperature conditions, i.e., cold and warm outdoor conditions. The in situ experiments conducted in two cases, i.e., cooling in warm outdoor conditions and heating in cold outdoor conditions, were used as a baseline. The baseline in-situ experiments were then compared with the IVE experiments conducted in four cases, i.e., cooling in warm and cold outdoor conditions and heating in warm and cold outdoor conditions. The selection of cooling in cold outdoor conditions and heating in warm outdoor conditions for IVE experiments is particularly for studying the impact of outdoor temperature variations. Results showed that under the experimental and outdoor temperature conditions, outdoor temperature variations in most cases did not impact the results of IVE experiments, i.e., IVE experiments can replicate a temperature environment for participants compared to the ones in the in situ experiments. In addition, the participant’s thermal sensation vote was found to be a reliable indicator between IVE and in situ settings in all studied conditions. A few signiﬁcantly different cases were related to thermal comfort, thermal acceptability, and overall skin temperature.


Introduction
Immersive virtual environments or IVEs are a set of hardware and software applications that allow users to immerse themselves in artificially constructed virtual environments and interact with their contents in real-time. IVEs provide users the impression of being inside and living in a virtual environment [1]. As such, IVEs have attracted significant research attention concerning their potential in replicating real-world experiences. Many IVE-based studies on building and urban design and operation have been reported, including daylight simulation [2][3][4], energy studies [5], sound and acoustic studies [6,7], and space design and wayfinding studies [8]. These studies utilized IVEs in experiments to collect human-based data. Specifically, some studies used IVEs to investigate the thermal Sustainability 2021, 13, 10638 2 of 36 state of participants, i.e., their thermal sensation, thermal comfort, and thermal acceptability (e.g., [9][10][11][12][13]). The human thermal states play a significant role in studying building performance in terms of energy consumption [14]. Depending on the thermal states, the occupants may interact with the building components such as changing the thermostat settings or opening/closing windows, leading to changes in building energy use and impacting overall building performance [15]. Thus, an improved understanding of the human thermal states during the design phase of a building is essential for the building to meet its energy performance expectation during the operational stage. However, collecting thermal state data during design is challenging as it requires an existing environment that could closely replicate the desired design. Constructing such a design may not always be practical as it is time-consuming and expensive. IVEs, on the other hand, might be utilized as a tool to quickly and cost-effectively mimic designs. In addition, IVEs provide engineers or researchers the capability to immerse humans in different scenarios and examine their reactions to controlled environmental changes [12,16]. Thus, IVEs may become a new way of studying human-building interactions and thermal states in built environments that do not exist yet [11,17]. However, IVE-based experiments are often limited by a range of factors [18], one of which is how to handle the potential influence of the outdoor temperature variations, such as seasonal temperature variations, on the thermal state of participants.
The impact of outdoor temperature variations on human physiology and comfort has drawn significant research interest in the past. For example, studies have shown seasonal effects on human physiological responses such as sweat rate, rectal temperature, and metabolic rate when participants were exposed to identical experimental scenarios in different seasons (e.g., [19,20]). Given the connection between physiological responses and thermal sensation and comfort [21,22], several studies (e.g., [23][24][25][26][27]) have concluded that differences exist in occupant's thermal sensation between different outdoor temperature conditions (i.e., summer and winter). Such findings align with previous studies that show a close link between thermal states and seasons or variations of outdoor temperatures (e.g., [28,29]). Although this line of research is still ongoing and different observations were reported (e.g., [30]), all studies pointed out the effect of outdoor temperature variations on human physiology. However, the results mainly differ on the sensitivity of physiological measures to outdoor temperature variations and the extent to which outdoor temperature variations influence the thermal states.
In our review of existing literature, there is no study discussing the effect of outdoor temperature variations on the results of thermal-related IVE experiments. Conversely, IVE experiments may happen at any time of the year. Thus, we conducted an initial investigation into the effect of two contrasting outdoor temperature conditions (i.e., cold and warm outdoor temperature conditions) on the thermal state of participants in IVE experiments and compared the results with the baseline in situ experiments. Three hypotheses are proposed across four outdoor temperature comparisons in which the participant's physiological and thermal state responses were used as indicators to quantity their thermal experience. The hypotheses and the outdoor temperature comparisons are elaborated in detail in Section 3.

Thermal State and Virtual Experience in Immersive Virtual Environment
In general, the thermal sensation of participants in experiments using immersive virtual environments (IVEs) can be generated in two possible ways, i.e., endogenous and exogenous stimuli. Endogenous stimuli are visual stimuli (e.g., [31,32]) that are usually part of a virtual scene. Studies have shown that physiological responses such as heart rate, skin conductance, respiratory rate, blood pressure [33], and skin temperature [34,35] changed with purposefully designed visual stimuli. On the other hand, the findings from the previous studies (e.g., [36][37][38]) have established a pathway from physiological responses to thermal sensation. Therefore, in theory, visual stimuli in IVEs may affect the thermal sensation of participants. The application of exogenous stimuli relies on the Sustainability 2021, 13, 10638 3 of 36 use of external devices to create perceptible thermal stimuli, such as the application of thermal haptic devices [39], Peltier devices [40], infrared lamps [41,42], and controlled environments [10,11,13]. Specifically, controlled environments have been widely used in in situ experiments on the thermal state (e.g., [43]) and occupant behavior studies (e.g., [44]).
Participants' thermal state is often measured using Likert scales. For example, ASHRAE's (American Society of Heating, Refrigerating, and Air Conditioning Engineers) seven-point Likert scale is used frequently for measuring thermal sensations [45,46]. In addition, thermal comfort scales (e.g., [30,47]) and thermal acceptability scales (e.g., [37,43]) are used to measure thermal comfort and acceptability. Besides using perceptive votes, researchers also use physiological data to infer thermal sensation and comfort. Skin temperature is a direct measure of human skin receptor responses to cold and warm environmental temperature responses [30,36] and has been used widely to measure human thermal states (e.g., [21,[48][49][50][51][52][53]). Other studies explored heart rate and heart rate variability to measure thermal comfort [54]. However, some studies conditioned the association between heart rate variability and thermal comfort on additional factors such as physical activity level [37,55] and sympathetic nervous system activity [38]. Thus, while physiological measures are continuously explored as indicators of thermal state conditions, participants' thermal state vote responses and skin temperature are standard measures included in experiments using immersive virtual environments (e.g., [11][12][13]).
Furthermore, studies using IVEs also measure participants' virtual experience. A standard measurement of virtual experience is presence, which is characterized as a "psychological state of 'being there' mediated by an environment that activates our senses, captures our attention, and promotes our active participation" [56,57]. In other words, presence in IVE represents how often the participant feels they are really in the environment that the head-mounted display (HMD) device portrays. Presence is also a metric for measuring the ecological validity of the virtual environment [58]. To measure presence, different instruments are used, such as the Independent Television Commission Sense of Presence Inventory (ITC-SOPI) [59] and the Igroup Presence Questionnaire (IPQ) [60]. In the case of IPQ, four sub-measures are included in this thirteen-question survey, and they are general presence (one question), spatial presence (physical experience of being present in the IVE) (four questions), involvement (user's level of immersion in the IVE) (four questions), and experienced realism (the extent of resemblance to the physical world) (four questions). A Likert scale of five points that ranges from 1 (strongly disagree) to 5 (strongly agree) is used to measure the questions and is administered after each IVE experimental session.
Another standard measurement of virtual experience is motion sickness or cybersickness [56]. Interaction with virtual environments might cause symptoms that are comparable to those seen in cybersickness [61]. The potential causes include hardware limitations of HMDs, such as field of view, display resolution, refresh rate, and input-output latency. These undesirable effects can damage the participant's interaction with IVEs. The Simulator Sickness Questionnaire (SSQ) is a standard tool to measure motion sickness [62]. This sixteen-question survey includes three sub-measures: seven questions for nausea, seven questions for oculomotor, and seven questions for disorientation. This survey also consists of a total cybersickness score. All sixteen questions are measured on a four-point Likert scale ranging from 0 (none) to 3 (severe) and were administered along with the presence questionnaire after each IVE experimental session. Researchers often compare their presence and motion sickness scores with those of existing studies to ascertain the adequacy of their scores (e.g., [63][64][65]).

Experiments Using Climate Chambers
A climate chamber is a controlled test environment where biological objects, materials, and other components are tested under specific environmental conditions. As such, climate chambers have been used in studying human thermal sensation (e.g., [37,38]) and the impact of outdoor temperatures on human thermal sensation (e.g., [20,30]). According to Fanger [66], the thermal sensation is generally influenced by two main factors, environmental conditions and the metabolic rate. Thus, previous studies can also be grouped into two groups based on the metabolic rate of participants, i.e., those with sedentary activities (e.g., [30]) and those with activity level or metabolic rate changes (e.g., [37]). Since this study focuses on the potential of immersive virtual environments mixed with a climate chamber, called mixed immersive virtual environment (MIVEs), the participant's performed a sedentary activity, and only the environmental temperature condition changed.
Studies using climate chambers based on sedentary activities are mainly used to determine thermal comfort or thermal sensation in different thermal conditions [48][49][50]52], investigate the potential of specific measures to model thermal comfort (e.g., [38,55]), and study the effect of outdoor temperature variations or seasonal impacts (e.g., [20,30]). Some studies varied air temperature in climate chambers during an experimental session, simulating temperatures stepping up and down (e.g., [38,[48][49][50]52]); others kept the temperature constant during an experimental session (e.g., [20]). Thermal sensation, acceptability, comfort, heart rate, and skin temperature are standard measures for which data is collected in climate chamber experiments (e.g., [30,51,55]). Typically, physiological data such as heart rate and skin temperature are collected continuously throughout an experimental session. In contrast, thermal sensation, thermal comfort, or thermal acceptability are collected at an interval such as every few minutes (e.g., [48]) or at a specific time point such as at the end of a session (e.g., [52]). In addition, studies using a climate chamber for IVE experiments (e.g., [12,13]) applied similar considerations as previous studies when designing experiments.
In summary, the literature presented in the background suggests that there is a pathway from physiological responses, such as heart rate and skin temperature, to thermal states. Mainly, in climate chamber studies [30,51,55], the subjective measures of thermal states such as thermal sensation, comfort, and acceptability are collected using Likert scales, along with the physiological data, to explain the participant's thermal experience. Furthermore, to quantify the participant's virtual experience in IVEs, standard metrics such as presence and cybersickness are also collected [63][64][65]. As such, subjective measures (i.e., thermal sensation, comfort, acceptability) and skin temperatures are collected in this study to evaluate thermal states. At the same time, presence and cybersickness data are also collected to measure participants' virtual experience. Section 4 delves into the specifics of how these metrics are gathered and calculated.

Research Objectives and Hypotheses
The objective of this study is to investigate the impact of outdoor temperature variations on the thermal states of participants in IVE experiments. To achieve the objective, four comparisons are selected, where the in situ setting is considered a baseline. The selection is based on the assumption that cooling typically happens in warm outdoor conditions, such as during summer, and heating happens more in cold outdoor conditions, such as during winter. These two are called matching conditions. Therefore, comparisons C1 and C2 address two matching conditions, under which results from IVE experiments are compared with those from the baseline, i.e., in situ experiments. On the other hand, IVE experiments can be done in mismatching conditions, such as cooling in cold outdoor conditions and heating in warm outdoor conditions. The mismatch conditions represent the possibility, and the flexibility, of IVE experiments to be conducted in conditions that the outdoor temperature may influence. Comparisons C3 and C4 address two such mismatch conditions, under which IVE experiments are conducted, such as cooling in cold outdoor conditions (C3) and heating in warm outdoor conditions (C4). Comparison C3 happens when an IVE experiment on cooling is conducted in cold outdoor conditions (e.g., winter). For example, an experiment is conducted to determine an ideal indoor cooling temperature set-point for a west-orientation office during summer. Instead of doing the experiment in summer, an IVE experiment may be conducted on a cold winter day. Comparison C4 is about conducting an IVE experiment on heating on a hot summer day. For example, an experiment may be about determining the thermal state of participants using a new personalized heating system when the outdoor temperature is below freezing. Instead of doing such an experiment in winter, an IVE experiment may be conducted on a hot summer day. In the two comparisons C3 and C4, the baseline is matching in situ experiments. Therefore, the comparisons will provide insight into the impact of outdoor temperature variations on IVE experiments. Although results from C1 and C2 cannot directly reflect the effects of outdoor temperature variations, as they only explore the effectiveness of IVE experiments for heating and cooling sequences in the same outdoor temperature condition, they provide support and background information to understand and interpret results from C3 and C4, which study the efficacy of IVE experiments when outdoor temperature variations exist.
Results of IVE experiments are compared with those of in situ experiments using three metrics, the perceived temperature, the thermal state votes, and the overall skin temperature of participants. The first metric assesses the measured indoor temperature (called the control temperature distribution in this study) around a participant when a thermal state vote is recorded. The indoor temperature around participants is measured by the same temperature sensor used in this study [13]. The second metric measures the participant's thermal state votes at a temperature range, otherwise known as the thermal state vote distribution in this study. Finally, since the human body controls skin temperature to balance heat transfer to the surrounding environment [50,51,53,[67][68][69], the third metric is the overall skin temperature of participants. Consequently, we proposed the following hypotheses: where, PR w temp, condtion denotes the mean of the overall skin temperature on the body at a specific indoor temperature range (i.e., cool, neutral, and warm) in the in situ experiments of a certain comparison condition, and PR c temp,condition denotes the mean of the overall skin temperature on the body at a specific indoor temperature range in the IVE experiments of a certain comparison condition.
The cool, neutral, and warm indoor temperature ranges were selected based on the ASHRAE psychrometric chart [70], where cool temperature range falls below 22 • C, neutral temperature range falls between 22 • C and 27 • C, and warm temperature range falls above 27 • C in both cold and warm outdoor conditions. Consequently, the neutral temperature range represents the comfort zone in both outdoor conditions, whereas cool and warm temperature ranges represent the thermal conditions outside the comfort zone.

Recruitment
The Institutional Review Board (IRB) of the university approved this study prior to the recruitment. Participants in this research included undergraduate students, graduate students, and staff members. Participants were recruited using flyers and word of mouth. Incentives such as payments ($10 per hour) were provided to the participants to participate in this study.

Climate Chamber
A climate chamber (shown in Figure 1) situated on the university's campus was used for IVE and in situ experiments. The chamber is a refurbished office space that includes a testing space, control and resting space, and a mechanical room that is also used as storage. The walls and ceiling of the testing space are constructed with a wood joist framework, and plywood decking is used on top. The insides of the walls are lined with moisture-resistant gypsum boards. The capacity and performance of the chamber were tested before conducting the experiments. The chamber can simulate a wide variety of climatic conditions, including temperatures as low as 60 • F/15 • C and as high as 90 • F/32 • C, with relative humidity levels ranging from 40% to 90% RH. A thermostat and a humidity sensor linked to the chamber's HVAC system control the room temperature and humidity, which can be set and adjusted using the chamber control software provided by Metasys ® . In addition, the chamber's HVAC system also maintains and monitors the CO 2 level. which can be set and adjusted using the chamber control software provided by Metasys ® . In addition, the chamber's HVAC system also maintains and monitors the CO2 level.

Indoor Temperature
The indoor temperature surrounding the participants was measured using Vernier surface temperature sensors (description in Table 1). Three of these sensors were mounted at ASHRAE's suggested heights of 4/0.1, 24/0.6, and 43/1.1 inches/meters from the floor [45]. Among them, the sensor placed at the height of 24/0.6 inches/meters was chosen as the closest-participant air temperature (i.e., the control temperature) ( Figure 2) [71]. Temperature data was sent at one-second intervals and recorded using Logger Pro 13. Furthermore, outdoor air temperature data were recorded just before the start of each experimental session. The source of such data is the Integrated Surface Hourly Database at the NOAA's National Climate Data Center (NCDC) [72].

Indoor Temperature
The indoor temperature surrounding the participants was measured using Vernier surface temperature sensors (description in Table 1). Three of these sensors were mounted at ASHRAE's suggested heights of 4/0.1, 24/0.6, and 43/1.1 inches/meters from the floor [45]. Among them, the sensor placed at the height of 24/0.6 inches/meters was chosen as the closest-participant air temperature (i.e., the control temperature) ( Figure 2) [71]. Temperature data was sent at one-second intervals and recorded using Logger Pro 13. Furthermore, outdoor air temperature data were recorded just before the start of each experimental session. The source of such data is the Integrated Surface Hourly Database at the NOAA's National Climate Data Center (NCDC) [72]. which can be set and adjusted using the chamber control software provided by Metasys ® . In addition, the chamber's HVAC system also maintains and monitors the CO2 level.

Indoor Temperature
The indoor temperature surrounding the participants was measured using Vernier surface temperature sensors (description in Table 1). Three of these sensors were mounted at ASHRAE's suggested heights of 4/0.1, 24/0.6, and 43/1.1 inches/meters from the floor [45]. Among them, the sensor placed at the height of 24/0.6 inches/meters was chosen as the closest-participant air temperature (i.e., the control temperature) ( Figure 2) [71]. Temperature data was sent at one-second intervals and recorded using Logger Pro 13. Furthermore, outdoor air temperature data were recorded just before the start of each experimental session. The source of such data is the Integrated Surface Hourly Database at the NOAA's National Climate Data Center (NCDC) [72].
The same surface temperature sensors (shown in Table 1) were used to collect the overall skin temperature, and the data were captured using the same Logger Pro 13 application at one-second intervals. The sensors were placed at eight body locations of participants for sampling their skin temperatures, i.e., forehead, neck, chest, upper back, posterior forearm, hand, anterior calf, and foot. The weighting factors of the mean skin temperature equations and the sensitivity of the body parts [50,52,[73][74][75][76] were used to choose these local body regions.

Overall Skin Temperature
The same surface temperature sensors (shown in Table 1) were used to collect the overall skin temperature, and the data were captured using the same Logger Pro 13 application at one-second intervals. The sensors were placed at eight body locations of participants for sampling their skin temperatures, i.e., forehead, neck, chest, upper back, posterior forearm, hand, anterior calf, and foot. The weighting factors of the mean skin temperature equations and the sensitivity of the body parts [50,52,[73][74][75][76] were used to choose these local body regions.
The overall skin temperature (OST) was then calculated using the following formula: where, is the chest skin temperature, is the upper back skin temperature, is the forearm skin temperature, is the hand skin temperature, is the calf skin temperature and is the foot skin temperature. This formula was derived based on the 17-point mean skin temperature formula [77]. The following steps explain the derivation method: 1. Out of 17 skin locations in the reference formula, the weighting factors of six locations i.e., chest, upper back, forearm, hand, calf, and foot, were selected. 2. The results from a previous study [71] showed that the temperature on the forehead was significantly different between IVE and in situ experiments because of the use of the HMD device. Thus, to avoid the HMD's impact on the overall skin temperature results, the final derived equation did not include the forehead skin temperature to ensure comparability between IVE and in situ experiments. 3. The final derived equation did not include the neck skin temperature because the reference formula did not consider the neck. 4. The adjusted weighting factors shown in equation 1 were then obtained by dividing each of the original six weighting factors by the total sum of those same six weighting factors. The overall skin temperature (OST) was then calculated using the following formula: where, T ch is the chest skin temperature, T u is the upper back skin temperature, T f o is the forearm skin temperature, T h is the hand skin temperature, T ca is the calf skin temperature and T f t is the foot skin temperature. This formula was derived based on the 17-point mean skin temperature formula [77]. The following steps explain the derivation method:

1.
Out of 17 skin locations in the reference formula, the weighting factors of six locations i.e., chest, upper back, forearm, hand, calf, and foot, were selected.

2.
The results from a previous study [71] showed that the temperature on the forehead was significantly different between IVE and in situ experiments because of the use of the HMD device. Thus, to avoid the HMD's impact on the overall skin temperature results, the final derived equation did not include the forehead skin temperature to ensure comparability between IVE and in situ experiments.

3.
The final derived equation did not include the neck skin temperature because the reference formula did not consider the neck. 4.
The adjusted weighting factors shown in equation 1 were then obtained by dividing each of the original six weighting factors by the total sum of those same six weighting factors.

Surveys
In this study, several questionnaire data were collected using the Qualtrics online software:

1.
Demographics: Age, gender, education level, and employment status were collected only on the first study visit.

2.
General information: Information such as food and beverage intake, cigarette smoking within the past hour before the experiment, alcohol intake, and intense physical activ-ity within the past 12 h before the experiment was collected before each experimental trial. This information was used for the pre-screening of the participants. 3.
Thermal states: The participant's thermal sensation was measured using the ASHRAE Standard 55 Thermal Comfort descriptive seven-point Likert scale [45,46]. The general thermal comfort and acceptability were measured using six-point Likert scales. Table 2 shows the thermal state scales. The general thermal acceptability scale was then grouped into two categories by combining the negative scales into the "Not Acceptable" group and the positive scales into the "Acceptable" group.

4.
Igroup Presence Questionnaire (IPQ): The participant's virtual experience, particularly presence in IVE, was collected using the IPQ questionnaire [60]. As mentioned earlier, four sub-measures are included in this 13-question survey, and they are general presence (one question), spatial presence (four questions), involvement (four questions), and experienced realism (four questions). A five-point scale (strongly disagree (1) to strongly agree (5)) is used to assess all four sub-measures. Each submeasure score is calculated by summing the answers to the individual sub-measure questions and then translating the total into a percent. As a result, the score varies from 20 to 100 for each sub-measure, respectively. A high score suggests that the individual has a higher presence in IVE, and the lower score indicates a low presence in IVE.

5.
Simulator Sickness Questionnaire (SSQ): The participant's motion sickness or cybersickness while using IVE was collected using the SSQ questionnaire [62]. As mentioned earlier, three sub-measures are included in this 16-question survey, and they are nausea (seven questions), oculomotor (seven questions), and disorientation (seven questions). The three sub-measures are measured using a four-point scale (none (0) to severe (3)

Immersive Virtual Environment
An HTC Vive head-mounted display device (HMD) was used to deliver immersive virtual environments. Autodesk 3ds Max was used to create the chamber's 3D model. As illustrated in Figure 3, the model was then imported into Unreal Engine 4 with the material textures and lightmaps. In an IVE, users view the virtual world of the chamber interior using a head-mounted display (HMD), while the climate chamber provides space heating and cooling. This study did not include any interactions with the virtual environment.

Immersive Virtual Environment
An HTC Vive head-mounted display device (HMD) was used to deliver immersive virtual environments. Autodesk 3ds Max was used to create the chamber's 3D model. As illustrated in Figure 3, the model was then imported into Unreal Engine 4 with the material textures and lightmaps. In an IVE, users view the virtual world of the chamber interior using a head-mounted display (HMD), while the climate chamber provides space heating and cooling. This study did not include any interactions with the virtual environment.

Pre-Experiment Session
The participants were instructed to wear a previously specified piece of clothing (clo of 0.5-0.6) to the lab in all sessions after signing their consent forms, which included light slacks and a light long-sleeved shirt or a T-shirt. After arriving at the chamber, the participants sat in the control/resting space (temperature set at 75 °F/23.8 °C and humidity set at 55%) where a pre-experiment general survey was administered to screen them for alcohol or cigarette intake, as well as for any strenuous physical activities at least 12 h before each experimental session. If they were found to be having any of the above, they were excluded from the experiments. Their demographics and other personal information were also recorded. This pre-experiment session, including resting time, took about 10 min. The session was also intended to help participants acclimatize to the chamber's indoor temperature to mitigate any influence from their prior thermal state. Afterward, participants were instructed to get into the testing space, sit in a chair, and have the skin temperature sensors affixed to their bodies.

Experimental Session
This study used the same experimental approach as the previous one [71]. Data were collected in two outdoor temperature conditions, one in the cold condition (December-March) and the other in warm (May-September). The experimental design procedure is shown in Figure 4. Each participant partook in two experimental settings (i.e., IVE and in situ) conducted in cold or warm conditions. Each experimental setting included a heating or a cooling sequence. The experimental sessions of IVE and in situ were carried out consecutively on the very day. In contrast, the experimental sessions of the same participants using the heating or the cooling sequence were at least two weeks apart. The heating sequence involved three temperature steps, which were 65 °F/18.3 °C, 75 °F/23.8 °C, and 85  The participants were instructed to wear a previously specified piece of clothing (clo of 0.5-0.6) to the lab in all sessions after signing their consent forms, which included light slacks and a light long-sleeved shirt or a T-shirt. After arriving at the chamber, the participants sat in the control/resting space (temperature set at 75 • F/23.8 • C and humidity set at 55%) where a pre-experiment general survey was administered to screen them for alcohol or cigarette intake, as well as for any strenuous physical activities at least 12 h before each experimental session. If they were found to be having any of the above, they were excluded from the experiments. Their demographics and other personal information were also recorded. This pre-experiment session, including resting time, took about 10 min. The session was also intended to help participants acclimatize to the chamber's indoor temperature to mitigate any influence from their prior thermal state. Afterward, participants were instructed to get into the testing space, sit in a chair, and have the skin temperature sensors affixed to their bodies.

Experimental Session
This study used the same experimental approach as the previous one [71]. Data were collected in two outdoor temperature conditions, one in the cold condition (December-March) and the other in warm (May-September). The experimental design procedure is shown in Figure 4. Each participant partook in two experimental settings (i.e., IVE and in situ) conducted in cold or warm conditions. Each experimental setting included a heating or a cooling sequence. The experimental sessions of IVE and in situ were carried out consecutively on the very day. In contrast, the experimental sessions of the same participants using the heating or the cooling sequence were at least two weeks apart. The heating sequence involved three temperature steps, which were 65 • F/18. °F/29.4 °C. The cooling sequence also involved three temperature steps in reverse, which were 85 °F/29.4 °C, 75 °F/23.8 °C, and 65 °F/18.3 °C. For instance, in a heating sequence, the temperature is first set at 65 °F/18.3 °C, and then the indoor temperature is monitored until it stabilizes at that set-point. After the temperature becomes stable, the thermal states of participant's are recorded, and the temperature is changed from 65 °F/18.3 °C to 75 °F/23.8 °C. Later, when the temperature becomes stable at 75 °F/23.8 °C, the thermal states are again recorded, and the temperature is adjusted to 85 °F/29.4 °C, and the same data collection procedure is followed. On the other hand, an exact but reverse order is observed in the cooling sequence where the temperature is initially set at 85 °F/29.4 °C, then decreased to 75 °F/23.8 °C, and then to 65 °F/18.3 °C after collecting thermal state data at each of those step temperatures. Overall, four experimental sessions were conducted in both cold and warm conditions, and these were (1)   In order to counterbalance and reduce the order effect, participants were allocated randomly to each of the experimental sessions. The heating sequence was followed by cooling for 50 percent of participants, while the cooling sequence was followed by heating for the other 50 percent. In a similar fashion, 50 percent of the participants did the in situ experiment first, followed by the IVE experiment. In addition, between every IVE and in situ experimental session, there was a 10-minute break. The sensors were detached from the participant's bodies so that they could move outside to the resting space and get acclimatized to the comfort temperature (75 °F23.8 °C). This was done to mitigate the influence of their thermal states on the next experimental session. The skin temperatures were constantly measured at one-second intervals from the beginning through the completion of each experimental session. The participants' thermal state responses were recorded after the indoor temperature stabilized around each step temperature. An identical data collection approach was used for both IVE and in situ experiments. During the IVE experimental session, the participants used the HTC device to view the virtual environment. Before each IVE trial, there was a familiarization step where the participants explored the virtual scene for a few minutes to calm their excitement and anxiety. The participants' tasks were restricted to sedentary levels of physical activity, such as seated at rest. The surface skin temperature sensors were directly taped on the skin. The indoor relative humidity was kept at 55% during the tests, while CO2 levels were kept below 1000 parts per million. In order to counterbalance and reduce the order effect, participants were allocated randomly to each of the experimental sessions. The heating sequence was followed by cooling for 50 percent of participants, while the cooling sequence was followed by heating for the other 50 percent. In a similar fashion, 50 percent of the participants did the in situ experiment first, followed by the IVE experiment. In addition, between every IVE and in situ experimental session, there was a 10-minute break. The sensors were detached from the participant's bodies so that they could move outside to the resting space and get acclimatized to the comfort temperature (75 • F23.8 • C). This was done to mitigate the influence of their thermal states on the next experimental session. The skin temperatures were constantly measured at one-second intervals from the beginning through the completion of each experimental session. The participants' thermal state responses were recorded after the indoor temperature stabilized around each step temperature. An identical data collection approach was used for both IVE and in situ experiments. During the IVE experimental session, the participants used the HTC device to view the virtual environment. Before each IVE trial, there was a familiarization step where the participants explored the virtual scene for a few minutes to calm their excitement and anxiety. The participants' tasks were restricted to sedentary levels of physical activity, such as seated at rest. The surface skin temperature sensors were directly taped on the skin. The indoor relative humidity was kept at 55% during the tests, while CO 2 levels were kept below 1000 parts per million.

Post-Experiment Session
After completing an experimental session, the participants completed the post-IVE experiment surveys inside the chamber. These surveys consisted of the IPQ and SSQ questionnaires to measure the participant's sense of presence and cybersickness in IVEs, respectively. No post-experiment surveys were conducted after in situ experiments.

Data Preparation and Cleaning
After data collection, the average of indoor control temperature (i.e., data from sensor at 24/0.6 inches/meters) and skin temperature were computed by considering the data from the exact starting time when the indoor temperature had reached the desired level (i.e., steadied at target temperature) to the completion of the thermal state questionnaires.
All data were synchronized using the universal time stamp. For analyzing the control temperature at each level of a thermal state scale, the data were grouped according to the number of scales of a specific thermal state level (e.g., seven groups for sensation, two groups for acceptability, and six groups for comfort). For analyzing thermal state vote responses and overall skin temperatures, the data were grouped into three categories, i.e., cool, neutral, and warm based on comfort zone and outside comfort zone indoor temperature ranges shown in the ASHRAE psychrometric chart. According to this chart, for both cold and warm outdoor conditions, the indoor comfort zone falls between 22 • C and 27 • C and outside comfort zone falls below 22 • C and above 27 • C. As a result, in this study, the cool category (outside comfort zone) includes all the observations below 22 • C, and the warm category (outside comfort zone) includes all the observations above 27 • C. The neutral category (comfort zone) includes all the observations between 22 • C and 27 • C.
Since C3 and C4 compare cold and warm outdoor temperature conditions, data cleaning was necessary to ensure contrasting outdoor temperature conditions. The local outdoor air temperature typically varies from 6.1 • C/43 • F to 32.7 • C/91 • F annually, and the hottest periods generally run from mid-June to mid-September, while December through mid-February are, on average, the coldest months in any year [78]. Therefore, the data collected from December to March were considered cold conditions, and data collected from May to September were considered warm conditions. Due to the limitations of the experiment location's outdoor temperature conditions, such as long, hot summers and brief, moderate winters [72], the outdoor temperatures were identical during some of the experiments conducted in the December-March period and May-September period. Therefore, it was necessary to clean the data to ensure that each participant's cold and warm condition data were collected in two contrasting outdoor temperature conditions. As such, this study assumed an upper-bound outdoor temperature of 70 • F/21.1 • C for the cold condition and a lower-bound outdoor temperature of 80 • F/26.6 • C for the warm condition. These thresholds were chosen based on references to the local average temperature in March and May. Therefore, if the corresponding outdoor temperature during experiment sessions in cold conditions was greater than 70 • F/21.1 • C, that particular data was removed. Likewise, if the corresponding outdoor temperature during experiment sessions in warm conditions was less than 80 • F/26.6 • C, then that particular data was removed. It is to be noted that the data cleaning was not performed for comparisons C1 and C2 because, in these comparisons, the in situ and IVE data were compared within the respective seasonal period. After cleaning, 42 observations (out of 270) were removed from comparison C3, and 69 observations (out of 305) were removed from comparison C4. Appendix A Table A1 demonstrates that, after cleaning, the mean outdoor temperature between the warm and cold conditions differed by at least 27 • F/15 • C.
Furthermore, after cleaning the data in comparisons C3 and C4, the mean control temperature at the three indoor temperature ranges (i.e., cool, neutral, and warm) was compared in all four comparisons to ensure that the mean control temperature in the IVE experiments does not differ significantly from the in situ experiments (e.g., the cooling sequence in the warm outdoor condition/cool indoor temperature range in in situ vs. the cooling sequence in the warm outdoor condition/cool indoor temperature range in IVE). A two-tailed pairwise T-test was used for comparison C1, and an independent sample T-test was used for comparisons C2, C3, and C4. The tests revealed that all the p-values were non-significant (p > 0.05) in comparisons C1 and C2, indicating that under the same outdoor temperature conditions, the control temperature of the IVE experiments was comparable with in situ experiments. However, there were two cases, C3 at the cool indoor temperature range and C4 at the neutral indoor temperature range, where the p-values were less than 0.05. The tests suggest that the control temperature of the IVE experiments differed significantly from that of the in situ experiments in those two cases.

Statistical Analysis and Sample Size
To analyze the mean control temperature at each level of a thermal state scale (i.e., first hypothesis), two-tailed pairwise T-tests were used for comparison C1 because of paired data, and independent sample T-tests were used for comparisons C2, C3, and C4 because of unpaired data. At the same time, to analyze the overall skin temperature at the three indoor temperature ranges (i.e., third hypothesis), two-tailed pairwise T-tests for comparison C1 and independent sample T-tests for comparisons C2, C3, and C4 were used. On the other hand, to analyze the thermal state vote responses at the three indoor temperature ranges (i.e., second hypothesis), two-tailed Wilcoxon Signed Rank tests for comparison C1 and Rank Sum tests for comparisons C2, C3, and C4 were used. The statistical tests were performed using the statistical software package RStudio [79] with a significance level set at α < 0.05.
Seventy-two people were recruited for this study, including thirty-seven male and thirty-five female participants. The sample size supports a commonly acceptable statistical power of 0.8 [80], with the following differences in the sample means between the IVE and in situ experiments: 1 scale point for average thermal state vote, 1 • C for overall skin temperature, and 1.7 • C for control temperature. Out of seventy-two people, 50% (n = 36) were White, 25% (n = 18) were Asian, and 25% (n = 18) were from other ethnicities. The mean and standard deviation of the participants' ages were 24.01 and 5.83, respectively. When classified according to BMI, 65.2% of participants were normal (n = 47), 16.6% (n = 12) were obese, 22.2% (n = 16) were overweight, and 2.7% (n = 2) were underweight. Table A2 compares the findings of the control temperature between IVE and in situ experiments at each level of a thermal state scale (such as thermal sensation, acceptability, and comfort scales) for all four comparisons. The control temperature differences were analyzed using two-tailed independent sample T-tests.

Control Temperature at Each Level of a Thermal State Scale
The p-values of all the T-tests at each level of thermal sensation and thermal acceptability scales were non-significant (p > 0.05), indicating the failure to reject the null hypothesis. The results suggest that the control temperature at each level of thermal sensation and acceptability scales did not differ significantly between IVE and in situ experiments in all four comparisons (i.e., C1, C2, C3, and C4), whereas thermal comfort had only one case in comparison C3 at -2 (uncomfortable) level, where the control temperature differed significantly between IVE and in situ experiments. The p-values of T-tests in other thermal comfort cases were non-significant (p > 0.05). Overall, the results show that outdoor temperature differences did not impact the control temperature at each level of thermal state scales, especially in comparisons C3 and C4.
Although the p-values were not significant in most cases, the results cannot be validly interpreted with a sufficient statistical power due to the small sample sizes in some cases, such as in extreme thermal sensation votes (i.e., −3 and +3), extreme acceptability, and comfort votes (i.e., −3), where the sample sizes were less than nine in all four comparisons (Figures 5-7). Thus, a further analysis was made by considering the distributions of the control temperature over the thermal state scales, i.e., it was hypothesized that the distributions of the control temperature over the thermal state scales do not significantly differ between IVE and in situ experiments across all four comparisons. The Kolmogorov-Smirnov (K-S) test, at a significance level of 0.05, was used to test this hypothesis. The K-S test compares the cumulative distribution functions of two samples in a non-parametric way. The null hypothesis of the K-S test is that the cumulative distribution functions of the two samples are similar. The alternative hypothesis is that the cumulative distribution functions of the two samples are not similar. A total of 12 tests (i.e., three thermal state scales, four comparisons) were performed where the control temperature distribution was compared over thermal sensation and thermal comfort scales. The thermal acceptability scale was not considered for this analysis because it only had two levels with sufficient sample sizes. The K-S test results are provided in Table A3, which shows that all p-values are non-significant (p > 0.05), implying that the distributions of the control temperature over the thermal sensation and thermal comfort scales in all comparisons (i.e., C1, C2, C3, and C4) between IVE and in situ experiments are not significantly different. Therefore, the results of the K-S test provide additional support to the results of the T-tests.
Furthermore, by visualizing the control temperature distribution as boxplots, the control temperature distribution patterns of IVE and in situ experiments along with their mean values were similar across all the thermal sensation vote levels in all comparisons (i.e., C1, C2, C3, and C4) as shown in Figure 5. As for thermal acceptability and comfort scales, the mean values of the control temperature at each level are shown in Figures 6 and 7. In comparison C3, in the thermal comfort scale at level −2 (uncomfortable) (Figure 7), the mean control temperature difference between IVE and in situ was 4.8 • C, resulting in a statistically significant T-test result.
In summary, the participants chose their thermal states (i.e., sensation, acceptability, and comfort) in both IVE and in situ experiments based on their surrounding indoor environment. The outdoor temperature differences did not seem to influence their choice of votes.
the two samples are similar. The alternative hypothesis is that the cumulative distribution functions of the two samples are not similar. A total of 12 tests (i.e., three thermal state scales, four comparisons) were performed where the control temperature distribution was compared over thermal sensation and thermal comfort scales. The thermal acceptability scale was not considered for this analysis because it only had two levels with sufficient sample sizes. The K-S test results are provided in Table A3, which shows that all p-values are non-significant (p > 0.05), implying that the distributions of the control temperature over the thermal sensation and thermal comfort scales in all comparisons (i.e., C1, C2, C3, and C4) between IVE and in situ experiments are not significantly different. Therefore, the results of the K-S test provide additional support to the results of the T-tests.
Furthermore, by visualizing the control temperature distribution as boxplots, the control temperature distribution patterns of IVE and in situ experiments along with their mean values were similar across all the thermal sensation vote levels in all comparisons (i.e., C1, C2, C3, and C4) as shown in Figure 5. As for thermal acceptability and comfort scales, the mean values of the control temperature at each level are shown in Figures 6  and 7. In comparison C3, in the thermal comfort scale at level −2 (uncomfortable) (Figure  7), the mean control temperature difference between IVE and in situ was 4.8 °C, resulting in a statistically significant T-test result.
In summary, the participants chose their thermal states (i.e., sensation, acceptability, and comfort) in both IVE and in situ experiments based on their surrounding indoor environment. The outdoor temperature differences did not seem to influence their choice of votes.
(a)  Table A4 shows the results of the thermal state votes in all four comparisons between IVE and in situ experiments. The votes were classified based on the three indoor temperature ranges (cool, neutral, and warm). For comparison C1, the Wilcoxon Signed Rank Test was used because of paired data. For comparisons C2, C3, and C4, the Wilcoxon Rank Sum test was used to analyze the vote differences because of unpaired data.

Thermal State Votes
The results show that, for thermal sensation votes, all p-values in the three indoor temperature ranges in all comparisons (i.e., C1, C2, C3, and C4) were non-significant, i.e., (p > 0.05). The results suggest that in all four comparisons, the participants' votes for thermal sensation in IVE experiments were comparable to those in in situ experiments, and the differences in outdoor temperature (i.e., C3 and C4) did not impact the thermal sensation votes. For thermal acceptability and comfort votes, the p-values in the three indoor temperature ranges for all comparisons were not significant (p > 0.05), except for only four cases (Table A4). This is because, in most cases in four comparisons, the differences in mean sensation, acceptability, and comfort votes between IVE and in situ experiments were less than 0.3 (Figures 8-10), except in comparison C1 under the cool temperature range and comparison C4 under neutral temperature range, where the mean differences in the thermal comfort votes were nearer to 0.5 ( Figure 10). However, in the four significant cases, i.e., in comparisons C1 and C3 under the cool indoor temperature range for thermal acceptability votes, and in comparison C3 under cool and neutral indoor temperature ranges for thermal comfort votes (Table A4), the mean vote differences in those four cases were greater than 0.65 (Figures 9 and 10), with 1.14 being the largest mean difference in C3 under the cool indoor temperature range for thermal acceptability votes (Figure 9). Thus, the above results indicate that the thermal acceptability and comfort votes in the remaining cases were comparable between IVE and in situ experiments apart from those four cases. In other words, the participants' thermal states votes were not influenced by the different outdoor temperature conditions.  Table A4 shows the results of the thermal state votes in all four comparisons between IVE and in situ experiments. The votes were classified based on the three indoor temperature ranges (cool, neutral, and warm). For comparison C1, the Wilcoxon Signed Rank Test was used because of paired data. For comparisons C2, C3, and C4, the Wilcoxon Rank Sum test was used to analyze the vote differences because of unpaired data.

Thermal State Votes
The results show that, for thermal sensation votes, all p-values in the three indoor temperature ranges in all comparisons (i.e., C1, C2, C3, and C4) were non-significant, i.e., (p > 0.05). The results suggest that in all four comparisons, the participants' votes for thermal sensation in IVE experiments were comparable to those in in situ experiments, and the differences in outdoor temperature (i.e., C3 and C4) did not impact the thermal sensation votes. For thermal acceptability and comfort votes, the p-values in the three indoor temperature ranges for all comparisons were not significant (p > 0.05), except for only four cases (Table A4). This is because, in most cases in four comparisons, the differences in mean sensation, acceptability, and comfort votes between IVE and in situ experiments were less than 0.3 (Figures 8-10), except in comparison C1 under the cool temperature range and comparison C4 under neutral temperature range, where the mean differences in the thermal comfort votes were nearer to 0.5 ( Figure 10). However, in the four significant cases, i.e., in comparisons C1 and C3 under the cool indoor temperature range for thermal acceptability votes, and in comparison C3 under cool and neutral indoor temperature ranges for thermal comfort votes (Table A4), the mean vote differences in those four cases were greater than 0.65 (Figures 9 and 10), with 1.14 being the largest mean difference in C3 under the cool indoor temperature range for thermal acceptability votes ( Figure 9). Thus, the above results indicate that the thermal acceptability and comfort votes in the remaining cases were comparable between IVE and in situ experiments apart from those four cases. In other words, the participants' thermal states votes were not influenced by the different outdoor temperature conditions.

Overall Skin Temperature
The overall mean skin temperatures were compared between IVE and in situ experiments using paired T-tests in comparison C1 and independent sample T-tests in comparisons C2, C3, and C4 at each of the three indoor temperature ranges. The results in Table  A5 reveal that all of the p-values in comparisons C1, C2, and C4 were non-significant (p > 0.05), indicating that the null hypothesis was not rejected in those cases. However, in comparison C3, the null hypothesis was rejected (p < 0.05) at all three temperature ranges. Thus, the results suggest that the overall mean skin temperature was comparable between

Overall Skin Temperature
The overall mean skin temperatures were compared between IVE and in situ experiments using paired T-tests in comparison C1 and independent sample T-tests in comparisons C2, C3, and C4 at each of the three indoor temperature ranges. The results in Table A5 reveal that all of the p-values in comparisons C1, C2, and C4 were non-significant (p > 0.05), indicating that the null hypothesis was not rejected in those cases. However, in comparison C3, the null hypothesis was rejected (p < 0.05) at all three temperature ranges. Thus, the results suggest that the overall mean skin temperature was comparable between IVE and in situ experiments across all four comparisons except C3. In other words, the variations in outdoor temperature, specifically in C4, did not impact the participant's overall mean skin temperature.
To investigate the reason for significant differences in C3, a further individual analysis was performed on the skin temperatures of the six locations using the independent sample T-tests. Only the tests with significant results are reported in Table A6. The results reveal that in C3, forearm, hand, and foot skin temperature across all three indoor temperature ranges differed significantly between IVE and in situ experiments. Thus, the individual skin temperature differences may have caused the overall mean skin temperature differences in C3. In comparison C4, upper back, forearm, and hand skin temperatures were significantly different in the neutral temperature range. However, these individual skin temperature differences did not influence the overall mean skin temperature results in C4. It also needs to be noted that, in comparisons C1 and C2, all the six individual skin temperatures were not significant (p > 0.05) between IVE and in situ experiments in all three indoor temperature ranges. Table 7 shows the mean and standard deviation results of the participants' sense of presence in IVE experiments. The results are organized according to the four comparisons. To assess the suitability of their findings, researchers often compare their presence scores with the available literature scores [63,64]. According to the study [64], the average general presence score in the IPQ online datasets was 38.16, with a standard deviation of 17.53. Thus, it can be noted that in all comparisons (i.e., C1, C2, C3, and C4), the mean general presence, as well as the mean spatial presence, involvement, and realism scores, are higher than the published mean scores with a lower standard deviation. The findings show that the participants in this study had a higher perception of presence than the mean presence of reported studies. Table 7 shows the mean and standard deviation results of the participants' cybersickness in IVE. Based on the literature, the mean total cybersickness scores ranging from 5.30 [81] to 27.25 [65] have been published. In this study, the mean scores ranged from 1.75 to 30.21 for nausea, 11.0 to 25.74 for oculomotor, and 8.7 to 41.47 for disorientation [82]. Thus, it can be noted that for nausea, disorientation, and total cybersickness, the scores fall within the range for all four comparisons. However, the oculomotor score was slightly larger than the reported mean score in comparisons C2 and C4.

Discussions and Limitations
The analysis suggests that, at the same level of participants' thermal sensation, acceptability, and comfort vote, the perceived indoor temperature is comparable between IVE and in situ experiments when they are performed in the same outdoor temperature conditions (i.e., C1 and C2) as well as in variable outdoor temperature conditions (i.e., C3 and C4) (Table A2). Similar results have also been observed when the perceived indoor temperature of IVE and in situ experiments were compared using the overall distribution of participants' thermal sensation, acceptability, and comfort vote across all four comparisons (C1, C2, C3, and C4) (Table A3). In other words, these results show that the outdoor temperature variations, specifically in comparisons C3 and C4, did not impact the participants' perceived indoor temperature at each thermal state level in IVE experiments when compared with the baseline in situ experiments. The experimental procedure can create comparable indoor temperature conditions regardless of the outdoor temperature conditions (e.g., C3 and C4), and such comparability is not affected by the difference in the nature of comparisons, i.e., C1 and C2 versus C3 and C4.
Using ASHRAE's indoor temperature ranges (i.e., cool, neutral, and warm) to organize and compare thermal state votes has produced mixed results (Table A4). On the one hand, the results of thermal sensation votes were comparable between IVE and in situ experiments at all three temperature ranges when those experiments were performed in the same outdoor temperature conditions (i.e., C1 and C2) as well as in variable outdoor temperature conditions (i.e., C3 and C4). This result suggests that the outdoor temperature variations, specifically in comparisons C3 and C4, did not impact the participants' thermal sensation votes in IVE experiments when compared with the baseline in situ experiments. On the other hand, when the IVE and in situ experiments were conducted in the same outdoor temperature conditions, i.e., C1 (cooling in warm outdoor conditions) and C2 (heating in cold outdoor conditions), the thermal acceptability votes were comparable at all three temperature ranges in only C2, and the thermal comfort votes were comparable at all three temperature ranges in both C1 and C2. Furthermore, in C4, the thermal acceptability and comfort votes in the IVE experiments conducted in heating in warm outdoor conditions were comparable to the votes of the in situ experiments conducted in heating in cold outdoor conditions at all three temperature ranges. In other words, the outdoor temperature variations did not impact the participants' thermal acceptability and comfort votes in IVE experiments. However, discrepancies in the thermal comfort and acceptability votes were observed in only a few cases (four out of twenty-four cases) in both C1 (same outdoor temperature condition) and C3 (different outdoor temperature condition). These discrepancies may not be simply attributed to control temperature differences. For example, even though the control temperature in C3 (the cool indoor temperature range) and C4 (the neutral indoor temperature range) is significantly different between IVE and in situ experiments, thermal comfort, and acceptability votes were different only in C3, not in C4, implying that the control temperature difference should not be the sole factor influencing thermal state votes.
Analysis of the overall skin temperature reveals that it is comparable between IVE and in situ experiments when they are performed in the same outdoor temperature conditions (i.e., C1 and C2) and variable outdoor temperature conditions, but only in C4 at all three indoor temperature ranges (Table A5). These results suggest that the outdoor temperature variations, specifically in comparison C4 (i.e., cooling in warm outdoor conditions for in situ experiments vs. cooling in cold outdoor conditions for IVE experiments), did not affect the participant's overall skin temperature. However, in the variable outdoor temperature conditions of comparison C3 (i.e., cooling in warm outdoor conditions for in situ experiments vs. cooling in cold outdoor conditions for IVE experiments), the overall skin temperatures were different between IVE and in situ experiments at all three indoor temperature ranges. These results suggest that the outdoor temperature variations in comparison C3 may have impacted participants' overall skin temperature. Further investigation shows significant mean skin temperature differences at three locations (forearm, hand, and foot) at all three temperature ranges in C3. Therefore, these individual differences may have influenced overall mean skin temperature differences in C3. Contrastingly, in C4, significant differences were only observed in the upper back, forearm, and hand skin temperatures at a neutral indoor temperature range. Nonetheless, these individual differences did not influence the overall mean skin temperature results in C4. These observations suggest that more studies are needed on the relationship between local and the overall skin temperatures and the choice of the overall skin temperature analysis method. It is also worth noting that, in general, irrespective of IVE and in situ experiments, higher local and overall skin temperatures were observed in the warm outdoor condition (Table A6), suggesting the potential impact of outdoor temperature on the skin temperatures. This finding is supported by previous non-IVE studies conducted in a climate chamber, which reported higher mean skin [83] and finger skin temperatures [84] in summer than in winter. Thus, it seems that the overall skin temperature is more sensitive to outdoor temperature than thermal state votes.
The presence and cybersickness scores were comparable to other reported studies, except for the oculomotor score ( Table 7). The scores suggest that the virtual environment is overall adequate to support this study. The oculomotor metric measures the "eyestrain, difficulty to focus, blurred vision, and headache" [62]. In this study, since the visual aspect was not the focus and participants' activities did not involve visual tasks, the impact of the oculomotor metric seems negligible. On the other hand, it is not clear if metrics of cybersickness are associated with physiological responses, such as skin temperatures, because cybersickness is a controlled variable. So, it is unknown if any of the local skin temperature differences between IVE and in situ experiments can be explained by cybersickness.
Despite these significant findings, the results may have been influenced by a few limitations. First, the experiments were performed in a humid subtropical climate, with long, hot summers and short, moderate winters [72]. Therefore, the results are limited to such outdoor temperature conditions. Future IVE studies can be performed to validate the findings in different climatic regions. Second, only skin temperatures were used as a physiological response to the thermal conditions. The results show that skin temperatures are more sensitive to outdoor temperature variations. The finding is helpful for future experimenters to choose metrics and mitigate the influence, but the study was not focused on understanding the pathway from outdoor temperature, skin temperature, and thermal states. As thermal states involve psychological factors, separate analyses are needed to understand the pathway. In addition, apart from skin temperatures, other physiological indicators of thermal states, such as heart rate, may be considered in future research. Third, the influence of visual stimuli such as simulating summer and winter scenes (e.g., snow) in IVE experiments may generate different responses [40,85]. So, the relationship between the visual stimuli, outdoor temperature, and physiological responses in both IVE and in situ experiments needs to be investigated further. Fourth, this study focused on investigating the effect of outdoor temperature variations on the participants' thermal states as a whole in IVE experiments and did not investigate the individual characteristics such as gender, age, and ethnicity. Thus, future IVE studies may consider the combined effect of outdoor temperature variations and the individual characteristics on thermal states. Fifth, there is evidence that the radiation from interior surfaces, such as walls, affects human physiology and thermal states [86,87]. Therefore, future IVE studies may consider investigating the effect of surface radiation on human physiology and thermal states under different outdoor temperature conditions. Lastly, the focus of this study is limited to a thermal state analysis. The impact of outdoor temperature conditions on adaptive behavior in IVE experience, in comparisons such as C3 and C4, is not explored.

Conclusions
The study shows that IVE experiments can create an indoor temperature environment for participants that is comparable to the one in in situ experiments when outdoor temperature varies. In other words, a properly designed experimental procedure can support IVE experiments to be conducted in situations where the outdoor temperature does not match the intention of the IVE experiments (C3 and C4). The same procedure may be used for match (C1 and C2) and mismatch (C3 and C4) situations. However, a careful selection of measurement parameters is essential to generate comparable results. The thermal sensation vote of the participants is the most reliable parameter, whereas the overall skin temperature seems less reliable due to its sensitivity to the outdoor temperature. Future studies may be extended to include different outdoor climate types and their impact on adaptive behaviors, to understand the extent to which visual stimuli may play a significant role in affecting the thermal state, and to explore the use of other physiological parameters as reliable thermal state indicators in IVE experiments. Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author.

Conflicts of Interest:
The authors declare no conflict of interest.