Automated Residential Energy Audits Using a Smart WiFi Thermostat-Enabled Data Mining Approach

: Smart WiFi thermostats, when they ﬁrst reached the market, were touted as a means for achieving substantial heating and cooling energy cost savings. These savings did not materialize until additional features, such as geofencing, were added. Today, average savings from these thermostats of 10–12% in heating and 15% in cooling for a single-family residence have been reported. This research aims to demonstrate additional potential beneﬁt of these thermostats, namely as a potential instrument for conducting virtual energy audits on residences. In this study, archived smart WiFi thermostat measured temperature data in the form of a power spectrum, corresponding historical weather and energy consumption data, building geometry characteristics, and occupancy data were integrated in order to train a machine learning model to predict attic and wall R-Values, furnace efﬁciency, and air conditioning seasonal energy efﬁciency ratio (SEER), all of which were known for all residences in this study. The developed model was validated on residences not used for model development. Validation R-squared values of 0.9408, 0.9421, 0.9536, and 0.9053 for predicting attic and wall R-values, furnace efﬁciency, and AC SEER, respectively, were realized. This research demonstrates promise for low-cost data-based energy auditing of residences reliant upon smart WiFi thermostats.


Introduction
In 2018, according to the U.S. Energy Information Administration (EIA), residential buildings accounted for approximately 21% of total electricity consumption as well as 16% of total natural gas consumption in the U.S. [1,2]. The residential sector has been deemed to offer the most cost-effective potential for energy savings among all U.S. buildings [3]. The most common approach for garnering savings has been through utility rebate programs, whereby utilities offer financial incentives for residential investment in energy reduction measures. The rebated measures are generally those with the statistically best savings relative to investment among the entire residential population. In practice, what this has meant is that all rate payers have effectively subsidized the investments of wealthier residents. Researchers have found that upgrading the housing of low-income residences to the median household efficiency would reduce excess energy by 68%. In other words, while residential energy reduction offers the most cost-effective potential among all U.S. buildings, the vast majority of this savings potential comes from low-income residences [4][5][6].
Many factors impact the energy consumption of individual residential buildings, including weather conditions; building geometry; building thermal envelope materials; heating, ventilation, and air conditioning (HVAC) characteristics; and energy-use behavior of the residents [7,8]. However, identifying the energy efficiency priorities for individual residences is not automatic and can be both laborious and expensive. For example, traditional energy audits require a physical visit to a residence, whereby a technician performs air leakage tests; conducts infrared imaging; documents insulation in the walls,

Background
In this section, relevant research pertaining to the standard calculation approaches is presented for: building energy models with sufficient granularity to permit estimates of savings from residential energy upgrades, inverse modeling approaches with sufficient granularity to identify residences in need of upgrades and quantity the resulting savings based on energy data pre-and post-upgrade, and the state-of-the art associated with virtual energy audits.

Building Information Modeling and Simulation for Energy Audits
Energy modeling software (e.g., eQuest, EnergyPlus, IES, and Energy-10) has been used extensively to simulate and predict building energy consumption. Generally, these have required extensive detail about the geometric and energy characteristics of a building, as well as occupancy and control schedules. Examples of their use are extensive and, unfortunately, despite the detail required of data inputs, the energy savings recommendations that result have been very inconsistent [22]. For example, one study evaluated the accuracy of the United States Department of Energy (DOE)-developed eQuest software for predicting energy consumption and estimating savings from upgrades in hotels. Good correspondence was seen between predicted and actual savings based on the building energy efficiency retrofit (BEER) scheme [23]. However, other studies have demonstrated just the opposite [24]. These tools are strongly dependent on the user and require significant engineering time [25]. Much of the time, these tools overpredict energy consumption [16]. For example, the Energy Trust of Oregon performed a study to evaluate building energy simulation programs. Three programs were compared: SIMPLE, REM/Rate, and Home Energy Saver (HES). Detailed audits were conducted, and utility bills were collected for 190 homes. The homes were simulated with the three energy modeling tools, including two levels of detail for HES. The models overpredicted gas use for space heating by an average of 41% in older homes built before 1960 and by 13% for newer homes built after 1989 [26,27]. Likewise, the validity of the Manufactured Home Energy Audit tool was assessed in a two-part study by Oak Ridge National Laboratory (ORNL). Obtained audit and utility data were used to analyze the energy effectiveness of manufactured homes across five counties in the U.S. North and Midwest. The predicted space heating energy consumption was compared to the actual space heating energy consumption. Pre-and post-retrofit comparisons of modeled and actual energy use were made. Results from the pre-retrofit simulations were observed to overpredict space heating energy use from 163% to 109% [28]. Lastly, a recent study by Pacific Northwest National Laboratory on seven homes with deep retrofits showed a range of predicted savings obtained by different auditors from 75% overestimation to 16% underestimation relative to the savings realized for all the homes evaluated [29].

Inverse Energy Modeling for Identifying Residences in Need of Upgrade and Estimating Savings from Upgrades
In 1994, ASHRAE published an Inverse Modeling Toolkit (IMT), which has been used since to estimate savings from various system upgrades [30]. This toolkit is based on a four-step process. The first step is to create statistical three-parameter models of electricity and natural gas consumption as a function of the outdoor air temperature over the energy consumption period. This regression renders estimates of the sensitivity of the consumption to temperature (termed heating and cooling slopes), the building balance-point temperature, and average weather-dependent energy consumption for a meter period. The second step is to apply these to site-relevant typical meteorological year (TMY3) weather data to determine the normalized annual consumption (NAC) for each type of energy. The third step is to derive an NAC for each set of 12 sequential months of utility data. The fourth step is to compare the NACs of multiple buildings to identify average, best, and worst energy performers and to evaluate how the consumption of a building has changed over time. It is this last step that permits measurement of savings post-retrofit of energy efficiency upgrades [31].
A case study of 14 Midwest hospital results showed that the NAC analysis is more stable and informative than the regression coefficients determined from the first step. Additionally, a change in NAC indicates a real change in the energy performance of the building, provided that the savings are greater than 10% (note that ASHRAE suggests that this approach is not, in general, able to measure savings less than 10% [16]). In another study, electric and natural gas historical consumption data were merged with residential building geometry, and historical weather data to determine the energy consumption intensity for each home in a Village of Yellow Springs, Ohio by using a five-parameter fit for the electricity data and a three-parameter fit for the natural gas data. These researchers normalized the NAC calculations with the residential floor area. Using this normalized data, they were able to identify the most promising homes for energy reduction [32].

State-Of-The-Art in Virtual Energy Audits
Building geometric and energy characteristics (insulation type and amount in envelope components, heating/cooling/water heating efficiencies, etc.) have a prominent influence on energy consumption [33]. Knowledge of these characteristics is essential for estimating potential energy savings from specific energy upgrades. Ordinarily, such data is collected from on-site audits. However, there have been some recent strides toward inferring energy characteristics from data alone. Table 1 summarizes research to predict the energy characteristics of buildings or to disaggregate the energy consumption into specific categories, such as lighting and appliances.
The private company Retroficiency (acquired by ENGIE Insight) claimed in the mid-2010s to have the ability to automatically audit the energy performance of commercial buildings. Their approach employed interval energy data from smart meters, occupant schedules, weather, and systems control details. Their virtual energy assessment (VEA) provided recommendations for retrofits based upon the virtual audit. Included in their recommendation were estimates of upgrade costs and return on investment [34].
In 2016, Case Western Reserve University and Johnson Controls Inc. worked collaboratively to develop another version of a virtual energy audit for small-to medium-sized commercial or retail buildings. Their approach employed 15-min-interval utility data, insulation characteristics, and weather data [35]. Lastly, the approaches by FirstFuel, Agilis Energy, and C3 Commercial likewise employ interval meter data from smart meters and real-time weather data to estimate various forms of electric consumption (lighting, cooling, etc.).

Objectives of Research
While smart meters have gained an increasing market share [38], nationally, there is still no consistent standard relative to the frequency of data collection and input [39]. Their use in this study is not assumed. For many residences, only monthly interval energy consumption data is available. Moreover, smart meters are only generally capable of providing information about electricity consumption. The cost for smart gas meters is prohibitive for wide-scale use without some type of enabling subsidy.
There are three starting points for this research. First, a smart thermostat offers greater promise for characterizing heating-, cooling-, and ventilation-related energy characteristics than smart meters, which are more prevalent in both the U.S. and Europe because smart WiFi thermostats provide for measurement of the internal residence temperature and humidity and account for residence-specific controls on this temperature. Second, the monthly metered energy consumption reflects the overall heating and cooling energy effectiveness of a residence. However, this information alone is incapable of resolving specific contributions to the heating and cooling energy effectiveness. Third, it acknowledges that if the residential energy characteristics for a sub-set of residences are known, data-based machine learning based models can be tuned to predict the individual energy characteristics. If these models are derived from data collected from numerous diverse residences, theoretically, they could then be used to predict the energy characteristics in residences where these are unknown.
The research question driving this study is the following: "How can the individual contributions to the heating and cooling energy effectiveness (namely the envelope Rvalues and heating/cooling system efficiencies) be resolved from only remotely collected data? To date, this question has not been answered.
Fundamentally, the goal of this research is to estimate residential energy characteristics from monthly energy consumption (potentially gas and electric), coupled with other data that could be collected remotely for residences. This data includes historical weather data, residential building geometry data, and potentially occupancy data, and uniquely and most importantly, smart WiFi thermostat data. This latter data, because of the relative high frequency associated with its measurement, could potentially help to resolve the energy characteristics, which control the thermal dynamics of a residence to heat gain/loss to changes in outdoor weather and to internal heating and cooling. If it was possible for these instruments to make possible remote energy auditing of residences, their prevalence in the world would guarantee wide-scale impact. In 2017, more than 82 million smart thermostats were in use in North America according to a study by Berg Insight. The same study projected that more than half (51%) of North America homes would be smart homes by 2022 [40].
To achieve the broad goal of predicting residential envelope R-values and heating/cooling system efficiencies from the varied data types (static residence geometrical, occupancy, and energy characteristics; monthly metered energy consumption; higher frequency weather data; and high-frequency 'delta' smart WiFi thermostat data), it is necessary to extract useable features from the higher frequency signals in order to combine with the monthly metered consumption. This first requires the creation of derived features characterizing the weather variation within the energy consumption meter periods. Average outdoor temperature during a meter period is not sufficient to characterize the exterior weather. Secondly, it requires the development of dynamic characteristics based upon smart WiFi thermostat data unique to a residence in which a smart WiFi thermostat is present. With static representations of the dynamics of the outdoor weather for each meter period and a residence's response to dynamic changes established, the data could be combined and then used to train machine learning models on a sub-set of residences for which the energy characteristics are known. Last, the developed model must be tested on residences not used in the training to demonstrate the potential for this approach to estimate energy characteristics in residences where the energy characteristics are unknown.
This paper is organized as follows. First, as the approach posed hinges on the data used, the data employed in this study are described. Next, the methodology and results, both aligned with the objectives posed, are presented. Lastly, we conclude by discussing the wide-scale implications of the approach developed to remote regional energy auditing and the work that is required to realize this potential.

Data
There were four main raw data used in this study. A description and more details for each individual dataset are contained in the following subsections.

Residence Geometrical, Occupancy, Monthly Energy Consumption, Energy Characteristics, and Smart WiFi Thermostat Data
This study considered 101 houses owned by a university in the Midwest region of the U.S. The majority of these houses are detached single-family houses constructed of wooden materials (with low thermal mass). Geometrical data were accessed for all residences through the local county property database. Such data is publicly available nationally.
Second, historical monthly energy consumption and occupancy data (electric and gas meter data) from January 2016 to the present were obtained for each residence from the university owner of the residences.
Third, energy characteristics for these residences were acquired in 2015 through detailed energy audits made by one of the lead authors. As noted in a prior study, this audited subset of houses offered significant diversity in size, insulation, and energy effectiveness as shown in [16], which helps in developing a generalizable model capable of predicting energy characteristics in any residence. Table 2 shows the minimum and maximum values for the building geometric data, energy characteristics, and residential occupancy characteristics for the 101 residences considered. Some input features included in the table might in general be a challenge to acquire (e.g., refrigerator-related data) but are retained here in order to evaluate their importance.
Smart WiFi thermostats data were accessible for each of the audited residences. Raw thermostat data, referred to as "delta data", were collected for each of the residences. Delta data are logged only when there is a change in one of the thermostat features. In practice, this means that if the set point temperature, measured temperature and humidity at the thermostat, heating/cooling mode, or heating/cooling/fan status changes, data are recorded. For this research, smart WiFi thermostat data for these houses were continuously collected and archived from 6/1/2018 to the present. Typically, thousands of points were collected for each residence each month. There is only a single smart WiFi thermostat (one point) in each house. The houses were intermittently heated/cooled throughout the day based on the thermostat setpoint temperature. Additionally, all house thermostats were monitored by the university housing management to ensure they were within the reasonable setpoint temperature range. Moreover, the residents were strongly advised to keep the windows closed when their residence was employing air conditioning.

Weather Data
Corresponding hourly weather data (only the outdoor dry bulb temperature was used here) were obtained from the U.S. NOAA National Climatic Data Center site [41] but could have likewise been obtained using the Weather Underground [42] resource.

Methodology
The methodology is organized as follows. In the first two sub-sections, the process for extracting features characterizing respectively the variation of the weather data in each meter period and the thermal dynamics of each residence to changes in outdoor temperature and internal heating and cooling as evidenced from the smart WiFi thermostat data is described. Then, the data-based machine learning and testing approaches are described.

Development of New Weather Features Characterizing Outdoor Temperature Variation during Each Meter Period
Inverse energy models have employed mean outdoor average temperature for an entire meter period as an input (often singular) to predict energy consumption [31]. However, including increased granularity to better reflect variation that occurs over a large time period may be beneficial.
The approach used here is to 'bin' the outdoor temperature data within a meter period into discrete temperature bands, determining the probability density of the outdoor temperature in each of the discrete bands over one energy consumption meter period. The idea is that it is not just the mean temperature in a meter period that is important. Rather, the record of temperature variation in a meter period is even more important, especially if the thermostat set point temperature is changing within the meter period.

Development of Dynamic Representations of Smart WiFi Thermostat Data for Each Residence
The measured smart WiFi thermostat temperature provides a record of heat gain/loss from the residence from/to the outdoor environment and a record of heating and cooling. When the heating system and cooling system are on, the interior temperature is observed to warm/cool over a certain amount of time. So, in effect, it accounts for the time constants associated with the heating and cooling systems, which likewise depend upon the heating and cooling system efficiencies. After heating and cooling is interrupted, heat loss/gain to/from the outdoor environment is registered as a decrease/increase in internal temperature. The rate at which the internal temperature cools/warms after interruption of heating/cools depends upon the envelope heat losses/gain, and thus on the thermal capacitances (time constants) associated with the envelope components and infiltration.
Since the aim of this research is to develop single models to predict residential energy characteristics based upon data from numerous diverse residences, we looked to develop a representation of the measured smart WiFi thermostat that could potentially account for the different time constants associated with the envelope barriers and the heating/cooling systems. A power spectrum reduction of this measured temperature seemed a reasonable approach; as such, a representation characterizes the strength of a signal relative to the driving frequencies.
In order to develop a power spectrum on a signal, however, the signal frequency must be constant. This was not the case for the smart WiFi thermostat data measured here [43]. "Delta" thermostat data is non-uniformly spaced in time. So, step 1 in establishing power spectrum representations of the measured smart WiFi thermostat temperature was to create a uniformly spaced signal. Linear interpolation was employed to estimate the temperature at fixed intervals based upon the measured thermostat temperatures, using Equation (1): where a, b, and i in this case are times associated with the collected data; x a and x b are collected neighbor data points at x a and x b (x i > x a , x i < x b ); and x i is interpolated data. The characteristic frequency of each residence to changes in outdoor weather conditions is an indicator of the dynamic thermal characteristics of a residence's envelope elements (walls, windows, and ceiling). The power spectrum defines the 'strength' of the response (measured thermostat temperature) with frequency. The power spectral density h(ω) is equal to the correlation value γ(k) (where k is lag and t is time) divided by the frequency span over which that peak is observed e -iωt (Equations (2) and (3)) [44]: A locally high amplitude in the power spectrum at a specific frequency means that the measured signal (thermostat temperature) owes much of its energy to a dynamic phenomenon at this frequency. For example, higher efficiency houses have more energy in the signal at lower frequencies, so if something changes outside or the set point temperature changes inside, the response to change as measured by the thermostat temperature is slow. In the power spectrum, the peak is in the low-frequency band. On the other hand, lower efficiency houses have more energy at higher frequencies.
In this study, a histogram of the power spectra for each house was created for fixed period bands. A total of 500 uniformly spaced bins were set. The average signal strength in each bin was calculated. Thus, the available power spectrum binned data was available for each residence. Of these, only the first 50 bins were retained, corresponding to 48-h periods. Table A1 shows the range of values for each bin in the first 50 bins retained. Almost all of the signal energy for each residence resided in these bands. In effect, this binned power spectra data is a characteristic of a residence. It should be noted that the thermostat data period used was in the middle of the summer/winter season. In the summer, most of these residences were non-occupied (yet still air conditioned) to prevent mold formation. Thus, windows were almost always closed. In the winter, few if any windows were opened by residents.

Data Merging and Preparation
In order to develop machine learning models for predicting the individual energy characteristics from the data described in Section 4 and developed in Sections 5.1 and 5.2, the data was merged. The binned outdoor temperature for each meter period and the binned smart WiFi thermostat temperature power spectra, along with the static residential geometry, occupancy, and energy characteristics, were synched and merged with the monthly energy consumption data by common address.
Additionally, in order to mitigate observation bias, very similar houses were removed by measure distances between the houses. A K-means Euclidean distance [45] was computed from the standardized static residential data only. The analysis found 14 similar houses (including 3 very similar newer houses). As a result, 9 houses were eliminated from inclusion in the model training datasets. As a result, the total number of residences included in the training dataset was reduced to be 86 houses. Then, all observations with any missing data were eliminated [46].

Model Development and Testing
Choosing the right machine learning algorithm is complicated; it depends on the data type, number of observations, number of input features, etc. Additionally, the second major challenge is to tune the model hyperparameters. Different machine learning algorithms have different hyperparameters, which need to be optimized in order to yield the best models. For example, the most critical hyperparameters in artificial neural network (ANN) models are the number of hidden layers, dropout rate, network weight initialization, activation function, learning rate, momentum, number of epochs, batch size, etc. [47,48]. In this research, the AutoMLH2O package [49] was used to select and tune the model and hyperparameters. Functional forms considered in this approach included deep neural networks, random forests, extremely randomized trees, gradient boosting machines (GBMs), extreme gradient boosting (XGBoost), and stacked ensembles. Table 3 shows the input features employed to predict the attic R-value, wall R-value, furnace efficiency, and AC SEER targets. Note the R-value targets use as input features knowledge of the furnace efficiency and AC SEER, but the latter two do not leverage the attic and wall R-Values as features. Thus, the general predictive process would be to first predict the R-values and then use these predictions as predictors for the furnace efficiency and AC SEER.
A training dataset was used to develop a predictive model, while a validation dataset provided an evaluation of the model for model hyperparameter tuning. Next, the model was applied to an independent testing dataset. We used 10-fold cross-validation during hyperparameter tuning to avoid subset biases. We reported and used the mean crossvalidation performance metrics [50][51][52]. The effectiveness of the models for both the validation and testing datasets was evaluated using the following parameters: R-squared metric, mean square error (MSE), root mean squared error (RMSE), mean absolute error (MAE), and root mean squared logarithmic error (RMSLE): A model is only as good as its ability to make accurate predictions on data not used in its training. Here, the true quality of the models developed was assessed through testing. A testing dataset was developed by extracting the observations from 6 houses from among the 92 houses included in the study. The six testing houses were randomly selected but were also checked to ensure that the testing set included high, medium, and low values of the responses (Table 4).  Figure 1 shows a representative probability density distribution for the outdoor temperature developed for a single meter period within discrete two degree • C bins. This figure shows how this binning took place for one meter period (1 January 2018 to 9 February 2018). A model is only as good as its ability to make accurate predictions on data not used in its training. Here, the true quality of the models developed was assessed through testing. A testing dataset was developed by extracting the observations from 6 houses from among the 92 houses included in the study. The six testing houses were randomly selected but were also checked to ensure that the testing set included high, medium, and low values of the responses (Table 4).  Figure 1 shows a representative probability density distribution for the outdoor temperature developed for a single meter period within discrete two degree °C bins. This figure shows how this binning took place for one meter period (1 January 2018 to 9 February 2018).  Figure 2a shows the power spectrum for an energy-effective residence with respective wall and ceiling R-values of 2.46 and 3.16 (m 2 × K × W −1 ), whereas Figure 2b shows the power spectrum for a low-energy-effective residence with respective wall and ceiling Rvalues of 0.70 and 2.28 (m 2 × K × W −1 ). Note that in the former case (a), most of the energy in the signal is at small periods, the opposite of that for the low-energy-effectiveness case,  Figure 2a shows the power spectrum for an energy-effective residence with respective wall and ceiling R-values of 2.46 and 3.16 (m 2 × K × W −1 ), whereas Figure 2b shows the power spectrum for a low-energy-effective residence with respective wall and ceiling R-values of 0.70 and 2.28 (m 2 × K × W −1 ). Note that in the former case (a), most of the energy in the signal is at small periods, the opposite of that for the low-energy-effectiveness case, owing to the more rapid response of high-efficiency homes to heating and cooling, relative to a slower, more damped response (due to greater heat loss/gain to the external ambient) for the low-efficiency residence. Most visible is that at the diurnal period (24 h), there is little energy in the high-efficiency house case, but, in comparison, the signal energy peaks at this period for the low-efficiency house case. Thus, the low-efficiency house 'feels' the diurnal transients far more than the high-efficiency house, which damps out most of the energy associated with this cycle. owing to the more rapid response of high-efficiency homes to heating and cooling, relative to a slower, more damped response (due to greater heat loss/gain to the external ambient) for the low-efficiency residence. Most visible is that at the diurnal period (24 h), there is little energy in the high-efficiency house case, but, in comparison, the signal energy peaks at this period for the low-efficiency house case. Thus, the low-efficiency house 'feels' the diurnal transients far more than the high-efficiency house, which damps out most of the energy associated with this cycle. The higher energy at lower periods (higher frequencies) for the high-efficiency residence in comparison to a low-efficiency residence is primarily affected by the response to thermostat set point changes. The high-efficiency house is able to respond quickly to indoor temperature set point changes. The low-efficiency house responds more slowly. So, even the period associated with set point changes increases relative to the high-efficiency house case.

Identifying the Best Machine Learning Algorithm
This subsection aims to document how the best model was developed in predicting each of the envelope thermal characteristics. It was unknown what model algorithm should be used and which features should be included in the model development.
First, different machine learning algorithms were applied and validated on the complete training dataset. This complete dataset included all static residential features, monthly energy consumption, binned outdoor temperature data for each meter period, and all binned smart WiFi thermostat temperature power spectrum data. Table 5 documents the validation metrics obtained for this complete dataset for the various algorithms employed. It is clear from this table that the GBM machine learning methodology yielded the best validation performance. Hereafter, only this algorithm was considered. The general formula for gradient boosting machine (GBM) is shown in Equation (9), which can be applied to all four targets [53]: The higher energy at lower periods (higher frequencies) for the high-efficiency residence in comparison to a low-efficiency residence is primarily affected by the response to thermostat set point changes. The high-efficiency house is able to respond quickly to indoor temperature set point changes. The low-efficiency house responds more slowly. So, even the period associated with set point changes increases relative to the high-efficiency house case.

Identifying the Best Machine Learning Algorithm
This subsection aims to document how the best model was developed in predicting each of the envelope thermal characteristics. It was unknown what model algorithm should be used and which features should be included in the model development.
First, different machine learning algorithms were applied and validated on the complete training dataset. This complete dataset included all static residential features, monthly energy consumption, binned outdoor temperature data for each meter period, and all binned smart WiFi thermostat temperature power spectrum data. Table 5 documents the validation metrics obtained for this complete dataset for the various algorithms employed. It is clear from this table that the GBM machine learning methodology yielded the best validation performance. Hereafter, only this algorithm was considered. The general formula for gradient boosting machine (GBM) is shown in Equation (9), which can be applied to all four targets [53]: where b τm (x) ∈ β is a weak learner and β m is its corresponding additive coefficient.  Figure 3 shows variable importance plots obtained from the best GBM models produced in predicting the (a) attic R-value, (b) wall R-value, (c) furnace efficiency, and (d) AC SEER. In this figure, the features labeled PSD.Freq.X refer to the average power spectrum powers in frequency bin X. It is clear from this figure that the power spectrum features are very important for predicting each of the energy characteristics. As a result, one would expect that the spectral information present in the thermostat signals improves the prediction of the targeted energy characteristics. We then investigated developing models using subsets of the PSD.Freq.X data. GBM models were thus developed to predict the targeted energy characteristics for the following PSD binned power subsets: (a) for the first 40 frequency bins (approximately needed to capture the diurnal cycle), (b) for the first 20 frequency bins, (c) for the first 10 frequency bins, (d) for the top 10 most important frequency bins for each target obtained from a variable importance analysis using the best GBM model, (e) for the top 2 frequency bins for each target obtained from a variable importance analysis, (f) for the top frequency bin for each target for each target obtained from a variable importance analysis, (g) for the top two frequency bins for each target obtained from an optimization to minimize error, and (h) for the top frequency bin for each target obtained from an optimization to minimize error. Table 6 shows the testing statistics for predicting the attic and wall R-values, furnace efficiency, and AC SEER, respectively, for inclusion of the binned spectral powers using the same testing dataset considered in Section 5.3.2. There are three main points to make. First, while some of these cases yield accurate validation metrics for individual targets, the best overall cases are those using only one or two of the optimally selected frequency bins to minimize the validation error. It is clear that the use of all of the frequency bins introduces many features that have little influence on the target. Elimination of these features in general improves the model. Second, the prediction statistics for the testing dataset are improved markedly for the last three cases, cases e-h. Case e, where the two top We then investigated developing models using subsets of the PSD.Freq.X data. GBM models were thus developed to predict the targeted energy characteristics for the following PSD binned power subsets: (a) for the first 40 frequency bins (approximately needed to capture the diurnal cycle), (b) for the first 20 frequency bins, (c) for the first 10 frequency bins, (d) for the top 10 most important frequency bins for each target obtained from a variable importance analysis using the best GBM model, (e) for the top 2 frequency bins for each target obtained from a variable importance analysis, (f) for the top frequency bin for each target for each target obtained from a variable importance analysis, (g) for the top two frequency bins for each target obtained from an optimization to minimize error, and (h) for the top frequency bin for each target obtained from an optimization to minimize error. Table 6 shows the testing statistics for predicting the attic and wall R-values, furnace efficiency, and AC SEER, respectively, for inclusion of the binned spectral powers using the same testing dataset considered in Section 5.3.2. There are three main points to make. First, while some of these cases yield accurate validation metrics for individual targets, the best overall cases are those using only one or two of the optimally selected frequency bins to minimize the validation error. It is clear that the use of all of the frequency bins introduces many features that have little influence on the target. Elimination of these features in general improves the model. Second, the prediction statistics for the testing dataset are improved markedly for the last three cases, cases e-h. Case e, where the two top power spectrum bins were based upon the GBM variable importance, yielded the best model for predicting the attic R-value. Case g, which included as predictors the two most important power spectrum frequency bins for minimizing error, yielded the best model for the AC SEER. Lastly, case h, reliant upon a single power spectrum frequency bin based upon minimizing the predictive error, yielded the best model for predicting the wall R-value and furnace efficiency. The best MAE error in predicting the attic R-value, wall R-value, furnace efficiency, and AC SEER was reduced from 0.5249 to 0.2752, 0.2768 to 0.1044, 0.0362 to 0.0116, and 0.7450 to 0.4245, respectively. All of these errors could be well-tolerated in virtual energy audits. It is interesting in this table to see how the use of multiple power spectrum frequencies especially harms the models in predicting the AC SEER and furnace efficiencies (cases a-d). The fact is that the ac and furnace systems for the set of residences are respectively two-and single-stage systems, meaning that the cooling and heating powers respectively have two and one levels. Having multiple power spectrum frequency bins to predict the cooling/heating system efficiencies is seen to actually hurt the performance of the regression. Additionally, it is interesting to see the progressive improvement in model accuracy for predicting all of the features as a result of using a reduced number of power spectrum frequencies obtained either from the variable importance characterization from the GBM model or through error minimization. This in effect says that the different features are associated with specific frequencies. For example, the best model in predicting the furnace efficiency is associated with a single binned power spectrum efficiency of 46. Given that only the single-stage furnaces are considered in this study, all with constant heating power, the time response associated with furnace on-time dictates that a single frequency should best characterize this system. In comparison, a majority of the AC systems considered in this study had two stages associated with different cooling powers. Thus, it is not surprising that two power spectrum bins capture the dynamics of these systems best. Similarly, the attic and wall R-values control the dynamics associated with cooling of the internal environment. Again, a single frequency should best characterize the dynamics of these components. Table 7 summarizes the best model testing performance for each of the targeted energy characteristics obtained from Table 6. Table 8 shows the actual values and predicted values of these characteristics using these best models for all of the testing houses. Model performance appears strong across evaluation metrics. The errors associated with the prediction of each energy are quite small for all of the residences. These errors could well be tolerated in any energy audit.

Summary of the Best Model Validation Statistics and Hyperparameters
The model validation statistics for the best testing models for each target seen in Table 7 are shown in Table 9. The validation metrics are exceptional at or very close to 1 for all targeted variables. Table 10 shows the tuned hyperparameters for each of the best models.

Summary of the Best Model Validation Statistics and Hyperparameters
The model validation statistics for the best testing models for each target seen in Table 7 are shown in Table 9. The validation metrics are exceptional at or very close to 1 for all targeted variables. Table 10 shows the tuned hyperparameters for each of the best models.  Table 11 summarizes the validation metrics for predicting the targeted attic R-value, wall R-value, natural gas furnace efficiency, and air conditioner SEER value for the various models considered using the complete training data features, e.g., considering the case where thermostat-derived power spectrum binned data is not included. From this table, it is clear that it yielded strikingly good model results, with respective R-squared values of 1, 1, 1, and 0.99 and RMSE errors of 0.0022, 0.0013, 0.0002, and 0.1513 for predicting the attic R-value, wall R-value, furnace efficiency, and AC SEER. The tuned hyperparameters (number of trees, number of internal trees, depth, and minimum number of observations in the smallest leaf) for the best GBM models are shown in Table 12. The hyperparameters of the best model without using thermostat-derived information shown in Table 12 are compared to the hyperparameters of the best model obtained using thermostat-derived information shown in Table 10. It should be noted that the number of trees and the minimum number of observations in the minimum leaf are within the recommended values, which are 2/3 the number of observations and 12 observations per leaf, respectively. Furthermore, there is similarity in all of the hyperparameters, providing an indication of the confidence that the models developed to predict the energy characteristics using thermostat-derived data is not simply overfitted relative to the case where thermostat data were excluded. The developed models were then applied to the testing set of houses described previously. Table 13 shows the actual values and predicted values of the targeted energy characteristics. The models were generally accurate in predicting the energy characteristics; however, the AC SEER values in the training data set did not have as much variation as desired, thus the predictions of these had the greatest associated error. The testing results were as follows (see Table 14). The R-squared and MAE values for predicting the attic R-value, wall R-value, furnace efficiency, and AC SEER were respectively 0.6778, 0.6474, 0.6280, and 0.5928 (R-squared), and 0.5249, 0.2768, 0.0362, and 0.7450 (MAE). These results are significantly poorer than the predictions reliant upon the thermostat-derived information.

Conclusions and Discussion
This study has demonstrated the feasibility of utilizing available residential building data, historical energy consumption, and archived smart WiFi thermostat data to develop machine learning models to predict with accuracy the primary heating and cooling characteristics of a residence provided there is a set of residences for which the energy characteristics have been measured for. Residences with known energy characteristics, if they reflect the whole pool of residences in a particular area, can be used to effectively calibrate a data-based model, which can then be used to predict energy characteristics in other residences. Uniquely, this research has shown the value of thermostat-derived data characterizing the dynamic response of residential inside temperature to weather and thermostat set point changes in improving the accuracy of these predictions.
The potential implication of this research is substantial. The data needed to render this information is potentially accessible. Generally, smart WiFi thermostat data is accessed via the cloud by the thermostat manufacturer. This research is premised on the idea that such companies could directly manage or indirectly participate in a regional electric and/or gas utility sponsored program to audit residences leveraging smart WiFi thermostats. Through such an arrangement, the smart WiFi thermostat manager would also have access to metered energy consumption for all participating residences. If data for all types of possible residences could be collected, at least within the boundaries of a utility service territory, a single model could be trained to predict the most important energy characteristics that would be applicable to every residence in a region. Potential savings from upgrades of every energy characteristic in each residence could be estimated. A strategic energy (and carbon) reduction investment protocol could be established to realize the greatest savings per investment, and in a way that did not exclude low-to low-middle-income residences.
Admittedly, there is more work to do. One, the dataset used for training must be expanded. All of the houses considered in this study were two-story wood-frame houses. Data from brick, stone, single-story, duplex, etc. residences must be added to the growing database of residences to expand the relevance of this research to the whole of the U.S. and the rest of the developed world, where buildings generally have much higher thermal mass. It is certain that the approach posed here could be likewise used in such buildings; however, new predictive features characterizing the construction type (brick, stone, etc.) would be needed to generalize the model developed. Additionally, other features characterizing the placement of a residence relative to adjacent residences, such as single-family detached, condo, apartment, etc., could be added as predictors.
Further, there is an opportunity to combine data derived from smart WiFi thermostats and smart interval meters to expand the information derived. In the U.S., nearly 70% of residences are equipped with smart meters [54]. In Europe, the adoption of this technology is even more pervasive [55]. If both datasets were to be leveraged, the source power for cooling, heating (if heat pump), and ventilation could be determined. Energy savings estimations from upgrades of the HVAC energy savings retrofits could as a result be more accurately calculated.
In addition, this study only used one thermostat-derived piece of information. The thermostat temperature set point history could and should also be considered. Finally, solar fenestration has a clear impact on the dynamics of residences, especially those with large window areas. Future research should include solar irradiation dynamic inputs.  Data Availability Statement: The data are not publicly available due to privacy.
Acknowledgments: Alanezi, A. would like to acknowledge financial support from the Colleges and Institutes Sector at the Royal Commission for Jubail, Saudi Arabia.

Conflicts of Interest:
The authors declare no conflict of interest.