Forecast of Community Total Electric Load and HVAC Component Disaggregation through a New LSTM-Based Method

: The forecast and estimation of total electric power demand of a residential community, its baseload, and its heating ventilation and air-conditioning (HVAC) power component, which represents a very large portion of a community electricity usage, are important enablers for optimal energy controls and utility planning. This paper proposes a method that employs machine learning in a multi-step integrated approach. An LSTM model for total electric power at the main circuit feeder is trained using historic multi-year hourly data, outdoor temperature, and solar irradiance. New key temperature indicators, TmHAVC, corresponding to the standby zero-power operation for HVAC systems for summer cooling and winter heating are introduced using a V-shaped hourly total load curve. The trained LTSM model is additionally run with TmHVAC and zero irradiance inputs yielding an estimated baseload, which is representative of typical occupancy patterns. The HVAC power component is disaggregated as the difference between total and baseload power. Total power forecasts of an aggregated residential community as seen by major distribution lines are experimentally validated with a satisfactory MAPE error below 10% based on a 4-year dataset from a representative suburban community with more than 1800 homes in Kentucky, U.S. Discussions regarding the validity of the separation method based on combined considerations of fundamental physics, statistics, and human behavior are also included.


Introduction
According to the Energy Information Agency, the residential sector accounts for 22% of the total U.S. energy portfolio [1]. In residential communities, which are the focus of this paper, heating, ventilation, and air-conditioning (HVAC) systems are the largest energy users and major contributors to the peak power [2][3][4]. For perspective, the approximately 45% of the residential sector energy use from HVAC systems [5] represents 10% of the total U.S. grid's demand. The accurate estimation and forecast of the total and electric power components provides utilities with planning and control opportunities for optimal energy generation and use, and for avoiding large demand fluctuations.
Together with the specific thermal inertia of the building, an HVAC system may support temporary demand response (DR) controls and provide a distributed energy resource, including virtual energy storage capacity, without affecting occupancy thermal comfort [6,7]. Previous research shows that HVAC systems in a residential community may be controlled by similar charging and discharging procedures to electric batteries [8]. More recently, transactive energy approaches were proposed for automated HVAC controls and enabled the reduction of total electricity cost, while maintaining user comfort [9]. Another application of HVAC controls, based on Stackelberg game theory, contributed to reducing the mismatch between residential energy usage and renewable generation by more than 40% in a best-case scenario [10].
Utilities have electric power monitoring capabilities mostly at the aggregated community level, as recent smart meter-type technologies are yet to be wider deployed in the field at building level and substantially contribute to the historic collection of big data. As such, there is continued interest in community studies based on a variety of methods, such as multivariate quantile regression [11], deep neural networks (DNN) [12], quantile regression averaging on sister forecasts [13], and, more recently, long short-term memory (LSTM) neural networks, e.g., [14,15].
In the past decade the penetration of smart metering in the United States has rapidly increased from less than 5% in 2008 to over 60% in 2019 [16] with more than 90 million devices installed. The increase in smart meter infrastructure represents a major shift in smart grid equipment deployment, but regulatory and demand flexibility barriers still exist in utilizing the smart meters in DR programs such as time of use (TOU) pricing. Furthermore, to assess load-specific information from HVAC systems and water heaters, the pro-dominant use devices, through field measurements, requires additional direct load control (DLC) instrumentation such as [17] on top of smart meters, which poses a substantial cost and implementation barrier, including data management and processing difficulties [3].
In lack of specific measured data, the HVAC power can be alternatively estimated through software that models the entire building energy usage, such as eQuest [18], BEopt [19], EnergyPlus [20], and OpenStudio [21], or is based on R-C equivalent circuit models, e.g., [7]. Based on a collection of representative building models and assuming a statistical distribution, the HVAC power load at the community level can be aggregated on methods such as Gaussian kernel density estimation (GKDE) [22]. Substantial developing effort and uncertainties, inclusive of those associated with the physical characteristics of construction materials and different human behavior that result in questionable accuracy and generality, continue to be considered typical challenges for the computational models.
Provided that the data for the total residential load are available, the HVAC power may be identified, in principle, based on the observation that it is the component most sensitive to outdoor weather conditions. An example previous study into HVAC load disaggregation at the individual residence level was conducted using 30 min data representative of smart meter data for 85 homes [23] utilizing daily average outdoor temperature. The authors propose an hourly linear regression-based method including subtraction of average baseload profiles per season, which is found to have an error of approximately 8% across different buildings. Recent work by our group of authors, based on systematic experimental data from a individual residence, smart home, reported good results for total forecasts and isolation of baseload and HVAC power components, by characterizing weather conditions through both outside temperature and irradiance [24].
Other previous work in individual residence HVAC load disaggregation includes a random forest machine learning model training procedure and pipeline optimization with detailed automated feature selection considering weather, calendar-based, pattern-based, statistical, etc. [25]. The model was tuned using 182 homes and tested on 10, all from the Pecan Street experimental database with an overall R 2 of 0.905 over eight days. Additionally, an hourly multi-sequence non-homogeneous factorial hidden Markov model (MN-FHMM) performed with an average error of 22% with a dataset of over 100 homes. A common factor between this model and an additional one described in [26] is that temperature was used as the only weather input. In practice, the effect of the solar irradiance on the building heating may be significant, as previously discussed for a computational and experimental study of smart "robotic" homes [19]. One set of authors combined both multiple weather parameters, including solar irradiance with frequency components ascertained from Fourier series analysis of high-resolution minute data to disaggregate both heating and lighting loads [27], but these resolution data are not widespread, and efforts into smart meter disaggregation are further needed.
Fewer studies into aggregated HVAC load separation of entire distribution circuits including hundreds to thousands of homes were found. The linear-regression-based method from individual homes in [23] was replicated by the authors in [28] at the aggregate level of a residential community of 400 homes with verification against a Gridlab-D model of the community. Also at the community level, the authors of [29] improved another traditional multiple regression method using time of year, weather data, holidays, and varying cut-off temperatures to calculate total hourly load with an error of approximately 5% and disaggregate heating and cooling portions from over 80,000 residencies and 8000 commercial buildings located in Canada. The authors call for further validation of the method and approach to hourly temperature selection points for heating and cooling, as ground-truth sub-metering HVAC data is not available at this wide scale, which is explored further in our paper.
The current paper brings additional novel contributions specific to community-level applications addressing a research gap in HVAC load analysis, specifically, a procedure for identification of an outside temperature range for which the HVAC systems across the community do not operate from hourly "V-curves" of total power to outdoor temperature. Key temperatures from the identified range are used in a novel two-step machine learning (ML) method employing LSTM algorithms for the HVAC separation. Under zero irradiance and identified key TmHVAC temperature conditions, the baseload is estimated by the LSTM model and used to disaggregate the HVAC power component.
The paper is organized in multiple sections with the next one being devoted to the problem formulation and the introduction of the experimental big data and its preliminary analysis for a representative residential community with all electric air-conditioning in the summer and mixed natural gas and electricity heating supply in the winter. The proposed method for forecast and disaggregation is presented, together with a pseudocode algorithm, flowchart, and example results, in the third section. A fourth section further analyzes case study results, selection of parameters, and demonstrates low errors for the the total power forecast. A comparison with conventional linear regression results is presented in the fifth section, indicating the advantages of the proposed method both in terms of automated analysis and improved accuracy. The sixth section includes discussions on the validity of the separation method based on statistics and human behavioral patterns for the baseload and fundamental physics for the influence of outdoor weather conditions. Finally, the conclusions summarize the main findings, original contributions, and advantages of the proposed ML method.

Problem Formulation and Experimental Data
A main objective of the study described in this paper is to use systematically-collected historic data for electric power and weather conditions, i.e., outside temperature and solar irradiance, to produce a day-ahead forecast of the total electric load demand at community level and to separate, i.e., disaggregate, out of it the power component corresponding to the HVAC heating and cooling systems. The newly developed algorithms are based on ML models, which are of the black-box data-driven type, and therefore require "big data" consisting of very large time series.
A real-life case study for a representative suburban community from Kentucky, which is also relevant for a wider region of the US, is considered throughout the paper. The electric power experimental data for the four years, 2017 to 2020 inclusive, as measured at the main circuit feeder of the Liberty Rd. area served by the Louisville Gas and Electric and Kentucky Utilities (LG&E and KU) distribution system in Lexington, KY, has been collected with a 1 min time resolution. The weather data are provided with 5 min resolution by the National Oceanic and Atmospheric Administration (NOAA) (Figure 1) Within this aggregated community there are 1810 buildings, mostly houses used as family homes and residences. Space cooling in the summer is provided for all buildings via HVAC air-conditioners. For heating in the winter, 966 of the buildings employ electric heat pump HVAC systems, and the rest use natural gas furnaces. This equipment deployment, together with the weather conditions, contribute to explaining the electric power pattern and the peak usage illustrated in Figure 1c, which is in line with expectations for communities with dual heating supply/fuel.
The daily load demand for the total electricity used by the analyzed combined community measured at the main feeder in the years 2017-2019 circuit has specific seasonal profiles, Figure 1. The 2020 experimental data for weather and total electric power for the community considered in the study. All 1810 buildings employ electric air-conditioners for cooling during the summer, but only 53% of them are using electric heat pumps for heating during the winter. In comparison, the winter is bimodal with the maximum load across the day happening in the early morning with outliers reaching to more than 7 MW, due to extreme instances of cold.
Such variations are common and are typically considered through categorical variables in quantile regression models such as the vanilla benchmark model [30]: where t is the time variable (min), y(t) the load (min), M t denotes the month (1-12), W t the week (1-4), H t the hour within the day (1-24), and f (T t ) is another quantile regression function relating the temperature ( • C or • F) at t to the categorical time of year variables. This model was originally employed as the basis for global forecasting competitions [31], and has been later improved to include the regency effect, i.e., the impact of temperatures at previous times [13]. These established relationships between time of year, weather, temperatures at previous time increments, and the total load have been further studied and identified in other papers relating to HVAC load separation and forecast. For example, an aggregated residential total load forecast that considers the time of year through one hot encoding, the day of the week, and previous sequences of energy usage was reported in [14].
Another study, by a different group of authors, identified a time lag between an outdoor temperature increase and the resulting larger HVAC load, corresponding to the previously mentioned regency effect [32]. This referenced study included the influence of solar heat gains from sunlight in the HVAC model. For such reasons, the authors of this paper selected sequence inputs of the previous load power, outdoor temperature, and solar irradiance, as well as future weather inputs, to forecast the day-ahead future load for use in an HVAC disaggregation case study. The data were also split to include summer and winter months as separate datasets in order to account for the categorical time of year influence.
By accounting for the influence of previous and future weather in the model, the patterned portion of the load can be calculated with high confidence because the communitylevel HVAC disaggregation conducted in this paper has less uncertainty from weather variability. The impact of stochastic human-behavior-based portions of the total load is minimized at the community level as the randomness of individual schedules and decisions are smoothed. The aggregate effect on human behavior is seen in large-scale experimental studies of community-level wide air-conditioning, lighting, dishwasher, and clothes washer/dryer loads utilized by U.S. Department of Energy Building America Program and the National Renewable Energy Laboratory (NREL) [33]. This known phenomenon was used by our group in [34] to produce equivalent aggregated water heater models for community-wide virtual power plant (VPP) control studies.
For analysis of key features or inputs used in our case study, the total power and weather data were averaged on a daily basis and also integrated to calculate energy. The graphical results shown in Figure 3 are typically referred to a daily "V-curve". The spread of data, which is influenced not only by temperature, but also by irradiance and other factors, is exemplified through the 95% confidence interval for power in cooling mode and a box plot representation for energy. It should be noted that extremely high or low values for the hourly temperature may be averaged out in the daily calculations, and may contribute to the relatively low number of outliers for power and energy. The proposed hourly model described in the following section accounts for these extreme temperatures by forecasting for each hour of the following day using the key weather parameters identified in this section.

Proposed Method for Forecast and Disaggregation
The proposed method, which is described in this section, is a combination of multistep machine learning algorithms based on big datasets, and of a physical engineering observation of the weather conditions under which the operation of the HVAC system is not required, i.e., standby, and hence its corresponding component electric power draw is substantially zero. To quantify these conditions as accurately as possible, the "V-curve" was derived using the extensive hourly data rather than the traditional daily averages (Figure 4a).
In line with expectations, the data spread in this case is larger than for the daily values, but the V-profile is still similar with two edges for heating and cooling, respectively, and an in-between region in which the total electric power is minimum, as it substantially covers only the "baseload", i.e., the sum of all other load components, apart from the HVAC. Such conditions may occur, for example, at night, when the irradiance is zero and the outside temperature is close to the set point for the indoor temperature, which has a typical 20 • C value, or during the day in the so-called "shoulder months" of spring and fall when, due to the combined effect of mild temperature, irradiance, and building thermal insulation, there is no need for heating nor cooling.
As a first step of the method, two outdoor temperature key indicators, denoted by TmHAVC, one for heating and one for cooling, are introduced to identify the minimum and the maximum temperature, respectively, for which the HVAC systems are on standby and not using virtually any electricity. The proposed mathematical procedure to select the TmHVAC points is described in Algorithm 1.

Algorithm 1:
Mathematical process for TmHVAC temperature selection.
Prepare year of total power and outdoor temperature data Select range of potential cooling TmHVAC options, e.g., 16 Repeat iteration procedure with temperatures below the TmHVAC heating options Select TmHVAC cooling and heating points with the highest R 2 value A wide range from 10 to 18 • C was considered for the TmHVAC heating point, and 16 to 22 • C was studied for the TmHVAC cooling point. Within these ranges, the hourly data were divided into subsets, and linear fitting was performed as illustrated in gray in Figure 4a. The temperatures corresponding to the highest values for the coefficient of determination, or the R 2 goodness of the fit, which for the example community study are . Also plotted and illustrative of the correlation are the weather conditions, which are specified in the p.u. system with a reference outdoor temperature of 40 • C and a solar irradiance of 1000 W/m 2 , which is used throughout the paper unless specified otherwise. 15.5 to 17.5 • C, are shown in Figure 4b and may be recommended for further use. As it will be later discussed, the selection of TmHVAC may also consider the data spread, for example, at very low temperatures due to gas heating for part of the community, and the need to achieve a better fit particularly at high loads, in order to ensure that the demand is fully met under extreme conditions.
In the second step of the method, a machine learning (ML) black box model based on a sequence-to-sequence encoder-decoder long short-term memory (LSTM) algorithm of the type previously introduced by the authors for home-level applications [24] was adapted for community-level studies and employed for the forecast of the total electric power load. This ML model was trained using hourly integrated power and weather data for the example community over the winter and summer for three years, 2017 to 2019. The input data, ML gate, were structured in series of consecutive 72 h, i.e., 3 full days, and an additional 24 h array for the day-ahead weather forecast. The results, ML gate, are represented by 24 h of total electric power forecast. The model was tested over the year with the most recently available data, which is 2020, by providing, instead of a weather forecast, the actual data.
The comparison between the forecasted and the measured total electric power was satisfactory, as illustrated by the example summer and winter weekly profiles included in Figure 5. The last day in the week shown has a decreased load caused by extremely low irradiance. The model accounts for changes in weather conditions and resulting impact on total power demand across the community and was able to predict the reduced load. Utilities would benefit from the proposed day-ahead forecast to schedule generation needed on days outside of typical weather, such as very hot summer days and very cold winter days. It is also important to note that the specific neighborhood selected has a near-constant occupancy level as construction is completed and no new houses are to be added. Scaling to account for a 1.25% growth rate could be applied to the final forecasted result from the proposed model to account for the increase in population but was not considered for this case study.
In a third step of the method, the previously established and trained LSTM model that employs 72 h data series was employed/tested, in an innovative engineering-type approach, with constant TmHVAC values, selected for the cooling and heating season, respectively, and with zero irradiance. The electric power results constitute the estimated baseload for the community and correspond to the situation in which the HVAC systems are on standby and hence not using virtually any electricity.
In the fourth and final step of the method, the HVAC load component is disaggregated, i.e., separated, through the subtraction of the estimated baseload from the total electric power. The results for the previously considered example weeks are shown in Figure 6. During the summer, the effect of the thermal inertia of the buildings, all of which operate  electrically powered HVAC air-conditioners, is noticeable and provides reassurance that the proposed ML method is consistent with the expected physics-based behavior.
The overall procedure is summarized through the combined pseudocode for the following Algorithm 2 and the flowchart from Figure 8. Also included in this figure are the 2020 test year results for forecasted total power and baseload and HVAC components, showing that during the summer, the baseload maintained a fairly repetitive load profile and the HVAC variations are dependent on weather conditions. Unless specified otherwise, the values used for TmHVAC throughout the paper study are 12 and 18 • C for heating and cooling, respectively, for rationale later explained in a discussion section.

Case Study Results
The overall winter seasonal results obtained by applying the method to the studied community are plotted in heatmap format in Figure 9, similarly to the results provided for the summer in Figure 8b. A specific winter pattern of bimodal daily peaks in the morning and evening, which corresponds to weather conditions and typical human behavior aggregated at community level, is noticeable. For this community, in the winter the contribution of the base load is expected to be more substantial, especially when considering that approximately half of the buildings are heated with natural gas furnaces. The summer daily-specific profile has only one typical peak in the late afternoons into the evenings, when all buildings are electrically air-conditioned and most people are expected to be at home.
The overall performance of the LSTM model was analyzed and quantified separately for the summer and winter (Figure 10). The residual for the summer results has a mean of −137 kW and a standard deviation of 361 kW. The corresponding values for the winter results are 33 kW and 302 kW, respectively. The mean residual percentage errors are 2.3 and 0.5% of the maximum forecasted community load, which is 5.965 and 6.051 MW in the summer and winter, respectively. The mean absolute percent error (MAPE) across the entire test periods is 9.5% for the summer and 7.3% for the winter. Based on the trends observed and on the error analysis, the LSTM model for the community total electric power forecast can be considered as satisfactory.
The proposed LSTM model is satisfactory as its MAPE is comparable with other recent studies on day-ahead hourly forecasts at the aggregate level on residential distribution circuits, such as about 8% in [14] and 11% [35]. Improvement to the state of the art forecasting accuracy at the distribution level would assist in generation purchasing and reduced use of expensive fast-responding generation scheduled by utilities only as needed during the day. It is important to note that as machine learning models improve, the proposed two-step HVAC disaggegation method in this paper can be still be utilized as the engineering insight, utilizing TmHVAC values as a general input to a trained model for baseload forecasts is universal.
The TmHVAC key outdoor temperature indicator was introduced and defined with respect to Figure 4. The results for the total power forecast are independent of TmHVAC, but the separation of baseload and HVAC component is not. For the example community in the winter, the selection of TmHAVC, as the minimum temperature above which the HVAC system is assumed to be in standyby zero electric power mode, is challenging because approximately half of the buildings employ natural gas furnaces. In this case, a value of 12 • C was preferred for TmHVAC on additional considerations including those related to the parametric linear study depicted in Figure 4, in order to obtain a better fit for very low temperatures and high total electric loads, and ultimately ensure that the demand is met under critical conditions. Further reassurance for the selection is provided by the analysis of the energy for the HVAC community component as a percentage of the total. The daily values have been calculated through the time integration of profiles such as those illustrated in Figures 5-6, and for this example week the HVAC energy from the total ranges in between 8 and 24%. When considering the fact that half of the houses do not use electricity for heating, the results at individual residence level may be considered as consistent with reports based on larger-scale surveys [2]. Another approach, which may be considered for the future, when large numbers of smart meters are expected to be deployed in the field at individual house level, would be to statistically determine TmHVAC for the community based on experimental big data.
For the summer, a parametric study was conducted considering TmHVAC values between 16 and 20 • C, and the disaggregated results for the HVAC load component for an example day are plotted in Figure 11. The daily profiles for HVAC and for its complementary estimated baseload maintain their specific curve shapes but the peak values may change. The in-between value of 18 • C is recommended, and not only as a  trade-off. Furthermore, this value is well supported by the correlation study from Figure 4 and also on the basis of considering regional specifics for the building thermal insulation, and relatively high likelihood of simultaneous occurrence of mild temperature and high solar irradiance that may contribute to natural heating.

Comparison with Conventional Approaches
Traditional models for load forecast are based on linear regression (LR), e.g., [30]. Such an LR model, typically referred to as power per cooling degree, was implemented and employed for a summer study of our example community based on the optimal linear data fit for the total electric power previously discussed in Section 2 and shown in Figure 3. The LR results from Figure 12a were purposefully selected to illustrate a fortunate situation of satisfactory agreement between measurements and forecast throughout the week with the notable exception of the second day.
A systematic overall analysis of the LR results shows that the good agreement may be rather occasional because there is a wide spread of the residual over the entire power range, pointing out the disadvantages in terms of accuracy of traditional analysis versus the newly proposed ML method based on LSTM-type ML algorithms (see Figures 10b and 12b). Yet another advantage of the ML method is that it is fully automated on a numerical computer and virtually eliminates the reliance on the analyst experience, as is typically the case for LR studies.
A baseload profile, detailed for 24 h and invariable from day to day, was estimated also based on the hourly V-curve for power. The temperature range for which the HVAC system is considered to be on standby at zero power was approximated as 12 to 18 • C . Due to the fact that there are not enough data points within this temperature range during the daytime in the months of June to August in 2017, 2018, and 2019, the dataset was extended to include the month of May, under the assumption that human behavior is comparable in late spring and summer. The average value of the resultant baseload compares satisfactorily with the corresponding value from the proposed LSTM-based method and has rather similar profile variation. This provides additional confidence in the new ML method. It further means that the errors noted for the LR estimation of total power are passed on to the HVAC power component disaggregated through difference, and that the new ML method is advantageous in this respect as well.

Discussion
The baseload forecasted and the HVAC power component disaggregated with the proposed ML method can be, in principle, experimentally validated against measurements, while for the total electric power, such validation was performed at aggregated community level as part of the study; as previously discussed, for the power components, the process requires information at individual building power supply level. Although dedicated intrusive load monitoring (ILM) equipment and methods are available [3], the cost and the effort associated with the field deployment for equipment specific instrumentation within thousands of buildings is currently prohibitive and limits verification of the proposed approach. More specifics into the current state of energy monitoring in the United States are included in Section 1.
Steps towards the validation of the community-level components for baseload and HVAC power are described in the following by comparing calculations with data trends from experiments and alternative models that are already established. In a previous section of the paper, such a comparison has been already discussed with favorable conclusions versus the summer baseload estimated using more traditional LR methods.
Additional satisfactory verification of this summer typical baseload was conducted. In principle, a day with a constant temperature of 18 • C , equal to the recommended TmHVAC for cooling, would provide the experimental baseload as the HVAC will be in standby zero-power mode. In reality, such a full day does not exist as the temperature varies and tends to be substantially higher in the afternoon. Instead, synthetic data for a summer day of constant temperature was produced for each hour by averaging the power data within a one-degree range of the specified 18 • C with the averages plotted in Figure 13, considering suitable data from the extended summer of May to August, inclusive, or throughout the entire year, respectively.
The HVAC electric power component disaggregated with the proposed ML method follows weather variations, as expected, based on fundamental physics rules. For example, in Figure 6a, the latency of HVAC power with respect to temperature, due to building thermal inertia, is clearly illustrated. Furthermore, within this example summer week, the daily integration of the total electric power and its components yields the percentage of the HVAC energy from the total to be in between 30 and 46%, such values being consistent with reports based on larger-scale surveys [2].
Yet another verification that the new method aligns well with sound scientific fundamentals is through favorable comparison of the results with those provided by models scaled up from representative buildings. One such medium-sized, three-bedroom house of the conventional type, equipped with a standard SEER 13 HVAC system, and considered representative for the region, has been developed as part of another experimental project with support from another large utility, Tennessee Valley Authority (TVA) [36]. The house has been modeled with the widely used EnergyPlus software [19].
The representative building physics-based white box model includes details of the HVAC system and physical parameters, such as surface area, windows, wall thickness, roofing, thermal insulation, etc., and the results include HVAC electric power. This can be  scaled up to community level using, for example, the advanced techniques proposed in [22]. For simplicity, in the current study, the scaling was performed through direct multiplication with the total number of 1810 buildings from the community. For an example day of 7 June 2020, the p.u. results with a 3.35 MW base are plotted in Figure 14 together with the HVAC power component disaggregated with the proposed ML method, indicating satisfactory agreement and providing further confidence in the new method.
The proposed ML method is well suited for short-term day-ahead forecasts, which enables utilities to optimize generation dispatch plans in advance of load, and thus lowest-cost units can be prioritized, and insufficient capacity day-to-day could be avoided. Additionally, because the proposed scheme disaggregates the HVAC power component, it can support detailed studies to further the economic benefit to utilities by estimating the effect, and cost reductions, of smart HVAC load management and demand curtailment of HVAC, which can reduce further capacity needs, distribution, and transmission system upgrades, as well as facilitate the incorporation of additional intermittent renewable resources by syncing HVAC loads with the available renewable power.
For the summer, the detailed HVAC-based studies may include establishing the available energy and optimal timing for demand response (DR) programs, during which large groups of individual HVAC air-conditioners are controlled on and off as an aggregated virtual power plant (VPP) in order to shift and possibly reduce the peak electric load [7]. For the winter, they may include extremely low temperature conditions and the accurate prediction of the HVAC and total power load, in order to make sure that available generation may fully meet the demand and, hence, reliably ensure thermal comfort through all the buildings in the community.
Studies for long-term HVAC technology evolution and field deployment may also be supported. As HVAC systems with higher seasonal energy efficiency ratio (SEER) may be gradually introduced for more efficient summer air-conditioning, reduced cost of energy use, and to comply with new industry requirements, e.g., [37], the corresponding HVAC load component is expected to reduce. The associated forecasted profile can then be combined with the estimated baseload to derive the predicted total load requirements, which is useful for planning purposes. The approximate half-and-half split in between winter heating with natural gas furnaces and electric heat pumps for the community analyzed provides a good basis for long-term future studies in which new HVAC systems are deployed based on the evolution of technology, fuel type availability and cost, and possibly new environmental regulations, e.g., [38]. The improved HVAC-specific forecasts and long-term development studies could allow utilities to better estimate capacity expansion needs and could lead to prevention of economic loss due to construction of excess generation capacity. It could also lead to retirement of unnecessary or underused generation facilities.

Conclusions
The ML-based method proposed in this paper addresses, at the community level, the timely topic of day-ahead forecast with a view to enabling optimal energy controls and utility planning. A first advantage of the method is that it only requires minimal historic hourly information, represented by the total power as measured at a main distribution line, which includes the summation of all loads on the branch as well as weather characteristics for outdoor temperature and solar irradiance, which are typically available from public databases. The method is shown to be superior to traditional linear regression approaches in terms of combined automated operation and higher accuracy. This has been demonstrated for total power through satisfactory comparison and an MAPE error below 10% with respect to experimental data from a suburban community in Kentucky representative for a wider US region.
Another advantage of the proposed method is that it separates the baseload and the HVAC components out of the total power. This is possible through the introduction of new key temperature indicators corresponding to the standby zero-power operation for the HVAC systems for summer cooling and winter heating and an innovative additional run of the trained LSTM model with such constant temperature and zero irradiance. The validity of the components' estimation and disaggregation is supported by favorable findings, in line with expectations based on fundamental physics, statistics, and human behavioral patterns. Furthermore, the economic benefits of the proposed two-step HVAC disaggregation model include lower costs for generation planning, use of more intermittent renewable generation resources, and cost-benefit assessments of HVAC load management and controls. Data Availability Statement: Not applicable.