Evaluating Building Energy Code Compliance and Savings Potential through Large-Scale Simulation with Models Inferred by Field Data

Building energy code compliance is the crucial link between the actual energy savings and the efficiency prescribed in energy codes. A research project aiming to identify opportunities to reduce energy consumption in new single-family residential construction by increasing compliance with the building energy code was conducted in several states of the United States. The study was comprised of three phases: (1) a baseline study to document typical practice and identify opportunities for improvement based on empirical data gathered from the field; (2) an education and training phase targeting the opportunities identified; and (3) a post-study to assess whether a reduction in average state-wide energy use could be achieved following the education and training phase. We proposed a novel methodology based on large-scale building energy simulation inferred by limited field data to assess the performance of a large population of homes. This paper presents the methodology, findings, and results of this study. The state-wide average energy consumption decreased at Phase III from Phase I for five of the seven states involved in the analysis. The measure-level savings potential analysis shows an overall reduction. Overall, the training and education phase plays a recognizable role in improving compliance with building energy codes.


Introduction
In the United States (U.S.), building energy use was responsible for 40% of total energy consumption and 75% of total electricity consumption in 2016 [1]. As a cost-effective manner for reducing energy usage, building energy codes, which govern building construction to meet minimum energy requirements, have been implemented and regularly strengthened for new and existing buildings in many countries [2].
Building energy codes have many advantages, such as lower utility bills for consumers, improved energy resilience, health and comfort, environmental sustainability, and a lower need for energy subsidies [2,3]. Studies have shown that building energy codes have led to 6%-22% reduction of average annual energy consumption per dwellings in the residential building sector of the European Union [2,4], have the potential to curtail the energy usage and CO 2 emissions by 13%-22% by 2100 building energy-efficiency study for a variety of research and practical purposes. Compared to in situ building experiments, building energy simulation provides a numerical experiment with a relatively fast, low-cost, and controllable environment to investigate the impact of design choices and technologies on an overall building's energy performance. There are many sophisticated building energy modeling tools that apply physics-based principles to simulate detailed building energy patterns. EnergyPlus [27] is one that has been used widely for the building energy codes development in the U.S. and was thus chosen for the evaluation of code compliance in this study.
There are several challenges of using building simulation to evaluate the energy impacts of code compliance. One of the primary challenges is having enough data inputs to inform the building energy model. When building energy simulation is used to compare individual building design options and technologies or evaluate retrofit measures, model inputs can be derived from building design blueprints, building permits, or from actual observations of individual buildings under retrofit. Prototypical building models are generally used to support energy code development or evaluate energy efficiency measures for a population of buildings [28][29][30]. The general energy characteristics of the prototypical buildings are known as well as the operations and control schemes [31,32]. However, these assumptions are not necessarily valid to develop a building simulation tool based off code compliance measurements taken from actual buildings.
The evaluation of code compliance requires in-field data collection to compare results against minimum code requirements. The average completion time of a single-family home is around several months, and the entire building construction process is complex. These make it very difficult to know whether a home complies with the energy code in its entirety, as not all energy-efficiency measures are in place or visible at any given point during the home construction process. For example, when homes are visited during earlier stages of construction, key features affecting energy performance (e.g., walls with insulation) may not be in place yet. However, these items may also not be observable, because they might be already covered if the homes are visited during later stages. Therefore, to gather all the data required in the sampling plan, field teams needed to visit homes in various stages of the construction process [24]. Multiple site visits during different construction phases not only increases the survey cost, but also introduces biases on the data collection due to the awareness of the builder of the upcoming visits. The builder's practice may be altered by knowing there are follow-up compliance assessments in the future. To account for these potential biases, field visits are conducted on a small sample of buildings where code items are recorded from a variety of different homes. As such, no home provides a complete representation of its compliance.
This shortage of complete data for individual homes introduces an analytical challenge, because building energy simulation tools require a complete set of inputs to generate reliable results [24]. Since comprehensive field surveys for the energy simulation of individual buildings becomes impractical for a code compliance evaluation at the scale of an entire state, this study leverages a novel modeling framework to limited field data collection with large-scale simulation. The framework consists of all aspects of conducting a residential energy code field study in single-family homes, including sampling homes under construction for field visit, data collection during a site survey, and the subsequent building energy simulation and analysis. The research questions we are trying to address include determining the status quo of code compliance in the new single-family homes in a state in terms of building energy consumption, the energy impact of non-compliance with energy codes, and whether targeted education and training could reduce the non-compliance and its impact.
The entire study includes three phases. Phase I establishes a baseline to evaluate the status quo of energy use in typical new construction residential homes in the state and identify specific code items that are not complied with, and therefore can be targeted to achieve better energy savings. Once specific code measures are identified, specific education, training, and outreach activities can be developed for Phase II. These multi-year activities are offered to builders to improve compliance rates and installation practices. Phase III is the final stage of the field studies, where follow-up field data is conducted, following the same survey methodology from Phase I. This paper focuses on the complete data analysis of Phase III, the comparison of analysis results between Phase III and Phase I, providing results on the impact of the education and training activities on code compliance and energy-savings potential.
The remainder of this paper is organized into five sections. Section 2 provides an overview of the three phases of this study and states participating in the initial pilot. Section 3 introduces the aspects of the framework that have otherwise been described and applied in Phase I [33] and part of Phase III [34] with a focus on the comparison between phases. Section 4 presents the results for seven states of the U.S. Section 5 describes the larger impact and opportunities that the code studies present. Section 6 concludes the paper and summaries the key contributions.

Background
Building energy codes save energy, and savings can be theoretically quantified through code-book to code-book comparison with the aid of computer simulation. However, construction is neither a simulation model nor a physical laboratory; the savings that assume perfect code compliance do not reflect reality. The commonly used checklist compliance rate approach has weaknesses because it is assumed to be a proxy for energy, but that connection was never empirically established [35]. Little research has been done to evaluate compliance in a consistent and reproducible manner, due to the complex nature of this matter [36]. To address the lack of information available on energy code impacts, the U.S. Department of Energy initiated an Energy Code Field Study to help documenting baseline practices and targeting areas for improvement as well as further quantifying related savings potential [24]. This information is intended to assist states in measuring energy code compliance and to identify areas of focus for future education and training initiatives [37].
A multi-year residential energy code field study was initiated by the United States Department of Energy (U.S. DOE) in 2015. The goal of the study was to determine whether an investment in education, training, and outreach programs targeted at improving code compliance can produce a significant, measurable change in single-family residential building energy use [35]. The study consists of (1) establishing a framework to evaluate the current status of code compliance and quantify code-related energy savings opportunities in single-family residential construction, and (2) testing whether compliance could be improved through energy code education, training, and outreach activities. Eight U.S. states, including Alabama (AL), Arkansas (AR), Georgia (GA), Kentucky (KY), Maryland (MD), North Carolina (NC), Pennsylvania (PA), and Texas (TX) participated in the pilot study by responding to the U.S. DOE Funding Opportunity Announcement (FOA), "Strategies to Increase Residential Energy Code Compliance Rates and Measure Results" [37,38].
The study includes three phases. A framework for evaluating residential building code compliance has been developed during Phase I. The framework includes plans for site surveys, protocols for data collection, and a methodology for data analysis including EnergyPlus simulation. The analysis methodology replaces the historic compliance rate approach with the use of building energy simulation [24]. Prototype building models are used for the analysis. Limited field data is collected and bootstrap sampling [39,40] is applied to generate inputs for many building models on which EnergyPlus simulation is conducted. Bootstrap is a widely used computational-intensive statistical tool based on empirical distribution, and the repeated sampling with replacement on it for improving statistical assessment of the population. It has increasing use in the energy efficiency area [41][42][43]. In the context of assessing code compliance in a state, the population is all the new homes constructed in one year. While it is not possible to survey all homes under construction, ideally one wants to draw large, non-repeated, samples from the population. However, one is generally limited to one sample with limited instances because of limited resources. Like other statistics tools, bootstrap is based on the plug-in principle, which is to substitute something unknown with an estimate [39,40]. For example, one uses sample mean as an estimate of population mean. With bootstrap, one goes one step farther-instead of plugging in an estimate for a single parameter, one plugs in an estimate for the whole population by treating this single sample as a mini population, from which repeated samples are drawn with the replacement. The developed methodology has previously been applied to field data collected during Phase I in the eight pilot states funded by the FOA [33]. The analysis identified gaps in code compliance, and those to-be-improved code items became targets for training, education, and outreach activities. The energy-savings potential of to-be-improved code items is also estimated [33].
Following Phase I, seven of the eight pilot states (Arkansas dropped out after Phase I) spent two years implementing a variety of intervention strategies, which were focused on the to-be-improved code items identified in Phase I. The education, training, and outreach activities include in-person trainings, circuit rider assistance with code officials or builders, handing out code books, compliance guides, and distributing energy stickers for panel certificates, creating online videos, and organizing workshops with presentations. These Phase II activities varied by state based on local stakeholder preferences and other state-specific constraints.
The Phase III field data collection and analysis are based on the same framework developed and applied in Phase I, aiming to assess the effectiveness of the education, training, and outreach activities of Phase II. Partial results of four pilot states at Phase III were previously reported [34]. All pilot states (except for Arkansas, which dropped out of the study) have completed the Phase III data collection and analysis. Additionally, a dozen more other states used the methodology to start single-phase studies evaluating the current status of code compliance and quantifying code-related energy savings opportunities in the states, with the U.S. DOE providing the technical analyses through the Pacific Northwest National Laboratory (PNNL). This paper focuses on results of the pilot states that have completed the full three-phase study.

Methodology
The framework and analysis methodology developed has been described in [44]. For completeness, a brief introduction has been included in this section with a diagram shown in Figure 1. education, and outreach activities. The energy-savings potential of to-be-improved code items is also estimated [33]. Following Phase I, seven of the eight pilot states (Arkansas dropped out after Phase I) spent two years implementing a variety of intervention strategies, which were focused on the to-be-improved code items identified in Phase I. The education, training, and outreach activities include in-person trainings, circuit rider assistance with code officials or builders, handing out code books, compliance guides, and distributing energy stickers for panel certificates, creating online videos, and organizing workshops with presentations. These Phase II activities varied by state based on local stakeholder preferences and other state-specific constraints.
The Phase III field data collection and analysis are based on the same framework developed and applied in Phase I, aiming to assess the effectiveness of the education, training, and outreach activities of Phase II. Partial results of four pilot states at Phase III were previously reported [34]. All pilot states (except for Arkansas, which dropped out of the study) have completed the Phase III data collection and analysis. Additionally, a dozen more other states used the methodology to start single-phase studies evaluating the current status of code compliance and quantifying code-related energy savings opportunities in the states, with the U.S. DOE providing the technical analyses through the Pacific Northwest National Laboratory (PNNL). This paper focuses on results of the pilot states that have completed the full three-phase study.

Methodology
The framework and analysis methodology developed has been described in [44]. For completeness, a brief introduction has been included in this section with a diagram shown in Figure  1.

Key Code Items
Building energy codes regulate many building characteristics. In this study, the methodology [44] evaluates seven key code items shown in Table 1, which is a subset of code items identified through simulation and analysis as having the largest direct impact on residential energy consumption.
In identifying the key items, all the requirements in the 2009 International Energy Conservation Code (IECC)-the most commonly adopted code in the U.S. states at the time of this work-were reviewed, and a list of insulation and fenestration requirements as well as air leakage, duct leakage,

Key Code Items
Building energy codes regulate many building characteristics. In this study, the methodology [44] evaluates seven key code items shown in Table 1, which is a subset of code items identified through simulation and analysis as having the largest direct impact on residential energy consumption.
In identifying the key items, all the requirements in the 2009 International Energy Conservation Code (IECC)-the most commonly adopted code in the U.S. states at the time of this work-were reviewed, and a list of insulation and fenestration requirements as well as air leakage, duct leakage, and lighting requirements were prepared. The list was sent out for the public review of all stakeholders involved in the study. The finalized list is shown in Table 1, which is consistent with hundreds of analyses and millions of simulation runs conducted by the PNNL and other organizations over the past decades. The items on the list are present in all code versions since 2009 IECC in some form, providing abundant flexibility for performing comparisons across multiple code editions [44].

Sample Size and Data Collection
A statistical analysis was conducted based on sensitivity analysis employing whole building energy simulation to investigate the impact of key code items. A sample size of 63 was established as the minimum sample size to identify the desired building energy usage difference [44].
Since data collection from field surveys is very costly, the number of homes to be sampled for field data collection is limited. A sensitivity analysis was carried out to determine a minimum sample size that would ensure the statistical validity of the study. The sensitivity analysis applied whole building energy simulation to investigate the energy impact of the key code items (Table 1) individually as well as all together for pre-training Phase I and a post-training Phase III [33,44]. Since there was no available data before the actual data collection of Phase I, a Delphi process [45] was used to survey several residential building energy code experts to determine the ranges and likelihoods of values for the key code items, as well as reasonable changes of both the range of values and their likelihood for the key code items in Phase III after education, training, and outreach activities have been conducted in Phase II. The two sets of value ranges and their likelihoods, before and after the education phase, are treated as empirical distributions of the code items, and bootstrapping is used to sample them to get a large number of bootstrap samples [33,44]. Each bootstrap sample consists of a list of values of the code item of interest that might be observed from field data collection, and they are used as input parameters of building simulation models. Energy-use intensity (EUI) can be obtained from the energy simulation, and the average EUI would be derived for each bootstrap sample of the code item of interest. By repeating the process on the large number of bootstrap samples, the mean and standard derivation of the average EUI could be obtained for both before and after the education phase. Based on the standard derivation of EUI and the desired difference in EUI to be detected between Phase III and Phase I, a minimum sample size for a certain confidence level and statistical power was derived [33,44]. Based on this exercise, it was determined that a sample size of 63 was needed to detect a statistically valid whole building EUI difference of 14,195 KJ/m 2 ·yr between the pre-training phase (Phase I) and the post-training phase (Phase III) [33,44].
A proportional random sampling approach was applied to design the sample plan based on the average of the three years of Census Bureau permit data [46]. In some states where Census Bureau permit data were reviewed but deemed inadequate due to the lack of permit reporting in much of the state, it was determined that an alternative data source would more accurately represent current construction trends within the state. For example, alternative possible data sources include heating, ventilation, and air conditioning (HVAC) or plumbing permits. State-specific construction practices and systematic differences across geographic boundaries were discussed by stakeholders and were considered in the final sampling plan. A data collection team contacted each jurisdiction identified in the sample plan to obtain a list of homes at various stages of construction within the jurisdictions. Homes were selected at random from the list of homes, and builders were contacted to gain permission for site access and data collection [44,47]. For each selected home, a single site visit was planned to avoid biases associated with multiple visits. Only installed items directly observed by the field teams during site visits were recorded. If access was rejected for a home on the list, the field team jumped onto the next home on the list [44]. Table 2 presents the number of homes visited during the two phases for field data collection. Table 2 also shows the annual permits, which are the number of building construction permits issued annually in a place such as a county or a state. It can be seen from Table 2 that the single site-visit principle led to about 177 (Phase I) and 143 (Phase III) home visits, on average, to obtain at least 63 samples for all key items, depending on the state. As shown in Table 2, the number of homes visited during the field survey consists of a small sample of the estimated construction permits issued in each state. Annual construction permit estimates are taken from the U.S. Census Bureau Building Permits Survey [46]. The latest annual data available from the Census Bureau at the time that the Phase I report for each state was created were used. The same annual estimate was used for both the Phase I and Phase III analysis. Annual number of permits data by location or county was mapped to the IECC climate zones and summed to create annual number of permits by climate zone. The data collected for the eight states (seven states for Phase III) are publicly available at the residential field study page on the U.S. DOE's Building Energy Code Program's website [48]. Many more additional data were collected besides the key items, and some of them were also used in various analysis stage of this study. For example, insulation installation qualities of envelope components play an important role in the thermal performance of envelope assemblies and were used as modifiers in the analyses for applicable key items (i.e., ceiling insulation, wall insulation, and foundation insulation) [44]. Teams followed the Residential Energy Services Network (RESNET) assessment protocol [49] which has three grades, Grade I being the best quality installation and Grade III being the worst.

Data Analysis
All data analysis was applied through several stages, consisting of statistical analysis to examine the data and distributions for individual code items, energy analysis for modeling energy consumption of a large population of homes, and a savings potential analysis to estimate savings associated with improved code compliance [44].

Distributions of Individual Measures
Standard statistical analysis was conducted with distributions of key items [44]. This approach enables a better understanding of the value range of field observations and provides insight on the most commonly installed energy-efficiency measures in the field. It also enables a comparison of values installed in the field to the applicable code requirement, and it allows for the identification of any problem areas where improvement potential exists. Histograms are generated for the individual code item, and the histograms of the two phases are placed together for a visual inspection.

Simulation to Compare Baseline and Observed Energy Consumption
As described below, the data collected from single site visits of randomly selected homes under construction is incorporated into the residential building prototype models developed by PNNL for the U.S. DOE's residential code analyses [50] for whole-building energy simulation.
It is assumed that the homes surveyed are a representative subset of single-family homes under construction in the state. The field observations of key items are treated as empirical distributions of the code values expected in the time period when the field surveys were conducted, and the distributions of the key items are assumed to be independent from each other. Values are randomly drawn from the empirical distributions in proportion to the frequency of the code items, and combinations of key code items were generated. Each combination of the randomly drawn values of all the key items was treated as a plausible set of values that might have been observed from a newly constructed home in the state. Thus, each combination of the randomly drawn value of the key items was applied to the prototype model; thus, a building model with all the necessary inputs, i.e., a pseudo-home, was generated. Repeating the random drawing process over many times (N = 1500 as used in this study), a population of N pseudo-homes was generated for each state. Altogether, the variations of the key items in the N pseudo-homes follow the empirical distributions of the key items observed from field survey. Thus, the set of N pseudo-homes reflects the status quo of code compliance of new single-family home in a state. In order to evaluate the code compliance of the new residential construction represented by the N pseudo-homes, a code compliant pseudo-home was generated through setting the value of all key items to the corresponding requirements of the code in effect in the state. This yields N + 1 total pseudo-homes.
The single-family residential building prototype models include five possible foundation types (slab-on-grade, vented crawlspace, conditioned crawlspace, heated basement, unheated basement), and four possible heating system types (gas furnace, electric resistance, heat pump, and fuel oil furnace) [44]. Due to the different energy use impact of the foundation type and heating system type, each of the N + 1 pseudo-homes were replicated into M copies to account for the M combinations of the foundation type and heating system type existing in the state. The M replicates of a pseudo-home were otherwise identical to each other with respect to building construction, equipment, and internal loads. For states with multiple climate zones, defined as K, the building model creation was replicated K times, leading to K × M × (N + 1) total building models.
The EnergyPlus simulations were carried out on an hourly basis, and the annual EUI and energy costs were evaluated from hourly outputs separated by fuel types for code-regulated loads. The EUI of each pseudo-home was calculated by weighting the EUIs of the M EnergyPlus models across multiple foundation types and heating system types. Table 3 lists the number of climate zones, K, the number of foundation types, the number of heating system types, and their combinations, M, the number of pseudo-homes, N, as well as the number of EnergyPlus simulation models for each of the eight states at Phase I. To retain consistency between the two phases as described in the next section, Phase III is subjected to the same number of models and EnergyPlus runs.

Post-Stratified Sampling
Several issues arose when we applied the methodology developed at Phase I for Phase III data. First, the number of permits issued annually in a state varies from year to year. Since the methodology involved multiplying the average measure-level savings per home by the number of permits to obtain the state-level savings, it quickly became obvious that the state-level savings were primarily driven by the number of permits. Thus, the methodology was revised to specify that the comparison of measure-level savings was based on the Phase I number of permits.
The next issue was the distribution of heating system types, foundation types, and number of permits by climate zone. Since each of these distributions was used for weighting either the number of pseudo-homes or the results, the changes in each of these distributions from Phase I to Phase III would skew the comparison. Therefore, Phase I distributions were applied to the Phase III data analysis for consistency. The third issue is that the distribution of the number of observations by the climate zone of individual key items differs in Phase III from Phase I in states with more than a single climate zone. In the Monte Carlo process used to assign observations to individual building models, all of the observations of each key item within a state are pooled together. For states with more than one climate zone and for key items with varying code requirements among climate zones, the pooling may introduce disproportionately high or low observations into a climate zone. When the distribution of key items by climate zone differs between Phase I and Phase III, randomly drawn observations from the pooled data for the state may lead to bias. In order to maintain consistency throughout random sample drawings between Phase I and Phase III, the key item distribution by the climate zone of Phase I was used to guide the random sample drawings in Phase III. Instead of drawing from the pooled data with equal probability, a post-stratified sampling proportional to the key item distribution by climate zone of Phase I is enforced. Within each stratum (i.e., the observations in each climate zone), each observation has equal probability to be drawn.

Simulation for Measure-Level Savings
In addition to identifying the specific gaps in code compliance in Phase I, another goal was to estimate the energy savings potential of bringing each measure to code requirement. For Phase III, the same measure-level analysis is conducted, allowing us to compare the effectiveness and energy savings of the Phase II training, education, and outreach activities. As such, the difference in energy savings potential between the phases represents the impact of phase II.
The analysis designed for evaluating measure-level savings begins by comparing the observation of each key item with the code requirement to determine if it meets the code requirement or not. If a key item has a significant number of observations not meeting the code requirement, it is a to-be-improved candidate for the targeted training, education, and outreach activities for code compliance improvement. Here, significance is defined as more than 15% of observations not meeting the code requirement. For each to-be-improved key item, the worse than code requirement data are extracted, and the unique values and their occurrence frequencies are calculated.
Two sets of building models were generated for the simulations. One set of building models was generated from each unique worse than code requirement value. Another set of building model was generated by replacing the worse than code requirement values with the code requirement. The various foundation types and heating system types were taken into consideration through replicating the building model described in earlier sections. The difference of energy consumption between these two sets of models denotes the theoretical energy-savings potential that can be obtained if the worse than code requirement observation at present could be improved to just meet the code requirement in the future. Assessing the savings potential due to non-compliance could be very useful in determining whether increasing code enforcement efforts is worthwhile [18]. The developed approach was applied to data collected on Phase I and Phase III, respectively. The change in the measure-level energy savings results between Phase I and Phase III will yield insights on the success of the training, education, and outreach activities of Phase II. It should be pointed out that the estimated savings potential might be treated as a theoretical maximum, as it does not account for interaction effects such as the increased amount of heating needed in the winter when high-efficacy lights are installed (see footnote 4 of [44]).

Results
This section presents the results of seven of the eight pilot states that have completed all three phases: Alabama, Georgia, Kentucky, Maryland, North Carolina, Pennsylvania, and Texas. The average state-wide energy consumption results are first presented in Section 4.1. The results also include the distributions of the modeled energy-use intensity (EUI) based on the recorded observations from Phase I and Phase III, as well as the model code-compliant EUI for that given state. Section 4.2 presents the measure-level savings potential of Phase III and Phase I. The measure-level savings potential roughly estimates how much saving can be achieved if worse than code observations can be boosted up to the code requirement level through improved code compliance. A reduction on the measure-level savings potential from Phase I to Phase indicates an improvement in the code compliance in Phase III. Section 4.3 presents the distributions of the key items collected in Phase III and Phase I.  Table 4 compares the baseline (code compliant) EUI and average observed EUI for the seven states at both Phase I and Phase III. The initial U.S. DOE field study methodology was designed to detect an EUI difference of 14.20 MJ/m 2 ·yr between Phases I and III. Any change in excess of that threshold would indicate that a statistically significant change between phases was found.

State-Wide Average Energy Consumption
The average observed EUI decrease for five of the seven states ranges from 3.9% in Alabama to 9.8% in Maryland. The absolute reduction in Georgia, Kentucky, Maryland, and Texas exceeds the threshold of 14.20 MJ/m 2 ·yr, indicating that there is a significant reduction of energy consumption from Phase I to Phase III in these four states on average. The observed average EUI in Alabama decreases from Phase I to Phase III but the difference is below the 14.20 MJ/m 2 ·yr threshold, so the result is inconclusive. In contrast, North Carolina and Pennsylvania saw an increase in the state average EUI from Phase I to Phase III but achieved EUIs that remained below the code compliance EUI.
Energies 2020, 13, x 11 of 19 average EUI from Phase I to Phase III but achieved EUIs that remained below the code compliance EUI.

Measure-Level Saving Analysis
The measure-level savings potential of each of the key items, which were accumulated across all seven states that participated in the entire study, are presented in Table 5. The measure-level savings potential is an indicator of how well homes performed compared to code-compliant homes. If all

Measure-Level Saving Analysis
The measure-level savings potential of each of the key items, which were accumulated across all seven states that participated in the entire study, are presented in Table 5. The measure-level savings potential is an indicator of how well homes performed compared to code-compliant homes. If all homes meet code, there is no savings potential. Therefore, a reduction in savings potential indicates an improvement in code compliance. It can be seen from Table 5 that the joint savings potential of all of the key items across the states show a reduction in both energy and cost savings potential, indicating improved compliance. In Supplementary Materials, Table S1 presents the measure-level savings potential of each key item for individual state based on both Phase I and Phase III calculations.
In the seven states, most key items exhibit improvement. For Alabama, Georgia, and Maryland, improvements were shown in all to-be-improved key items identified at Phase I with a 12% to 98% reduction on energy savings potential and from 13% to 94% reduction on cost-savings potential, respectively, which leads to a 28% to 78% reduction on energy-savings potential to 29% to 80% reduction on cost-savings potential overall in these three states.
Four out of six key items in Kentucky and four out of five key items in Texas show a reduction on both energy and cost savings potential. However, two key items in Kentucky and one key item in Texas show increase in both energy and cost-savings potential, hinting that the code compliance of those few key items deteriorated from Phase I to Phase III. Despite this, there is still a 25% to 46% reduction in either the energy or cost-savings potential in these two states.
North Carolina and Pennsylvania show an opposite trend. Although half of the to-be-improved key items were improved, the other half got worse, leading to an overall increase of energy and cost-savings potential and suggesting that overall, code compliance became worse in these two states. The measure-level results are consistent with the state-wide results shown in Table 4.

Distribution of Key Items
Figures S1-S6 present the histograms of several key items collected in both Phase III and Phase I for the seven states. In each figure from Figures S1-S6, there are seven plots, one for each state. Each plot consists of two panels. The top panel shows the data distribution of Phase I, and the bottom panel shows the data distribution of Phase III. A text box located on either top left or top right corner of each panel displays the number, the mean, and the median of the observations collected during Phase I or Phase III. The dashed vertical line(s) shows the code requirement(s). The state name is shown in the plot title. While observations of the entire distribution will contribute to the state average EUI, as shown in Table 4 and Figure 2, only those observations to the left of the code compliance denoted by the dashed vertical lines will contribute to the savings potential calculations, as shown in Table 5 and  Table S1. Figure S1 shows the distribution of high-efficacy lighting. By visualizing the histograms and checking the mean and median in the text boxes of the plots, Phase III shows obvious higher values than Phase I for all seven states, indicating an unambiguous improvement in terms of code compliance.

High-Efficacy Lighting
By comparing the portion of the histogram to the left of the dashed vertical line, which are the distribution of the worse than code requirement observation, both the value magnitude and the occurring frequency are reduced from Phase I to Phase III. This is consistent with the reduction in energy and cost-savings potential for all seven states, as shown in Table S1. As discussed in the description of the measure-level savings potential analysis, the measure-level savings potential focuses on bringing the worse than code requirement observation up to the code requirement, so it is associated with the observation distribution on the left side of the dashed vertical line in the plots. For this key item, the histogram, especially the part on the left side of the dashed vertical line, supports the reduction of savings potential in Phase III from Phase I. Figure S2 shows the distribution of U factors of exterior wall insulation. Both the mean and median in the text boxes show lower values in Phase III than Phase I for Kentucky, Maryland, North Caroline, Pennsylvania, and Texas, but not unambiguously for Alabama and Georgia.

Exterior Wall Insulation
The higher median in Phase III than Phase I, as shown in the text box of the plot for Alabama, seems contradictory to the reduction of savings potential, as shown in Table S1. However, it should be emphasized that the measure-level saving is a compound result of all observations on the left side of the code requirement dashed vertical line. Therefore, it is not easy to have a direct mapping of the changes in the savings potential to the visual comprehension of the distributions.
Although there is no straightforward mapping between the savings potential and the histogram, the distribution of observations to the left side of the dashed vertical line in Maryland shows a clear example that the reduction of savings potential of this key item must be driven by the decrease in number of worse than code requirements and the rightward shift of the worse than code requirement observations to the dashed vertical line in Phase III from Phase I.
As already pointed out above, the value distributions shown in Figures S1-S6 carry different information from those carried in the savings potential in Table S1. It is only the observations on the left side of the dashed vertical line that contribute to the savings potential calculated. Figure S3 shows the distribution of envelope tightness. As indicated by the numbers shown in the text boxes of the plots for Alabama, Georgia, Kentucky, Maryland, and Texas, both mean and median are lower in Phase III than in Phase I in these five states. By visualizing the portion of the histogram to the left of the dashed vertical line at George, Kentucky, Maryland, and Texas, the reduction on the energy and cost-savings potential is self-evident.

Envelope Tightness (ACH50)
For North Carolina and Pennsylvania, both the mean and median of the observations of Phase III are larger than those from Phase I. Furthermore, looking at the portion of the histogram to the left of the dashed vertical line in North Carolina and Pennsylvania, the cause of the increase of energy and cost-savings potential from Phase I to Phase III are obvious for North Carolina and Pennsylvania, as shown in Table S1. Figure S4 shows the distribution of U factors of ceiling insulation. While the portion of the histograms to the left of the dashed vertical line for Alabama, Georgia, and Maryland might clearly suggest the reduction of the energy and cost-savings potential from Phase I to Phase III, as shown in Table S1, it is not easy to make such a mapping between the histogram and the reduction or increase of the savings potential shown in Table S1 in other states.

Ceiling Insulation
Worsening ceiling insulation is one of the major contributors to the increasing of the savings potential for Pennsylvania, as shown in Table S1. Further investigating the R-value distribution of ceiling insulation found that the R-value of the insulation material meets or exceeds the code requirement in Phase III, and the distributions between Phase III and Phase I are similar. However, the insulation installation quality of the two phases is quite different. While Phase I has fractions of 53%, 45%, and 23% split for type I, II, and III installation, respectively, the fractions of installation quality are 12%, 75%, and 7%, respectively for Phase III. The less than perfect installation quality (type II and III) caused the inferior thermal performance and made the overall ceiling performance much worse than in Phase I. Figure S5 shows the distributions of duct leakage. The mean and median in the text boxes of the plots show a decrease trend for Georgia, Maryland, Texas, and the portion of the histograms to the left of the dashed vertical line also show a clear enhancement on Phase III from Phase I. These are consistent to the reduced savings potential presented in Table S1.

Duct Leakage
Although the mean and median at Alabama shows opposite trends between Phase III and Phase I, the improvement of the worse than code observation portion is clearly seen in the portion of the histogram to the dashed vertical line, which is consistent with the savings potential reduction, as shown in Table S1.
The means and medians for Kentucky, North Carolina, and Pennsylvania are increased from Phase I to Phase III. The portion of the histogram to the left of the dashed vertical line suggests an obvious deterioration from Phase I to Phase III for Kentucky and North Carolina, which are consistent with the increase savings potential shown in Table S1. It seems that the outliers on the high-value tail of the histogram of Phase III for Kentucky and North Carolina are contributing to the increase of savings potential, as shown in Table S1. The higher mean and median of Phase III for Pennsylvania are partially due to the lack of high frequency of low duct leakage observations in Phase I. While this might have an impact that increases the state average EUI at Phase III, it will not necessarily have the impact of increasing the savings potential at Phase III, because measure-level savings potential focuses on the improvement of the worse than code observations and, in this case, the savings potential did show improvement in Phase III for Pennsylvania. Figure S6 shows the distribution of Window SHGC. Window SHGC is one of the key items that meets code compliance in the Phase I baseline study of most of the states, which is revealed by the fact that most of the observations are located on the right side of the code requirement denoted by the dashed vertical line. Alabama was the exception, with Window SHGC identified as a to-be-improved key item in Phase I. Visual inspection of the portion of the distribution left to the dashed vertical line of the two phases explains the savings potential reduction of Phase III, as shown in Table S1.

Discussion
A consistent framework based on an energy metric has been established that can quantify gaps in code compliance and the effectiveness of compliance improving intervention strategies. This approach has recently been used by eight states. We evaluated the state-wide average EUIs of new residential construction and individual key item measure-level savings potential both before and after intervention activities such as education and training. We compared both the state-wide average EUI and the measure-level savings potential of the seven states that have completed all three phases. The state-wide EUI results show significant EUI reductions in four states (Georgia, Kentucky, Maryland, and Texas), an inconclusive EUI reduction in Alabama, an EUI increase in Pennsylvania, and an inconclusive EUI increase in North Carolina. The measure-level savings potential analysis shows that all to-be-improved key items identified in Phase I at Alabama, Georgia, and Maryland have been improved. Although the savings potential of two key items increase in Kentucky, and one key item increases in Texas, the overall savings potential in Kentucky and Texas decreased after Phase II. While the overall savings potential in North Carolina and Pennsylvania increases, three key items in both North Carolina and Pennsylvania show savings potential reduction after Phase II. Table S1 indicates code compliance improvement after Phase II's education, training, and outreach activities. There was an overall improvement in five of the seven states. In three of the seven states, every to-be-improved key item showed improvement, while in the other four states, some key items improved, while some got worse. By looking at key item performance across all seven states as shown in Table 5, all key items show overall improvement. Future study is needed specifically for those key items in the states showing deteriorated performance after the targeted education, training, and outreach activities.
The distribution of the key items collected at the two phases have also been inspected. High-efficacy lighting shows unambiguous improvement after Phase II in all states. Frame wall insulation shows improvement in the states of Kentucky, Maryland, North Carolina, Pennsylvania, and Texas, but it seems to deteriorate a little in Alabama and Georgia in terms of means and medians. However, if focusing on the effort to bring worse than code occurrence to meet or be better than code, even Alabama and Georgia show improvement, as suggested by the reduction of savings potential in Phase III from Phase I, as shown in Table S1. Envelope tightness shows improvement in five of the seven states, i.e., Alabama, Georgia, Kentucky, Maryland, and Texas, based on checking the descriptive statistics and visual inspection. Ceiling insulation improved in Alabama, Georgia, Maryland, and North Carolina, and it deteriorated in Pennsylvania, which was supported by descriptive statistics and visual inspection. There is an improvement in Kentucky, which can also be supported by the higher frequency of the meet-code observations and reduced savings potential observed in Phase III. The descriptive statistics suggest an improvement in Texas in Phase III, but the change in the worse than code portion shows an opposite trend based on the histograms in Figure S4 and the measure-level savings potential in Table S1. For duct leakage, the descriptive statistics and visual inspection of histogram support that it is improved in Georgia, Maryland, and Texas, and it deteriorated in North Carolina and Pennsylvania. The means and medians at Alabama and Kentucky show opposite trends from Phase I to Phase III. In Alabama, the heavy tail in the histogram of Phase I suggests an improvement from Phase I to Phase III. In Kentucky, the existence of outliers in the high-value tail in the histogram of Phase III leads to a deterioration on the descriptive statistic values.
Code compliance of window SHGC is generally good for all states in the two phases, judging by the very few occurrences of observations to the left of the dashed vertical line in the histograms in Figure S5. Window SHGC was identified as a to-be-improved key item in Alabama during Phase I. The descriptive statistic and visual inspection of the histogram conclude that there is an improvement in Phase III over Phase I for Alabama, Maryland, and Texas. For other states, the construction practice in terms of Window SHGC may stay the same between the two phases.
The purpose of this study aims to evaluate code compliance, in term of energy metrics, of a large population of buildings in the scale of U.S. states, and the methodology was designed for this purpose. The single site-visit principle enforced during field data collection and the use of limited field data to infer a large number of building energy models constitute the novelty of this study, but these also lead to limitations. For example, it is impossible to know whether a home complies with the building energy code in its entirety from a single visit, since insufficient information can be gathered in a single visit to determine if all code requirements have been met. For the same reasons, the prescriptive path was assumed in this study, because it is not possible to determine if common tradeoffs were present. In addition, the building energy model was constructed using prototypes instead of the surveyed homes. Thus, the impact of certain field-observable items such as size, height, orientation, window area, floor-to-ceiling height, equipment sizing, and equipment efficiency were not included in the analysis [44]. Experiences and lessons learned during this study might be very useful for future similar studies.

Conclusions
In conclusion, this paper presents a novel methodology to evaluate the building energy code compliance of a large population of homes. The novel features of this approach versus more traditional approaches include (a) data collection based on a single site visit to each home to preclude the effect of builders being particularly careful about their work on homes that they know survey teams will revisit; (b) the use of statistical sampling of homes under construction within the state to ensure randomness, (c) a relatively small number of homes visited per state, which keeps the cost of the field data collection down, (d) the use of prototypical homes for modeling instead of modeling "as built" homes, and (e) whole-building simulation on the use of energy performance metrics through leveraging limited field data collection with large-scale simulations. In this approach, the limited field data is leveraged by the use of a bootstrapping sampling process and a PNNL prototype model to generate a large number of pseudo-homes to assess building energy efficiency performance in the scale of U.S. states.
The established methodology has been employed to field data collected from two field-survey phases of the multi-year multi-phase project in seven U.S. states. The state-wide average energy consumption decreased for Phase III from Phase I for five of the seven states involved in the analysis. The measure-level savings potential analysis shows an overall reduction. The improvement of Phase III over Phase I is evident, and overall, the training and education of Phase II played a recognizable role in improving compliance with the building energy code.
Although EnergyPlus was used as the simulation engine during the methodology development, the framework developed is generic and can use any building performance simulation tool or platform to evaluate energy efficiency measures, and it is applicable to other scenarios where the goal is to evaluate energy performance of a large number of buildings with limited available data.
Supplementary Materials: The following are available online at http://www.mdpi.com/1996-1073/13/9/2321/s1, Figure S1: Distributions of High-Efficacy Lighting, Figure S2: Distributions of Frame Wall Insulation, Figure S3: Distributions of Envelope Tightness (ACH50), Figure S4: Distributions of Ceiling Insulation. For state of Georgia, two plots were included due to the distortion caused by an outlier. The two plots show the histogram before and after the outlier is removed from the display, Figure S5: Distributions of Duct Leakage, Figure S6: Distributions of Window SHGC, Table S1: Measure-Level Annual Savings Potential by state (Phase III vs. Phase I).