Quality Control Methods for Advanced Metering Infrastructure Data

: While urban-scale building energy modeling is becoming increasingly common, it currently lacks standards, guidelines, or empirical validation against measured data. Empirical validation necessary to enable best practices is becoming increasingly tractable. The growing prevalence of advanced metering infrastructure has led to signiﬁcant data regarding the energy consumption within individual buildings, but is something utilities and countries are still struggling to analyze and use wisely. In partnership with the Electric Power Board of Chattanooga, Tennessee, a crude OpenStudio/EnergyPlus model of over 178,000 buildings has been created and used to compare simulated energy against actual, 15-min, whole-building electrical consumption of each building. In this study, classifying building type is treated as a use case for quantifying performance associated with smart meter data. This article attempts to provide guidance for working with advanced metering infrastructure for buildings related to: quality control, pathological data classiﬁcations, statistical metrics on performance, a methodology for classifying building types, and assess accuracy. Advanced metering infrastructure was used to collect whole-building electricity consumption for 178,333 buildings, deﬁne equations for common data issues (missing values, zeros, and spiking), propose a new method for assigning building type, and empirically validate gaps between real buildings and existing prototypes using industry-standard accuracy metrics.


Introduction
In the United States, there are approximately 125 million residential and commercial buildings. Collectively, these buildings consumed approximately 40% of nation's primary energy use, 73% of the electricity, 80% of demand during critical generation hours, and totalled approximately $419 billion in energy bills during 2019. Buildings constitute more than any other energy-consuming sector, and is often referred to as the "built environment." Many locations are attempting to stimulate intelligent and efficient use of energy by assessing smart city developement [1] in service to climate action plans [2]. In order to facilitate private sector application of energy efficiency in these buildings, enhanced decision making tools and financing instruments are becoming available. Tools are becoming available from the subfield of urban-scale building energy modeling [3], where a digital twin of a city-sized area is created and leveraged for many emerging use cases [4]. Such a tool can be used by a city's sustainability officers to evaluate and prioritize attractive energy-saving technologies prior to incentivization or updating building codes. Likewise, a utility could use it to facilitate deployment of energy-and demand-saving technologies to customers through its energy efficiency program. There are many specific instances of urban-scale modeling demonstrations that have been developed recently by universities and U.S. national laboratories.
Massachusetts Institute of Technology (MIT) created a model of 83,541 buildings in Boston, Massachusetts by leveraging publicly available GIS data from tax assessor records [5]. Many urban-scale energy modeling techniques use such locally-rich data sources, but a more general method is needed for assigning building types and properties that doesn't rely on small regions. Stanford University was able to assess 22 modeled buildings in California including comparison to measured data using Mean Absolute Percent Error (MAPE) [6]. This work builds on the same DOE prototypes and comparisons while also leveraging more industry-standard metrics and delving into the challenges of preprocessing energy use data. University of College London in the United Kingdom has set about the ambitious goal of modeling London with 98,000 building energy models built on datasets of building descriptors and energy use not typically found in the United States [7]. Among other notable urban-scale modeling efforts is University of Applied Sciences Stuttgart in Germany that has defined a flexible workflow for ingesting, simulating, and analyzing city-scale data [8]. For the study of London as well as Stuttgart's SimStadt, the authors are not aware of any statistical analysis or summary of advanced metering infrastructure data used or issues encountered.
Lawrence Berkeley National Laboratory's City Building Energy Saver (CityBES) team have created a modern visualization tool for analyzing city-scale building energy models online using building energy models from traditional data sources (e.g., tax assessor data) but also includes thermal and radiant coupling between neighboring buildings [9]. Their analysis has been applied to 940 office and retail buildings in northeast San Francisco with estimates on potential energy savings. Like many such efforts, scalable methods for assigning building type are not needed and details for empirical validation against measured data are lacking. National Renewable Energy Laboratory's URBANopt team have created an open-source repository to facilitate urban-scale energy analysis for buildings [10]. This software repository is flexible and scalable, but relies on the user to provide necessary data, does not provide tools for analyzing energy data, and has not yet been involved in any case studies comparing to measured data. Many of these studies leverage geographically-limited datasets to create building-specific energy models, but some have begun to grapple with the challenges of scalability and empirical validation. At a larger scale, modeling of buildings allows benchmarking of the existing building stock, cost-optimization of energy technologies, and renewables that could offset remaining energy use. This simulation-informed benchmark, reduce, offset approach could help actualize a sustainable built environment.
In a 5-year vision to create a model of every U.S. building, a larger team has set about the task of identify, comparing, and extracting building-specific descriptors from nationscale data sources. In order to quantify the value of specific data layers or algorithms, the team has partnered with the Electric Power Board of Chattanooga, TN (EPB) which has provided 15-min electricity use for each building. EPB's service area covers 8 counties and approximately 1400 km 2 in East Tennessee and Georgia. The data sources and algorithms, which we collectively refer to as "Automatic Building Energy Modeling (AutoBEM)," has been used to create 178,368 distinct OpenStudio and EnergyPlus models for every building in EPB's service territory. The models have since quantified energy, demand, emissions, and cost-reductions under nine monetization scenarios for the utility and is being used to inform programmatic rollout of energy efficiency, demand management, product/service lines, and new business models. Previous work has focused on peer review [11], scalable data sources [12], assessment of value propositions [13], virtual utility with buildings as thermal batteries [14], and microclimate interaction [15] detailing the development and application of the building energy models. These research areas are provided for context, but are explicitly outside the scope of the current article.
There exists a software vs. reality technical gap that can lead to distrust in models when applied as digital twins to inform city-scale decisions. Individuals that create software models of real-world objects are often attacked for failing to empirically validate the model with measured data from the real-world. While this is more difficult and costly to maintain, real-world data can expose gaps both in software inputs or underlying algorithms. While there is a tendency for modelers to trust "ground truth" data, those that collect data often prefer to rely on models. This can be due to sensor drift/failure, placement, measurement uncertainty, data acquisition challenges, or formatting/conversion issues.
There is also a research gap for accurately defining the building type of a structure. Traditional urban-scale building energy modeling approaches use tax assessor's data and attempt to map land use or other codes to a canonical set of prototype models. This metaparameter of a building, combined with the assumption that the building was built to code at the time of construction, is subsequently used to fill out building details (e.g., HVAC type/efficiency, insulation levels, linear feet of refrigeration cases) necessary to perform physics-based energy calculations. This paper discloses results of a methodology based on Energy Use Intensity (EUI) for assigning building type.
This article presents a few simple methods for performing quality control assessment on advanced metering infrastructure data, comparison to prototypical building types, and quantification of error between models and whole-building electricity use. To the authors' knowledge, comparison of building energy models to measured data from over 100,000 buildings has never been published. As such, we hope the crude models, quality control, and industry-standard error metrics will stimulate comparison and improvement of empirical validation techniques for urban-scale modeling. The rest of this article will provide details of the sub-hourly, whole-building electric use information and mathematical methods for comparing this data to building energy models in Materials and Methods. Results follow summarizing statistical analysis of unusual data patterns, methods for correction, industry-standard error metrics for comparison between measured and modeled data, and error rates for building type assignments.

Materials and Methods
EPB provided measured data taken from revenue-grade electrical meters for 178, 377 premise IDs. This data was subject to many of the metering issues described above. Technical challenges arise when working with such large data sets, including organization, filtering, and transcription. This paper attempts to address some of these issues and represents an expansion of analysis from our previous paper [16]. An overview of the preprocessing methods is given in this paper, but for more specific details, refer to [16]. Three patterns of outlier data are investigated: missing, 0-vectors, and spiking.

Data
The first goal of this research was to perform a quality-control analysis on meter data for the nearly 180,000 customers in EPB's service area. The metered data was collected in calendar year 2015 and initially presented as 50 gigabytes of unsorted tuples in the format <time, premise ID, energy use>. In each tuple, time indicates a 15-min interval during the calendar year, premise ID an un-linked property ID, and energy use the reported amount of kW hours consumed by the property during the indicated 15-min interval. This data was sorted by premise ID and chronologically for easier analysis.
An initial look at the data revealed a number of issues: As a consequence of these issues, nearly all premise IDs are missing some data. For the particular year of data, there are exactly 35, 040 15-min intervals corresponding to the start of the year (1 January, 00:00-00:15) and every 15 min until the end of the year (31 December, 23:45-24:00). Ideally, each premise ID would have the exact number of data points as there are 15-min intervals in the year. Instead, most premise IDs have at least some missing data, resulting in fewer energy use values. Analysis showed that over 93% of the premise IDs were missing less than 2% of their data (i.e., missing fewer than 701 of 35,040 data points for the year). For the purpose of this research, this was sufficient data to continue the comparison. Any premise IDs missing an excessive amount of data could be individually filtered during later analysis. We refer to missing vectors as premise IDs missing in excess of 90% of their data.
In addition to missing data, a quick scan through the premise IDs revealed two unusual trends. The first, which we are calling 0-vectors, are premise IDs in which all given energy use values for the entire year are zero. The second, which we are calling spiking vectors, are similar to the 0-vectors but with one or more 15-min energy use values exceeding 10,000 kWh (and in some cases, exceeding 10,000,000 kWh). Clearly, neither of these patterns represents normal operation of a standard building type. Premise IDs displaying either of these trends could also be filtered during later analysis.

Comparison
The second goal of this research was to compare crude building simulations to the metered data to determine the value of crude simulations. A total of 97 different prototype building and vintage combinations were simulated using climate zone ASHRAE-169-2006-4A building codes and Actual Meteorological Year (AMY) weather data matching the year of the metered data. The simulations produced the energy use of each building/vintage combination in 15-min intervals for the same calendar year, resulting in 35, 040 15-min intervals for each building/vintage combination. Each premise ID was compared to the 97 prototype vectors to determine a level of similarity. Comparing buildings requires finding the energy use intensity (EUI) of each building, given by the kWh use normalized by area. We were able to obtain square footage of 178, 333 of the initial 178, 377 buildings, allowing us to perform comparisons on nearly all premise IDs in the service area.
For this analysis, Euclidean distance (Equation (1)) was used to determine similarity between each premise ID and the prototype vectors. A smaller Euclidean distance indicates a higher similarity between two values. Every premise ID was individually compared to each of the 97 prototype vectors using Euclidean distance. From there, each premise ID was assigned the building type and vintage corresponding to the prototype vector for which it had the smallest distance: where: d is the distance, n is the number of values in a vector, p is the chronological energy use of a premise ID, and v is the chronological energy use of a prototype building. Both p and v have values in 15-min intervals, given by Equation (2). The time intervals begin at 1 January, 00:00-00:15 of the calendar year and continue in 15-min intervals. Thus, p 1 represents 1 January, 00:00-00:15, p 2 represents 1 January, 00:15-00:30, etc., until p 35,040 at 31 December, 23:45-24:00. The value of n in these calculations is 35,040.
For the prototype buildings, each v i is a positive numeric value. In the case of the premise IDs, missing values at time i are given the value p i = NaN to differentiate them from actual values of zero and to help with coding. With this representation, the 0-vectors discussed above will have all p i as either 0 or NaN. As NaN values represent missing data, these values were "skipped" in the distance calculation. For Equation (1), p i = NaN "skips" the corresponding v i value in the prototype vector.
Once each premise ID had been assigned to a building type and vintage, error rates were calculated between the premise ID and prototype. CV(RMSE) (coefficient of variation of the root mean square error) and NMBE (normalized mean bias error) are industry standards for comparing simulated and measured data and were measured based on ASHRAE Guideline 14 [17].
As with previous equations, n is the number of values in a vector, p is the chronological energy use of a premise ID, and v is the chronological energy use of a prototype building. The valuep is the mean of non-NaN values in p.

Results
The results are broken down based on the two different analysis performed. First, several statistical analysis were performed on the metered data to determine the effects of removing the premise IDs matching previously identified patterns. Second, CV(RMSE) and NMBE measurements are given using the same filtering criteria.

Statistical Analysis
One issue in dealing with a real-world data set is determining how trustworthy the data is. In situations where no ground truth is available, statistical information can be analyzed to determine the consistency of the data set. For this research, several statistics were analyzed with and without filtering. These statistics are RMSE (root-mean square error), RE (relative error), AE (absolute error), average, standard deviation, and the minimum/maximum values. Threshold analysis can be a more accurate way to determine an average of a series of data with missing values than several other methods [18]. In this study, we compute the average, but then use a sliding window of 1.5 h and standard deviation of 3 (c = 3, n = 6) to discard any electricity use outside of that range. See [16] for the full implementation of threshold averaging.
where y i is the utility data,ȳ i the simulated data, µ the threshold window average, and σ the threshold window standard deviation.
For the statistical analysis, three different filters were applied to the initial data set containing 178, 377 premise IDs. These filters removed premise IDs with the criteria listed below. It is possible that a premise ID could belong to more than on filter. In those cases, filters were applied in the order of Missing, Zeros, Spiking.

1.
Missing: 90% or more data points of the premise ID were missing (indicating that 90% or more of the data consisted of NaN values).

2.
Zeros: the maximum value of any 15-min energy interval did not exceed 0.001 kWh.

3.
Spiking: contained a maximum value that was over 10,000 and 50 t larger than the threshold average value. The threshold average value is calculated from Equation (8).
Effects of individual filters applied to the data are shown in Table 1. For the zeros filter, a value slightly higher than 0 was used to account for conversion issues between data types, such as string to float. Spiking values were selected by the authors based on inspection of this specific dataset and applied equally to all buildings. The values used are with acknowledgment of data-specific behaviors and reported here for completeness, without implying a best practice for flagging buildings with volatile vacillations of energy use. The most effective filter by far was removing the spiking data. All metrics, except for the average minimum value, were reduced. This is especially interesting because the spiking filter removed only 138 premise IDs, far fewer than either the missing or zeros filters. Removing spiking premise IDs also reduced the error measurements significantly, while missing and zero premise IDs had almost no effect. It is also worth noting that the threshold average was reduced by roughly 1/2 while the raw average was reduced by 1/17, indicating that threshold averaging can be useful for data with unusually high outliers.

Industry-Standard Error Metrics
With the available square footage for premise IDs to perform the Euclidean distance calculations, 178,333 premise IDs were compared with their matched prototype vectors. The raw data, with no filters applied, is given in Table 2 and shows the average values for each building type for distance, valid data points (the number of non-NaN values in the premise ID), CV(RMSE), NMBE, and total number of premise IDs matched to that building type. To clarify, the Valid Data Points is the average number of datapoints used to classify each premise; this would be 35,040 if every building had data for every 15-min period during the year. Also, Total Matches refers to the number of buildings assigned to that building type based on Euclidean distance between the building's actual EUI compared to the prototype building.
The table reveals several concerning outliers: the IECC and Warehouse building types have error rates exceeding one million percent. A low distance value represents a closer, or better, match between a premise ID and the prototype. The distance value is incredibly high for the QuickServiceRestaurant (QSR), and relatively high for Outpatient and PrimarySchool building types. This indicates that the premise IDs matching to QSR and PrimarySchool do not match EUI as well as other premise IDs match their building types. Once the initial data was analyzed, filters were applied. After filtering out premise IDs with the missing, zeros, or spiking patterns discussed above, a total of 173, 839 premise IDs remain. The same values are reported in Table 3. When the original data is filtered, many of the results are improved. The filtering generally reduces the CV(RMSE) for each building type, while NMBE remains largely unchanged. Comparing the filtered and unfiltered FullServiceRestaurant (FSR), the unfiltered CV(RMSE) of 780.84% is reduced to the filtered value of 78.24%, an improvement by a factor of 10. This is especially notable because the total matches changed from 52 to 48, indicating that the data for the 4 filtered premise IDs was enough to warp the CV(RMSE) of the FSR significantly. Such a decrease is likely the result of removing one or more Spiking premise IDs. By definition of the Spiking category, at least one value needs to be exceptionally high, which would yield a large error value for that time interval and increase CV(RMSE) compared to other premise IDs with the same building type. When Spiking premise IDs are removed, the average CV(RMSE) for that building type will decrease. Three building types, LargeOffice, RetailStandalone, and SecondarySchool, have their CV(RMSE) and NMBE values increased rather than decreased. When the CV(RMSE) increases, it indicates that the remaining premise IDs have larger outliers than the removed premise IDs, while an NMBE increase indicates a higher average error in the remaining premise IDs. These changes indicate that the removed premise IDs were likely from the Missing category. Removing a premise ID in the Zeros category would likely reduce the NMBE values, although this might not happen if the building type's EUI is very close to 0. This effect is especially noticable on the RetailStandalone and SecondarySchool building types as both had only a single premise ID removed from their Total Matches.
Both matches to PrimarySchool in the unfiltered data are removed with filtering, indicating that the unfiltered measurements matched one of the unusual data patterns and were not likely good matches for the building type.
The distance value for the QSR decreased from 922.84 to 2.98 after filtering, indicating that the premise IDs that remain after filtering are significantly more likely to be represented by the QSR building type. This can also be seen in the CV(RMSE), which drops from 1341.74 to 75.72 after filtering. This is the result of removing Spiking premise IDs, which have an extremely high CV(RMSE) due to the nature of their outliers. The largest outliers within the filtered data are still the IECC and Warehouse building types, whose error measurements exceed 1,000,000. Although filtering reduced these error measurements significantly, they still far exceed a desirable value and warrant additional investigation in the future.
Generally, the quality control methods appear to be successful in reducing the error rates. The most significant effect comes from removing Spiking premise IDs, which lowers CV(RMSE) significantly due to the removal of extreme outliers. The effect on NMBE exists but is less intense. Removing the other two outlier patterns, Zeros and Missing, have a varied effect on the error measurements that will depend on the building type's energy profile. However, these energy patterns do not accurately represent a building type, and premise IDs matching the patterns should still be filtered prior to analysis.

Conclusions
This article attempts to provide guidance for working with advanced metering infrastructure for buildings related to: quality control, pathological data classifications (and their equations), statistical metrics on performance, a methodology for classifying building types, and industry-standard accuracy metrics. Common problems (missing, 0-vectors, and spiking) observed with advanced metering infrastructure data, and the mathematical definitions of these issues, has been shared along with methods for handling these, or similar, data quality problems. Actual 15-min electricity use from over 178,000 customers has been used to assign building type. The provided statistics can inform the time-of-use energy match between building energy models and real buildings. While advanced metering infrastructure data may become more prevalent, the approaches in this study generally are not feasible since organizations, other than utilities, typically do not have such energy use for buildings at city-scale. CV(RMSE) and NMBE error metrics are used to quantify improvement of the match between modeled and measured building energy use when applying the quality control methods.
Future work will share distributions of error by building type, vintage, and other characteristics to show improvement and remaining challenges in driving down the error in urban-scale energy modeling for both electricity and demand. This involves ongoing work to generalize features and Artificial Intelligence-based prediction of building types.