Prediction is a common machine learning (ML) technique used on building energy consumption data. This process is valuable for anomaly detection, load profile-based building control and measurement and verification procedures. Hundreds of building energy prediction techniques have been developed over the last three decades, yet there is still no consensus on which techniques are the most effective for various building types. In addition, many of the techniques developed are not publicly available to the general research community. This paper outlines a library of open-source regression techniques from the Scikit-Learn
Python library and describes the process of applying them to open hourly electrical meter data from 482 non-residential buildings from the Building Data Genome Project
. The results illustrate that there are several techniques, notably decision tree-based models, that perform well on two-thirds of the total cohort of buildings. However, over one-third of the buildings, specifically primary schools, performed poorly. This example implementation shows that there is no one size-fits-all
modeling solution and that various types of temporal behavior are difficult to capture using machine learning. An analysis of the generalizability
of the models tested motivates the need for the application of future techniques to a board range of building types and behaviors. The importance of this type of scalability analysis is discussed in the context of the growth of energy meter and other Internet-of-Things (IoT) data streams in the built environment. This framework is designed to be an example baseline
implementation for other building energy data prediction methods as applied to a larger population of buildings. For reproducibility, the entire code base and data sets are found on Github.
This is an open access article distributed under the Creative Commons Attribution License
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited