A Robust Artificial Intelligence Approach with Explainability for Measurement and Verification of Energy Efficient Infrastructure for Net Zero Carbon Emissions

Rapid urbanization across the world has led to an exponential increase in demand for utilities, electricity, gas and water. The building infrastructure sector is one of the largest global consumers of electricity and thereby one of the largest emitters of greenhouse gas emissions. Reducing building energy consumption directly contributes to achieving energy sustainability, emissions reduction, and addressing the challenges of a warming planet, while also supporting the rapid urbanization of human society. Energy Conservation Measures (ECM) that are digitalized using advanced sensor technologies are a formal approach that is widely adopted to reduce the energy consumption of building infrastructure. Measurement and Verification (M&V) protocols are a repeatable and transparent methodology to evaluate and formally report on energy savings. As savings cannot be directly measured, they are determined by comparing pre-retrofit and post-retrofit usage of an ECM initiative. Given the computational nature of M&V, artificial intelligence (AI) algorithms can be leveraged to improve the accuracy, efficiency, and consistency of M&V protocols. However, AI has been limited to a singular performance metric based on default parameters in recent M&V research. In this paper, we address this gap by proposing a comprehensive AI approach for M&V protocols in energy-efficient infrastructure. The novelty of the framework lies in its use of all relevant data (pre and post-ECM) to build robust and explainable predictive AI models for energy savings estimation. The framework was implemented and evaluated in a multi-campus tertiary education institution setting, comprising 200 buildings of diverse sensor technologies and operational functions. The results of this empirical evaluation confirm the validity and contribution of the proposed framework for robust and explainable M&V for energy-efficient building infrastructure and net zero carbon emissions.


Introduction
A global transition into net zero carbon emissions has been accepted, and in some cases mandated, by governments, organizations, and concerned communities as the critical and pragmatic solution to climate change. An increase in renewables generation and a decrease in energy consumption delivers climate action, as well as energy sustainability and reduced operational costs. Governmental policies on energy efficiency are aimed at supporting climate action, such as the Energy Policy Act (EPA), an executive order passed by the US government to improve the energy efficiency in 15% of the buildings by 2015 with respect to the 2003 baseline and the US Climate Bill 2022 proposed investment worth nearly $370 billion towards energy efficiency and climate change reduction efforts [1,2]. The Energy Efficiency Directive by the European Parliament has the objective of reducing greenhouse gas emissions by 55% compared to 1990 levels to achieve climate neutrality in 2050 and improve energy efficiency across all industries by 9% compared to 2020 levels [3].

Materials and Methods
The proposed AI approach is illustrated in Figure 1. It is composed of five vertical layers, they are, the input layer, data Lake layer, artificial intelligence (AI) layer, ECM quantification layer, and finally, the explainability layer. The input layer receives data streams from the building management systems (BMS), solar photovoltaic (PV) systems, smart meters, sensors, climate systems, and ECM project management database. These data streams are extracted, transformed, and loaded into a data lake which forms the second vertical layer. This data lake is accessed by the AI layer which pre-processes and formulates relevant attributes to build the baseline consumption model and finetune its hyperparameters. The AI layer ensures robustness to the variability of the ECM calculations by using a single predictive model to detect pre-and post-ECM energy consumption patterns, which are then used to estimate annual savings. This single model approach can be applied to short time intervals so that granular ECM patterns and the impact of seasonality is determined using low volumes of data. XGBoost, the supervised learning algorithm used to build the predictive model, adds a further level of robustness through its boosting and meta-learning properties. The fourth layer for ECM Quantification evaluates the performance of the model, and then computes and quantifies the ECM savings. Finally, the fifth layer applies Shapley Additive exPlanations (SHAP) game theoretic approaches to explain and interpret the model and its M&V outcomes and communicates the datadriven decision-making insights on M&V of ECM initiatives using interactive analytics dashboards. The following subsections delineate each of the post-input verticals, in terms of their constituents and functionality.

Data Lake Layer
The data lake layer is formally defined as a repository for "all" types of data generated by multiple systems and functions within an organization [23][24][25]. Data lakes are advantageous primarily due to the separation of data produced by systems (processes or entities) from data consumed by humans and systems, as well as their adaptable structure in providing a storage layer for analytics/AI insights and data-driven decision-making [26]. It receives internal and external data from databases, data warehouses, raw data streams, and conventional repositories, such as flat files and spreadsheets. Data formats

Data Lake Layer
The data lake layer is formally defined as a repository for "all" types of data generated by multiple systems and functions within an organization [23][24][25]. Data lakes are advantageous primarily due to the separation of data produced by systems (processes or entities) from data consumed by humans and systems, as well as their adaptable structure in providing a storage layer for analytics/AI insights and data-driven decision-making [26]. It receives internal and external data from databases, data warehouses, raw data streams, and conventional repositories, such as flat files and spreadsheets. Data formats are not defined until the point of utility, which ensures the data can be stored, managed, and leveraged in an adaptable manner, typically using the key-value pairs format. The data lake is updated regularly with timestamps for when the data was received, in addition to the timestamps for when the data was created and/or modified in the source pipelines and storage. The data lake can also be used directly for data visualization and search functions that are required by ECM audits and management activities.

Artificial Intelligence (AI) Layer
The AI layer retrieves relevant data from the data lake and begins with a data wrangling phase that evaluates data quality. This is an important first phase because energy data streams contain missing and erroneous due to reasons such as sensor failures, network failures, and data transmission issues. Descriptive statistical analysis and visualization techniques, such as histograms and boxplots can be used to identify these issues. For identified issues, we have used the mean value imputation method to remove variables with more than 5% of missing values and impute the values of variables with less than 5% missing values. Following this preprocessing phase, we identify appropriate variables to build the AI models. Correlation analysis between independent variables and dependent variables is used to identify highly correlated variables, while we also assess correlation among independent variables to detect and avoid multicollinearity.
The AI algorithms and models we have leveraged in this approach align with the guidelines specified in IPMVP and ASHRAE. Firstly, we have introduced a parameter to represent ECM activity between the pre-retrofit period and post-retrofit periods. This ensures the AI model learns the change in energy consumption before and after the ECM project. For the learning task, we used the XGBoost algorithm. Boosting is an established machine learning concept with algorithms, such as adaptive boosting (AdaBoost), boosting tree, gradient boosting (GB), Extreme gradient boosting (XGBoost) and Light gradient boosting machine (lightGBM). XGBoost is a formally established supervised learning algorithm for predictive model development due to its accuracy and performance as evidenced by several studies in the energy domain itself [12,15,27,28]. A comparative analysis conducted by Cabrera et al. [29] showed that XGBoost has the highest accuracy in terms of CV(RMSE) compared to linear and symbolic regression models. Boosting focuses on predictions with high error in the initial training step by adjusting the sample distribution for the next training step. This process continues for multiple iterations until the number of learners reaches a stopping criterion. Boosting models build a set of weak learners to obtain strong learner with better performance compared to individual weak learners.
Most machine learning models with high accuracies are subject to over-fitting where the model becomes optimized for the training data, thereby performing poorly in testing data and live applications. K-fold cross-validation is an effective method to detect and measure over-fitting in machine learning models; this method randomly splits the training dataset into k subsets, which are called folds, with the same sizes. Since limited data is used in our approach with shorter pre-and post-ECM periods, we have applied 10-fold cross-validation. A further optimization of the machine learning model that improves performance is to finetune the hyperparameters of the learning algorithm. In XGBoost, the relevant hyperparameters are: (1) learning rate which provides the shrinkage at each time step, (2) estimators that provide the number of weak learners or the regression trees in the model and (3) tree depth for the number of splits during training [30]. The broad range of values that each hyperparameter takes and the number of such parameters that can be finetuned increases the computational complexity of the model development phase. This is typically addressed using the grid search method, which train models per all the hyperparameter combinations to find the most effective configuration or randomized search, which randomly select hyperparameter combination [31].
The robust evaluation of baseline accuracy is critical for M&V use cases as they aim to capture the quality of each ECM initiative [32]. Therefore, in our framework, we utilize the large volumes of smart meter data accumulated in most ECM settings to evaluate the performance of the XGBoost models built in the previous layer. Given the disposition of the performance evaluation task, the following metrics are the most effective; Normalized Mean Bias Error (NMBE), the Coefficient of Variation of the Root Mean Squared Error (CV(RMSE)), Coefficient of Determination (R 2 ) and Mean Absolute Percent Error (MAPE). The mathematical formulations of NMBE, CV(RMSE), R 2 , and MAPE are presented in Equations (1)-(4), whereŷ i is the predicted value, y i is the actual meter reading, y is the average of y i and n is the total number of data points.
According to the ASHRAE Guideline 14 uncertainty analysis of the M&V predictive model provides a degree of confidence in the actual value when measurement procedures are applied. CV(RMSE) measure is a prominent metric in ASHRAE Guideline 14 and IPMVP where the requirement is a prediction model with a CV(RMSE) value of less than 30, while IPMVP requires a value less than 20 CV(RMSE) [10,11].

ECM Quantification
The fourth vertical focuses on the savings quantification of the ECM initiative. Most quantification approaches follow the standard method defined in the IPMVP guidelines (Equation (5)), which includes an adjustment factor along with the baseline and post-retrofit energy usage. We propose a robust quantification method that is facilitated by the design of the predictive model in the AI layer. The binary ECM indicators introduced into the predictive model are used to estimate the impact of the ECM project by revising the values of the introduced features. For example, if the predictive model is built using consumption data and weather data from 2019, then the introduced binary ECM indicator 'is_ecm' is used to reflect the impact of ECM installation. During the model training time, this feature value will be set based on the ECM installation date to feed the pre and post-retrofit energy consumption behavior of the building to the model. Then, to evaluate the yearly savings with estimated weather data from a future or current year (noting that annual weather data typically lies within a narrow range across adjacent years), we set the introduced binary feature 'is_ecm' value to True and estimate energy consumption for the year assuming ECM event in place for the entire year. Next, we set the value to False, to which the model provides the estimated annual consumption without any ECM project for the entire year.
As noted in Equation (6), the difference between the sum of the energy consumption without the event and the sum of the energy consumption with the event over the year provides the annual energy savings for that ECM project. The savings percentage can be calculated using Equation (7) to estimate the overall energy savings percentage of the ECM project. The details of the savings quantification process are shown in Algorithm 1.

Explainability Layer
The explainability layer consists of two key functions, (1) using explainable AI (XAI) methods to interpret the predictive models for M&V and (2) interactive dashboards and reporting of the prediction and XAI outcomes. For the XAI function, the binary ECM indicator features drive the interpretation of the most contributing features in the model. The most widely used XAI method is the SHapley Additive exPlanations (SHAP) framework which is based on the concept of game theory and determines the contribution (or importance) of features and groups of features towards the output of the predictive model [33]. In addition, complex machine learning models such as neural networks and gradient boosting are yet to be adopted widely for M&V estimations despite their high accuracy because of the lack of interpretability [34]. The framework component that explores the model interpretability is a major need for uncovering the model internal behaviors with respect to input features [35]. The second function of the interactive dashboard and reporting aims to unpack the predictive model and its XAI output for further analysis and synthesis. This becomes important in the context of multiple buildings or large organizations where the buildings and the ECM projects need to be evaluated individually and in comparison to other similar projects.

Empirical Evaluation
We conducted several experiments using the UNICON [36] dataset, which is drawn from the La Trobe Energy AI/Analytics Platform (LEAP), the flagship AI initiative of La Trobe University's commitment towards achieving Net Zero Carbon Emissions in all campuses by 2029 [37]. There are over 100 buildings from multiple campuses that are classified as academic, accommodation, or administrative. The consumption data is from smart meters and contains 15-min interval energy usage from 2018, 2019, and 2020, where 2020 includes the COVID-19 shutdown period. Weather data is also recorded in 5-min granularity, which includes apparent temperature, dew point temperature, and relative humidity. University buildings reflect similar energy consumption patterns as commercial buildings where energy consumption is higher during afternoons and lower during nights and weekends. The full set of features used in the model is shown in Table 1. The dates of the ECM projects carried out at each of the buildings are shown in Table 2.

Evaluation of AI Model Performance
In this section, we evaluate the performance of the models on the selected features from Table 1. For evaluation, we used k-fold cross validation where the dataset is split into k equal segments and for each k-th segment, the model is trained on all other segments, followed by testing on the k-th segment. This repeats k-times and the k results are averaged to produce a single performance metric. This approach is more effective than a singular train-test split because of the multiple runs through which the model is exposed to all the data records instead a single, fixed subset. The results in Figure 2 show the results of 10-fold cross-validation CV(RMSE), R 2 , MAPE, and NMBE mean scores of the models for the buildings. There were no ECM projects for these buildings in the year 2018. We have used yearly 15-min energy consumption data in 2018 to validate the models with cross-validation. As illustrated in Figure 2, cross-validated CV(RMSE) values are within the recommended range specified in IPVMP and ASHRAE, which are 20 and 30, respectively. The R 2 values in Figure 2 are closer to 1 with majority of the MAPE values are below 10%, while NMBE values are below 0.06 for all the buildings confirms that the variables and dataset used in this experiment are effective at demonstrating the proposed robust and explainable AI approach for ECM calculation.
In this section, we evaluate the performance of the models on the selected features from Table 1. For evaluation, we used k-fold cross validation where the dataset is split into k equal segments and for each k-th segment, the model is trained on all other segments, followed by testing on the k-th segment. This repeats k-times and the k results are averaged to produce a single performance metric. This approach is more effective than a singular train-test split because of the multiple runs through which the model is exposed to all the data records instead a single, fixed subset. The results in Figure 2 show the results of 10-fold cross-validation CV(RMSE), R 2 , MAPE, and NMBE mean scores of the models for the buildings. There were no ECM projects for these buildings in the year 2018. We have used yearly 15-min energy consumption data in 2018 to validate the models with cross-validation. As illustrated in Figure 2, cross-validated CV(RMSE) values are within the recommended range specified in IPVMP and ASHRAE, which are 20 and 30, respectively. The R 2 values in Figure 2 are closer to 1 with majority of the MAPE values are below 10%, while NMBE values are below 0.06 for all the buildings confirms that the variables and dataset used in this experiment are effective at demonstrating the proposed robust and explainable AI approach for ECM calculation. The predictive AI models are built using 2018 data and the performance is evaluated using the complete 12 months of 2019 data. The objective of this experiment is to evaluate the impact of using smaller training time durations as existing research studies found that the use of shorter energy interval data provides similar performance measures over longer training periods [15,38]. This will verify the impact of using smaller training durations to build accurate baseline models with our selected set of features. Figure 3 shows the The predictive AI models are built using 2018 data and the performance is evaluated using the complete 12 months of 2019 data. The objective of this experiment is to evaluate the impact of using smaller training time durations as existing research studies found that the use of shorter energy interval data provides similar performance measures over longer training periods [15,38]. This will verify the impact of using smaller training durations to build accurate baseline models with our selected set of features. Figure 3 shows the CV(RMSE) values of the models that are trained for 3 months, 6 months, 9 months, and 12 months and evaluated the performance against 12-month training data in 2019. We have only selected buildings that do not have any ECM project during 2019. As noted in Figure 3, models trained on 3 months data performed equally well to the models trained on 6 months, 9 months and 12 months training time periods. This demonstrates the robustness of the proposed approach, where using 3 months' training data is sufficient for the model to learn energy consumption behavior of the building. CV(RMSE) values of the models that are trained for 3 months, 6 months, 9 months, and 12 months and evaluated the performance against 12-month training data in 2019. We have only selected buildings that do not have any ECM project during 2019. As noted in Figure 3, models trained on 3 months data performed equally well to the models trained on 6 months, 9 months and 12 months training time periods. This demonstrates the robustness of the proposed approach, where using 3 months' training data is sufficient for the model to learn energy consumption behavior of the building. In this dataset, buildings can be categorized into three groups, (1) buildings where the ECM project is a BMS upgrade only, (2) buildings where the ECM project is an LED lighting upgrade only, and (3) buildings with both ECMs. In our proposed method, we have added a binary feature to represent the time periods of before and after the installation date of ECM which is then projected across 2019 to evaluate the energy savings as shown in Figure 3 where the percentage savings is calculated using Equation (7).

Buildings with BMS Upgrade ECM
We first evaluated the buildings that underwent the BMS upgrade. Model performance is shown in Figure 4 where all CV(RMSE) values are below 15 for all of the buildings, which is within the IPMVP guidelines. Table 3 shows the percentage of savings for the same BMS upgrade measured during different time spans, ranging from 12-, 9-, 6-, and 3-months data prior to the event with 3-and 6-month data after the ECM installation.  In this dataset, buildings can be categorized into three groups, (1) buildings where the ECM project is a BMS upgrade only, (2) buildings where the ECM project is an LED lighting upgrade only, and (3) buildings with both ECMs. In our proposed method, we have added a binary feature to represent the time periods of before and after the installation date of ECM which is then projected across 2019 to evaluate the energy savings as shown in Figure 3 where the percentage savings is calculated using Equation (7).

Buildings with BMS Upgrade ECM
We first evaluated the buildings that underwent the BMS upgrade. Model performance is shown in Figure 4 where all CV(RMSE) values are below 15 for all of the buildings, which is within the IPMVP guidelines. Table 3 shows the percentage of savings for the same BMS upgrade measured during different time spans, ranging from 12-, 9-, 6-, and 3-months data prior to the event with 3-and 6-month data after the ECM installation.     We conducted further on-site analysis of the results presented in Table 3. It was noteworthy that some buildings have lower percentage savings after switching from a 3-month post-event period to a 6-month post-event period. Our on-site analysis revealed that the impact of the BMS upgrade on energy savings reduced over time which is the reason for the reduction of the savings percentage. The percentage savings in the 12 months (post) column is the standard one-year approach of estimating ECM savings where we built a baseline model using a 12-month pre-ECM period and estimate the energy savings percentage compared to actual and adjusted baseline consumption. According to the results, energy savings estimated by the standard approach are higher compared to the proposed method. This is because of the saving percentage estimated by the standard approach is not yearly basis as it is not possible to capture data for 1 year period after the ECM installation due to overlapping ECM projects. The savings calculated using the standard approach also indicates a drop going from 3 months savings to 6 months savings. Hence, the annual savings estimated by our approach is less than the standard method estimation. This indicates the robustness of our approach where it captures the energy savings pattern over time and projects it across the entire year. Using the standard approach, it is not possible to get an estimation of this latent behavior. Figure 5 shows the projection of energy consumption over a year with and without the ECM project. It shows that our approach captures the energy consumption behavior according to the introduced binary feature to reflect the ECM project.
In the explainability layer, we studied the models used in the experiments that are based on a 9-month pre-ECM installation period followed by a 3-month post-training period. The variable importance plot in Figure 6 shows the impact of the features based on SHAP values respective to feature values towards the target prediction with descending feature importance. According to the variable importance plot, the 'is_bms' binary variable which is related to the BMS upgrade ECM has the second most impact on the target predictions of the model for building B26. SHAP values in the variable impact plot in Figure 6 provide the distribution of the impact of each feature has on the model output. The color represents the feature value where their value goes from low to high from blue to red. As expected, the binary variable 'is_bms' has an inverse effect. When the binary value is set to 1, it reduces the energy consumption, while a variable, such as temperature, has a positive impact towards the energy consumption. This indicates that the model has properly captured the impact of the ECM project based on the feature value.
savings. Hence, the annual savings estimated by our approach is less than the standard method estimation. This indicates the robustness of our approach where it captures the energy savings pattern over time and projects it across the entire year. Using the standard approach, it is not possible to get an estimation of this latent behavior. Figure 5 shows the projection of energy consumption over a year with and without the ECM project. It shows that our approach captures the energy consumption behavior according to the introduced binary feature to reflect the ECM project. In the explainability layer, we studied the models used in the experiments that are based on a 9-month pre-ECM installation period followed by a 3-month post-training period. The variable importance plot in Figure 6 shows the impact of the features based on SHAP values respective to feature values towards the target prediction with descending feature importance. According to the variable importance plot, the 'is_bms' binary variable which is related to the BMS upgrade ECM has the second most impact on the target predictions of the model for building B26. SHAP values in the variable impact plot in Figure 6 provide the distribution of the impact of each feature has on the model output. The color represents the feature value where their value goes from low to high from blue to red. As expected, the binary variable 'is_bms' has an inverse effect. When the binary value is set to 1, it reduces the energy consumption, while a variable, such as temperature, has a positive impact towards the energy consumption. This indicates that the model has properly captured the impact of the ECM project based on the feature value. Waterfall graphs shown in Figure 7 are used to analyze the value contribution towards target prediction based on each feature of a given data point. Red color bars indicate an increase while blue color bars indicate a decrease in energy consumption and the length of the bar is proportional to the impact of the feature. When no-ECM is in place that indicates the value zero of the binary variable, the energy consumption increases while it decreases when the value is set to 1. According to the plots, they further verify the model behavior in the present and the absence of the ECM project.  Waterfall graphs shown in Figure 7 are used to analyze the value contribution towards target prediction based on each feature of a given data point. Red color bars indicate an increase while blue color bars indicate a decrease in energy consumption and the length of the bar is proportional to the impact of the feature. When no-ECM is in place that indicates the value zero of the binary variable, the energy consumption increases while it decreases when the value is set to 1. According to the plots, they further verify the model behavior in the present and the absence of the ECM project.

Buildings with LED Retrofit ECM
LED retrofit ECM shows consistent energy savings compared to the BMS upgrade. The consistent cross-validated CV(RMSE) results across different pre-retrofit periods shown in Figure 8 are within the acceptable levels of IPMVP guidelines and further validate the consistent impact of LED retrofit ECM. The energy consumption patterns of buildings varied over time and the impact of the BMS upgrade was not consistent across time. This behavior is already observed in our dataset. However, the LED retrofit ECM project has a consistent impact on energy savings. Table 4 shows the percentage of savings for the LED retrofit ECM project that contains only 3 months of data.
Waterfall graphs shown in Figure 7 are used to analyze the value contribution towards target prediction based on each feature of a given data point. Red color bars indicate an increase while blue color bars indicate a decrease in energy consumption and the length of the bar is proportional to the impact of the feature. When no-ECM is in place that indicates the value zero of the binary variable, the energy consumption increases while it decreases when the value is set to 1. According to the plots, they further verify the model behavior in the present and the absence of the ECM project.

Buildings with LED Retrofit ECM
LED retrofit ECM shows consistent energy savings compared to the BMS upgrade. The consistent cross-validated CV(RMSE) results across different pre-retrofit periods shown in Figure 8 are within the acceptable levels of IPMVP guidelines and further validate the consistent impact of LED retrofit ECM. The energy consumption patterns of buildings varied over time and the impact of the BMS upgrade was not consistent across time. This behavior is already observed in our dataset. However, the LED retrofit ECM project has a consistent impact on energy savings. Table 4 shows the percentage of savings for the LED retrofit ECM project that contains only 3 months of data.   LED ECM saving percentage results in Table 4 show consistent behavior across different selections of pre-retrofit data periods. This indicates that the proposed AI approach identifies consistent energy-saving patterns from the LED retrofit project. Figure 9 shows a consistent gap between the energy consumption of building B16 in the presence and the absence of the LED retrofit project.   LED ECM saving percentage results in Table 4 show consistent behavior across different selections of pre-retrofit data periods. This indicates that the proposed AI approach identifies consistent energy-saving patterns from the LED retrofit project. Figure 9 shows a consistent gap between the energy consumption of building B16 in the presence and the absence of the LED retrofit project. LED ECM saving percentage results in Table 4 show consistent behavior across different selections of pre-retrofit data periods. This indicates that the proposed AI approach identifies consistent energy-saving patterns from the LED retrofit project. Figure 9 shows a consistent gap between the energy consumption of building B16 in the presence and the absence of the LED retrofit project. The explainability plots in Figures 10 and 11, indicate a similar behavior as in BMS upgrade ECM on the introduced 'is_led' binary feature to represent the LED retrofit. The explainability plots in Figures 10 and 11, indicate a similar behavior as in BMS upgrade ECM on the introduced 'is_led' binary feature to represent the LED retrofit.

Buildings with BMS Upgrade and LED Retrofit ECM
In this dataset, there are several buildings that were subjected to both BMS upgrade and LED retrofit ECM projects. The performance results of the models trained using the post-retrofit data till the COVID-19 shutdown are shown in Figure 12, which is also within the acceptable level under the standard guidelines. Results of Table 5 were obtained from the buildings that are subjected to both ECM projects. We have evaluated the savings per-

Buildings with BMS Upgrade and LED Retrofit ECM
In this dataset, there are several buildings that were subjected to both BMS upgrade and LED retrofit ECM projects. The performance results of the models trained using the post-retrofit data till the COVID-19 shutdown are shown in Figure 12, which is also within the acceptable level under the standard guidelines. Results of Table 5 were obtained from the buildings that are subjected to both ECM projects. We have evaluated the savings per- Figure 11. SHAP Waterfall plots of building B26 (a) Energy consumption prediction assuming LED retrofit ECM present (b) Energy consumption prediction assuming LED retrofit ECM absent.

Buildings with BMS Upgrade and LED Retrofit ECM
In this dataset, there are several buildings that were subjected to both BMS upgrade and LED retrofit ECM projects. The performance results of the models trained using the post-retrofit data till the COVID-19 shutdown are shown in Figure 12, which is also within the acceptable level under the standard guidelines. Results of Table 5 were obtained from the buildings that are subjected to both ECM projects. We have evaluated the savings percentage using 3, 6, 9, and 12 months of data prior to the BMS upgrade project.     We further validated our results with those from the ECM solution provider based on 3000 operational hours, 57.69 per week for buildings B15 and B16. According to the results shown in Figure 13, the estimated savings from our method is similar to the total estimated savings data provided by the vendor. Our method calculates the savings with building operation characteristics, which is more aligned with the proper M&V savings calculation approach.  We further validated our results with those from the ECM solution provider based on 3000 operational hours, 57.69 per week for buildings B15 and B16. According to the results shown in Figure 13, the estimated savings from our method is similar to the total estimated savings data provided by the vendor. Our method calculates the savings with building operation characteristics, which is more aligned with the proper M&V savings calculation approach.
When a building contains two ECM installations, the introduced two binary variables (is_led, is_bms) show two levels of impact. For building B9, the binary variable, which reflects the LED-retrofit has more impact compared to the binary variable, which reflects the BMS upgrade. We have performed a manual evaluation of the energy consumption behavior after the BMS upgrade and LED retrofit. Our analysis confirmed that the impact of LED retrofit in building B9 has a greater impact compared to the BMS upgrade. Interestingly, when two ECMs projects present in building B9 have a clear inverse impact on both binary variables, which can be observed by the symmetrical separation of colors in the (b) plot of Figure 14. However, the impact of the 'is_bms' variable on building B16, which is shown in (c) plot of Figure 14 indicates both positive and negative correlations towards target predictions, while the 'is_led' binary feature has a complete inverse relationship. This could indicate that the impact of BMS upgrade degrades over time, while LED retrofit has a complete inverse correlation for energy consumption, which is a permanent event. Figure 15 shows the impact of the introduced features on the model prediction for selected data points.   We further validated our results with those from the ECM solution provider based on 3000 operational hours, 57.69 per week for buildings B15 and B16. According to the results shown in Figure 13, the estimated savings from our method is similar to the total estimated savings data provided by the vendor. Our method calculates the savings with building operation characteristics, which is more aligned with the proper M&V savings calculation approach.  When a building contains two ECM installations, the introduced two binary variables (is_led, is_bms) show two levels of impact. For building B9, the binary variable, which reflects the LED-retrofit has more impact compared to the binary variable, which reflects the BMS upgrade. We have performed a manual evaluation of the energy consumption behavior after the BMS upgrade and LED retrofit. Our analysis confirmed that the impact of LED retrofit in building B9 has a greater impact compared to the BMS upgrade. Interestingly, when two ECMs projects present in building B9 have a clear inverse impact on both binary variables, which can be observed by the symmetrical separation of colors in the (b) plot of Figure 14. However, the impact of the 'is_bms' variable on building B16, which is shown in (c) plot of Figure 14 indicates both positive and negative correlations towards target predictions, while the 'is_led' binary feature has a complete inverse relationship. This could indicate that the impact of BMS upgrade degrades over time, while LED retrofit has a complete inverse correlation for energy consumption, which is a permanent event. Figure 15 shows the impact of the introduced features on the model prediction for selected data points.

Impact of ECM Installation Date
Here, we evaluate the selection of the ECM installation date and its impact on the annual savings. Figure 16 shows the impact of percentage savings by selecting 7 days before and after the ECM completion date for the BMS upgrade ECM project. According to When a building contains two ECM installations, the introduced two binary variables (is_led, is_bms) show two levels of impact. For building B9, the binary variable, which reflects the LED-retrofit has more impact compared to the binary variable, which reflects the BMS upgrade. We have performed a manual evaluation of the energy consumption behavior after the BMS upgrade and LED retrofit. Our analysis confirmed that the impact of LED retrofit in building B9 has a greater impact compared to the BMS upgrade. Interestingly, when two ECMs projects present in building B9 have a clear inverse impact on both binary variables, which can be observed by the symmetrical separation of colors in the (b) plot of Figure 14. However, the impact of the 'is_bms' variable on building B16, which is shown in (c) plot of Figure 14 indicates both positive and negative correlations towards target predictions, while the 'is_led' binary feature has a complete inverse relationship. This could indicate that the impact of BMS upgrade degrades over time, while LED retrofit has a complete inverse correlation for energy consumption, which is a permanent event. Figure 15 shows the impact of the introduced features on the model prediction for selected data points.

Impact of ECM Installation Date
Here, we evaluate the selection of the ECM installation date and its impact on the annual savings. Figure 16 shows the impact of percentage savings by selecting 7 days before and after the ECM completion date for the BMS upgrade ECM project. According to Figure 15. SHAP Waterfall plots of building B9 (a) Energy consumption prediction assuming LED retrofit ECM present (b) Energy consumption prediction assuming LED retrofit ECM absent.

Impact of ECM Installation Date
Here, we evaluate the selection of the ECM installation date and its impact on the annual savings. Figure 16 shows the impact of percentage savings by selecting 7 days before and after the ECM completion date for the BMS upgrade ECM project. According to Figure 16, several buildings follow a similar decreasing function, but the savings percentage remains mostly consistent across many buildings. The reason for this behavior is that the predictive model observes the reduction of energy savings differently, which changes the impact of the binary feature towards the prediction. Peak percentage savings were observed mostly near the ECM project completion date, which accurately represents the behavior of the building before and after the ECM project.  Figure 16, several buildings follow a similar decreasing function, but the savings percentage remains mostly consistent across many buildings. The reason for this behavior is that the predictive model observes the reduction of energy savings differently, which changes the impact of the binary feature towards the prediction. Peak percentage savings were observed mostly near the ECM project completion date, which accurately represents the behavior of the building before and after the ECM project.

Conclusions
In this paper, we have proposed a robust and explainable AI approach for M&V protocols of energy-efficient infrastructure, leading up to a net zero carbon emissions strategy. The proposed approach is novel in its use of small volumes of energy consumption data streams to build effective predictive models for energy savings estimation, as well as its interpretation of the predictive outcome in terms of the variables of the dataset using XAI techniques. The approach consists of five layers: input layer, data lake layer, AI layer, ECM quantification layer, and the explainability layer. We have empirically evaluated the proposed approach on a large real-world dataset of energy consumption data and ECM projects within the multi-campus multi-building setting of La Trobe University. The results representing robustness and explainability of the predictive output and savings estimation confirm the validity and practical value of this AI approach for M&V of ECM projects. As future work, the following limitations of the framework should be addressed; the short time interval for ECM monitoring could contain biased consumption data relating to an event of significance or an outlier, this needs to be factored in as the time interval is incrementally expanded, and the prediction horizon and prediction uncertainty should be incorporated into the model development as further parameters to be finetuned. Experimentation with sophisticated AI models for prediction and further evaluation of the proposed framework across diverse ECM projects in different organizational settings are also recommended. In conclusion, drawing on the proposed framework and its demonstration, it is now timely for policymakers and regulators of energy markets and energy operations to consider the formal adoption of the robust and explainable capabilities of AI for measurement and verification of energy efficient infrastructure for net zero carbon emissions.

Conclusions
In this paper, we have proposed a robust and explainable AI approach for M&V protocols of energy-efficient infrastructure, leading up to a net zero carbon emissions strategy. The proposed approach is novel in its use of small volumes of energy consumption data streams to build effective predictive models for energy savings estimation, as well as its interpretation of the predictive outcome in terms of the variables of the dataset using XAI techniques. The approach consists of five layers: input layer, data lake layer, AI layer, ECM quantification layer, and the explainability layer. We have empirically evaluated the proposed approach on a large real-world dataset of energy consumption data and ECM projects within the multi-campus multi-building setting of La Trobe University. The results representing robustness and explainability of the predictive output and savings estimation confirm the validity and practical value of this AI approach for M&V of ECM projects. As future work, the following limitations of the framework should be addressed; the short time interval for ECM monitoring could contain biased consumption data relating to an event of significance or an outlier, this needs to be factored in as the time interval is incrementally expanded, and the prediction horizon and prediction uncertainty should be incorporated into the model development as further parameters to be finetuned. Experimentation with sophisticated AI models for prediction and further evaluation of the proposed framework across diverse ECM projects in different organizational settings are also recommended. In conclusion, drawing on the proposed framework and its demonstration, it is now timely for policymakers and regulators of energy markets and energy operations to consider the formal adoption of the robust and explainable capabilities of AI for measurement and verification of energy efficient infrastructure for net zero carbon emissions.