Machine Learning-Based Ensemble Classifiers for Anomaly Handling in Smart Home Energy Consumption Data

Addressing data anomalies (e.g., garbage data, outliers, redundant data, and missing data) plays a vital role in performing accurate analytics (billing, forecasting, load profiling, etc.) on smart homes’ energy consumption data. From the literature, it has been identified that the data imputation with machine learning (ML)-based single-classifier approaches are used to address data quality issues. However, these approaches are not effective to address the hidden issues of smart home energy consumption data due to the presence of a variety of anomalies. Hence, this paper proposes ML-based ensemble classifiers using random forest (RF), support vector machine (SVM), decision tree (DT), naive Bayes, K-nearest neighbor, and neural networks to handle all the possible anomalies in smart home energy consumption data. The proposed approach initially identifies all anomalies and removes them, and then imputes this removed/missing information. The entire implementation consists of four parts. Part 1 presents anomaly detection and removal, part 2 presents data imputation, part 3 presents single-classifier approaches, and part 4 presents ensemble classifiers approaches. To assess the classifiers’ performance, various metrics, namely, accuracy, precision, recall/sensitivity, specificity, and F1 score are computed. From these metrics, it is identified that the ensemble classifier “RF+SVM+DT” has shown superior performance over the conventional single classifiers as well the other ensemble classifiers for anomaly handling.


Introduction
Considering the global thrust towards the development of grid-independent and green energy systems for addressing the unrelenting growth of loads as well as environmental pollution, smart home and renewable energy-based microgrid culture has been increasing worldwide. Smart cities are new-era establishments where all the smart homes are jointly operated to consolidate and optimize electricity utilization. As these establishments are realized with a combination of electrical, communication, and information technology, the gathering of quality data is a challenging task. Smart homes connected to the power network continuously generate huge volumes of energy consumption data, which is normally a combination of timestamps and readings. The reading information in this data is a key value that helps in understanding the energy consumption behavior, billing generation, load profiling, forecasting, contingency analysis, device health condition analysis, etc. All these operations rely upon the quality of the data being captured. However, this data often may consist of different anomalies, viz., garbage data, outliers, redundant data, and missing data due to malfunctioning of advanced metering infrastructure, failure of communication channels, unanticipated issues in power networks, etc. If these anomalies are a few imputation methods such as data splitting, fuzzy inductive reasoning, denoising autoencoder, and bagging are used. Further, to evaluate their performance, various single classifiers, namely, SVM, neural networks, etc., are used. However, these approaches are found ineffective to address the hidden issues of smart home energy consumption data due to the presence of a variety of anomalies such as garbage data, outlier data, redundant data, missing data, etc.
On the other hand, in recent days, the ensemble classification approach is supporting effective classification in data imputation in different applications, which was not tried for the smart home energy consumption data. With this motivation, this paper proposes ML-based ensemble classifiers to handle all the possible anomalies in smart home energy consumption data. The major contributions of this paper are summarized as follows: The proposed approach initially identifies all anomalies and removes them, and then imputes this information. The entire implementation consists of four parts.
-Part 1 (anomaly detection and removal) considers the original dataset and refines it by removing all the identified anomalies. -Part 2 (data imputation) considers this refined dataset and performs the missing data imputation using median, KNN, and bagging imputation methods, thereby producing an anomaly-free dataset. To assess the classifiers' performance, various metrics, namely, accuracy, precision, recall/sensitivity, specificity, and F1 score are computed. From these metrics, it is identified that the ensemble classifier "RF+SVM+DT" has shown superior performance over the conventional single classifiers as well the other ensemble classifiers for anomaly handling in smart home energy consumption data.
All these contributions are structured in the paper as follows. Section 2 presents the description of the dataset. Section 3 presents the description and implementation of the proposed approach. Section 4 presents simulation results and their discussion. Finally, Section 5 concludes the outcomes of the paper in a synopsized way.

Description of Dataset
To implement the proposed approach, the data of an appliance (refrigerator) from the Tracebase dataset [33] is considered. This dataset consists of 43 different appliances with 158 device IDs that are connected to various smart homes/buildings. Each appliance consists of CSV files that represent the energy consumption data of a day. A detailed description of this dataset can be obtained from [34]. Further, this dataset was considered and used in various literary works. The Tracebase dataset was used in the extensive study of different non-intrusive load monitoring (NILM) power consumption datasets described in [35][36][37]. The present and the future directions for energy management techniques using NILM datasets are discussed in [38].
The CSV file (dev_98C08A_2011.09.17.csv) data of the refrigerator appliance is prepared with the columns such as CAPTURED_DATE, CAPTURED_HOUR, CAPTURED_MINUTE, CAPTURED_SECOND, and CAPTURED_READING for implementing the proposed ensemble classifier approach.

Description and Implementation of the Proposed Approach
The conceptual model of the proposed approach is shown in Figure 1. It consists of four parts, viz., Part 1, Part 2, Part 3, and Part 4. The smart home energy consumption dataset will be given as input to Part 1. In Part 1, an analysis of the missing data will be carried out for understanding the missingness in the original dataset. Further, the identification and removal of different anomalies (viz., garbage data, outliers in the data, and redundant data) will be performed. From this, a dataset with the abovementioned anomalies removed will be produced and given as input to Part 2. In Part 2, the imputation of missing data will be completed. In Part 3, a single-classifier approach will be applied. This will provide a recommendation of the best single-classifier approach as the output. By taking this best single-classifier as the basis, the ensemble classifiers approach will be applied in Part 4. This will provide a recommendation of the best ensemble classifier to perform the imputation.

Description and Implementation of the Proposed Approach
The conceptual model of the proposed approach is shown in Figure 1. It consists of four parts, viz., Part 1, Part 2, Part 3, and Part 4. The smart home energy consumption dataset will be given as input to Part 1. In Part 1, an analysis of the missing data will be carried out for understanding the missingness in the original dataset. Further, the identification and removal of different anomalies (viz., garbage data, outliers in the data, and redundant data) will be performed. From this, a dataset with the abovementioned anomalies removed will be produced and given as input to Part 2. In Part 2, the imputation of missing data will be completed. In Part 3, a single-classifier approach will be applied. This will provide a recommendation of the best single-classifier approach as the output. By taking this best single-classifier as the basis, the ensemble classifiers approach will be applied in Part 4. This will provide a recommendation of the best ensemble classifier to perform the imputation. The implementation flow of the proposed ensemble classifiers approach through all the proposed parts is shown in Figure 2. The detailed description and implementation processes are discussed in Sections 3.1-3.4 respectively for Part 1, Part 2, Part 3, and Part 4. The implementation flow of the proposed ensemble classifiers approach through all the proposed parts is shown in Figure 2. The detailed description and implementation processes are discussed in Sections 3.1-3.4 respectively for Part 1, Part 2, Part 3, and Part 4.

Implementation of Part 1 (Anomaly Detection and Removal)
The process starts from Part 1 by reading the smart home energy consumption dataset and saving it in an object "shec_dat". Initially, the missing data information in this "shec_dat" is analyzed [6]. Further, the process is continued with the identification of garbage data in the dataset. To identify the garbage data i.e., the data other than the numerical data, a function grepl("[[:digit:]] is used on each column of the dataset. If garbage data exists, those records are removed and the remaining data are given as input to identify the outliers data. If there is no garbage data, the existing dataset is used as it is and is given as input to identify the outliers data. The outliers data is the data that does not exist within the expected range. To identify outliers data, a boxplot analysis is applied to the data obtained after removing the garbage data. The boxplot analysis is a standardized approach to showing data distribution in a five-number summary (i.e., minimum, first quartile, median, third quartile, and maximum). The data that lies in between the "minimum" and "maximum" values are considered as the data within the range and useful for the analysis. The data that lies below the "minimum" and above the "maximum" values are considered outliers data and needs to be removed to achieve better analytics. The function boxplot() is applied to the readings column by using boxplot(shec_dat$CAPTURED_READING, plot=F)$out. If the outliers exist in the readings column, those records are removed and the remaining data are given as input to identify the redundant data. If there are no outliers, the existing dataset is used as it is to identify the redundant data. In general, redundant data refers to the duplication of the entire record in the dataset. However, in this case, there exist two types of redundant data in the dataset. They are the records with the same timestamp and same reading information, and records with the same timestamp and different reading information. The detailed process of identifying these types of redundant data is discussed in [39]. If the abovementioned types of redundant data exist, those records are removed. If there are no redundant data, the existing data is used as it is to perform the next step. At the end of Part 1, a dataset is obtained after removing all the anomalies (garbage data, outliers data, and redundant data). As several records in the dataset are removed due to the existence of different anomalies, this dataset consists of missing timestamps. Hence, these missing timestamps are filled, and the respective reading information is set to "NA (Not Available)" [8] before proceeding to the implementation of Part 2.

Implementation of Part 1 (Anomaly Detection and Removal)
The process starts from Part 1 by reading the smart home energy consumption dataset and saving it in an object "shec_dat". Initially, the missing data information in this "shec_dat" is analyzed [6]. Further, the process is continued with the identification of garbage data in the dataset. To identify the garbage data i.e., the data other than the numerical data, a function grepl("[ [:digit:]] is used on each column of the dataset. If garbage data exists, those records are removed and the remaining data are given as input to identify the outliers data. If there is no garbage data, the existing dataset is used as it is and is given as input to identify the outliers data. The outliers data is the data that does not exist within the expected range. To identify outliers data, a boxplot analysis is applied to the data obtained after removing the garbage data. The boxplot analysis is a standardized approach to showing data distribution in a five-number summary (i.e., minimum, first quartile, median, third quartile, and maximum). The data that lies in between the "minimum" and "maximum" values are considered as the data within the range and useful for the analysis. The data that lies below the "minimum" and above the "maximum" values are considered outliers data and needs to be removed to achieve better analytics. The function boxplot() is applied to the readings column by using boxplot(shec_dat$CAPTURED_READING, plot=F)$out. If the outliers exist in the readings column, those records are removed and the remaining data are given as input to identify the redundant data. If there are no outliers, the existing dataset is used as it is to identify the redundant data. In general, redundant data refers to the duplication of the entire record in the dataset. However, in this case, there exist two types of redundant data in the dataset. They are the records with the same

Implementation of Part 2 (Data Imputation)
Once all the missing records are finalized in the dataset obtained after removing all the anomalies, the imputation methods such as median imputation, KNN imputation, and bagging imputation are applied. The implementation of these imputation methods produces datasets with imputed reading values. Further, the single-classifier approach is applied to these imputed datasets. The implementation of the median, KNN, and bagging imputation methods is discussed in Sections 3.2.1-3.2.3 respectively.

Implementation of the Median Imputation Method
In the median imputation method, the median value of the reading information in the CAPTURED_READING column is calculated, and that value is used for imputing the missing reading information. This imputation method is simple and fast. The process of calculating the median value starts with the ordering of readings information in ascending order. Once the ordering of readings information is done, then the number of values (odd or even) in the CAPTURED_READING is taken into consideration. Here, the number of values plays a major role in calculating the median value of the readings information. The formula for calculating the median value is given in Equation (1).
where D = list of values ordered in the CAPTURED_READING column, and s = number of values in the CAPTURED_READING column.
If the number of values in the CAPTURED_READING column is odd, then the middle value is considered as the median. If the number of values is even in the CAP-TURED_READING column, then the average of the middle two values is considered as the median.

Implementation of the KNN Imputation Method
In the KNN imputation method, the distance between the k-nearest neighbor values is calculated by using the Euclidean distance metric. In the CAPTURED_READING column, the distance between the k-closest samples of the readings is calculated and that distance value is used to impute the missing reading information. The formula for calculating Euclidean distance is given in Equation (2).
where dist = Euclidean distance, m = number of points, and p i & q i are the points.

Implementation of the Bagging Imputation Method
In the bagging imputation method, the term 'bagging' refers to bootstrap aggregation. The bootstrap is a statistical technique of iteratively resampling the data with replacement in the dataset. To perform this, initially, the number of bootstrap samples is to be fixed, and then the sample size. For each sample of bootstrap the following steps are performed: draw the sample with replacement, fit the model, anticipate the performance of the model based on the out-of-bag sample, and calculate the average of the sample of the model. The multiple iterations of sampling improve the prediction performance of the model. The bagging method fits a bagged tree. This method is simple, powerful, and accurate to impute the missing values in the readings information. However, it is computationally high-cost.

Implementation of Part 3 (Single-Classifier Approach)
In this section, the single-classifier approach is performed using various classifiers, viz., RF, SVM, DT, NB, KNN, and NNET, for the classification. All these classifiers are implemented individually on the dataset. To implement these, the dataset is divided into train_set and test_set. These classifiers are trained on the train_set using k-fold crossvalidation. Here, the k-value considered is 10. Further, these classifiers are applied to the test_set to predict the classes Yes (Y) or No (N). Here, class 'Y' represents missing data, and class 'N' represents non-missing data. After the implementation, the performance metrics such as accuracy, precision, recall/sensitivity, specificity, and F1 score are computed using a confusion matrix to evaluate each classifier's performance. The confusion matrix is shown in Figure 3 and the formulae for computing the performance metrics are given in Equations (3)- (7).
Recall/Sensitivity = T.Pos. T.Pos. + F.Neg. (5) In this section, the single-classifier approach is performed using vari viz., RF, SVM, DT, NB, KNN, and NNET, for the classification. All these implemented individually on the dataset. To implement these, the dataset train_set and test_set. These classifiers are trained on the train_set using kidation. Here, the k-value considered is 10. Further, these classifiers are test_set to predict the classes Yes (Y) or No (N). Here, class 'Y' represents and class 'N' represents non-missing data. After the implementation, the metrics such as accuracy, precision, recall/sensitivity, specificity, and F1 s puted using a confusion matrix to evaluate each classifier's performance. matrix is shown in Figure 3 and the formulae for computing the performan given in Equations (3)-(7).

T Pos T Neg Accuracy T Pos T Neg F Pos F Neg
. .

T Pos Precision T Pos F Pos
. .

T Neg Specificity T Neg F Pos
If all the single classifiers are implemented and their performance is verified, then, the best single classifier is recommended. Otherwise, the performance metrics are re-verified.

Implementation of Part 4 (Ensemble Classifiers Approach)
This section uses the best single classifier recommended in Part3 as the input to develop ensemble classifiers. The ensemble of classifiers is performed using the "stacking" method. In stacking, there are two layers called the top layer and the bottom layer. The top layer consists of a classifier, which is referred to as a base classifier and the bottom layer consists of other classifiers. The output of the bottom layer is given as input to the top layer. The classifier used in the top layer is an ensemble with the output of the bottom layer classifiers, which produces an ensemble classifier. The stacking of classifiers is shown in Figure 4. From this figure, it is seen that the single classifiers used in the bottom layer are an ensemble with the recommended best classifier used in the top layer. For example, the single classifiers SVM and DT are part of the ensemble with the recommended best classifier RF. Similarly, all the other single classifiers form an ensemble with RF and produce ensemble classifiers. To implement these ensemble classifiers, the imputed datasets are given as input. Further, each imputed dataset is divided into train_set and test_set. The ensemble classifiers are trained on the train_set using k-fold cross-validation. Here, the k-value considered is 10. Further, these ensemble classifiers are applied to test_set to predict the classes Y or N. After the implementation, the performance metrics such as accuracy, precision, recall/sensitivity, specificity, and F1 score are computed using a confusion matrix to evaluate each ensemble classifier's performance. If all the ensemble classifiers are implemented and their performance is verified then the best ensemble classifier for the imputation is recommended, otherwise the performance metrics are re-verified.
dation. Here, the k-value considered is 10. Further, these ensemble classifiers are applied to test_set to predict the classes Y or N. After the implementation, the performance metrics such as accuracy, precision, recall/sensitivity, specificity, and F1 score are computed using a confusion matrix to evaluate each ensemble classifier's performance. If all the ensemble classifiers are implemented and their performance is verified then the best ensemble classifier for the imputation is recommended, otherwise the performance metrics are re-verified.

Simulation Results and Discussion
In keeping with the aims of the paper, the simulation results of the implementation are presented in three subsections. Sections 4.1-4.3 present the results corresponding to anomaly detection and removal, single-classifier approach, and ensemble classifiers approach, respectively.

Results Corresponding to Anomaly Detection and Removal
This section presents the details of the missing data in the original CSV file (original dataset) and the missing data in the dataset after eliminating the anomalies. The number of records in this original dataset is 155,374. During the analysis of missing data, 700 records are missed in the original dataset [7]. During the identification of garbage data, no garbage data (other than numerical data) are identified in the original CSV file. Hence, no records are removed and the same number of records (155,374) are available. During the

Simulation Results and Discussion
In keeping with the aims of the paper, the simulation results of the implementation are presented in three subsections. Sections 4.1-4.3 present the results corresponding to anomaly detection and removal, single-classifier approach, and ensemble classifiers approach, respectively.

Results Corresponding to Anomaly Detection and Removal
This section presents the details of the missing data in the original CSV file (original dataset) and the missing data in the dataset after eliminating the anomalies. The number of records in this original dataset is 155,374. During the analysis of missing data, 700 records are missed in the original dataset [7]. During the identification of garbage data, no garbage data (other than numerical data) are identified in the original CSV file. Hence, no records are removed and the same number of records (155,374) are available. During the identification of outliers data, there are 25 readings identified as outliers and the respective records are removed from the dataset. The removal of records with outliers left the dataset with 155,349 records. During the identification of redundant data, the records with the same timestamp and same reading are identified and those records are removed from the dataset. This removal left the dataset with 98,779 records. Further, the records with the same timestamp and different readings are identified and those records are removed from the dataset. This removal left the dataset with 72,597 records. Once the redundant data are removed, the missing data are filled with the respective timestamps and the respective reading with NA value, as shown in Figure 5 (all the highlighted rows). After this filling, there are 86,400 records in the dataset, out of these, 13,803 records contain missing readings.
The proportions of the available data and missing data in the original dataset and the dataset available after removing anomalies are shown in Figure 6a-c. These figures show the proportion of the missing data and available data in the considered dataset in three different scenarios, namely, (i) consideration of the original dataset, (ii) consideration of the dataset that is obtained after removing the anomalies, and (iii) consideration of the dataset after filling the missing timestamps, ready for the imputation.
From Figure 6a, it is understood that the proportion of available data is 99.55% and missing data is 0.45% in the original dataset. From Figure 6b, it is seen that the proportion of available data is 84% and missing data is 16% in all columns of the dataset obtained after removing anomalies. From Figure 6c, it is evident that there are no missing data in the columns CAPTURED_DATE, CAPTURED_HOUR, CAPTURED_MINUTE, CAP-TURED_SECOND and the proportion of data availability is 84%. Further, there are missing readings in the column CAPTURED_READING with a proportion of 16%. dataset with 155,349 records. During the identification of redundant data, the records with the same timestamp and same reading are identified and those records are removed from the dataset. This removal left the dataset with 98,779 records. Further, the records with the same timestamp and different readings are identified and those records are removed from the dataset. This removal left the dataset with 72,597 records. Once the redundant data are removed, the missing data are filled with the respective timestamps and the respective reading with NA value, as shown in Figure 5 (all the highlighted rows). After this filling, there are 86,400 records in the dataset, out of these, 13,803 records contain missing readings. The proportions of the available data and missing data in the original dataset and the dataset available after removing anomalies are shown in Figure 6a-c. These figures show the proportion of the missing data and available data in the considered dataset in three different scenarios, namely, (i) consideration of the original dataset, (ii) consideration of the dataset that is obtained after removing the anomalies, and (iii) consideration of the dataset after filling the missing timestamps, ready for the imputation.
From Figure 6a, it is understood that the proportion of available data is 99.55% and missing data is 0.45% in the original dataset. From Figure 6b, it is seen that the proportion of available data is 84% and missing data is 16% in all columns of the dataset obtained after removing anomalies. From Figure 6c, it is evident that there are no missing data in the columns CAPTURED_DATE, CAPTURED_HOUR, CAPTURED_MINUTE, CAP-TURED_SECOND and the proportion of data availability is 84%. Further, there are missing readings in the column CAPTURED_READING with a proportion of 16%.

Results Corresponding to the Single-Classifier Approach
This section presents the performance of a single-classifier approach on the imputed datasets. The performance of classifiers in the median, KNN, and bagging imputation methods are discussed in

Results Corresponding to the Single-Classifier Approach
This section presents the performance of a single-classifier approach on the imputed datasets. The performance of classifiers in the median, KNN, and bagging imputation methods are discussed in Sections 4.2.1-4.2.3, respectively.

Performance of the Single-Classifier Approach in the Median Imputation Method
The performance metrics of each classifier are shown in Figure 7, where the red colored bar(s) indicate the highest value achieved corresponding to that particular metric. From this, the highest accuracy value of 98.1% is observed in RF, while the lowest accuracy value of 76.3% is observed in KNN, as shown in Figure 7a. The highest precision value of 99% is observed in RF, while the lowest precision value of 80.5% is observed in SVM and NB, as shown in Figure 7b. The highest recall value of 100% is observed in SVM and NB, while the lowest recall value of 87.9% is observed in KNN, as shown in Figure 7c. The highest specificity value of 95.9% is observed in RF, while the lowest specificity value of 0% is observed in SVM and NB, as shown in Figure 7d. The highest F1 Score value of 98.8% is observed in RF, while the lowest F1 Score value of 85.7% is observed in KNN, as shown in Figure 7e. From the subplots in Figure 7a-e, it is understood that the classifier RF has outperformed the others. Further, the performance summary of all the single classifiers is given in Table 1. NB, as shown in Figure 7b. The highest recall value of 100% is observed in SVM and NB, while the lowest recall value of 87.9% is observed in KNN, as shown in Figure 7c. The highest specificity value of 95.9% is observed in RF, while the lowest specificity value of 0% is observed in SVM and NB, as shown in Figure 7d. The highest F1 Score value of 98.8% is observed in RF, while the lowest F1 Score value of 85.7% is observed in KNN, as shown in Figure 7. From the subplots in Figure 7a-e, it is understood that the classifier RF has outperformed the others. Further, the performance summary of all the single classifiers is given in Table 1.

Performance of the Single-Classifier Approach in the KNN Imputation Method
The performance metrics of each classifier are shown in Figure 8, where the red colored bar(s) indicate the highest value achieved corresponding to that particular metric. From this, the highest accuracy value of 87.7% is observed in RF, while the lowest accuracy value of 68% is observed in NNET, as shown in Figure 8a. The highest precision value of 86.6% is observed in RF, while the lowest precision value of 80.3% is observed in NNET,  The performance metrics of each classifier are shown in Figure 8, where the red colored bar(s) indicate the highest value achieved corresponding to that particular metric. From this, the highest accuracy value of 87.7% is observed in RF, while the lowest accuracy value of 68% is observed in NNET, as shown in Figure 8a. The highest precision value of 86.6% is observed in RF, while the lowest precision value of 80.3% is observed in NNET, as shown in Figure 8b. The highest recall value of 100% is observed in RF, SVM, DT, NB, and KNN, while the lowest recall value of 79.9% is observed in NNET, as shown in Figure 8c. The highest specificity value of 36.3% is observed in RF, while the lowest specificity value of 0% is observed in SVM and KNN, as shown in Figure 8d. The highest F1 Score value of 92.8% is observed in RF, while the lowest F1 Score value of 80.1% is observed in NNET, as shown in Figure 8e. From the subplots in Figure 8a-e, it is understood that the classifier RF has outperformed the others. Further, the percentage summary of all classifiers is given in Table 2. as shown in Figure 8b. The highest recall value of 100% is observed in RF, SVM, DT, NB, and KNN, while the lowest recall value of 79.9% is observed in NNET, as shown in Figure  8c. The highest specificity value of 36.3% is observed in RF, while the lowest specificity value of 0% is observed in SVM and KNN, as shown in Figure 8d. The highest F1 Score value of 92.8% is observed in RF, while the lowest F1 Score value of 80.1% is observed in NNET, as shown in Figure 8e. From the subplots in Figure 8a-e, it is understood that the classifier RF has outperformed the others. Further, the percentage summary of all classifiers is given in Table 2.  The performance metrics of each classifier are shown in Figure 9, where the red colored bar(s) indicate the highest value achieved corresponding to that particular metric. From this, the highest accuracy value of 95.2% is observed in RF, while the lowest accuracy value of 75.7% is observed in NNET, as shown in Figure 9a. The highest precision value   The performance metrics of each classifier are shown in Figure 9, where the red colored bar(s) indicate the highest value achieved corresponding to that particular metric. From this, the highest accuracy value of 95.2% is observed in RF, while the lowest accuracy value of 75.7% is observed in NNET, as shown in Figure 9a. The highest precision value of 100% is observed in RF and DT, while the lowest precision value of 79.5% is observed in NNET, as shown in Figure 9b. The highest recall value of 100% is observed in SVM, and NB, while the lowest recall value of 84.3% is observed in DT, as shown in Figure 9c. The highest specificity value of 100% is observed in RF and DT, while the lowest specificity value of 0% is observed in SVM, NB, and NNET, as shown in Figure 9d. The highest F1 Score value of 96.9% is observed in RF, while the lowest F1 Score value 86.1% is observed in NNET, as shown in Figure 9e. From the subplots Figure 9a-e, it is understood that the classifier RF has outperformed the others. Further, the percentage summary of all classifiers is given in Table 3. in NNET, as shown in Figure 9b. The highest recall value of 100% is observed in SVM, and NB, while the lowest recall value of 84.3% is observed in DT, as shown in Figure 9c. The highest specificity value of 100% is observed in RF and DT, while the lowest specificity value of 0% is observed in SVM, NB, and NNET, as shown in Figure 9d. The highest F1 Score value of 96.9% is observed in RF, while the lowest F1 Score value 86.1% is observed in NNET, as shown in Figure 9e. From the subplots Figure 9a-e, it is understood that the classifier RF has outperformed the others. Further, the percentage summary of all classifiers is given in Table 3. Table 3. Performance comparison of the single-classifier approach on the bagging imputed dataset.

Results Corresponding to the Ensemble Classifiers Approach
This section presents the performance of the ensemble classifiers approaches on the imputed datasets. The performance of ensemble classifiers in the median, KNN, and bagging imputation methods are discussed in Sections 4.3.1-4.3.3 respectively.

Performance of the Ensemble Classifiers Approach in the Median Imputation Method
The performance metrics of each ensemble classifier are shown in Figure 10, where the red colored bar(s) indicate the highest value achieved corresponding to that particular metric. From this, the highest accuracy value of 98.9% is observed in RF+SVM+DT and RF+DT+NNET, while the lowest accuracy value of 72.5% is observed in RF+NB+KNN, as shown in Figure 10a. The highest precision value of 99.4% is observed in RF+SVM+NB, while the lowest precision value of 77.5% is observed in RF+SVM+KNN, as shown in Figure 10b.
The highest recall value of 100% is observed in RF+SVM+DT, RF+DT+NB, RF+DT+NNET, RF+NB+NNET, and RF+KNN+NNET, while the lowest recall value of 81.6% is observed in RF+SVM+NB, as shown in Figure 10c. The highest specificity value of 94.5% is observed in RF+SVM+DT, RF+DT+NB, RF+DT+NNET, and RF+KNN+NNET, while the lowest specificity value of 34.7% is observed in RF+NB+KNN, as shown in Figure 10d.
The highest F1 Score value of 99.3% is observed in RF+SVM+DT, RF+DT+NB, RF+DT+NNET, and RF+KNN+NNET, while the lowest F1 Score value of 82.2% is observed in RF+SVM+KNN and RF+NB+KNN, as shown in Figure 10e. From the subplots in Figure 10a-e, it is understood that the ensemble classifiers RF+SVM+DT and RF+DT+NNET have outperformed the others.
Further, the performance summary of all ensemble classifiers with respect to various parameters is given in Table 4. The highest F1 Score value of 99.3% is observed in RF+SVM+DT, RF+DT+NB, RF+DT+NNET, and RF+KNN+NNET, while the lowest F1 Score value of 82.2% is observed in RF+SVM+KNN and RF+NB+KNN, as shown in Figure 10e. From the subplots in Figure 10a-e, it is understood that the ensemble classifiers RF+SVM+DT and RF+DT+NNET have outperformed the others.
Further, the performance summary of all ensemble classifiers with respect to various parameters is given in Table 4

Performance of the Ensemble Classifiers Approach in the KNN Imputation Method
The performance metrics of each ensemble classifier are shown in Figure 11, where the red colored bar(s) indicate the highest value achieved corresponding to that particular metric. From this, the highest accuracy value of 80.2% is observed in RF+DT+KNN, while the lowest accuracy value of 70.9% is observed in RF+SVM+KNN, as shown in Figure 11a. The highest precision value of 99.3% is observed in RF+DT+KNN, while the lowest precision value of 81.3% is observed in RF+NB+NNET, as shown in Figure 11b.

Performance of the Ensemble Classifiers Approach in the KNN Imputation Method
The performance metrics of each ensemble classifier are shown in Figure 11, where the red colored bar(s) indicate the highest value achieved corresponding to that particular metric. From this, the highest accuracy value of 80.2% is observed in RF+DT+KNN, while the lowest accuracy value of 70.9% is observed in RF+SVM+KNN, as shown in Figure 11a. The highest precision value of 99.3% is observed in RF+DT+KNN, while the lowest precision value of 81.3% is observed in RF+NB+NNET, as shown in Figure 11b.    The highest recall value of 82.6% is observed in RF+NB+NNET, while the lowest recall value of 80.5% is observed in RF+SVM+DT, RF+SVM+NNET, as shown in Figure 11c. The highest specificity value of 43.6% is observed in RF+DT+NNET, while the lowest specificity value of 19.2% is observed in RF+SVM+NNET, as shown in Figure 11d. The highest F1 Score value of 89% is observed in RF+DT+KNN, while the lowest F1 Score value of 81.9% is observed in RF+NB+NNET, as shown in Figure 11e. From the subplots in Figure 11a-e, it is understood that the ensemble classifier RF+DT+KNN has outperformed the others. Further, the performance summary of all ensemble classifiers with respect to various parameters is given in Table 5. The performance metrics of each ensemble classifier are shown in Figure 12, where the red colored bar(s) indicate the highest value achieved corresponding to that particular metric. From this, the highest accuracy value of 89.6% is observed in RF+SVM+DT, while the lowest accuracy value of 71.2% is observed in RF+SVM+KNN, as shown in Figure 12a. The highest precision value of 98.8% is observed in RF+SVM+DT, while the lowest precision value of 86.5% is observed in RF+SVM+KNN, as shown in Figure 12b. The highest recall value of 89.4% is observed in RF+SVM+DT, while the lowest recall value of 78.8% is observed in RF+SVM+NNET, as shown in Figure 12c.
The highest specificity value of 91.1% is observed in RF+SVM+DT, while the lowest specificity value of 0.2% is observed in RF+SVM+NNET, as shown in Figure 12d. The highest F1 Score value of 93.9% is observed in RF+SVM+DT, while the lowest F1 Score value 82.8% is observed in RF+SVM+KNN, as shown in Figure 12e. From the subplots in Figure 12a-e, it is understood that the ensemble classifier RF+SVM+DT has outperformed the others.
Further, the performance summary of all ensemble classifiers with respect to various parameters is given in Table 6. the red colored bar(s) indicate the highest value achieved corresponding to that particular metric. From this, the highest accuracy value of 89.6% is observed in RF+SVM+DT, while the lowest accuracy value of 71.2% is observed in RF+SVM+KNN, as shown in Figure 12a. The highest precision value of 98.8% is observed in RF+SVM+DT, while the lowest precision value of 86.5% is observed in RF+SVM+KNN, as shown in Figure 12b. The highest recall value of 89.4% is observed in RF+SVM+DT, while the lowest recall value of 78.8% is observed in RF+SVM+NNET, as shown in Figure 12c.

Conclusions
This paper proposes a machine learning-based ensemble classifiers approach to address the anomalies present in smart homes' energy consumption data. This proposed approach has proven to be more effective than the conventional single-classifier approach that is presented in the literature. The salient observations from this work are summarized as follows: All the possible anomalies are successfully identified and removed from the dataset. The number of records in the original dataset is 155,374 and the number of records available in the refined dataset after removing anomalies is 86,400, which is the actual expected number of records as per the dataset description.
Out of 86,400 records, 13,803 records are identified as records with missing data. This missing data has been successfully imputed by using various imputation methods (median, KNN, and bagging). To assess the process of imputation, various conventional single-classifier approaches, as well as the proposed ensemble classifiers approaches, are implemented. From the computation of the performance metrics (accuracy, precision, recall/sensitivity, specificity, and F1 score), the RF classifier is identified as the superior single-classifier to all other single classifiers. Out of the proposed ensemble classifiers, "RF+SVM+DT" has shown superior performance over the conventionally best single classifier (RF) as well the other ensemble classifiers for imputing the missing reading information.
Thus, the proposed ensemble classifiers approach has successfully handled anomalies that exist in the smart home energy consumption data.

Impacts and Implications of the Work
The proposed work in this paper helps in data preprocessing by the cleansing of data, which is typically essential to carry out precise analytics, and thereby, take superior decisions for energy management in smart buildings. Furthermore, the outcome of this work helps as a ready reference to understand the irregularities of the live data captured in a smart building/home/grid application for better data analytics. This impacts one of the important objectives of "United Nations Sustainable Development Goals (UN SDGs)-SDG 7: Energy" in producing an anomaly-free dataset for providing several customer services.
In addition, the identification of different data anomalies, viz., missing data, outliers data, garbage data, and redundant data in the energy consumption dataset, may be applied to the malfunctioning of metering infrastructure, failure/glitches of communication channels, cyber-attacks, energy thefts, unanticipated situations in power networks, etc.

Conflicts of Interest:
The authors declare no conflict of interest.