Non-Intrusive Load Monitoring of Residential Water-Heating Circuit Using Ensemble Machine Learning Techniques

: The recent advancement in computational capabilities and deployment of smart meters have caused non-intrusive load monitoring to revive itself as one of the promising techniques of energy monitoring. Toward e ﬀ ective energy monitoring, this paper presents a non-invasive load inference approach assisted by feature selection and ensemble machine learning techniques. For evaluation and validation purposes of the proposed approach, one of the major residential load elements having solid potential toward energy e ﬃ ciency applications, i.e., water heating, is considered. Moreover, to realize the real-life deployment, digital simulations are carried out on low-sampling real-world load measurements: New Zealand GREEN Grid Database. For said purposes, MATLAB and Python (Scikit-Learn) are used as simulation tools. The employed learning models, i.e., standalone and ensemble, are trained on a single household’s load data and later tested rigorously on a set of diverse households’ load data, to validate the generalization capability of the employed models. This paper presents a comprehensive performance evaluation of the presented approach in the context of event detection, feature selection, and learning models. Based on the presented study and corresponding analysis of the results, it is concluded that the proposed approach generalizes well to the unseen testing data and yields promising results in terms of non-invasive load inference.


Introduction
Energy monitoring is considered an integral part of the future smart power grid system. With an increasing number of prosumers and microgrid systems, it is vital to monitor the energy consumption effectively and predict the consumption behavior for the long-term stability of a power grid. In this context, advanced metering infrastructure (AMI) plays a significant role by enabling the utilities not only to monitor the energy consumption of customers [1] but also to offer numerous incentive-based programs to consumers toward energy efficiency [2,3]. AMI is a closed loop where the feedback regarding energy consumption to consumers can be broadly classified into direct and indirect feedback. Direct feedback refers to real-time appliance/circuit level energy consumption information (segregated energy monitoring), while indirect feedback relates to monthly bills (aggregated energy monitoring) [4].

Motivation
Today the smart grid concept transforms the end-users from passive to active consumers, who can play a significant role in energy efficiency [5]. However, without direct feedback, it is unrealistic to expect consumers to play an effective role in a sustainable and efficient energy system [4]. As with direct feedback, consumers are not only able to monitor their electricity consumption effectively but also contribute to energy saving [4,6]. In this context, Martinez et al. [7] present a comprehensive review of more than 60 studies regarding feedback mechanism and concluded that direct feedback leads to more energy savings as opposed to indirect feedback. Therefore, towards energy saving and successful development of the smart grid system, effective energy monitoring at the segregated level, i.e., direct feedback, is inevitable. Segregated energy monitoring could not only contribute to the stability of the grid but also facilitate numerous real-world applications in the context of energy efficiency and conservation.

Literature Review
One of the techniques toward segregated energy monitoring is referred to as load disaggregation, also known as energy disaggregation [8] or power disaggregation [9]. Load disaggregation refers to a broad range of methodologies where the accumulated load profile is converted into a segregated one using numerous techniques. Mostly, it can be classified into two categories, namely hardware methods and software methods. The former is categorized into intrusive load monitoring (ILM) techniques and smart appliances. Hardware methods are relatively simple to deploy, however, not widely used because of constraints like scalability, reliability, interoperability, and high cost [10,11]. An alternative and attractive load disaggregation technique is a software method commonly referred to as non-intrusive load monitoring (NILM). The NILM process employs numerous pattern recognition techniques to estimate the individual appliance/circuit operation state within the aggregated load data, i.e., acquired from a single metering point [12]. Because of single-point measurements and its non-invasive nature, NILM not only provides a cost-effective segregated energy monitoring solution but also address consumers' privacy concerns [13]. The NILM methodologies can be grouped into two categories: event-based and eventless, in the context of working principles. Event-based NILM systems are computationally more efficient compared to the eventless approach, as for the latter, all the samples of the acquired load data are considered for inference [14]. An event-based NILM system comprises four building blocks, namely data acquisition, event detection, feature extraction, and load classification. Further details of the existing state of the art on NILM methodologies are presented in [15][16][17].
Data acquisition is a prerequisite of the NILM process that impacts the following stages in terms of the selection of tools/methodologies as well as the type/number of appliances to be accurately classified [6]. Numerous datasets have been collected at a different data granularity level and publicly released. Some of the NILM datasets are Reference Energy Disaggregation Dataset (REDD) [18], Building-Level fUlly-labeled dataset for Electricity Disaggregation (BLUED) [19], UK Domestic Appliance Level Electricity (UK-DALE) [20], GREEN Grid [21], and Pecan Street Inc. Dataport [22]. A recent trend revolves around high data granularity; consequently, most of the research is based on high sampling NILM systems [23]. In this context, Guillén-García et al. [24] acquired voltage and current measurements at 8 kHz of the sampling rate for electrical load identification using the C-means algorithm. De Baets et al. [25] employed two distinct publicly available datasets that include voltage/current measurements sampled at 30 kHz and 44 kHz respectively. Gupta et al. [26] proposed a single point sensing approach for household electrical event detection and classification, where the data acquisition system works in the range of 36-500 kHz. Moreover, Chang [27] proposed an approach based on the wavelet transform of the time-frequency domain where the data granularity is approximately 30 kHz. As high data granularity leads to transient features, consequently, it leads to the inference of a greater number of appliances with higher accuracy [6,15]. However, the said performance comes at a price of high cost and computational complexity due to the requirement of additional high-end measurement devices [28]. Moreover, on social grounds, high data granularity also Inventions 2020, 5, 57 3 of 20 raises concerns regarding consumers' privacy as their activities can be detected [29]. Most importantly, high data granularity is not compatible with the existing metering infrastructure.
Recent advancements in computational capabilities significantly aided the NILM classification methodologies. In this context, numerous techniques are adopted by the research community for the NILM process, which include but are not limited to dynamic time wrapping [28,30], optimization [12,31], machine learning [32][33][34][35][36], neural networks [25,37], and deep learning [38,39]. However, in the context of NILM, supervised machine-learning models are more frequently used as compared to other methodologies. For NILM classification, most of the existing research mainly focuses to employ the learning models in a standalone configuration, where some research work presents a comparative analysis of different independent learning models. For example, Azaza and Wallin [40] presented a comparative performance evaluation of five different machine learning models, where the presented study is based on a high data granularity of 30 kHz.
Based on the review of the existing NILM literature, it is observed that most of the research is based on high data granularity. However, the existing metering infrastructure, e.g., revenue meter, is generally not capable of high sampling data measurements, consequently, the high sampling NILM systems are not a viable option for the existing metering infrastructure. Furthermore, load classification in the NILM domain is mostly carried out using standalone machine learning models. However, in the machine learning domain, "one size fits all" is not a case, consequently, standalone machine learning models' performance varies from case to case. In this context, ensemble learning, i.e., combining different machine learning models to form a single optimal model, is a promising technique to balance the performance of different standalone models. However, it is noted that very little research has been done in terms of ensemble learning techniques in the context of NILM systems.

Contributions
To address the aforesaid limitations of the existing NILM literature, this research work proposes a low complexity and low data granularity based non-invasive load inference approach for the existing metering infrastructure. The proposed approach is assisted by ensemble learning techniques and only relies on mean power as an input variable. Moreover, to realize the real-world applications, the proposed approach is evaluated using one of the most significant and high-potential demand response residential load elements, i.e., water heating. Further, in the context of NILM, categorical key contributions of this research work are summarized as:

1.
To realize the real-world implementation, the proposed approach is, a.
Thoroughly evaluated on real-world load measurements acquired at low data granularity of 1/60 Hz, i.e., 1-min interval measurements; b.
Based on only a single input variable, i.e., mean power (in Watts).

2.
Event Detection: As an extension of our previously proposed event detection algorithm [41], a post-processing criterion is incorporated to further improve the event detection performance. The extracted results are validated using an extensive sensitivity analysis.

3.
Load Features: Four distinct load features are extracted for each detected event and further analyzed using correlation-based feature selection methodology to identify the most significant load features.

4.
Classification: To facilitate the classification performance, this research work introduces two diverse ensemble learning techniques, based on a combination of machine learning and artificial neural network models, in the context of the NILM domain and comprehensive performance evaluation and comparative analysis are presented.

5.
A brief outlook in the context of real-world applications of the proposed approach is presented.
Overall, the proposed non-invasive inference approach for the residential water-heating circuit is based on low sampling real-world load measurements and assisted by improved event detection, Inventions 2020, 5, 57 4 of 20 feature selection, and ensemble learning techniques, aiming to facilitate the real-world deployment of NILM systems.
The rest of the paper is organized as follows: Section 2 presents the details of the system formulations in terms of the problem statement, methodologies, and performance evaluation criteria. Section 3 discusses the simulation studies carried out in this research work and the corresponding analysis of the extracted results. Section 4 presents a brief outlook of the proposed approach. Finally, Section 5 concludes this research paper.

System Formulation
This section describes the overall proposed system architecture presented in this paper, i.e., problem statement and research methodologies regarding data acquisition, event detection, feature extraction, and classification toward NILM-based load inference.

Problem Statement
At a single metering point, the monitored time-series aggregated power load profile can be weighed as an algebraic summation of m numbers of individual circuits' power load profile, as presented mathematically in (1).
Inventions 2020, 5, x FOR PEER REVIEW 4 of 20 feature selection, and ensemble learning techniques, aiming to facilitate the real-world deployment of NILM systems. The rest of the paper is organized as follows: Section 2 presents the details of the system formulations in terms of the problem statement, methodologies, and performance evaluation criteria. Section 3 discusses the simulation studies carried out in this research work and the corresponding analysis of the extracted results. Section 4 presents a brief outlook of the proposed approach. Finally, Section 5 concludes this research paper.

System Formulation
This section describes the overall proposed system architecture presented in this paper, i.e., problem statement and research methodologies regarding data acquisition, event detection, feature extraction, and classification toward NILM-based load inference.

Problem Statement
At a single metering point, the monitored time-series aggregated power load profile can be weighed as an algebraic summation of m numbers of individual circuits' power load profile, as presented mathematically in (1).
where Ƿ д (t) is the aggregated power load at the metering point at time instant t, Ƿ i (t) represents power load of ith circuit at time instant t, m represents the total numbers of individual circuits, and n(t) is the measurement noise. In the context of this research work, Ƿ д (t) can be redefined as shown in (2).
where Ƿ ϢϦ (t) refers to the power load profile of the water-heating circuit and Ƿ ᴎ (t) encompasses all other miscellaneous circuits' power load profiles that are not under consideration within the scope of this research work. Within the scope of this paper, the main task is to infer the operating status of the water-heating circuit with the only information of the main circuit, i.e., aggregated power load. Water heating is not only one of the major load elements in the residential sector [42][43][44] but is also a flexible/interruptible load element [45]. The said properties of the water-heating circuit make it a high potential load toward numerous real-world energy efficiency applications, e.g., demand response [44,46], power regulations [43], and peak shifting, and frequency response [47]. Consequently, noninvasive inference of water-heating circuit is of utmost importance in the context of real-world energy efficiency applications.

Methodology
An event-based low sampling NILM system, depicted in Figure 1, is employed in this research work. It is worth noting that within the scope of this research the presented methodology is employed for non-invasive inference of water-heating circuits, however, this can be further extended for the non-invasive inference of other load elements; depending on the availability of load disaggregation databases. Details of employed techniques at each stage/block presented in Figure 1 are explained below.
Inventions 2020, 5, x FOR PEER REVIEW 4 of 20 feature selection, and ensemble learning techniques, aiming to facilitate the real-world deployment of NILM systems. The rest of the paper is organized as follows: Section 2 presents the details of the system formulations in terms of the problem statement, methodologies, and performance evaluation criteria. Section 3 discusses the simulation studies carried out in this research work and the corresponding analysis of the extracted results. Section 4 presents a brief outlook of the proposed approach. Finally, Section 5 concludes this research paper.

System Formulation
This section describes the overall proposed system architecture presented in this paper, i.e., problem statement and research methodologies regarding data acquisition, event detection, feature extraction, and classification toward NILM-based load inference.

Problem Statement
At a single metering point, the monitored time-series aggregated power load profile can be weighed as an algebraic summation of m numbers of individual circuits' power load profile, as presented mathematically in (1).
where Ƿ д (t) is the aggregated power load at the metering point at time instant t, Ƿ i (t) represents power load of ith circuit at time instant t, m represents the total numbers of individual circuits, and n(t) is the measurement noise. In the context of this research work, Ƿ д (t) can be redefined as shown in (2).
where Ƿ ϢϦ (t) refers to the power load profile of the water-heating circuit and Ƿ ᴎ (t) encompasses all other miscellaneous circuits' power load profiles that are not under consideration within the scope of this research work. Within the scope of this paper, the main task is to infer the operating status of the water-heating circuit with the only information of the main circuit, i.e., aggregated power load. Water heating is not only one of the major load elements in the residential sector [42][43][44] but is also a flexible/interruptible load element [45]. The said properties of the water-heating circuit make it a high potential load toward numerous real-world energy efficiency applications, e.g., demand response [44,46], power regulations [43], and peak shifting, and frequency response [47]. Consequently, noninvasive inference of water-heating circuit is of utmost importance in the context of real-world energy efficiency applications.

Methodology
An event-based low sampling NILM system, depicted in Figure 1, is employed in this research work. It is worth noting that within the scope of this research the presented methodology is employed for non-invasive inference of water-heating circuits, however, this can be further extended for the non-invasive inference of other load elements; depending on the availability of load disaggregation databases. Details of employed techniques at each stage/block presented in Figure 1 are explained below.
where 0, 5, x FOR PEER REVIEW 4 of 20 ction, and ensemble learning techniques, aiming to facilitate the real-world deployment stems. st of the paper is organized as follows: Section 2 presents the details of the system s in terms of the problem statement, methodologies, and performance evaluation criteria. iscusses the simulation studies carried out in this research work and the corresponding the extracted results. Section 4 presents a brief outlook of the proposed approach. Finally, ncludes this research paper.
ormulation ection describes the overall proposed system architecture presented in this paper, i.e., tement and research methodologies regarding data acquisition, event detection, feature and classification toward NILM-based load inference.

Statement
ingle metering point, the monitored time-series aggregated power load profile can be an algebraic summation of m numbers of individual circuits' power load profile, as athematically in (1).
is the aggregated power load at the metering point at time instant t, Ƿ i (t) power load of ith circuit at time instant t, m represents the total numbers of individual d n(t) is the measurement noise. In the context of this research work, Ƿ д (t) can be s shown in (2).
(t) refers to the power load profile of the water-heating circuit and Ƿ ᴎ (t) encompasses all llaneous circuits' power load profiles that are not under consideration within the scope rch work. Within the scope of this paper, the main task is to infer the operating status of eating circuit with the only information of the main circuit, i.e., aggregated power load. ing is not only one of the major load elements in the residential sector [42][43][44] but is also a rruptible load element [45]. The said properties of the water-heating circuit make it a high ad toward numerous real-world energy efficiency applications, e.g., demand response er regulations [43], and peak shifting, and frequency response [47]. Consequently, nonerence of water-heating circuit is of utmost importance in the context of real-world energy pplications.
ology ent-based low sampling NILM system, depicted in Figure 1, is employed in this research orth noting that within the scope of this research the presented methodology is employed asive inference of water-heating circuits, however, this can be further extended for the e inference of other load elements; depending on the availability of load disaggregation Details of employed techniques at each stage/block presented in Figure 1 are explained is the aggregated power load at the metering point at time instant t, Inventions 2020, 5, x FOR PEER REVIEW 4 feature selection, and ensemble learning techniques, aiming to facilitate the real-world deploy of NILM systems. The rest of the paper is organized as follows: Section 2 presents the details of the sy formulations in terms of the problem statement, methodologies, and performance evaluation cri Section 3 discusses the simulation studies carried out in this research work and the correspon analysis of the extracted results. Section 4 presents a brief outlook of the proposed approach. Fin Section 5 concludes this research paper.

System Formulation
This section describes the overall proposed system architecture presented in this paper problem statement and research methodologies regarding data acquisition, event detection, fea extraction, and classification toward NILM-based load inference.

Problem Statement
At a single metering point, the monitored time-series aggregated power load profile ca weighed as an algebraic summation of m numbers of individual circuits' power load profil presented mathematically in (1).
is the aggregated power load at the metering point at time instant t, represents power load of ith circuit at time instant t, m represents the total numbers of indivi circuits, and n(t) is the measurement noise. In the context of this research work, Ƿ д (t) ca redefined as shown in (2).
where Ƿ ϢϦ (t) refers to the power load profile of the water-heating circuit and Ƿ ᴎ (t) encompass other miscellaneous circuits' power load profiles that are not under consideration within the s of this research work. Within the scope of this paper, the main task is to infer the operating stat the water-heating circuit with the only information of the main circuit, i.e., aggregated power Water heating is not only one of the major load elements in the residential sector [42][43][44] but is a flexible/interruptible load element [45]. The said properties of the water-heating circuit make it a potential load toward numerous real-world energy efficiency applications, e.g., demand resp [44,46], power regulations [43], and peak shifting, and frequency response [47]. Consequently, invasive inference of water-heating circuit is of utmost importance in the context of real-world en efficiency applications.

Methodology
An event-based low sampling NILM system, depicted in Figure 1, is employed in this rese work. It is worth noting that within the scope of this research the presented methodology is empl for non-invasive inference of water-heating circuits, however, this can be further extended fo non-invasive inference of other load elements; depending on the availability of load disaggreg databases. Details of employed techniques at each stage/block presented in Figure 1 are expla below.
represents power load of ith circuit at time instant t, m represents the total numbers of individual circuits, and n(t) is the measurement noise. In the context of this research work, Inventions 2020, 5, x FOR PEER REVIEW 4 of 20 feature selection, and ensemble learning techniques, aiming to facilitate the real-world deployment of NILM systems. The rest of the paper is organized as follows: Section 2 presents the details of the system formulations in terms of the problem statement, methodologies, and performance evaluation criteria. Section 3 discusses the simulation studies carried out in this research work and the corresponding analysis of the extracted results. Section 4 presents a brief outlook of the proposed approach. Finally, Section 5 concludes this research paper.

System Formulation
This section describes the overall proposed system architecture presented in this paper, i.e., problem statement and research methodologies regarding data acquisition, event detection, feature extraction, and classification toward NILM-based load inference.

Problem Statement
At a single metering point, the monitored time-series aggregated power load profile can be weighed as an algebraic summation of m numbers of individual circuits' power load profile, as presented mathematically in (1).
is the aggregated power load at the metering point at time instant t, Ƿ i (t) represents power load of ith circuit at time instant t, m represents the total numbers of individual circuits, and n(t) is the measurement noise. In the context of this research work, Ƿ д (t) can be redefined as shown in (2).
where Ƿ ϢϦ (t) refers to the power load profile of the water-heating circuit and Ƿ ᴎ (t) encompasses all other miscellaneous circuits' power load profiles that are not under consideration within the scope of this research work. Within the scope of this paper, the main task is to infer the operating status of the water-heating circuit with the only information of the main circuit, i.e., aggregated power load. Water heating is not only one of the major load elements in the residential sector [42][43][44] but is also a flexible/interruptible load element [45]. The said properties of the water-heating circuit make it a high potential load toward numerous real-world energy efficiency applications, e.g., demand response [44,46], power regulations [43], and peak shifting, and frequency response [47]. Consequently, noninvasive inference of water-heating circuit is of utmost importance in the context of real-world energy efficiency applications.

Methodology
An event-based low sampling NILM system, depicted in Figure 1, is employed in this research work. It is worth noting that within the scope of this research the presented methodology is employed for non-invasive inference of water-heating circuits, however, this can be further extended for the non-invasive inference of other load elements; depending on the availability of load disaggregation databases. Details of employed techniques at each stage/block presented in Figure 1 are explained below.
can be redefined as shown in (2).
Inventions 2020, 5, x FOR PEER REVIEW 4 of 20 feature selection, and ensemble learning techniques, aiming to facilitate the real-world deployment of NILM systems. The rest of the paper is organized as follows: Section 2 presents the details of the system formulations in terms of the problem statement, methodologies, and performance evaluation criteria. Section 3 discusses the simulation studies carried out in this research work and the corresponding analysis of the extracted results. Section 4 presents a brief outlook of the proposed approach. Finally, Section 5 concludes this research paper.

System Formulation
This section describes the overall proposed system architecture presented in this paper, i.e., problem statement and research methodologies regarding data acquisition, event detection, feature extraction, and classification toward NILM-based load inference.

Problem Statement
At a single metering point, the monitored time-series aggregated power load profile can be weighed as an algebraic summation of m numbers of individual circuits' power load profile, as presented mathematically in (1).
is the aggregated power load at the metering point at time instant t, Ƿ i (t) represents power load of ith circuit at time instant t, m represents the total numbers of individual circuits, and n(t) is the measurement noise. In the context of this research work, Ƿ д (t) can be redefined as shown in (2).
where Ƿ ϢϦ (t) refers to the power load profile of the water-heating circuit and Ƿ ᴎ (t) encompasses all other miscellaneous circuits' power load profiles that are not under consideration within the scope of this research work. Within the scope of this paper, the main task is to infer the operating status of the water-heating circuit with the only information of the main circuit, i.e., aggregated power load. Water heating is not only one of the major load elements in the residential sector [42][43][44] but is also a flexible/interruptible load element [45]. The said properties of the water-heating circuit make it a high potential load toward numerous real-world energy efficiency applications, e.g., demand response [44,46], power regulations [43], and peak shifting, and frequency response [47]. Consequently, noninvasive inference of water-heating circuit is of utmost importance in the context of real-world energy efficiency applications.

Methodology
An event-based low sampling NILM system, depicted in Figure 1, is employed in this research work. It is worth noting that within the scope of this research the presented methodology is employed for non-invasive inference of water-heating circuits, however, this can be further extended for the non-invasive inference of other load elements; depending on the availability of load disaggregation databases. Details of employed techniques at each stage/block presented in Figure 1 are explained below. feature selection, and ensemble learning techniques, aiming to facilitate the real-world deployment of NILM systems. The rest of the paper is organized as follows: Section 2 presents the details of the system formulations in terms of the problem statement, methodologies, and performance evaluation criteria. Section 3 discusses the simulation studies carried out in this research work and the corresponding analysis of the extracted results. Section 4 presents a brief outlook of the proposed approach. Finally, Section 5 concludes this research paper.

System Formulation
This section describes the overall proposed system architecture presented in this paper, i.e., problem statement and research methodologies regarding data acquisition, event detection, feature extraction, and classification toward NILM-based load inference.

Problem Statement
At a single metering point, the monitored time-series aggregated power load profile can be weighed as an algebraic summation of m numbers of individual circuits' power load profile, as presented mathematically in (1).
where Ƿ д (t) is the aggregated power load at the metering point at time instant t, Ƿ i (t) represents power load of ith circuit at time instant t, m represents the total numbers of individual circuits, and n(t) is the measurement noise. In the context of this research work, Ƿ д (t) can be redefined as shown in (2).
where Ƿ ϢϦ (t) refers to the power load profile of the water-heating circuit and Ƿ ᴎ (t) encompasses all other miscellaneous circuits' power load profiles that are not under consideration within the scope of this research work. Within the scope of this paper, the main task is to infer the operating status of the water-heating circuit with the only information of the main circuit, i.e., aggregated power load. Water heating is not only one of the major load elements in the residential sector [42][43][44] but is also a flexible/interruptible load element [45]. The said properties of the water-heating circuit make it a high potential load toward numerous real-world energy efficiency applications, e.g., demand response [44,46], power regulations [43], and peak shifting, and frequency response [47]. Consequently, noninvasive inference of water-heating circuit is of utmost importance in the context of real-world energy efficiency applications.

Methodology
An event-based low sampling NILM system, depicted in Figure 1, is employed in this research work. It is worth noting that within the scope of this research the presented methodology is employed for non-invasive inference of water-heating circuits, however, this can be further extended for the non-invasive inference of other load elements; depending on the availability of load disaggregation databases. Details of employed techniques at each stage/block presented in Figure 1 are explained below. feature selection, and ensemble learning techniques, aiming to facilitate the real-world deployment of NILM systems. The rest of the paper is organized as follows: Section 2 presents the details of the system formulations in terms of the problem statement, methodologies, and performance evaluation criteria. Section 3 discusses the simulation studies carried out in this research work and the corresponding analysis of the extracted results. Section 4 presents a brief outlook of the proposed approach. Finally, Section 5 concludes this research paper.

System Formulation
This section describes the overall proposed system architecture presented in this paper, i.e., problem statement and research methodologies regarding data acquisition, event detection, feature extraction, and classification toward NILM-based load inference.

Problem Statement
At a single metering point, the monitored time-series aggregated power load profile can be weighed as an algebraic summation of m numbers of individual circuits' power load profile, as presented mathematically in (1).
where Ƿ д (t) is the aggregated power load at the metering point at time instant t, Ƿ i (t) represents power load of ith circuit at time instant t, m represents the total numbers of individual circuits, and n(t) is the measurement noise. In the context of this research work, Ƿ д (t) can be redefined as shown in (2).
where Ƿ ϢϦ (t) refers to the power load profile of the water-heating circuit and Ƿ ᴎ (t) encompasses all other miscellaneous circuits' power load profiles that are not under consideration within the scope of this research work. Within the scope of this paper, the main task is to infer the operating status of the water-heating circuit with the only information of the main circuit, i.e., aggregated power load. Water heating is not only one of the major load elements in the residential sector [42][43][44] but is also a flexible/interruptible load element [45]. The said properties of the water-heating circuit make it a high potential load toward numerous real-world energy efficiency applications, e.g., demand response [44,46], power regulations [43], and peak shifting, and frequency response [47]. Consequently, noninvasive inference of water-heating circuit is of utmost importance in the context of real-world energy efficiency applications.

Methodology
An event-based low sampling NILM system, depicted in Figure 1, is employed in this research work. It is worth noting that within the scope of this research the presented methodology is employed for non-invasive inference of water-heating circuits, however, this can be further extended for the non-invasive inference of other load elements; depending on the availability of load disaggregation databases. Details of employed techniques at each stage/block presented in Figure 1 are explained below.
where OR PEER REVIEW 4 of 20 , and ensemble learning techniques, aiming to facilitate the real-world deployment . the paper is organized as follows: Section 2 presents the details of the system erms of the problem statement, methodologies, and performance evaluation criteria. es the simulation studies carried out in this research work and the corresponding tracted results. Section 4 presents a brief outlook of the proposed approach. Finally, es this research paper.
lation describes the overall proposed system architecture presented in this paper, i.e., nt and research methodologies regarding data acquisition, event detection, feature lassification toward NILM-based load inference. ment metering point, the monitored time-series aggregated power load profile can be lgebraic summation of m numbers of individual circuits' power load profile, as matically in (1).
is the aggregated power load at the metering point at time instant t, Ƿ i (t) r load of ith circuit at time instant t, m represents the total numbers of individual ) is the measurement noise. In the context of this research work, Ƿ д (t) can be n in (2).
fers to the power load profile of the water-heating circuit and Ƿ ᴎ (t) encompasses all ous circuits' power load profiles that are not under consideration within the scope ork. Within the scope of this paper, the main task is to infer the operating status of g circuit with the only information of the main circuit, i.e., aggregated power load. not only one of the major load elements in the residential sector [42][43][44] but is also a ible load element [45]. The said properties of the water-heating circuit make it a high ward numerous real-world energy efficiency applications, e.g., demand response gulations [43], and peak shifting, and frequency response [47]. Consequently, none of water-heating circuit is of utmost importance in the context of real-world energy tions.
sed low sampling NILM system, depicted in Figure 1, is employed in this research noting that within the scope of this research the presented methodology is employed inference of water-heating circuits, however, this can be further extended for the rence of other load elements; depending on the availability of load disaggregation s of employed techniques at each stage/block presented in Figure 1 are explained refers to the power load profile of the water-heating circuit and Inventions 2020, 5, x FOR PEER REVIEW 4 of 20 feature selection, and ensemble learning techniques, aiming to facilitate the real-world deployment of NILM systems. The rest of the paper is organized as follows: Section 2 presents the details of the system formulations in terms of the problem statement, methodologies, and performance evaluation criteria. Section 3 discusses the simulation studies carried out in this research work and the corresponding analysis of the extracted results. Section 4 presents a brief outlook of the proposed approach. Finally, Section 5 concludes this research paper.

System Formulation
This section describes the overall proposed system architecture presented in this paper, i.e., problem statement and research methodologies regarding data acquisition, event detection, feature extraction, and classification toward NILM-based load inference.

Problem Statement
At a single metering point, the monitored time-series aggregated power load profile can be weighed as an algebraic summation of m numbers of individual circuits' power load profile, as presented mathematically in (1).
is the aggregated power load at the metering point at time instant t, Ƿ i (t) represents power load of ith circuit at time instant t, m represents the total numbers of individual circuits, and n(t) is the measurement noise. In the context of this research work, Ƿ д (t) can be redefined as shown in (2).
where Ƿ ϢϦ (t) refers to the power load profile of the water-heating circuit and Ƿ ᴎ (t) encompasses all other miscellaneous circuits' power load profiles that are not under consideration within the scope of this research work. Within the scope of this paper, the main task is to infer the operating status of the water-heating circuit with the only information of the main circuit, i.e., aggregated power load. Water heating is not only one of the major load elements in the residential sector [42][43][44] but is also a flexible/interruptible load element [45]. The said properties of the water-heating circuit make it a high potential load toward numerous real-world energy efficiency applications, e.g., demand response [44,46], power regulations [43], and peak shifting, and frequency response [47]. Consequently, noninvasive inference of water-heating circuit is of utmost importance in the context of real-world energy efficiency applications.

Methodology
An event-based low sampling NILM system, depicted in Figure 1, is employed in this research work. It is worth noting that within the scope of this research the presented methodology is employed for non-invasive inference of water-heating circuits, however, this can be further extended for the non-invasive inference of other load elements; depending on the availability of load disaggregation databases. Details of employed techniques at each stage/block presented in Figure 1 are explained below. encompasses all other miscellaneous circuits' power load profiles that are not under consideration within the scope of this research work. Within the scope of this paper, the main task is to infer the operating status of the water-heating circuit with the only information of the main circuit, i.e., aggregated power load. Water heating is not only one of the major load elements in the residential sector [42][43][44] but is also a flexible/interruptible load element [45]. The said properties of the water-heating circuit make it a high potential load toward numerous real-world energy efficiency applications, e.g., demand response [44,46], power regulations [43], and peak shifting, and frequency response [47]. Consequently, non-invasive inference of water-heating circuit is of utmost importance in the context of real-world energy efficiency applications.

Methodology
An event-based low sampling NILM system, depicted in Figure 1, is employed in this research work. It is worth noting that within the scope of this research the presented methodology is employed for non-invasive inference of water-heating circuits, however, this can be further extended for the non-invasive inference of other load elements; depending on the availability of load disaggregation databases. Details of employed techniques at each stage/block presented in Figure 1 are explained below.

Data Acquisition and Preprocessing
For this research work, New Zealand (NZ) based electricity database, namely GREEN Grid (https://reshare.ukdataservice.ac.uk/853334/) [21] is used. The recently released database is first of its kind for New Zealand, where the data have been collected from 2014 to 2018 from a sample of 45 households, as part of the Renewable Energy and the Smart Grid (NZ GREEN Grid) project, a joint

Data Acquisition and Preprocessing
For this research work, New Zealand (NZ) based electricity database, namely GREEN Grid (https://reshare.ukdataservice.ac.uk/853334/) [21] is used. The recently released database is first of its kind for New Zealand, where the data have been collected from 2014 to 2018 from a sample of 45 households, as part of the Renewable Energy and the Smart Grid (NZ GREEN Grid) project, a joint venture of the University of Canterbury and the University of Otago, New Zealand. The NZ GREEN Grid dataset contains a 1-min interval measurement of mean power (in watts) data for individual circuits and main (total incoming power) circuit.
As the acquired load data are based on real-world measurements, numerous measurement uncertainties, e.g., noise, data spikes, and missing values are inevitable. Therefore, the acquired data have been thoroughly pre-processed to take care of the said measurement uncertainties. Initially, for simulation purposes, data are acquired from the timeframes that have consistent measurement entries without any missing or error values. Further, the acquired raw data are re-arranged in a more categorical (tabular) form for better visualization and validation for later stages. In terms of eliminating the noise/data spikes that interfere with event detection, the acquired aggregated load data are processed using the median filtering technique: a digital filtering technique that preserves the edges while eliminating the undesirable noise/data spikes. A detailed explanation of median filtering and its working phenomenon is presented in [48].

Event Detection
An event is defined as a transient portion within a signal when it deviates from the previous steady-state and lasts until the next one [49]. The aggregated load power profile varies with each transition in individual loads' power profile. Event detection algorithms detect these changes in the aggregated profile initiated by individual loads. So far, numerous event detection algorithms have been proposed that can be broadly classified into three categories, namely expert heuristics, matched filters, and probabilistic models [50].
This research work relies on an extended version of our recently proposed event detection As the acquired load data are based on real-world measurements, numerous measurement uncertainties, e.g., noise, data spikes, and missing values are inevitable. Therefore, the acquired data have been thoroughly pre-processed to take care of the said measurement uncertainties. Initially, for simulation purposes, data are acquired from the timeframes that have consistent measurement entries without any missing or error values. Further, the acquired raw data are re-arranged in a more categorical (tabular) form for better visualization and validation for later stages. In terms of eliminating the noise/data spikes that interfere with event detection, the acquired aggregated load data are processed using the median filtering technique: a digital filtering technique that preserves the edges while eliminating the undesirable noise/data spikes. A detailed explanation of median filtering and its working phenomenon is presented in [48].

Event Detection
An event is defined as a transient portion within a signal when it deviates from the previous steady-state and lasts until the next one [49]. The aggregated load power profile varies with each transition in individual loads' power profile. Event detection algorithms detect these changes in the aggregated profile initiated by individual loads. So far, numerous event detection algorithms have been proposed that can be broadly classified into three categories, namely expert heuristics, matched filters, and probabilistic models [50].
This research work relies on an extended version of our recently proposed event detection algorithm known as the mean absolute deviation-sliding window (MAD-SW) algorithm [41]. The MAD-SW algorithm is extended by incorporating a post-processing step to further improve the event detection performance. Table 1 presents a detailed description of the extended MAD-SW algorithm.

Input
Preprocessed aggregated load data, x Process
Initialize the filter having window width, ω, with the MAD value of input x Using the sliding window concept and pre-selected window width, ω, compute iteratively the MAD value 4.
Select a threshold value, δ, and compute the thresholding signal as Use derivative to compute the edges and extract the corresponding starting and ending time instances of the detected events 6.
Post-processing a.
Ending time instance delay correction because of window width b.
Final event approval c.
Delay tolerance incorporation, i.e., the detected event is considered a true event if, where, t ground_truth , t detected , and ∆t represent the ground-truth event starting time instance, detected event starting time instance, and delay tolerance, respectively.

Starting and Ending time instances of the detected events
The output of the MAD-SW algorithm in the form of starting and ending time indices (successive ones) are linked together to acquire all the detected events (transient portions), within the aggregated load power profile, for further processing according to the methodology presented in Figure 1.

Feature Extraction and Selection
The output of the event detection is merely an indication of transitions that occurred at different time instances within the aggregated load and does not provide any information regarding explicit circuits' identification and corresponding status, i.e., turn-on or turn-off. To identify this, different load features (also known as signatures) are extracted for each detected event, to be used as an input to classification models. Features refer to the unique consumption pattern of a circuit and enable the appropriate monitoring and classification of an explicit status of the given circuit from the aggregated load profile.
In this research work, a feature set (F) comprising of four distinct load features based on statistical, power, and geometrical features have been extracted. The proposed F is expressed in (3).
where µ and σ 2 represent the mean and variance of the transient portion, i.e., event, given as in (8) and (9), respectively.
Within the scope of this research work, the extracted load features are further evaluated using feature selection methodology, i.e., correlation analysis, to identify the most significant load features for further processing. Correlation analysis is employed to identify the highly correlated features within the extracted feature set, F, as features with high correlation are linearly dependent, consequently, having the same effect on target class in the context of classification. The employed methodology will not only identify the most significant load features as an input to learning models for better classification performance but also reduce the feature space dimensionality that plays a key role in reducing algorithm complexity and training time.

Classification
The selection of classification models for a specific domain is a critical phase. A variety of factors are involved when evaluating a classifier that includes but is not limited to features selection, training set size, the dimensionality of the problem, and parameter tuning [51]. This research work aims to introduce ensemble learning models for NILM classification. The ensemble learning [52] refers to a range of methodologies that combine independent (base) learning models to generate one optimal learning model/classifier for the given problem. It is mostly employed to improve the classification performance and is considered a trustworthy methodology in the said context [53]. Ensemble learning methodologies can be broadly classified into two categories, namely sequential and parallel ensemble learners. In the former, the base-learners are sequentially generated, however, the latter refers to a technique where the base-learners are generated in parallel. Both methodologies are employed in this research work, where AdaBoost-and Voting-based classifiers are used in the context of sequential and parallel ensemble techniques, respectively. The AdaBoost algorithm uses a weak base-learner to build a strong learning model by adaptively adjusting the weights at each iteration [54]. The Voting classifier merges several base-learners and the final prediction is based on a voting system, namely hard voting or soft voting [55]. Hard voting refers to the majority voting, where soft voting is based on average predicted probabilities.
Furthermore, for the employed sequential and parallel ensemble learners, the homogeneous (employs single base-learner) and heterogeneous (employs diverse base-learners) structure, respectively, are adopted. For said purposes, three independent and diverse supervised learning models including two machine learning models, i.e., logistic regression (LR) [56], decision trees (DT) [57], and one neural network model, i.e., multi-layer perceptron-artificial neural network (MLP-ANN) [58], are used to build the diverse ensemble learning models. Figure 2 graphically depicts the detailed methodologies of the proposed ensemble learning models, employed in this research work.
Inventions 2020, 5, x FOR PEER REVIEW 8 of 20 Furthermore, for the employed sequential and parallel ensemble learners, the homogeneous (employs single base-learner) and heterogeneous (employs diverse base-learners) structure, respectively, are adopted. For said purposes, three independent and diverse supervised learning models including two machine learning models, i.e., logistic regression (LR) [56], decision trees (DT) [57], and one neural network model, i.e., multi-layer perceptron-artificial neural network (MLP-ANN) [58], are used to build the diverse ensemble learning models. Figure 2 graphically depicts the detailed methodologies of the proposed ensemble learning models, employed in this research work.

Performance Evaluation
For evaluation purposes, well-known performance metrics namely, f-score, recall, and precision are used. F-score is a measure of a test's accuracy and is defined as harmonic-mean of the recall and precision, mathematically defined as in (10) [59].
Recall is defined as the number of relevant items selected, where precision refers to the number that selected items are relevant. Recall and precision are mathematically given as in (11) and (12), respectively [59].
Precision = TP TP + FP (12) Accuracy is another performance metric used for the evaluation of classification models and is defined as the fraction of predictions the model classifies correctly [60], given as in (13). Accuracy = TP + TN TP + TN + FP + FN (13) The terminologies of TP, FP, FN, and TN represent true positive, false positive, false negative, and true negative respectively, and are well defined in [35].

Performance Evaluation
For evaluation purposes, well-known performance metrics namely, f-score, recall, and precision are used. F-score is a measure of a test's accuracy and is defined as harmonic-mean of the recall and precision, mathematically defined as in (10) [59].
Recall is defined as the number of relevant items selected, where precision refers to the number that selected items are relevant. Recall and precision are mathematically given as in (11) and (12), respectively [59].
Accuracy is another performance metric used for the evaluation of classification models and is defined as the fraction of predictions the model classifies correctly [60], given as in (13).
The terminologies of TP, FP, FN, and TN represent true positive, false positive, false negative, and true negative respectively, and are well defined in [35].

Simulations and Results
Based on the presented research methodologies, comprehensive digital simulation studies have been carried out using Core i7 (8th Generation) desktop PC having 32 GB RAM. Moreover, in terms of simulation tools, MATLAB ® R2018b and Python 3.6.7 (scikit-learn (https://github.com/scikit-learn/ scikit-learn) version 0.21.3 [55]) are used. The following subsections present the details of simulation studies in terms of simulation parameters, extracted results, and corresponding analysis for each building block of the research methodology presented in Figure 1.

Event Detection Results
For event detection simulation, 30 days of load measurements are acquired from a real-world household of the NZ GREEN Grid database. To accommodate the diversity of consumption patterns of different load elements, the acquired load data are taken from different months of a year. For event detection simulation purposes, the details of the acquired load data and event detector parameters are presented in Table 2. Based on the attributes presented in Table 2, comprehensive simulations are carried out to assess different input parameters on the performance of the event detection algorithm. Table 3 presents a detailed performance evaluation of the event detection algorithm at different values of window width, where the delay tolerance is fixed at 0, i.e., exact match. From Table 3, it is observed that MAD-SW performs optimally at a window width of 3 yieldings to the results of around 81, 89, and 85 percent in terms of recall, precision, and f-score, respectively. It is also observed that a continuous drop in all concerned performance metrics has been occurred with an increase in window width. The observed decline in recall performance metric is due to the drastic upsurge in false negative detection with an increase in window width. The same phenomenon was observed in [41] for the load data of the Pecan Street Inc. Dataport [22] database.
Further, Table 4 presents MAD-SW performance evaluation and sensitivity analysis in terms of delay tolerance "∆t" where the window width is kept constant at ω = 3 because of the optimal performance of MAD-SW as shown in Table 3. It is evident from Table 4 that the incorporation of ∆t significantly improves the performance of the MAD-SW algorithm. As a consistent increase in true positive detection with an increase in delay tolerance value is recorded, consequently, leading to a persistent increase in algorithms overall performance. This determined that ∆t defines the event detector accuracy and is directly proportional to the performance [61], however, an optimal value must be selected to minimize the tradeoff between event detection performance and estimation of energy consumption at later stages. Hence, based on the presented results in Table 4, ∆t = 2 is selected as an optimal value. For ∆t > 2, the event detection f-score improvement is marginal, however, at a later stage larger ∆t will lead to higher error in the estimated and actual energy consumption. Figure 3 depicts the overall performance trend of the event detection algorithm in terms of ω and ∆t. It is evident from Table 4 that the incorporation of Δt significantly improves the performance of the MAD-SW algorithm. As a consistent increase in true positive detection with an increase in delay tolerance value is recorded, consequently, leading to a persistent increase in algorithms overall performance. This determined that Δt defines the event detector accuracy and is directly proportional to the performance [61], however, an optimal value must be selected to minimize the tradeoff between event detection performance and estimation of energy consumption at later stages. Hence, based on the presented results in Table 4, Δt = 2 is selected as an optimal value. For Δt > 2, the event detection f-score improvement is marginal, however, at a later stage larger Δt will lead to higher error in the estimated and actual energy consumption. Figure 3 depicts the overall performance trend of the event detection algorithm in terms of ω and Δt.
. Based on the extracted results and the presented analysis, ω = 3 and Δt = 2 are selected as the optimal parameters for further event detection simulations. Table 5 presents different attributes of diverse real-world households employed in this research work for non-invasive load inference of water heating, along with the corresponding event detection results based on the optimal parameters for event detection algorithm.  Based on the extracted results and the presented analysis, ω = 3 and ∆t = 2 are selected as the optimal parameters for further event detection simulations. Table 5 presents different attributes of diverse real-world households employed in this research work for non-invasive load inference of water heating, along with the corresponding event detection results based on the optimal parameters for event detection algorithm. It is worth noting that all the selected (testing) households, presented in Table 5, possess mostly different individual load circuits along with diverse consumption patterns. Even the similar load circuits in different testing households have different installation configurations, e.g., household ID rf_42 has a single circuit configured for laundry and freezer having a circuit label of "Laundry & Freezer$4128" [62]. In contrast, household ID rf_36 has two dedicated circuits for the said having the circuit labels of "Washing Machine$4146" and "Kitchen Appliances$4145" [62]. Likewise, household ID rf_42 has a load circuit labeled as "Lighting (inc heat lamps)$4129" where household ID rf_36 has a load circuit labeled as "Lighting$4149," which potentially implies that the latter has no heat lamps. A detailed layout of the individual circuits within the employed testing residential households are depicted in Figure 4, where further details can be found in [62]. All these constraints lead to a widely varied consumption pattern which is not only hard to predict precisely but also yield variable inference performance.
Inventions 2020, 5, x FOR PEER REVIEW 11 of 20 It is worth noting that all the selected (testing) households, presented in Table 5, possess mostly different individual load circuits along with diverse consumption patterns. Even the similar load circuits in different testing households have different installation configurations, e.g., household ID rf_42 has a single circuit configured for laundry and freezer having a circuit label of "Laundry & Freezer$4128" [62]. In contrast, household ID rf_36 has two dedicated circuits for the said having the circuit labels of "Washing Machine$4146" and "Kitchen Appliances$4145" [62]. Likewise, household ID rf_42 has a load circuit labeled as "Lighting (inc heat lamps)$4129" where household ID rf_36 has a load circuit labeled as "Lighting$4149," which potentially implies that the latter has no heat lamps. A detailed layout of the individual circuits within the employed testing residential households are depicted in Figure 4, where further details can be found in [62]. All these constraints lead to a widely varied consumption pattern which is not only hard to predict precisely but also yield variable inference performance.

Feature Extraction and Selection Results
As per the methodology presented in Section 2.2.3, four distinct load features, as given in (3), are extracted for each detected event of all households given in Table 5. The extracted load features are further evaluated using correlation analysis to identify the most significant ones for accurate load

Feature Extraction and Selection Results
As per the methodology presented in Section 2.2.3, four distinct load features, as given in (3), are extracted for each detected event of all households given in Table 5. The extracted load features are further evaluated using correlation analysis to identify the most significant ones for accurate load classification. Figure 5 presents the feature selection, i.e., correlation analysis, results for different testing households' data.
Inventions 2020, 5, x FOR PEER REVIEW 12 of 20 classification. Figure 5 presents the feature selection, i.e., correlation analysis, results for different testing households' data. It is evident from the results presented in Figure 5 that for all testing households the load features, i.e., S Ɛ (Slope) and P peak2peak (P2P Power) are highly correlated to each other, i.e., ≥0.9. Similarly, C Disp. (Coef. Disp.) and σ (St. Dev.) are highly correlated to each other with a correlation ≥0.83. Hence, from the larger perspective of models' performance, complexity, and computational need, the highly correlated features are excluded and a new feature set, F Input , is formulated that will act as an input to the models for classification purposes within the scope of this research work. The newly formulated load feature set, F Input , is expressed as in (14).

Classification Results
For classification purposes, the methodologies discussed in Section 2.2.4 are employed and comprehensive simulation studies are carried out on load data presented in Table 5. To further validate the effectiveness of the proposed approach in terms of generalization capability of learning models, four different households, as given in Table 5, are employed for evaluation purposes. It is worth noting that the employed households for training and testing purposes of the learning models have dedicated water-heating load circuits, however, the other individual circuits may vary in terms of availability and installation configuration [62]. Initially, all employed models are evaluated using k-fold cross-validation to validate their effectiveness toward unseen testing data. Later, all employed learning models are trained on 20 days of load data from a single (training) household and rigorously tested on a diverse set of testing households. The testing households also include the same household It is evident from the results presented in Figure 5 that for all testing households the load features, i.e., S E (Slope) and P peak2peak (P2P Power) are highly correlated to each other, i.e., ≥0.9. Similarly, C Disp. (Coef. Disp.) and σ (St. Dev.) are highly correlated to each other with a correlation ≥0.83. Hence, from the larger perspective of models' performance, complexity, and computational need, the highly correlated features are excluded and a new feature set, F Input , is formulated that will act as an input to the models for classification purposes within the scope of this research work. The newly formulated load feature set, F Input , is expressed as in (14).

Classification Results
For classification purposes, the methodologies discussed in Section 2.2.4 are employed and comprehensive simulation studies are carried out on load data presented in Table 5. To further validate the effectiveness of the proposed approach in terms of generalization capability of learning models, four different households, as given in Table 5, are employed for evaluation purposes. It is worth noting that the employed households for training and testing purposes of the learning models have dedicated water-heating load circuits, however, the other individual circuits may vary in terms of availability and installation configuration [62]. Initially, all employed models are evaluated using k-fold cross-validation to validate their effectiveness toward unseen testing data. Later, all employed learning models are trained on 20 days of load data from a single (training) household and rigorously tested on a diverse set of testing households. The testing households also include the same household as used for training purposes, however, the data acquired for testing purposes are entirely unseen for the training phase. In the given context, Table 6 presents the details of different learning models' parameters adopted for the digital simulation within the scope of this research. Table 6. Learning models' parameters.
The employed learning models are also evaluated in the context of individual households and for the said purpose the accuracy performance metric, given in (13), is employed. The corresponding results are presented in Table 8, where all the results are in percentages. For the given testing households, the results presented in Table 8 are further depicted in Figure 6 to better visualize the performance comparison among different employed ensemble learners and their respective standalone base-learner/s. do not correspond to the worst performance of the employed models, as in reality there was no ground-truth water-heating circuit activity for the given data acquisition timeframe of household ID rf_31.
The employed learning models are also evaluated in the context of individual households and for the said purpose the accuracy performance metric, given in (13), is employed. The corresponding results are presented in Table 8, where all the results are in percentages. For the given testing households, the results presented in Table 8 are further depicted in Figure  6 to better visualize the performance comparison among different employed ensemble learners and their respective standalone base-learner/s. As evident from the detailed results presented in Table 8 and performance comparison presented in Figure 6, in most of the cases the ensemble learners attained higher accuracy performance compared to their respective standalone base-learner/s. Except for a single case, where the AdaBoost ensemble learner lags in performance compared to its respective base-learner, i.e., the DT model, however, the performance lag is marginal, i.e., 0.33% only. Further, it is also observed that the accuracy performance of all the learning models varies from house to house. This is expected because of diverse set of testing households as well as the corresponding testing households' data are entirely unseen in the training phase of the learning models.
The employed learning models are also evaluated in terms of an entire set of diverse testing households within the scope of this research work. In this context, Figure 7 (in the form of boxplot) As evident from the detailed results presented in Table 8 and performance comparison presented in Figure 6, in most of the cases the ensemble learners attained higher accuracy performance compared to their respective standalone base-learner/s. Except for a single case, where the AdaBoost ensemble learner lags in performance compared to its respective base-learner, i.e., the DT model, however, the performance lag is marginal, i.e., 0.33% only. Further, it is also observed that the accuracy performance of all the learning models varies from house to house. This is expected because of diverse set of testing households as well as the corresponding testing households' data are entirely unseen in the training phase of the learning models.
The employed learning models are also evaluated in terms of an entire set of diverse testing households within the scope of this research work. In this context, Figure 7 (in the form of boxplot) presents an overall accuracy performance of the employed learning models, i.e., ensemble learners vs. respective standalone base-learners. presents an overall accuracy performance of the employed learning models, i.e., ensemble learners vs. respective standalone base-learners. The red horizontal line within the box in Figure 7 represents the median values. Similarly, in Figure 7, the yellow and green dotted lines represent the median and minimum performance attained by the employed ensemble learners. It is seen in Figure 7 that both ensemble learners attained better overall accuracy performance compared to their respective standalone base-learner/s. As the AdaBoost learner enhances the performance of the weak base-learner, i.e., the DT model, by attaining a median accuracy performance improvement of 1.54%. On the other side, the voting ensemble model balances out the individual shortcomings of its respective base-learner members, i.e., LR, DT, and MLP-ANN, and attained a median accuracy performance improvement in a range of 0.17% to 8.53% compared to its respective base-learner members. From the extracted results, seen in Figure 7 (Left Side), it is also noted that the voting ensemble achieves a marginal improvement of 0.17% compared to one of its respective members, i.e., the LR model. But it is worth noting that there is a probability that in the presence of the best-performing member, the ensemble model does not lead to any performance improvement [63]. However, for the given problem, i.e., non-invasive load inference, both employed ensemble leaners, i.e., homogeneous and heterogeneous, achieved classification performance improvement.

Outlook
In the context of real-world deployment, low data granularity based non-invasive load inference technique is of utmost importance, as it can be extended to disaggregate the major residential load elements, e.g., water heating, electric vehicles, air-conditioning units. More importantly, disaggregation of these load elements can further facilitate the demand side management strategies as the corresponding outcome in form of appliance or circuit level feedback will significantly facilitate the consumers to effectively manage their loads' operation. This could not only help the sustainable operation of energy systems but also facilitate the consumers in terms of savings due to load shifting The red horizontal line within the box in Figure 7 represents the median values. Similarly, in Figure 7, the yellow and green dotted lines represent the median and minimum performance attained by the employed ensemble learners. It is seen in Figure 7 that both ensemble learners attained better overall accuracy performance compared to their respective standalone base-learner/s. As the AdaBoost learner enhances the performance of the weak base-learner, i.e., the DT model, by attaining a median accuracy performance improvement of 1.54%. On the other side, the voting ensemble model balances out the individual shortcomings of its respective base-learner members, i.e., LR, DT, and MLP-ANN, and attained a median accuracy performance improvement in a range of 0.17% to 8.53% compared to its respective base-learner members. From the extracted results, seen in Figure 7 (Left Side), it is also noted that the voting ensemble achieves a marginal improvement of 0.17% compared to one of its respective members, i.e., the LR model. But it is worth noting that there is a probability that in the presence of the best-performing member, the ensemble model does not lead to any performance improvement [63]. However, for the given problem, i.e., non-invasive load inference, both employed ensemble leaners, i.e., homogeneous and heterogeneous, achieved classification performance improvement.

Outlook
In the context of real-world deployment, low data granularity based non-invasive load inference technique is of utmost importance, as it can be extended to disaggregate the major residential load elements, e.g., water heating, electric vehicles, air-conditioning units. More importantly, disaggregation of these load elements can further facilitate the demand side management strategies as the corresponding outcome in form of appliance or circuit level feedback will significantly facilitate the consumers to effectively manage their loads' operation. This could not only help the sustainable operation of energy systems but also facilitate the consumers in terms of savings due to load shifting of their high consumption load elements [64]. Non-invasive load inference can also facilitate the commercial and industrial sectors, e.g., in the commercial sector, the proposed non-invasive load inference approach can play a significant role in terms of monitoring distinct load patterns (energy audit) without affecting the individual vendors' privacy. Moreover, the proposed approach facilitates the industrial sector not only in terms of load monitoring, i.e., operation patterns, fault diagnosis, but also helps in terms of potential load identification for demand response applications.
Further, in the context of system perspective, the authors of [65] presented a comprehensive overview of NILM applications; exploring numerous NILM-assisted real-world applications including but not limited to, homecare monitoring systems, appliance scheduling, energy audit, personalized recommendation systems, demand response, and fault detection. The study broadly classified numerous NILM applications into four categories, namely consumer-based applications, utility-based applications, policy-based applications, and manufacturer-based applications [65]. Concisely, the non-intrusive load inference approach has solid potential toward energy efficiency, and further research particularly in the context of low data granularity and real-world applications will significantly facilitate all the stakeholders including but not limited to utility providers, consumers, policymakers, and manufacturers.

Conclusions
This paper proposed a non-invasive load inference approach for water-heating circuit using ensemble machine learning methodologies. For the said purpose, an event-based NILM methodology, assisted by correlation-based feature selection technique and diverse machine learning models, is adopted, and comprehensive digital simulations are carried out on real-world low granularity (1-min sampling rate, i.e., 1/60 Hz) load measurements: NZ GREEN Grid database.
In the context of event detection, the MAD-SW algorithm's performance is improved with post-processing. Similarly, the extracted load features of detected events are further evaluated using feature selection methodology to identify the most significant load features for classification purposes. For NILM classification, two diverse ensemble learning techniques are introduced to facilitate inference performance. Under the given conditions, homogeneous sequential (AdaBoost) and heterogeneous parallel (Voting) ensemble learning techniques are successfully employed. Based on the presented analysis of the extracted results, it is concluded that the proposed non-invasive load inference approach not only attained promising inference results but also showed good generalization capabilities in the context of unseen testing data. Further, it is noted that the employed ensemble learners provide classification performance improvement compared to their respective standalone base-learners. However, it is worth noting that the performance improvement allowed by the employed ensemble models came at a price of model complexity and computational power. Consequently, a trade-off exists between the performance and computational requirements. Hence, it is exclusively the choice of the end-user as well as the sensitivity-level of the given problem to prefer performance over computational efficiency or vice-versa.
Based on the presented research work and corresponding findings, it is concluded that ensemble learning can facilitate non-intrusive load monitoring, even at low data granularity. Further, the outcome of non-invasive load inference of water heating has a solid potential to facilitate numerous real-world energy efficiency applications, e.g., demand response, load forecasting, and load scheduling strategies. In the future, this research will be extended in terms of broader applications of the proposed approach toward energy efficiency.