The Design, Creation, Implementation, and Study of a New Dataset Suitable for Non-Intrusive Load Monitoring

Rodriguez-Navarro, Carlos; Portillo, Francisco; Montoya, Francisco G.; Alcayde, Alfredo

doi:10.3390/app15137200

Open AccessArticle

The Design, Creation, Implementation, and Study of a New Dataset Suitable for Non-Intrusive Load Monitoring

Department of Engineering, University of Almeria, ceiA3, 04120 Almeria, Spain

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(13), 7200; https://doi.org/10.3390/app15137200

Submission received: 29 May 2025 / Revised: 17 June 2025 / Accepted: 24 June 2025 / Published: 26 June 2025

Download

Browse Figures

Versions Notes

Abstract

The increasing need for efficient energy consumption monitoring, driven by economic and environmental concerns, has made Non-Intrusive Load Monitoring (NILM) a cost-effective alternative to traditional measurement methods. Despite its progress since the 1980s, NILM still lacks standardized benchmarks, limiting objective performance comparisons. This study introduces several key contributions: (1) the development of five new converters with 13-digit timestamp support and harmonic inclusion, improving the data collection accuracy by up to 25%; (2) the implementation of an advanced disaggregation software, achieving a 10–15% increase in the F1-score for certain appliances; (3) a detailed analysis of harmonics’ impact on NILM, reducing the Mean Normalized Error in Assigned Power by up to 40%; and (4) the design of open-source measurement hardware to enhance reproducibility. This study also evaluates open hardware platforms and compares five common household appliances using NILM Toolkit metrics. Results demonstrate that open hardware and software foster reproducibility and accelerate innovation in NILM. The proposed approach contributes to a standardized and scalable NILM framework, facilitating real-world applications in energy management and smart grid optimization.

Keywords:

data-driven algorithms; energy disaggregation; F1-score; MAE; NDE; non-intrusive load monitoring; non-intrusive load monitoring toolkit; RMSE; sampling rate; smart energy monitoring

1. Introduction

Energy consumption significantly impacts the economy and the environment, emphasizing the need for accurate measurement and optimization. Real-time monitoring enables cost reductions and enhances sustainability by promoting energy efficiency practices. Non-Intrusive Load Monitoring (NILM) is a promising approach to disaggregate electricity consumption without requiring additional sensors at each consumption point.

Unlike intrusive methods, NILM estimates the individual device consumption by analyzing smart meter data [1]. Studies have shown that detailed information on appliance energy usage can reduce household consumption by at least 20% [2].

The field of NILM has significantly evolved since its initial conceptualization. Research on NILM began in the 1980s and 1990s, focusing on developing non-intrusive methods to monitor and disaggregate household energy consumption. In 1992, Hart demonstrated how to estimate the individual appliance energy consumption from a single aggregated signal, using signal processing techniques and basic energy consumption models. This work played an essential role in the development of the more advanced techniques used in NILM today [3].

NILM research has evolved significantly, incorporating sophisticated techniques to improve accuracy. Methods like Combinatorial Optimization (CO) and Hidden Markov Models (HMMs) were initially used. More recently, deep learning approaches, including Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs), have enhanced the NILM performance. However, challenges persist due to the lack of standardized datasets and evaluation metrics, complicating algorithm comparisons and limiting reproducibility. Additionally, the influence of hardware characteristics, particularly harmonics, on the NILM performance remains underanalyzed. As shown in Figure 1, the evolution of NILM has followed key technological advancements. In the early 2000s, more advanced algorithms incorporating load modeling and signal decomposition techniques emerged. These approaches explored energy signal decomposition into individual components and appliance identification based on distinctive load characteristics. With the rise of machine learning and advanced data analytics since the mid-2000s, NILM techniques have significantly evolved. Traditional approaches, such as decision trees, HMMs [4], and Factorial Hidden Markov Models (FHMMs) [5], enabled the capture of temporal dependencies in energy consumption data.

Deep learning techniques have revolutionized NILM recently, significantly improving the disaggregation accuracy. CNNs and RNNs are increasingly used because of their ability to model complex, nonlinear relationships in energy data [6,7,8]. Recent comparative studies also analyze the performance of several machine learning techniques, including LSTM-based Recurrent Neural Networks [9] for load disaggregation in residential electricity consumption data [8]. These algorithms can automatically extract meaningful features from raw consumption data, enhancing the classification performance. Additionally, novel methods such as the Coupled Sequence Matrix Reconstruction and Cross-Stage Partial Network (CSPNet) have been proposed to improve the NILM performance by effectively leveraging variations in energy consumption over time [10].

Despite these advancements, NILM faces challenges, particularly regarding real-time processing capabilities [11]. Most NILM algorithms are not designed for real-time operations, limiting their practical applicability in scenarios requiring immediate responses. While high-frequency hardware and advanced machine learning techniques can enhance the real-time performance, challenges such as the computational efficiency optimization and handling large volumes of high-frequency data persist [12].

Edge computing-based NILM solutions have gained attention for addressing these challenges. By processing energy consumption data locally in real-time, edge-based NILM offers a lower latency, which is crucial for applications such as home automation and anomaly detection [13]. Moreover, local processing enhances privacy by minimizing data transmission, reducing the risk of data breaches, and decreasing bandwidth usage, making it more cost-effective. However, implementing NILM on the edge involves challenges, such as hardware limitations and energy consumption constraints, which must be addressed to fully realize its potential in energy management and smart home automation [14].

Table 1 provides an overview of various NILM methods, highlighting their strengths and limitations while showcasing their practical implementations in smart home energy management, industrial monitoring, and intelligent energy distribution systems.

Practical applications of NILM are diverse and impactful. In smart home energy management, NILM identifies energy-saving opportunities and optimizes consumption, helping households reduce their energy bills and environmental footprint. In industrial monitoring, NILM facilitates the real-time monitoring and optimization of energy usage, allowing industries to streamline operations and improve efficiency. Lastly, NILM enhances the energy management efficiency in intelligent energy distribution systems by providing detailed insights into energy consumption patterns. This is crucial for optimizing smart grid applications and ensuring a more sustainable energy supply.

NILM optimizes energy consumption, identifies usage patterns, and supports cost-saving decisions. This is achieved by disaggregating the total energy consumption into individual components, allowing more efficient demand management and identifying energy-saving opportunities [15]. Despite the advancements, some challenges persist:

Standardized datasets: No accepted standard makes comparisons between studies difficult. The NILMTK aims to address this problem with the HDF5 format [16,17];
Diverse electrical loads: The devices’ energy consumption variability complicates the NILM algorithm’s practical development. This includes the complexity of identifying devices with similar or variable consumption patterns [17,18];
Robust and scalable solutions: Many algorithms perform well in laboratory settings but fail in real-world applications. This is due to the need to adapt to different conditions and usage patterns in the real world [19];
Metrics for evaluation and generality: The inconsistency in performance metrics and the lack of standardized benchmarks make it challenging to evaluate the generality and effectiveness of new algorithms [17,20].

This study addresses these challenges through several key strategies. First, it develops five new converters for measurements that support 13-digit timestamps with harmonic inclusion capabilities and creates new datasets in the NILMTK-DF format. Additionally, it implements comprehensive disaggregation software for a systematic algorithm evaluation and comparison. It also analyzes the impact of the harmonic inclusion and other considerations on NILM metrics. Finally, it designs and validates open-source measurement hardware to ensure the results are replicable. This study provides an in-depth evaluation of open hardware platforms and compares five household appliances using the Non-Intrusive Load Monitoring Toolkit (NILMTK) metrics. The findings highlight the importance of standardized tools like the NILMTK in assessing NILM performance and underscore the need for accessible hardware and software platforms to foster reproducibility and innovation.

This article is structured to present a comprehensive investigation of NILM, after the Section 1, which sets the context of NILM and the ongoing challenges, Section 3 describes the instruments (the OpenZmeter oZm v3 platform), the datasets used (DSUALM10H and DSUALM10), the software tools and evaluation metrics (NILMTK v0.4.0 and NILMTK-Contrib, with MAE, RMSE, F1-score, and NDE), the algorithms tested (including traditional methods such as CO and FHMM, and deep neural networks such as DAE and RNN), and the experimental conditions (sampling rate, harmonic content, and power filters). Then, in Section 4, the results of the evaluation of the algorithms under these conditions are presented, analyzing the influence of experimental factors and execution times. Finally, the Section 5 summarizes the main findings and their implications for future research.

2. NILMTK

NILM relies on advanced software tools and specialized hardware to ensure accurate data acquisition and reliable energy consumption disaggregation. In addition, data acquisition systems have evolved with high-resolution hardware, improving NILM’s accuracy across various environments. These advancements drive its adoption in industrial and commercial applications, contributing to energy savings and a reduced environmental footprint. Future developments in NILM are expected to bring further innovations, enhancing its precision and applicability.

In the last decade, efforts to standardize the evaluation of NILM algorithms and expand the access to public datasets have intensified. Platforms like NILM-Eval and the NILMTK have significantly facilitated the comparison and validation of load disaggregation approaches. Among the software tools, the NILMTK stands out as a standardized framework that provides workflows and analytical algorithms, facilitating research and methodology comparisons. Its application in energy efficiency studies has proven valuable for optimizing the electricity consumption in residential buildings [21].

The NILMTK [22] is an open-source toolkit designed to streamline NILM research by providing standardized data formats, detailed metadata, and a variety of energy analysis algorithms [23]. Its modular architecture supports the entire workflow, from the data acquisition to the performance evaluation. It is compatible with datasets such as REDD [24], UK-DALE [25], ECO [26], and other high-resolution datasets, such as those provided by Pecan Street, (Austin, TX, USA), through its dataport platform [27], which provides tools for preprocessing, feature extraction, and visualization and enables the comparison of results using built-in evaluation metrics. As shown in Figure 2, NILMTK covers all key steps, including the raw data processing, the application of learning algorithms, and the disaggregation performance assessment.

Among the disaggregation methods implemented in the NILMTK, CO and the FHMM are two widely used benchmark approaches. CO formulates the load disaggregation as an optimization problem, where the total energy consumption is distributed among appliances while minimizing the error between the measured and estimated consumption. Despite its computational complexity, CO has proven to be both efficient and competitive compared to more sophisticated methods, while also being significantly faster [28]. The FHMM [29] extends the traditional HMM by modeling multiple appliances independently, improving the representation of the appliance behavior, and capturing more detailed consumption patterns. In the NILMTK, the FHMM is integrated within its modular framework, facilitating its application across various datasets.

To evaluate NILM algorithms, the NILMTK incorporates several performance metrics that measure the accuracy of the load disaggregation. Among these, the F1-score defined in Equation (1) assesses the balance between precision and recall in identifying appliance states.

F 1 = \frac{2 \times P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l},

(1)

where Precision, defined in Equation (2), emphasizes the accuracy of detecting the “ON” state of an appliance, and Recall, described in Equation (3), focuses on the correct identification of real activations (TP refers to true positives, FP to false positives, and FN to false negatives).

P r e c i s i o n = \frac{T P}{T P + F P},

(2)

R e c a l l = \frac{T P}{T P + F N},

(3)

Another key metric is the Energy Assignment Error (EAE), defined in Equation (4), which measures the mean absolute error in the energy estimation.

E A E = |\sum_{t} y_{t}^{(n)} - \sum_{t} {\hat{y}}_{t}^{(n)}|,

(4)

where

{\hat{y}}_{t}^{(n)}

is the estimated power consumption of the appliance n at time interval t, and

y_{t}^{(n)}

is the actual power used by the same appliance.

The Mean Normalized Error in Assigned Power (MNEAP), defined in Equation (5), evaluates the accuracy of the energy assignment across devices. It is computed by comparing the absolute difference between the measured and actual power, normalized by the real power.

M N E A P = \frac{\sum_{t} | y_{t}^{(n)} - {\hat{y}}_{t}^{(n)} |}{\sum | y_{t}^{(n)} |},

(5)

where

{\hat{y}}_{t}^{(n)}

represents the estimated power for appliance n during each time interval, and

y_{t}^{(n)}

is the actual power consumed by the appliance.

The Root Mean Square Error (RMSE), defined in Equation (6), measures the overall variability in the energy estimation, showing discrepancies between predicted and actual energy consumption values.

R M S E = \sqrt{\frac{1}{T} \sum {(y_{t}^{(n)} - {\hat{y}}_{t}^{(n)})}^{2}},

(6)

where

{\hat{y}}_{t}^{(n)}

is the assigned power for the appliance at time interval t,

y_{t}^{(n)}

is the actual power of the appliance, and T is the total number of observations or time intervals during the energy consumption period.

The Normalized Disaggregation Error (NDE), defined in Equation (7), quantifies the total energy estimation error for each appliance, normalized by the actual consumption to enable fair comparisons.

{N D E}^{(i)} = \frac{{\sum_{t = 1}^{T} {(\hat{y}}_{t}^{(i)} - y_{t}^{(i)})}^{2}}{\sum_{t = 1}^{T} {(y_{t}^{(i)})}^{2}},

(7)

where

{\hat{y}}_{t}^{(i)}

is the estimated power consumption of the device i at time t, and

y_{t}^{(i)}

reflects the actual power consumption of the device over the total time interval T.

Finally, the Mean Absolute Error (MAE), defined in Equation (8), measures the average absolute difference between the predicted and actual power consumption of an appliance. Lower MAE values indicate a better estimation performance.

M A E = (\frac{1}{N}) \times \sum |p r e d i c t e d_{i} - a c t u a l_{i}|

(8)

where predicted_i and actual_i represent individual prediction values and actual energy consumption values of appliances, respectively, and N represents the total number of observations.

3. Materials and Methods

To perform energy consumption disaggregation with NILMTK, it is essential to have a compatible dataset that includes metadata and measurements of both the aggregate and disaggregated consumption of appliances. In this context, once NILMTK is installed (with the use of Anaconda being recommended), a specialized program that supports 13-digit timestamps and can generate new datasets from the measurements obtained by monitoring devices is required.

For this purpose, data converters developed by the authors of this study are used and are freely available. These converters are stored in the following directory: C:\Users\userxx\anaconda3\envs\nilmtk-env\Lib\site-packages\nilmtk\dataset_converters.

Additionally, these converters must be registered in NILMTK, which is achieved by adding their names to the __init__.py file in the directory (see Figure A1. Code of the init file in Appendix A).

Once the converter code is located, invoking each converter is relatively straightforward. It only requires passing the function of the path where the measurement files are stored, the new dataset name, and the start and end dates, as shown in Figure A2 in Appendix A.

After generating the dataset, data analysis, preprocessing, and statistics extraction can be performed to train and ultimately develop the most optimal model for disaggregation. This final task can be streamlined using the Python code shown in Figure A3 in Appendix A, which will generate an H5-format model for disaggregation. The software used was Python version 3.7.12, which meets the NILMTK requirement of Python 3.6 or higher.

Finally, once the model has been created, its accuracy can be evaluated by obtaining metrics using the tools included in NILMTK. This approach enables comparison of results with those from other datasets, thereby deducing the optimal configuration. Figure A4 in Appendix A shows an example of Python code that calculates these metrics. The results section will present these metrics, facilitating comparisons considering factors such as filing methods, sampling rates, and the types and numbers of appliances in the dataset.

A wide variety of metering devices are available on the market, although unfortunately, not all are open source or compatible with NILMTK. In addition, many of these devices remain prohibitively expensive and inaccessible to researchers, as the accompanying software is proprietary [30]. However, there is a growing trend towards the adoption of open-source solutions. These limitations have motivated the need to develop new open-source hardware for this work, on which both the converters and the software have been designed to disaggregate energy consumption.

The openZmeter (oZm) v1 is an open-source, low-cost, and high-efficiency single-phase power quality analyzer and smart energy meter. It complies with IEC 61000-4-30 and EN 50160 standards and was developed by the Universities of Granada and Almeria. It measures electrical parameters at 15,625 Hz and includes an easy-to-use interface. Its architecture features an AFE, an STM32 microcontroller, ARM NanoPi/OrangePi boards, and a PostgreSQL database, which handles RMS voltages up to 400 V and currents up to 50 A with 0.1% accuracy.

A visual representation of the electrical diagram covering the various elements of the oZm v1 is provided in Figure 3. The oZm’s hardware includes an Analog Front End (AFE) that captures and conditions the voltage and current signals from the electrical grid. An STM32 microcontroller performs basic calculations and sends the data to more powerful ARM boards for complex analysis. Additionally, the device uses a Hall effect sensor or current clamps to measure current and a resistive divider to measure voltage, which is then converted to digital signals by an ADC. The system is powered directly from the electrical grid with an isolated power source and includes a backup battery to maintain operation in case of power interruptions.

The data obtained with this oZm configuration is accurate to 0.1% for voltage measurements in a 42.5 to 57.5 Hz frequency range. In addition, the oZm v1 incorporates a customized version of the FFTW3 Fourier transform algorithm, which allows harmonic values to be obtained in complex form up to order 50, both for current and voltage. Given the compact and versatile aspect of the oZm, this implementation is of great relevance to the objectives of this work. This study used this meter to generate the DSUALM dataset (excluding harmonics) and the DSUALMH dataset (including voltage, current, and power harmonics up to the 50th order).

The oZm v3, shown in Figure 4, is an open-source three-phase meter designed for low-voltage installations, operating up to 400V and complying with IEC 61000-4-30 and EN 50160 standards. It measures single-phase and three-phase systems with 0.1% accuracy, supports DC systems, and operates at 50/60 Hz with a resolution of up to 10 MHz. The device features versatile current measurement capabilities, virtual analyzers, an intuitive web interface, and an open API, making it highly adaptable for various applications.

Compared to its predecessor, the oZm v3 offers significant improvements, including higher accuracy, a 60 kHz sampling rate, and enhanced connectivity with GPS/PPS. It also incorporates programmable relays and multi-synchronization capabilities, making it ideal for power quality analysis and electrical grid monitoring. These enhancements enable more efficient detection and resolution of power quality issues and improved energy management in complex systems.

Designed for a wide range of applications, including power quality monitoring, energy management, and electrical network analysis, the oZm v3 provides exceptional versatility. Its capability to measure single-phase and three-phase systems makes it suitable for diverse industrial and commercial environments. Its web interface and open API facilitate seamless integration with existing energy management systems, enabling real-time visualization and analysis of energy consumption and power quality. This functionality is crucial for optimizing energy efficiency and ensuring the reliable operation of electrical grids. This new version of oZm was used in this study to generate the DSUALM10 dataset (excluding harmonics) and the DSUALM10H dataset (including voltage, current, and power harmonics up to the 50th order).

In this work, the Open Multi Power Meter (OMPM) [31] has also been utilized. The OMPM, whose electric sketch is shown in Figure 5, is a highly scalable multi-meter that employs a Node MCU ESP32 microcontroller (Espressif Systems, Shanghai, China), to acquire electrical measurements from a variable number of high-precision, low-cost PZEM004 modules (Peacefair, Shenzhen, China) connected via a bus. The prototype uses six PZEM 004T modules with Rogowski coils to measure current, power, voltage, and power factor for six devices. It offers a precision of 0.5% (1% for power factor) at frequencies above 10 Hz, transmitting data via RS-485. With high-resolution processing, the OMPM measures 80–260 V, 0–100 A, 0–23 kW, and 45–65 Hz per channel. The system supports up to 127 channels by adding more modules at minimal cost.

The OMPM was designed with scalability, utilizing an RS-485 bus to transmit voltage, current, power, frequency, and power factor measurements from each module to the central controller via RX and TX lines. This setup allows for high accuracy and flexibility in electrical monitoring. It leverages the ESP32 microcontroller for data processing and stores data on microSD cards, with optional display capabilities via an I2C screen. Each module provides precise electrical measurements with an accuracy of 0.5% for most parameters, except for the power factor, which is 1%, achieved at frequencies above 10 Hz. This precision and scalability make the OMPM suitable for various applications requiring detailed electrical monitoring.

This study used the OMPM to generate the UALM2 dataset, which includes measurements from six meters associated with the aggregate and five applications. The UALM2 dataset demonstrates the OMPM’s capability to provide high-quality data for NILM applications, highlighting its potential for enhancing energy efficiency and load disaggregation accuracy.

The evaluation methodology in this study focused on ensuring reproducibility, leveraging open-source hardware, and conducting comprehensive comparative analysis. An extensive pipeline was followed, encompassing the entire process from raw data conversion to the evaluation of disaggregation results. The datasets were consistently split into training, validation, and testing phases, using an 80/20 split in some scenarios to enable consistent comparisons across algorithms and environments. Trained models were saved in H5 format to optimize efficiency and reusability.

During validation, the optimal model was applied, and its predictions were compared with the actual ground truth data. A key part of the strategy involved comparing performance with publicly available and widely used NILM datasets, such as REDD, UK-DALE, and ECO, to demonstrate improvements in accuracy and efficiency under various conditions.

This study focused on the disaggregation of typical household appliances. The custom datasets (DSUALM, DSUALMH, UALM2, DSUALM10, and DSUALM10H) included devices such as fans, freezers, televisions, vacuum cleaners, kettles, lamps, laptops, and more, eventually covering up to 10 appliances. For the new OMPM hardware, low-consumption devices were prioritized to increase the complexity of disaggregation.

The primary software tool used was NILMTK v0.4.0, extended with NILMTK-Contrib. Custom data converters were developed to process measurements from oZm v1, oZm v3, and OMPM into a format compatible with NILMTK, enabling seamless integration of custom data into the evaluation framework.

Both traditional algorithms (CO, FHMM, Mean, and Hart85) and deep learning approaches (DAE, RNN, Seq2Point, Seq2Seq, and WindowGRU) were evaluated. The analysis used a broad set of standard metrics provided by NILMTK, including F1-score, MAE, RMSE, NDE, MNEAP, and EAE, to gain a comprehensive understanding of each model’s performance.

Furthermore, this study investigated the impact of several factors on disaggregation performance, such as the sampling rate (ranging from 125 ms to 30 min), the presence of harmonic content (up to the 50th order), and the application of power filters with different thresholds (10 W, 50 W, and 100 W). For deep learning models, due to their high computational cost, evaluations were mainly conducted under a specific configuration: 1 s sampling, with harmonic content, and without power filters.

4. Results and Discussion

The following sections present the datasets generated in this study. The analysis uses standardized NILMTK metrics across different devices and configurations to identify the most suitable datasets and setups for specific load disaggregation scenarios.

4.1. Metrics Obtained from the Different Datasets Generated with Open Hardware

Five new public datasets (DSUALM, DSUALMH, UALM2, DSUALM10, and DSUALM10H) generated using the oZm v1, oZm v3, and OMPM devices [32] were reviewed. The datasets concerning NILM algorithms were analyzed and evaluated using NILMTK metrics. The goal was to guide the dataset selection and creation for NILM by comparing these metrics with other datasets.

The datasets were created using devices with precise electrical measurements, enabling the capture of detailed usage patterns. By applying NILMTK metrics, this study evaluates the performance of the NILM algorithm across these datasets, helping to identify which datasets are most effective for specific applications. This analysis highlights the impact of factors like the sampling frequency and data resolution on NILM performance, contributing to advancements in energy efficiency and advanced grid management technologies.

4.1.1. DSUALM and DSUALMH

The DSUALMH (UAL dataset with harmonics) is a new high-resolution dataset specifically designed for NILM. It was published in July 2023, built upon the DSUALM dataset released in February 2022. This dataset was generated using six oZm devices [28] (Figure 6), which are open-source and open hardware energy meters capable of measuring electrical variables at a high sampling frequency of 15.625 Hz. The DSUALMH is a multichannel dataset that includes measurements from five specific household appliances (a fan, a freezer, a television, a vacuum cleaner, and a boiler) along with aggregated electricity consumption data. Its comprehensive structure features more than 150 electrical variables, encompassing complex harmonic values of the current, voltage, and power up to the 50th order, as well as transient information.

The creation and labeling process for DSUALMH involved collecting measurements via the OZM API, preprocessing the data to resolve date and time discrepancies, and converting the files into a CSV format. Subsequently, a new converter specifically developed and integrated into the NILMTK (ualm5t.convert_ualmt) was utilized to organize these files into a unified HDF5 format, complemented by the YAML metadata. This converter supports 13-digit timestamps. The DSUALMH dataset is publicly accessible to the research community and is hosted on the Zenodo platform [33], ensuring its long-term availability and promoting transparency and reproducibility in NILM research. Since its introduction in July 2023, the access to the dataset is already enabled through the University of Almería’s datasets on Zenodo.

The dataset of UAL with harmonics (DSUALMH) is an extension of DSUALM, also obtained using the oZm v1 in a laboratory setting [34]. This dataset includes fundamental and secondary electrical characteristics, such as the complex harmonic current, voltage, and power values up to the 50th order for the same five devices and aggregate data.

The best results for nine algorithms are shown in Table 2. These findings suggest that harmonic support is more effective in statistical and traditional models in improving the accuracy and disaggregation. In contrast, in complex deep learning models, its impact is less deterministic and depends more on the architecture and the quality of the data. The choice of metrics must consider the type of problem (regression or classification) and the nature of the data to properly assess the improvement produced by harmonics.

These datasets underscore the importance of incorporating harmonics and sampling strategies in NILM applications, as these factors can significantly enhance the accuracy of load disaggregation models.

4.1.2. UALM2

The UALM2 dataset (also referred to as DSUALM2) is a new high-resolution dataset designed for NILM. It was launched in February 2023 by the UAL. This dataset was generated using the OMPM device (developed by the University of Almeria, Almeria, Spain) [31], a low-cost and scalable open hardware solution. UALM2 is a multichannel dataset that includes measurements of the main aggregate consumption and five individual low-power profile household appliances: a fan, a deep fryer, an LED lamp, a laptop, and an incandescent lamp. The measurements are taken at a sampling frequency above 10 Hz, and the dataset includes variables such as the RMS voltage, RMS current, active power, frequency, and power factor. Unlike other datasets in the UALM series, UALM2 does not include measurements of the apparent or reactive power or harmonics, due to the characteristics of the PZEM-004 modules used by the OMPM. The creation and labeling process involved the development of a new converter (UALM2) specifically for this dataset, adapted from the DSUAL converter (based on the iAWE converter), to organize the files in a unified HDF5 format compatible with the NILMTK. This converter supports 13-digit timestamps, and each CSV file contains the header “timestamp, VLN, A, W, F, PF”.

UALM2 is a publicly accessible dataset for the research community [35], promoting transparency and reproducibility in NILM research. As it was launched in February 2023, the dataset is already available for download and use by researchers. No future timeline for access or hosting plan has been specified, as the dataset is actively available on the Zenodo platform.

This dataset includes, in addition to the metadata, the electrical measurements (voltage, current, active power, and frequency) of several hours of aggregate consumption and of five low-consumption profile devices: a fan, fryer, LED lamp, laptop computer, incandescent lamp, and one fan, as shown in Figure 7.

The results of applying eight algorithms are shown in Table 3. Classical models, such as the CO, Mean, and FHMM, demonstrate a solid performance in terms of accuracy, classification, and disaggregation, while maintaining a low computational time, making them suitable for applications that require speed and stability. In contrast, deep learning-based models, such as DAE, Seq2Seq, RNN, Seq2Point, and WindowGRU, exhibit higher complexity and longer runtimes, with mixed results. Among them, WindowGRU shows the most promising balance between performance metrics and computational cost. Therefore, while classical approaches remain reliable for general use, WindowGRU offers potential advantages in capturing complex patterns when sufficient computational resources are available.

The OMPM’s innovative design and the use of the CO algorithm demonstrate its potential for accurate energy monitoring and load disaggregation, making it a valuable tool for optimizing energy efficiency in various applications.

4.1.3. DSUALM10H and DSUALM10

In June 2023, the dataset of the UAL for 10 appliances with harmonics (DSUALM10H) was expanded to include 10 additional appliances using three oZm v3 devices (see Figure 8), significantly improving its scope and diversity.

The DSUALM10H dataset is a high-resolution, multichannel dataset for NILM, released in June 2023. It was created using three open-source oZm v3 devices, capturing twelve channels—including the aggregated main consumption and ten common household appliances—with a sampling rate of 15.625 Hz. The dataset comprises 150 electrical variables, including the current, voltage, and power harmonics up to the 50th order, as well as transient information, allowing for a detailed analysis of both fundamental and harmonic electrical characteristics. The data acquisition and preprocessing involved converting raw measurements to CSV and then to a unified HDF5 format with YAML metadata via a dedicated NILMTK converter (ualmt10h.convert_ualmt10h), ensuring accurate storage and facilitating reproducibility. Publicly available [36], DSUALM10H promotes transparency and supports advanced NILM research by providing rich, open-access data generated with open hardware [37].

DSUALM10, by contrast, is a new dataset derived from DSUALM10H with harmonic components removed. This allows researchers to compare the impact of harmonics on the NILM performance. Table 4 presents the results obtained using nine algorithms with one-second sampling, without any filter, and with and without harmonics. The comparison highlights the role of harmonics in improving certain metrics for specific algorithms.

The analysis suggests that, on average, the inclusion of harmonics does not significantly improve the overall performance of NILM algorithms across all appliances. While some algorithms show moderate improvements in metrics such as the NDE, F1-score, RMSE, and MAE when harmonics are considered, others show little to no benefit or even a slight decrease in performance. This indicates that the effectiveness of harmonics depends on the specific algorithm and its modeling approach rather than universally enhancing results.

Furthermore, the impact of harmonics on the runtime varies by algorithm; in some cases, harmonics reduce the execution time, while in others, they significantly increase it. This highlights a trade-off between accuracy and computational cost that should be carefully considered when choosing to incorporate harmonics in NILM methods.

DSUALM10 and DSUALM10H provide valuable and diverse data for testing and validating these algorithms under various measurement conditions. They provide essential insights into how the inclusion of harmonics and different analytical approaches affects the accuracy and efficiency of load disaggregation, which is crucial for advancing energy efficiency and smart grid management.

4.2. Comparative Performance Between Datasets

The performance of three open-source hardware meters (OMPM, oZm v1, and oZm v3) was compared across five common household appliances (television, incandescent lamp, vacuum cleaner, fan, and freezer) using the four standard NILMTK metrics: the F1-score, EAE, MNEAP, and RMSE. The analysis considers different reference algorithms, sampling times, filling methods, and the inclusion or exclusion of the voltage, current, and power harmonics up to the 50th order.

Each measurement taken with real appliances under conditions of maximum randomness is plotted on the vertical axis of the graphs for each metric and device. The nomenclature for specifying each measurement includes ten fields separated by a hyphen, detailing the year, month, day, start time, end time, experiment number, algorithm, sampling time, filling method, meter used, and harmonic support. For example, the measurement “Jun_exp3_FHMM_90s_first_oZmv3_H=YES” was conducted in June, corresponding to experiment number 3, using the FHMM algorithm with 90 s of sampling, the first filling method, the oZm v3 as the meter, and considering harmonics.

A naming convention based on the meters used to collect the data was adopted to simplify references to the datasets. Specifically, DSUALM10H was referred to as the oZm v3, DSUALMH as the oZm v1, and UALM2 as the OMPM. This approach maintains consistency and clarity when discussing these datasets and their respective meters.

4.2.1. F1-Score

Table 5 summarizes all the measurements for the three datasets, the mean and standard deviation data for each meter, and the application for the F1-score.

The oZm v3 shows a more variable and generally lower performance than the oZm v1, likely due to the increased number of appliances in its datasets. For example, the oZm v3’s F1-score ranges from 0.182 for the vacuum cleaner to 0.609 for the television. In contrast, the oZmv1 consistently delivers a superior performance, particularly for the vacuum cleaner and television, with high mean scores and low variability. It achieves a mean F1-score of 0.707 for the TV and 0.825 for the vacuum cleaner. The OMPM exhibits an intermediate performance with a high variability, achieving lower mean F1-scores than the oZm v1. Its effectiveness varies significantly depending on the appliance and conditions. These findings highlight the importance of selecting the proper meter for an optimal NILM performance.

The heat map in Figure 9 illustrates the performance of algorithms in identifying the energy consumption through the F1-score. Columns that are greener indicate a better device identification, while redder columns suggest difficulties. The performance consistency is evaluated by observing the uniformity or variability in the F1-score within columns.

The data and heat map reveal notable differences in the device identification based on the meter and appliance type. The TV generally shows a high F1-score, especially with oZm meters, indicating recognizable consumption patterns. However, some OMPM measurements show lower scores due to sampling frequencies or specific operation modes.

The oZm v3 shows a low F1-score for incandescent lamps, while the OMPM performs well in some cases. The simple on/off pattern of lamps may make it harder to differentiate. The vacuum cleaner performs well with the oZm v1, achieving a high F1-score, but the oZm v3 shows lower scores. The fan identification is generally good, but the OMPM yields a lower F1-score. The freezer performance is variable, with better results from the oZm v1. These findings emphasize the importance of selecting appropriate meters and algorithms for an optimal NILM performance.

Although the heat map gives us a complete overview of all the measurements, it also clarifies the importance of combining all the measurements obtained from the appliances by the meter/dataset, as shown in Figure 10.

The oZm v1 shows the best overall performance in the F1-score, excelling in the detection of the vacuum cleaner (0.82), freezer (0.58), TV (0.71), and fan (0.67). The oZm v3, with more variables and a generally lower performance, has its best result on the TV (0.61), but reduced values on other devices, possibly due to the larger amount of equipment in its dataset. The OMPM shows a mixed performance, standing out in the incandescent lamp (0.68) and with moderate values in the fan (0.59), but with low scores in the television (0.35).

4.2.2. EAE

The heat map for the EAE in Figure 11 for the five appliances measured with the three meters shows an excellent overall performance. A heat map visualizes the performance of algorithms in identifying energy consumption through the EAE metric. The color patterns reveal the efficacy and consistency of the algorithms: green columns indicate a better identification, while red columns signal difficulties. The uniform green color across all columns reflects the consistent performance for the devices.

The television, incandescent lamp, and fan exhibit values very close to ideal, with the oZm v1 achieving a perfect performance in several cases. The vacuum cleaner shows more significant variability but maintains an excellent performance. The freezer, measured only with oZm models, presents optimal results. These findings highlight the effectiveness of these meters in accurately identifying the appliance energy consumption, which is crucial for optimizing energy efficiency in various settings. Overall, the oZm v1 stands out for its superior application performance. The oZm v3 and OMPM show small peaks in some appliances, although they insignificant given the scale used. The results reflect a high precision in the identification and monitoring of the household appliances studied, with values very close to the ideal of zero for the EAE metric.

4.2.3. MNEAP

This metric is like the EAE but measures power rather than energy. In addition, the error is normalized to be comparable between different datasets and devices. All three datasets show similar results, with the MNEAP metric ideally being as low as possible. The oZm v3 has a mean value of 0.573 and a low variability (0.144), while the oZm v1 has a higher mean value (1.073) and variability (0.179). The OMPM shows the highest mean (1.245) and variability (1.276), although it is still relatively low. In the case of the lamp, the oZm v3 has a mean of 1.153 and a standard deviation of 0.409, while the OMPM has a mean of 0.563 and less variability (0.358). Figure 12 shows MNEAP results in a heat map.

The analysis of different devices shows that MNEAP values vary slightly depending on the application and the meter. All three meters show similar results for the fan due to its low load. For the television, the data are comparable between oZms, but the OMPM has more oscillations due to its lower sampling rate. Despite using the same meter more frequently, the number of applications slightly affects the results, highlighting the importance of the context in measurements. The heat map (Figure 13) presents an overview of all the measurements, combining all the measurements obtained from the appliances by the meter/dataset.

The oZm v3 offers a better overall balance, excelling on the TV and fan, although with more significant errors in the vacuum and freezer. The oZm v1 performs well in the vacuum cleaner and fan but is less accurate in the TV and freezer. The OMPM excels with the incandescent lamp and fan, but it achieves the worst value on the TV.

4.2.4. RMSE

Regarding the RMSE, the oZm v3 has a mean of 22,855 and a standard deviation of 4,799, indicating a moderate variability. At the same time, the oZm v1 shows a more significant variability with a mean of 30,566 and a standard deviation of 7003. The OMPM has the most considerable variability, with a mean of 38,896 and a standard deviation of 82,267. Despite some outlier measurements, all three meters show similar results, highlighting that the OMPM can offer the best value in some instances. However, concerning the RMSE metric, the oZm v3 shows a poor performance with a mean of 297.678 and a high variability. At the same time, the OMPM offers values closer to the ideal (mean of 18.936 and standard deviation of 7.312). Figure 14 shows the heat map.

In the lower part corresponding to the measurements taken with the OMPM, the results are much higher than those obtained with the oZm (yellow tones). It also highlights how the data received with the oZm v3 is much worse than with the oZm v1 (tones tending to red). Although the heat map gives us a complete overview of all the measurements, it also clarifies the importance of combining all the measurements obtained from the appliances by the meter/dataset, as shown in Figure 15.

In summary, when analyzing the RMSE data, the performance of each meter varies significantly depending on the appliance. The oZm v3 seems more accurate for the TV and fan, but less accurate for the incandescent lamp and vacuum. The oZm v1 performs well for the freezer but has higher errors in other devices. The OMPM stands out for its low RMSE in the incandescent lamp but performs less impressively on the TV and fan.

4.2.5. The Summary of the Main Findings

By analyzing the results obtained by the three types of meters, Figure 16 shows how the metrics are affected by the type of meter and the number of applications supported in each dataset for each appliance.

The results indicate that the oZm v1 has the best F1-score, excelling in detecting the vacuum cleaner (0.82) and the freezer (0.58). The OMPM also performs well in the incandescent lamp (0.67) and fan (0.59). In contrast, the oZm v3 shows lower values in the F1-score, especially in the vacuum cleaner (0.19) and freezer (0.29), because it worked with a more complex dataset that included twice as many devices as the other models. In terms of the EAE, all three models show an excellent accuracy. The oZm v1 and OMPM have extremely low, near-zero errors on most devices. Although the oZm v3 worked with more devices, it maintains an excellent value, with an error of 0.044 in the vacuum cleaner and 0.016 in the lamp, indicating that its estimates were quite accurate. This result highlights that the model performs well even in a more complex environment.

Regarding the MNEAP, the values for the oZm v3 in the vacuum cleaner (2.34) and freezer (2.35) are higher than the other models, but not excessively high. These values can be explained by the difficulty of estimating several devices simultaneously. The OMPM and oZm v1 have fewer errors in this indicator, although it is essential to note that they worked with fewer devices, facilitating a better relative accuracy. Finally, the oZm v3 shows the highest values in the RMSE, especially in the lamp (329.98) and the vacuum cleaner (321.83), indicating a more significant variability in predicting these devices. This increased variability is expected due to the larger number of devices in the dataset. If the objective is to indicate a more significant number of devices, the oZm v3 presents the most realistic model. Conversely, if minimizing errors in a simpler environment is prioritized, the oZm v1 and OMPM may be more suitable options.

5. Conclusions

Traditional algorithms (Mean, CO, Hart85, and FHMM) offer computational efficiency and speed, making them suitable for real-time applications or resource-constrained environments; however, their accuracy decreases with increasingly complex consumption patterns. In contrast, deep learning models (DAE, RNN, Seq2Point, Seq2Seq, and WindowGRU) significantly improve the accuracy and adaptability to heterogeneous loads. Still, they entail high computational costs and prolonged execution times, which limit their applicability in latency-sensitive scenarios.

The accuracy of NILM models is influenced by critical factors: higher sampling rates enable the more precise capture of transient events; the inclusion of harmonics enhances the discrimination in classical models and does not negatively impact neural network-based models; power filters can marginally benefit traditional algorithms but often degrade the event detection in low loads, whereas deep models maintain robustness without filtering; and the appliance type determines the method suitability, with stable loads favoring classical algorithms and cyclic or complex loads requiring deep learning, subject to the computational capacity.

In the evaluation of open hardware, the oZm v1 demonstrated the best overall F1-score performance, particularly for specific devices. In contrast, the oZm v3 showed a lower accuracy, attributed to the increased complexity of the dataset. The OMPM yielded intermediate results, with a high precision in energy assignment metrics (EAE). However, an increased number of devices negatively impacted metrics such as the MNEAP and RMSE, highlighting the challenge of simultaneous disaggregation in complex scenarios.

In conclusion, the effectiveness of NILM depends on striking a balance between accuracy, the computational capacity, and the characteristics of the dataset and hardware. The design and study of new datasets, alongside the integration of open hardware and the consideration of harmonics, are essential to advancing the NILM accuracy and applicability, particularly in smart home and industrial energy monitoring contexts.

6. Limitations and Future Directions

NILM has made significant progress but still faces key limitations. There is no universally optimal algorithm, as performance depends on the type of device and signal conditions. Deep learning models, while accurate, require high computational costs that hinder their use in real-time applications or resource-constrained systems. Additionally, the lack of standardization and heterogeneity in datasets complicates comparisons and reproducibility, while the complexity of consumption patterns limits the robustness of algorithms.

Regarding hardware, current devices are often expensive and proprietary, with variability in precision and constraints for edge computing implementations. However, open hardware presents opportunities for improvement, especially in the power supply and reliability of measurement modules.

To advance the field, it is necessary to develop algorithms that integrate deep learning and edge computing to enhance efficiency and adaptability. Expanding and validating more representative datasets, optimizing open hardware design for real-time measurements, and deepening the analysis of high-frequency and transient data are also essential. Ultimately, expanding practical applications in smart homes, industry, and smart grids will enhance NILM’s impact.

Author Contributions

Conceptualization, F.G.M. and A.A.; methodology, C.R.-N., F.P., and A.A.; software, C.R.-N. and A.A.; formal analysis, C.R.-N.; investigation, C.R.-N. and F.P.; resources, F.G.M. and A.A.; data curation, C.R.-N.; writing—original draft preparation, C.R.-N. and F.P.; writing—review and editing, C.R.-N., F.P., F.G.M., and A.A.; visualization, C.R.-N. and F.P.; supervision, A.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

CO	Combinatorial Optimization
DAE	Denoising Autoencoder
DSUALM	Dataset of the University of Almeria
DSUALMH	Dataset of the University of Almeria with harmonics
DSUALM10	Dataset of the University of Almeria 10 appliances
DSUALM10H	Dataset of the University of Almeria 10 appliances with harmonics
FHMM	Factorial Hidden Markov Model
MAE	Mean Absolute Error
NDE	Normalized Disaggregation Error
NILM	Non-Intrusive Load Monitoring
OMPM	Open Multi Power Meter
oZm	OpenZmeter
RMSE	Root Mean Squared Error
RNN	Recurrent Neural Network
UAL	University of Almeria
UALM2	University of Almeria Dataset 2
WindowGRU	Windowed Gated Recurrent Unit

Appendix A

Figure A1. The code of the init file.

Figure A2. The code for creating a dataset.

Figure A3. The code for creating a model.

Figure A4. The code to obtain the metrics.

References

Francou, J.; Calogine, D.; Chau, O.; David, M.; Lauret, P. Expanding Variety of Non-Intrusive Load Monitoring Training Data: Introducing and Benchmarking a Novel Data Augmentation Technique. Sustain. Energy Grids Netw. 2023, 35, 101142. [Google Scholar] [CrossRef]
Aydin, E.; Brounen, D.; Kok, N. Information Provision and Energy Consumption: Evidence from a Field Experiment. Energy Econ. 2018, 71, 403–410. [Google Scholar] [CrossRef]
Hart, G.W. Nonintrusive Appliance Load Monitoring. Proc. IEEE 1992, 80, 1870–1891. [Google Scholar] [CrossRef]
Kelly, J.; Knottenbelt, W. Neural NILM: Deep Neural Networks Applied to Energy Disaggregation. In Proceedings of the BuildSys 2015—Proceedings of the 2nd ACM International Conference on Embedded Systems for Energy-Efficient Built, Seoul, Republic of Korea, 4 November 2015; Association for Computing Machinery, Inc.: New York, NY, USA; pp. 55–64. [Google Scholar]
Holweger, J.; Dorokhova, M.; Bloch, L.; Ballif, C.; Wyrsch, N. Unsupervised Algorithm for Disaggregating Low-Sampling-Rate Electricity Consumption of Households. Sustain. Energy Grids Netw. 2019, 19, 100244. [Google Scholar] [CrossRef]
Shin, C.; Rho, S.; Lee, H.; Rhee, W. Data Requirements for Applying Machine Learning to Energy Disaggregation. Energies 2019, 12, 1696. [Google Scholar] [CrossRef]
Çavdar, I.H.; Faryad, V. New Design of a Supervised Energy Disaggregation Model Based on the Deep Neural Network for a Smart Grid. Energes 2019, 12, 1217. [Google Scholar] [CrossRef]
Shabbir, N.; Vassiljeva, K.; Nourollahi Hokmabad, H.; Husev, O.; Petlenkov, E.; Belikov, J. Comparative Analysis of Machine Learning Techniques for Non-Intrusive Load Monitoring. Electronics 2024, 13, 1420. [Google Scholar] [CrossRef]
Irani Azad, M.; Rajabi, R.; Estebsari, A. Nonintrusive Load Monitoring (NILM) Using a Deep Learning Model with a Transformer-Based Attention Mechanism and Temporal Pooling. Electronics 2024, 13, 407. [Google Scholar] [CrossRef]
Zeng, W.; Han, Z.; Xie, Y.; Liang, R.; Bao, Y. Non-Intrusive Load Monitoring through Coupling Sequence Matrix Reconstruction and Cross Stage Partial Network. Measurement 2023, 220, 113358. [Google Scholar] [CrossRef]
Athanasiadis, C.; Doukas, D.; Papadopoulos, T.; Chrysopoulos, A. A Scalable Real-Time Non-Intrusive Load Monitoring System for the Estimation of Household Appliance Power Consumption. Energies 2021, 14, 767. [Google Scholar] [CrossRef]
Yang, C.C.; Soh, C.S.; Yap, V.V. A Non-Intrusive Appliance Load Monitoring for Efficient Energy Consumption Based on Naive Bayes Classifier. Sustain. Comput. Inform. Syst. 2017, 14, 34–42. [Google Scholar] [CrossRef]
Yan, L.; Tian, W.; Wang, H.; Hao, X.; Li, Z. Robust Event Detection for Residential Load Disaggregation. Appl. Energy 2023, 331, 120339. [Google Scholar] [CrossRef]
Rafiq, H.; Manandhar, P.; Rodriguez-Ubinas, E.; Ahmed Qureshi, O.; Palpanas, T. A Review of Current Methods and Challenges of Advanced Deep Learning-Based Non-Intrusive Load Monitoring (NILM) in Residential Context. Energy Build 2024, 305, 113890. [Google Scholar] [CrossRef]
Souza, J. Non-Intrusive Load Monitoring (NILM): An Introduction and Practical Example. Available online: https://www.linkedin.com/pulse/non-intrusive-load-monitoring-nilm-introduction-practical-souza-rpy4f/ (accessed on 11 March 2025).
Parson, O.; Fisher, G.; Hersey, A.; Batra, N.; Kelly, J.; Singh, A.; Knottenbelt, W.; Rogers, A. Dataport and NILMTK: A Building Data Set Designed for Non-Intrusive Load Monitoring. In Proceedings of the 2015 IEEE Global Conference on Signal and Information Processing (GlobalSIP), Orlando, FL, USA, 14–16 December 2015; IEEE: Piscataway, NJ, USA. [Google Scholar]
Liu, Q.; Lu, M.; Liu, X.; Linge, N. Non-Intrusive Load Monitoring and Its Challenges in a NILM System Framework. Int. J. High Perform. Comput. Netw. 2019, 14, 102–111. [Google Scholar] [CrossRef]
Vavouris, A.; Garside, B.; Stankovic, L.; Stankovic, V. Low-Frequency Non-Intrusive Load Monitoring of Electric Vehicles in Houses with Solar Generation: Generalisability and Transferability. Energies 2022, 15, 2200. [Google Scholar] [CrossRef]
Sykiotis, S.; Kaselimi, M.; Doulamis, A.; Doulamis, N. Electricity: An Efficient Transformer for Non-Intrusive Load Monitoring. Sensors 2022, 22, 2926. [Google Scholar] [CrossRef]
Lakshmi Ravi, D. Non-Intrusive Load Monitoring (NILM): A Comprehensive Review. Int. J. Innov. Res. Technol. 2024, 11, 853–858. [Google Scholar]
Nipun, B.; Jack, K.; Oliver, P.; Haimonti, D.; William, K. NILMTK: An Open Source Toolkit for Non-Intrusive Load Monitoring. In Proceedings of the 5th International Conference on Future Energy Systems, Cambridge, UK, 11–13 June 2014. [Google Scholar] [CrossRef]
Makonin, S.; Popowich, F. Nonintrusive Load Monitoring (NILM) Performance Evaluation. Energy Effic. 2015, 8, 809–814. [Google Scholar] [CrossRef]
Kukunuri, R.; Batra, N.; Pandey, A.; Malakar, R.; Kumar, R.; Krystalakos, O.; Zhong, M.; Meira, P.; Parson, O. NILMTK-Contrib: Towards Reproducible State-of-the-Art Energy Disaggregation. In Proceedings of the 6th ACM International Conference on Systems for Energy-Efficient Buildings, Cities, and Transportation, New York, NY, USA, 13–14 November 2019. [Google Scholar]
Kolter, J.Z.; Johnson, M.J. REDD: A Public Data Set for Energy Disaggregation Research. In Proceedings of the 1st KDD Workshop on Data Mining Applications in Sustainability, San Diego, CA, USA, 21 August 2011. [Google Scholar]
Kelly, J.; Knottenbelt, W. The UK-DALE Dataset, Domestic Appliance-Level Electricity Demand and Whole-House Demand from Five UK Homes. Sci. Data 2015, 2, 150007. [Google Scholar] [CrossRef]
Beckel, C.; Kleiminger, W.; Cicchetti, R.; Staake, T.; Santini, S. The ECO Data Set and the Performance of Non-Intrusive Load Monitoring Algorithms. In Proceedings of the BuildSys 2014—Proceedings of the 1st ACM Conference on Embedded Systems for Energy-Efficient Buildings, Memphis, TN, USA, 3 November 2014; Association for Computing Machinery: New York, NY, USA; pp. 80–89. [Google Scholar]
Certification, L. Pecan Street Dataport. Available online: https://www.pecanstreet.org/dataport/ (accessed on 17 June 2025).
Berrettoni, G.; Ferrigno Fabrizio Marignetti, L. Resilience of the Italian Electricity Grid: Objectives and Future Challenges; University of Cassino: Cassino, Italy, 2024. [Google Scholar]
Bonfigli, R.; Principi, E.; Fagiani, M.; Severini, M.; Squartini, S.; Piazza, F. Non-Intrusive Load Monitoring by Using Active and Reactive Power in Additive Factorial Hidden Markov Models. Appl. Energy 2017, 208, 1590–1607. [Google Scholar] [CrossRef]
Gawin, B.; Małkowski, R.; Rink, R. Will NILM Technology Replace Multi-Meter Telemetry Systems for Monitoring Electricity Consumption? Energies 2023, 16, 2275. [Google Scholar] [CrossRef]
Rodríguez-Navarro, C.; Portillo, F.; Martínez, F.; Manzano-Agugliaro, F.; Alcayde, A. Development and Application of an Open Power Meter Suitable for NILM. Inventions 2023, 9, 2. [Google Scholar] [CrossRef]
Rodriguez-Navarro, C. Datasets of Almeria University. Available online: https://zenodo.org/records/13739113 (accessed on 27 February 2025).
Rodriguez-Navarro, C. DSUALMH. Available online: https://zenodo.org/records/13739563 (accessed on 17 June 2025).
Viciana, E.; Alcayde, A.; Montoya, F.; Baños, R.; Arrabal-Campos, F.; Zapata-Sierra, A.; Manzano-Agugliaro, F. OpenZmeter: An Efficient Low-Cost Energy Smart Meter and Power Quality Analyzer. Sustainability 2018, 10, 4038. [Google Scholar] [CrossRef]
Rodriguez_Navarro, C. OMPM. Available online: https://zenodo.org/records/13740059 (accessed on 17 June 2025).
Rodriguez-Navarro, C. DSUALM10H. Available online: https://zenodo.org/records/13740027 (accessed on 17 June 2025).
Rodriguez-Navarro, C. Performance Analysis of Nine Disaggregation Algorithms on the DSUALM10 Dataset: A Notebook Study. Available online: https://zenodo.org/records/15575404 (accessed on 17 June 2025).

Figure 1. History and evolution of NILM (Source: own elaboration).

Figure 2. Pipeline of NILMTK (Source: own elaboration).

Figure 3. The block diagram of the oZm v1 (Source: own elaboration).

Figure 4. The front view of the oZm v3.

Figure 5. OMPM electrical schematic (Source: own elaboration).

Figure 6. An overview of the diagram to obtain the DSUALM and DSUALMH (Source: own elaboration).

Figure 7. Schema of appliances used in UALM2 (Source: own elaboration).

Figure 8. Schema of appliances used in DSUALM10H (Source: own elaboration).

Figure 9. The heat map for the F1-score.

Figure 10. Mean F1-score by appliance and meter.

Figure 11. The heat map for the EAE metric.

Figure 12. The heat map for the MNEAP.

Figure 13. Mean MNEAP by appliance and meter.

Figure 14. The heat map for the RMSE.

Figure 15. Mean RMSE by appliance and meter.

Figure 16. Metrics by meter and appliance. (a) F1-score. (b) EAE. (c) MNEAP. (d) RMSE.

Table 1. Key features of NILM methods.

Methods	Advantages	Disadvantages	Practical Applications
Traditional models (HMM, FHMM, and decision trees)	Interpretable; requires less training data	Lower accuracy in scenarios with high consumption variability	Used in early NILM implementations, such as energy monitoring systems in research projects like REDD and UK-DALE datasets
Neural networks (CNN, RNN, and deep learning)	High accuracy, can capture complex patterns	Require large amounts of data and high computational power	Applied in smart home energy management platforms like Google’s Nest and Sense Energy Monitor
Advanced techniques (CSPNet and Coupled Sequence Matrix Reconstruction)	Improved performance by leveraging temporal variations	Still in experimental stages; lack of standardized metrics	Being tested in advanced NILM research for industrial and smart grid applications
Edge computing-based NILM	Lower latency, enhanced privacy, and reduced bandwidth usage	Hardware limitations and energy consumption constraints	Used in IoT-enabled smart meters to provide real-time feedback in smart homes and microgrid management

Table 2. Metrics of the execution of nine algorithms of disaggregation.

	MAE (W)		RMSE (W)		F1-Score		NDE		Runtime (s)
Harmonics	No	Yes	No	Yes	No	Yes	No	Yes	No	Yes
CO	457.39	455.3	511.08	510.48	0.57	0.56	108.38	107.95	1.22	1.33
Mean	387.05	241.29	389.42	242.32	0.56	0.56	108.11	54.17	0.49	0.76
Hart85	238.64	238.63	405.68	405.68	0.45	0.45	41.39	41.39	1.11	1.25
FHMM	14.87	13.99	25.09	25.92	0.56	0.57	2.61	2.64	0.93	1.03
DAE	4.09 × 10¹²	3.1 × 10¹²	1.43 × 10¹³	1.08 × 10¹³	0.20	0.29	2.52 × 10¹⁰	1.93 × 10¹⁰	27.07	21.28
RNN	368.1	368.1	468.33	468.33	0.32	0.32	65.15	65.15	682.77	682.77
Seq2Point	4.31 × 10¹²	4.54 × 10¹²	1.58 × 10¹³	1.67 × 10¹³	0.03	0.28	3.00 × 10¹¹	1.75 × 10¹¹	208.38	237.83
Seq2Seq	3.26 × 10¹²	3.17 × 10¹²	1.61 × 10¹³	1.30 × 10¹³	0.19	0.17	8.43 × 10¹⁰	9.10 × 10¹⁰	34.67	30.01
WindowGRU	3.09 × 10¹²	3.12 × 10¹²	2.46 × 10¹³	2.34 × 10¹³	0.30	0.33	7.34 × 10¹²	5.23 × 10¹²	1321.34	970.98

Table 3. UALM2 metrics with eight algorithms, one-second sampling, and no filters.

Algorithm	MAE (W)	RMSE (W)	F1-Score	NDE	Runtime (s)
CO	9.49	12.26	0.7	0.43	1.49
Mean	13.19	12.88	0.17	0.48	0.58
FHMM	12.71	15.53	0.66	0.5	1.7
DAE	4.01 × 10¹²	2.10 × 10¹³	0.2	3.54 × 10¹⁰	27.07
RNN	13.36	14.53	0.17	0.51	2723.48
Seq2Point	11.6	12.65	0.17	0.56	841.65
Seq2Seq	6.04 × 10¹¹	1.52 × 10¹³	0.2	9.06 × 10¹⁰	34.67
WindowGRU	8.73	10.55	0.51	0.43	3869.17

Table 4. Metrics of DSUALM10 and DSUALM10H with nine algorithms, one-second sampling, and no filters.

Algorithm	Harmonics	NDE	F1-Score	RMSE (W)	MAE (W)	Runtime (s)
CO	No	1.853	0.463	376.207	208.329	3.25
CO	Yes	1.859	0.445	383.763	216.945	2.61
Mean	No	0.827	0.499	295.45	241.034	1.31
Mean	Yes	0.827	0.499	295.45	241.034	1.22
Hart85	No	0.92	0.114	282.7	144.046	8.54
FHMM	No	1.101	0.419	405.841	224.72	66.31
FHMM	Yes	0.984	0.39	322.955	158.119	61.76
DAE	No	0.743	0.529	242.217	169.454	310.57
DAE	Yes	0.736	0.522	234.881	154.166	138.07
RNN	No	0.652	0.593	191.057	112.596	5456.35
RNN	Yes	0.619	0.558	180.981	111.258	1880.32
Seq2Ppoint	No	0.633	0.555	189.776	111.629	1431.18
Seq2Ppoint	Yes	0.629	0.528	203.484	132.55	754.77
Seq2Seq	No	0.619	0.526	188.774	121.387	524.77
Seq2Seq	Yes	0.689	0.495	225.348	142.535	8613.56
WindowGRU	No	0.714	0.5	232.156	150.051	6982.46

Table 5. Mean and standard deviation data for each meter and application for the F1-score.

		TV	Lamp	Vac. Cleaner	Fan	Freezer
DSUALM10H (oZm v3)	Mean	0.609	0.289	0.182	0.453	0.287
DSUALM10H (oZm v3)	SD	0.248	0.202	0.109	0.232	0.165
DSUALMH (oZm v1)	Mean	0.707	No data	0.825	0.664	0.578
DSUALMH (oZm v1)	SD	0.154	No data	0.125	0.128	0.126
UALM2 (OMPM)	Mean	0.350	0.675	No data	0.589	No data
UALM2 (OMPM)	SD	0.275	0.254	No data	0.254	No data

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Rodriguez-Navarro, C.; Portillo, F.; Montoya, F.G.; Alcayde, A. The Design, Creation, Implementation, and Study of a New Dataset Suitable for Non-Intrusive Load Monitoring. Appl. Sci. 2025, 15, 7200. https://doi.org/10.3390/app15137200

AMA Style

Rodriguez-Navarro C, Portillo F, Montoya FG, Alcayde A. The Design, Creation, Implementation, and Study of a New Dataset Suitable for Non-Intrusive Load Monitoring. Applied Sciences. 2025; 15(13):7200. https://doi.org/10.3390/app15137200

Chicago/Turabian Style

Rodriguez-Navarro, Carlos, Francisco Portillo, Francisco G. Montoya, and Alfredo Alcayde. 2025. "The Design, Creation, Implementation, and Study of a New Dataset Suitable for Non-Intrusive Load Monitoring" Applied Sciences 15, no. 13: 7200. https://doi.org/10.3390/app15137200

APA Style

Rodriguez-Navarro, C., Portillo, F., Montoya, F. G., & Alcayde, A. (2025). The Design, Creation, Implementation, and Study of a New Dataset Suitable for Non-Intrusive Load Monitoring. Applied Sciences, 15(13), 7200. https://doi.org/10.3390/app15137200

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

The Design, Creation, Implementation, and Study of a New Dataset Suitable for Non-Intrusive Load Monitoring

Abstract

1. Introduction

2. NILMTK

3. Materials and Methods

4. Results and Discussion

4.1. Metrics Obtained from the Different Datasets Generated with Open Hardware

4.1.1. DSUALM and DSUALMH

4.1.2. UALM2

4.1.3. DSUALM10H and DSUALM10

4.2. Comparative Performance Between Datasets

4.2.1. F1-Score

4.2.2. EAE

4.2.3. MNEAP

4.2.4. RMSE

4.2.5. The Summary of the Main Findings

5. Conclusions

6. Limitations and Future Directions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI