MORED: A Moroccan Buildings’ Electricity Consumption Dataset

This paper consists of two parts: an overview of existing open datasets of electricity consumption and a description of the Moroccan Buildings’ Electricity Consumption Dataset, a first of its kind, coined as MORED. The new dataset comprises electricity consumption data of various Moroccan premises. Unlike existing datasets, MORED provides three main data components: whole premises (WP) electricity consumption, individual load (IL) ground-truth consumption, and fully labeled IL signatures, from affluent and disadvantaged neighborhoods. The WP consumption data were acquired at low rates (1/5 or 1/10 samples/s) from 12 households; the IL ground-truth data were acquired at similar rates from five households for extended durations; and IL signature data were acquired at high and low rates (50 k and 4 samples/s) from 37 different residential and industrial loads. In addition, the dataset encompasses non-intrusive load monitoring (NILM) metadata.


Introduction
Public datasets are published to help advance research fields by overstepping the laborious tasks of data acquisition and data management, and providing researchers with valuable data. One of these research fields where such datasets are useful is energy disaggregation, where researchers have been collecting and publishing datasets of electricity consumption for almost a decade now. In this paper, we focus on this kind of dataset.
Energy disaggregation revolves around the identification of the energy consumption of individual loads from the aggregated energy consumption records. It is considered a key element, particularly for low-cost and simple energy monitoring in buildings that are experiencing a growing and highly variable demand for energy [1]. Energy disaggregation gives rise to new opportunities and new applications in the energy management of microgrids [2], in smart homes [3], and in smart cities [4][5][6].
The lack of energy consumption datasets has long hindered the field's progression. The last decade has seen a collective effort towards alleviating this issue. Since the publication of REDD [7] in 2011, several datasets, mainly intended for energy disaggregation purposes, have been published every year. These datasets can be classified into event-based datasets (EB) and event-less datasets (EL). The main difference between EB and EL datasets is that the former type provides events that are additional information specifying equipment's state changes happening in the consumption measurements. In general, such datasets provide different consumption data: whole premises (WP) aggregate consumption, individual circuit consumption (IC), and individual load consumption (IL). It is important to note that: • WP measurements reflect the global consumption of premises. They are taken from the main switchboard. • IL measurements reflect the consumption of individual loads. There are two types: -Signatures (or traces), which are unique consumption patterns of loads [8], monitored individually for short periods of time independently of other measurements (as in HFED [9] or Tracebase [10]).

-
Ground-truth consumption data (also called plug-level data), which are the consumption of loads measured at individual plugs of monitored premises for longer durations and are often taken in parallel with the acquisition of the WP consumption (as in BLUED [11], Dataport [12], or BLOND [13], etc.). • IC measurements reflect the circuit-level energy measurements from the premises' electrical mains. IC consumption data can describe the consumption of a combination of loads or those of individual loads, in which case IC measurements become similar to IL measurements.
Such datasets reflect the energy consumption in residential or commercial premises; therefore, they have proven to be useful in energy management applications, including load disaggregation and forecasting, energy efficiency, photovoltaic systems sizing, etc. In addition, their public release has highlighted several issues regarding the collection and utilization of the datasets [14]. Apart from the obvious differences in granularity and formats in which data are provided, building datasets raise challenges in the acquisition phase (e.g., difficulty in acquiring IL data from all plugs available in a commercial building, and difficulties in accessing plugs of some loads), or in the events labeling process (e.g., difficulty to log all events as they happen), resulting in short durations of the monitoring of WP consumption with events, or even in missing or mismatching data between the consumption and the events list. In addition, the lack of a general consensus over suitable metrics to assess researchers' use of datasets may hinder achieving higher disaggregation performance [15].
Nevertheless, there has been a boost of interest in the field. For instance, Batra et al in [16] and then in [17], provided the Non-Intrusive Load Monitoring Toolkit (NILMTK) (v0.1 then v0.2), which is a Python-based open source toolkit; in addition to containing multiple open datasets, the toolkit facilitates their processing (e.g., parsing, processing, and diagnosing datasets) and the assessment of disaggregation algorithms. Further, Kelly and Knottenbelt in [18] provided NILM Metadata, a hierarchical metadata schema specific to energy disaggregation with controlled vocabularies to represent appliances, meters, buildings, and datasets. More recently, Pereira in [19] proposed a common data model with three main data entities deemed essential in any dataset for energy disaggregation tasks: (i) consumption data embracing raw waveforms (i.e., current and voltage) and processed waveforms (i.e., power metrics); (ii) ground-truth data which can report the appliance activities (i.e., power events information), the user activities (e.g., doing laundry), and the consumption of individual appliances (or individual circuit consumption); (iii) data annotations, which can be either metadata or comments. The author also proposed a file format (i.e., Waveform Audio File Format) that offers several features (e.g., waveform data and annotations stored in a single compact file favoring coherent and structured datasets, optimized files with minimal overheads favoring faster dataset manipulation) that are most beneficial for creating and manipulating such datasets.
The contribution of this paper is twofold: (i) it provides an overview of prior open datasets of electricity consumption, and (ii) it introduces a new electricity consumption dataset, coined MORED, and the methodology used for its acquisition (see Figure 1). Although not limited to these fields, MORED is intended for energy-related research fields such as load disaggregation and energy forecasting. The novelty of this dataset resides in several parts:  This paper is organized in two parts: Section 2 presents a detailed survey of datasets for energy disaggregation published up to August 2019, highlighting their categories, features, and advantages. Section 3 introduces the proposed dataset, MORED.

Previous Work
To the best of our knowledge, there are 28 open datasets for energy disaggregation up to August 2019. Table 1 presents the nomenclature of the electrical quantities used in the remainder of the paper, while Table 2 outlines all known datasets. We begin by presenting a summary of EL datasets and then that of EB datasets.

El Datasets
In EL datasets, researchers provide consumption measurements without information about loads' state transitions. A quick glance at Table 2 shows that most datasets belong to this category, because the acquisition of the associated measurements is easier and less time-consuming [14]. The following is an exhaustive description of the EL datasets:

•
Reference Energy Disaggregation Dataset (REDD) [7]: It provides single-phase electric consumption data from several residential settings from the US. REDD [23,24]: It provides a variety of data describing energy consumption (i.e., electricity, water, and natural gas) of one Canadian household. It has a 2013 release (without environmental and utility billing data) and a 2016 release (i.e., AMPds2). The electrical data comprises i(t), v(t), P, Q, S, f , Pt, Qt, St, PF, and DPF, sampled at 1/60[S/s] rates, at WP and IC levels (of specific appliances or household rooms), over a monitoring period of 365 days in the first release and 730 days in the second. All file measurements are provided in CSV and HDF5 (H5) formats. The official website of AMPds is: http://ampds.org.

•
Indian Dataset for Ambient Water and Energy (iAWE) [25]: It provides a variety of data describing ambient, water and electricity consumption of one Indian household. The electrical data comprises I, V, P, Q, S, f , and PF, extracted at 1[S/s] rates and monitored at WP, IC and IL levels (nine distinct appliance classes) over a monitoring period of 73 days. Additionally, event labels were provided for a single day. All file measurements are provided in CSV format. A preprocessed subsection of the same dataset is provided in H5 format. The official website of the iAWE dataset is: http://iawe.github.io.

•
Database of appliance consumption signatures (ACS-F1 and ACS-F2) [26,27]: It provides IL electricity consumption data taken from hundreds of Swiss houses. There is a 2013 release (called ACS-F1) and a 2014 release (called ACS-F2). Both datasets comprise I, V, P, Q, f and φ of 10 classes of appliances (a total of 100 appliances) for the ACS-F1 and of 15 classes of appliances (a total of 225 appliances) extracted at 1/10[S/s] rates over a period of one hour (ACS-F1 had two acquisition sessions of one hour). All file measurements are provided in Extensible Markup Language (XML) and MATLAB (MAT) formats. The official websites of the ACS-F1 and ACS-F2 datasets are: https://icosys.ch/acs-f1 and https://icosys.ch/acs-f2, respectively. • Green Electrical ENergy Dataset (GREEND) [28]: It provides electric power measurements of nine Italian and Austrian households. The dataset comprises only P extracted at 1[S/s] and monitored at WP and IL levels over a total period of almost 356 days. All file measurements are provided in CSV and in H5 formats. The official website of GREEND is: http://www.andreatonello.com/greend-energy-metering-data-set.
• UK Domestic Appliance-Level Electricity (UK-DALE) dataset [29]: It provides electric power measurements from several households in the UK. Since its first release in 2015, it had three other releases over the years (a second 2015 release, a 2016 release and a 2017 release  [42]: It provides three-phase and single-phase industrial loads consumption measurements from Germany. It comprises IL measurements of i(t) and v(t) sampled at 50k[S/s] and spanning from a couple of seconds to several minutes depending on the appliance type. A total number of 15 distinct classes of appliances were monitored. Acquisitions were conducted taking into account loads' states, timing patterns of loads' on/off cycles, and up to three combination of loads, producing a total of 1302 measurements: 381 of single appliances, 864 of a combination of two appliances, and 56 of a combination of three appliances. All data were acquired using a 16-bit resolution. All file measurements are provided in the Technical Data Management Streaming (TDMS) format. The official website of the LILACD is: https://www.in.tum.de/i13/resources/lilacd.

Eb Datasets
As opposed to EL datasets, the EB datasets contain energy consumption measurements coupled with lists of events that describe which and when state transitions occur for each load as they are used in the premises.
Only five datasets (BLUED, SustDataED, BLOND, EMBED, and UMass Smart* Home Data Set) are EB, with EMBED being the latest and the largest covering three houses over two to three weeks [44] (see Table 2).
The following is an exhaustive description of EB datasets: • Building-Level fUlly labeled Electricity Disaggregation (BLUED) dataset [11]: A portion of BLUED is published, providing single-phase consumption readings of i(t) and v(t) sampled at 12k[S/s] rates at WP, IC, and IL levels from a single household in the US over a seven days period. BLUED also contains the event list reporting loads' power state changes. All file measurements are provided in TXT and MAT formats. The official website of the dataset is: http://portoalegre.andrew.cmu.edu:88/BLUED. It can be downloaded upon requesting the files password by email.

-
The 2013 release of Home dataset contains environmental and single-phase electrical data from three households over a period of 90 days. First, environmental data describe indoor and outdoor weather data comprising averaged temperature, humidity, wind, and rainfall metrics sampled at a 1/60[S/s] rate. Second, electrical data describe consumed energy inside households and generated energy from solar panels and wind turbines near one of the households. The electrical data comprisesP andS extracted at a 1[S/s] rate, at WP level (from every household) and IC level, P extracted at an average rate of 1/2.5[S/s] at IL level, and v(t) and f sampled at a 1[S/s] rate at WP level. In addition, event data from a single household were provided describing: (i) wall switch events (on, off, dim in percentage) at most of the wall lighting switches; (ii) thermostat events describing the homes heating and cooling systems state (on/off), changes in temperature (in Fahrenheit), and changes in their set-point (in Fahrenheit); (iii) motion events (yes/no) corresponding to an activity being detected in a previously dormant room, or an activity that has not been detected for two minutes in a previously active room, and describing occupancy in six rooms of one household; (iv) door events (open/close) corresponding to the kitchen refrigerator, its freezer compartment, and the basement freezer. The latter data of generated energy of one household comprisesĪ from the solar panels and turbines, andV from the attached battery extracted at a 1/5[S/s] rate. All file measurements are provided in CSV format.

• Public Datasets for Sustainability and Electric Energy Research: Energy Disaggregation
Research (SustData-ED) [43]: SustData-ED, an extension of the SustData dataset is published, providing single-phase electricity consumption data, and room occupancy data of a Portuguese single-family household over 10 days. SustData-ED contains i(t) and v(t) sampled at 12 Typically, datasets such as REDD and UK-DALE are employed frequently for various residential disaggregation studies in the literature. This can be attributed to many aspects, namely, being one of the first open datasets and having rich data acquired for long durations of time. Nevertheless, newer datasets such as EMBED, PRECON, and MORED can provide more consumption data and help alleviate the lack of diversity in the data when developing solutions by researchers or distribution companies alike.
As can be noticed from the summary above that a taxonomy based on the existence of event lists in the datasets is sufficient to determine which one to use for an event-based process. For instance, Figure 2 presents a taxonomy based on the provided types of data, while Figure 3 presents a taxonomy based on the provided consumption data and their sampling rates. These taxonomies can help discriminate between datasets, showcase their intrinsic features, and help researchers pinpoint the right dataset or combinations of datasets to use. In addition, such taxonomies can also help identify specific data scarcity or potential problems in the available open datasets.

Data Acquisition Campaign
During spring and summer of 2019 and 2020, a data acquisition campaign was conducted to collect data reflecting the electricity consumption of different urban premises in different Moroccan cities. We note that the consent of all participating residents was gained prior to any data acquisition. In addition, anonymity was conserved in all gathered data in order to maintain their right for privacy.

Demographic Sample
Most Moroccan households are fed with a 220 V 50 Hz two-wire single-phase power circuit, with the exception of large houses in affluent neighborhoods that can be fed with a three-phase voltage supply. Morocco is a developing country with a large proportion of its population having low purchasing power; hence, availability and usage of power-hungry loads such as air-conditioners and microwaves are generally limited to affluent neighborhoods. In addition, given that the electricity consumption patterns and types of loads may differ based on the socio-economic status of the residents, we have taken into account the size, type, and location of premises, with the view of obtaining a representative sample of the country's electricity consumption profiles. As can be seen in Figure 4, MORED acquisition campaign targeted residential premises such as apartments and semi-detached houses from affluent neighborhoods, and apartments from disadvantaged neighborhoods of Moroccan urban areas, along with laboratories at the International University of Rabat. In this section, we describe the mechanisms of the data acquisition campaign.

Acquisition System
Power acquisitions were made possible using two different systems: emonPi and IMPEC [46]. The former was used for WP and IL ground-truth consumption acquisition at low sampling rates. IMPEC was used for IL signature consumption acquisition at high sampling rates.
IMPEC-an integrated system for monitoring and processing electricity consumption in buildings (see Figure 5a)-is a system that we have developed and presented in [46]. IMPEC houses a FPGA, a real-time processor, and two IO modules of 24-bit and 16-bit resolution ADCs for current and voltage measurements respectively. In fact, voltage is measured directly using the IO module, while the current is measured using split-core current transformers (SCT), with a single SCT for a single-phase power supply and three SCTs for a three-phase supply. Compared with commercial acquisition systems, IMPEC is tailored for NILM purposes and offers several advantages, such as acquiring electricity consumption at a high sampling frequency (50k[S/s]), thereby allowing one to extract electrical features at a frequency of 4[S/s], and logging data in TDMS files. It also provides control over key acquisition aspects via its user graphical interface (i.e., acquisition rate, extraction rate, length of snapshots of raw waveforms), and allows metadata information logging (e.g., information about premises and appliances).
EmonPi is an open-source energy monitoring system based on raspberry Pi (see Figure 5b) that houses an Arduino compatible micro-controller (i.e., The Atmel ATmega328) with 10-bit resolution ADCs for current and voltage measurements. Similarly to IMPEC, the current measured with emonPi is based on SCTs, while the voltage is measured using a plug-in AC-AC adapter. Multiple emonPis were used to extract features describing WP consumption and IL ground-truth consumption of several appliances of a household at a frequency of 1/5[S/s] or 1/10[S/s] and log data in CSV files. In another hand, a Web app and a native-android app called EmonCMS corresponding to each emonPi were used by household residents in order to monitor their real-time and daily power consumption.

Acquisition Method
As showcased in Figure 6, different types of data were collected, reflecting electricity consumption in Moroccan premises during the acquisition campaign of MORED: WP consumption data, IL ground-truth consumption data, IL signature consumption data, and event data.

Individual Load ground-truth consumption acquisition
Whole-Premises consumption acquisition Individual Loads signature acquisition Figure 6. The three different types of the electricity consumption data contained in MORED.

WP Electricity Consumption Data
In order to acquire WP electricity consumption, emonPi systems were installed at the mains of Moroccan households, with one SCT hooked at one phase, for an extended duration of time (see Section 4 for more details). Figure 7 presents the WP electricity consumption acquired from household 2.

• Preliminary Notions
Differentiation between electrical loads is dependent mostly on their respective signatures, which are in turn mostly defined by loads consumption type [47]. In order to acquire IL electricity consumption, the IMPEC system was configured to operate following different processes depending on the load's consumption type. In fact, loads can be classified, based on their consumption patterns, into three main categories: on/off, finite-state, and continuously variable devices (see Figure 8) [48].
1. On/off devices: These are loads with two different states where each one corresponds to a constant energy consumption level (see Figure 8a). This type covers loads including light bulbs, toaster, etc. 2. Finite-state machine (FSM) devices: Also called multi-state devices, these are loads with finite numbers of switching states (see Figure 8b). This type of load covers washing machines, fruit-mixers, etc.
3. Continuously variable devices (CVD): These are loads with no apparent finite numbers of switching states (see Figure 8c). This type covers loads such as computers, refrigerators, etc.

IL Signature Acquisition Method
The process to capture signatures is illustrated in the diagram shown in Figure 9. We first proceed by visually analyzing the electrical consumption of one load to define how to proceed with its signature acquisition. Afterwards, the acquisition is performed following a process of three main steps: pre-activation, signature acquisition, and post-activation. The pre-(post-)activation is a period spanning approximately two seconds that precedes (follows) the signature acquisition to enable a clear capture of the turn-on (-off) transients, essential to distinguishing between loads with similar consumption [47,49]. ON  The signature acquisition of a load is performed in two ways: • In the case of an on/off or FSM load, each state is recorded for 10 s. The transition between states may be direct, or occurring during a one second interval in case of loads with buttons that can halt the execution of states upon their release.

•
In the case of a CVD, longer durations of acquisitions are helpful to capturing realistic images of its usual consumption, yet some loads might have some characteristics that can make acquisitions hard to conduct for long periods; these are mainly devices having automatic or variable operation times that depend on their initial physical states (e.g., initial temperature of an electric kettle), or equipment for industrial applications which experience high noise emissions or is difficult to operate.
As demonstrated previously in the literature [38,50], several aspects can define transients: type of the load, initial physical state of the load, task to be performed, and the time instant corresponding to the turning-on or turning-off of the device with respect to the voltage waveform time-cycle (i.e., the precise value of the voltage phase when the appliance was turned on or off). Therefore, we can observe in the measurements of an appliance different variations of the transient depending on those aspects. Thus, we recorded 10 measurement instances for each load in order to provide a richer representation of the load's consumption. •

IL Ground-truth Acquisition Method
Ground-truth data acquisition is achieved in a household by connecting to power extension cords one load and one emonPi. These extension cords would be connected in plugs that are solely used for supplying power to their corresponding devices (i.e., load and emonPi) for the duration of the acquisition. Synchronization between all emonPis available in a household is guaranteed once they are connected to the Internet by updating their internal clock with the country's clock. Loads in a household are typically chosen to be monitored based on the answer of two questions: whether it is possible to access the load's cables and whether its consumption is higher than the monitoring system's consumption. Therefore, unmonitored loads generally include routers, phone chargers, and built-in appliances that can be usually found in the kitchen. Figure 10 showcases the WP electricity consumption of household 5 (see Table 3). These data were annotated using events that were identified from processing the corresponding IL ground-truth consumption data. The portions of the WP consumption that are not annotated in this figure are due to the limited number of monitored loads in the premises.

Event Data
In order to complement the WP data, events reflecting state changes of monitored loads were added to MORED. For IL signatures, state events were identified directly from the load's acquisition process. For IL ground-truth measurements, on-off events were identified manually post-acquisition using a thresholding technique. A load is considered turned ON only when its corresponding active power, P, exceeds a specific threshold. These events are exported in a CSV file containing their corresponding timestamps and the corresponding equipment name. Figure 11 presents the concurrence plot of processed events identified from IL ground-truth consumption of 13 distinct loads acquired from six monitored premises. In this plot, multiple loads can be seen to operate simultaneously (e.g., refrigerator and freezer, monitor and laptop).  Figure 11. Concurrence plot using events identified from the IL ground-truth consumption data from six households.

Dataset Summary
MORED comprises three kinds of electricity consumption data: labeled WP and ground-truth IL consumption, WP consumption, and labeled IL signature consumption. This dataset, available for download at https://moredataset.github.io/MORED, is intended to provide more data generally and help alleviate the scarcity of EB electricity consumption datasets in the field specifically. Summaries of targeted premises and loads for WP and IL ground-truth data, WP data, and IL signature data are presented in Tables 3-5,  Metadata: These offer detailed information for all three sets of data (an appliance type, prior knowledge about the typical time of its use per day, its room-location in the premises, etc.) following the NILM Metadata schema [18]. They are reported in YAML text files.

•
Measurements data are all provided in CSV files for easy and free access. Yet, IL signature data are also provided in TDMS files since they were originally logged in this format. Figure 12 illustrates the organization of the dataset directory. Data are divided into three main subdirectories, each corresponding to a specific type of data (i.e., WP and IL ground-truth (WPILGT), WP, and IL signature (ILS)).

Conclusions and Future Work
In the first part of this article, we presented a summary of the open electricity consumption datasets, in order to provide an exhaustive overview of the current state of the art. In the second part, we presented MORED: the first public dataset of Moroccan buildings' electricity consumption. Unlike its predecessors, MORED contains three types of electricity consumption data: labeled WP and IL ground-truth consumption data acquired at 1/5[S/s] rates, fully labeled IL signature acquired at 50k[S/s] and 4[S/s], and WP consumption data acquired at 1/5[S/s] or 1/10[S/s] rates. The aim for building such a dataset is to make a contribution to the fields of energy disaggregation and smart energy management systems. Nevertheless, MORED is considered a work in progress as the authors strive to continuously add new measurements from residential and industrial buildings following the same approaches described in this paper.