Residential Power Traces for Five Houses: The iHomeLab RAPT Dataset

: Datasets with measurements of both solar electricity production and domestic electricity consumption separated into the major loads are interesting for research focussing on (i) local optimization of solar energy consumption and (ii) non-intrusive load monitoring. To this end, we publish the iHomeLab RAPT dataset consisting of electrical power traces from ﬁve houses in the greater Lucerne region in Switzerland spanning a period from 1.5 up to 3.5 years with a sampling frequency of ﬁve minutes. For each house, the electrical energy consumption of the aggregated household and speciﬁc appliances such as dishwasher, washing machine, tumble dryer, hot water boiler, or heating pump were metered. Additionally, the data includes electric production data from PV panels for all ﬁve houses, and battery power ﬂow measurement data from two houses. Thermal metadata is also provided for the three houses with a heating pump.


Summary
Datasets with measurements of the domestic energy breakdown per appliance are of interest from different perspectives. Firstly, there is an ever-increasing amount of renewable electrical energy production and with that, a growing interest in solutions that can handle its stochastic nature such as demand response [1,2], optimized self-consumption [3] or smart energy trading [4]. Datasets that measure the consumption of aggregate households, as well as the (major) appliances, are required to base the development of such solutions on actual data. Secondly, datasets consisting of domestic electrical energy consumption with an appliance breakdown are essential for the non-Intrusive Load Monitoring (NILM) research, which aims at disaggregating the domestic energy consumption on device-level [5,6]. Finally, the data can also be useful for example in research in the domains of energy usage prediction, energy usage feedback systems, time-series data analysis and processing, and consumer behavior.
Our motivation in publishing this work is primarily to extend the list of available open datasets and therefore foster innovation in the corresponding fields. Our dataset consists of electrical power traces Data 2020, 5, 17 2 of 14 from five houses in the greater Lucerne region in Switzerland spanning a period from 1.5 up to 3.5 years. For each house, the electrical energy consumption of the aggregated household and specific appliances such as dishwasher, washing machine, tumble dryer, hot water boiler, or heating pump were metered. The data includes additional electric production data from PV panels for all five houses, and power flow measurement data from a battery in the case of two houses. Three houses had a heating pump installed. For these houses, thermal metadata is also provided in order to enable corresponding simulations.
The dataset collection started in the framework of the project Wizard for the optimal management of Electrical Energy in a prosumer household (WizEE) [7] and is ongoing in the project Swiss Competence Center for Energy Research, Future Energy Efficient Buildings and Districts (SCCER FEEB&D) [8]. We, therefore, plan to release further data in the future. Based on the collected data, the authors investigated how well algorithms can predict the usage of domestic appliances solely based on historic electrical energy consumption data [9].

Relation to Other Datasets
A recent and comprehensive overview of datasets that have been used for load disaggregation work can be found e.g., in [10]. The dataset most similar to ours is named ECO [11] standing for Electricity Consumption and Occupancy dataset. It stems also from Switzerland and comprises a similar number of houses and duration. The main differences are the sampling frequency, 1Hz in case of the ECO and 5 min for the presented dataset, and submetered devices. Our dataset includes data of appliances that consume a major fraction of the electrical household energy, i.e., heat pump and hot water boilers, but also photovoltaic production and household battery power flow measurements. The Dataport [12] database containing data from over 1000 residential homes is the most notable source of data that includes measurement of such heavy power consuming appliances. Available metering data stems however all from the US with quite distinct appliances and electrical installations compared to Switzerland. To our knowledge, datasets containing electrical power measurements with a similar breadth as Dataports are only available from the Irish Commission of Energy Regulation [13] and the 'Building Data Genome Project' [14]. The former contains however only 30 min aggregated residential data whereas the latter only hourly aggregated non-residential data.

Data Description
The dataset consists of data from five houses located in the greater Lucerne region. Each house is equipped with up to seven meters that measure each of the average consumed power per sampling period. Figure 1 contains a schematic overview of the installed (sub)metering and the corresponding wiring: The main meters of the houses are represented as thick red arrows. Thin red arrows indicate submetered appliances. Other electrical connections and corresponding meters are shown in blue/green. Sensor numbers are listed in Table 1. The table contains also a description of the sensors and their denomination in the dataset. The dataset is stored in the Hierarchical Data Format (HDF) version 5 [15]. If using the Python programming language, it is easily accessible with pandas [16,17] via the 'pandas.read_hdf(path/to/file)' function. All timestamps are given in Swiss local time, i.e., UTC+1:00h during winter half-year (November to March) and UTC+2:00h during the summer half-year (April to October).  Table 1. Table 1. List of sensors installed in the different houses. Numbers in column '#' refer to those in Figure 1. 'Sensor Name' gives the name of the sensor as used in the data. 'T orig ' indicates the original sampling rate. 'M-id' lists the meter identity as given in Table 5. Abbreviations: 'pc' stands for 'power consumption', 'PV' for 'photovoltaic', 'ed' for 'event driven'.

# Description
Sensor Name Unit T orig M-id  Table 3 E_weather_* * 1h l 9 pc of dehumidifier E_dehumidifier_power W 60 s n 1 Meter d was replaced with e on 2018-4-16. 2 Please refer to Section 4.4 for the meaning of these sensors. 3 Meter a (T orig = 120 s) removed on the 2017-09-26. Meter g (T orig = 300 s) installed on 2017-11-01.

Electrical Power Data
Electrical power measurements are available in a raw and processed version. The raw sensor measurements are provided one HDF-file per sensor combined in one folder per house. The processed data is joined in one HDF-file per house as summarized in Table 2. The table contains also the logged time span for each house. The HDF-files consist of tables where each tables' column corresponds to one sensor reading. For the processed power data, we down-sampled some of the measured sensor data to have a consistent sampling interval of 5 min across all sensors of the five houses. This was mostly done for sensors of houses A, D and E. The original sampling rate of each sensor is given in column T orig of Table 1.

Weather Data
Houses C and E were equipped with a local weather station. The corresponding sensor descriptions and names are listed in Table 3. Sampling was done on an hourly basis. The available HDF-files are given in Table 2. As house D is in close proximity to E, corresponding outdoor weather data can be used for both. Table 3. Description of the data recorded by the 'My Weatherbox' in houses C and E. The technical specification of the weather station can be found in Table 5, sensor l.

Description
Sensor Name Unit T orig

Thermal Metadata
Houses A, C and D include the electrical consumption of the heat pump. To make the dataset also useful for research questions involving the thermal aspects of the heated buildings, relevant thermal characteristics of these three houses are provided in Table 4. Table 4. Thermal metadata of houses A, C and D.

Measurement Equipment and Data Collection
Metering was done with the sensors listed in Table 5. In cases where the number of pulses from the S0 interfaces per capturing period was counted, the measuring method induces the uncertainty that a major fraction of the energy of the first pulse in a capturing period could actually have been consumed before that period. One consequence thereof is that even if the actual energy consumption of an appliance was constant, the measured signal could exhibit a variation ≤ (energy/pulse). The data collection infrastructure is visualized in Figure 2: Measurements were collected in files directly by the myStrom WiFi Switches, the Fronius Symo Hybrid, and the Weatherbox. All other sensors were read-out by means of a Raspberry Pi (www.raspberrypi.org) and the collected data was then uploaded to an FTP server. All raw measurement files were finally imported into a database with custom scripts.

Processing of Raw Data
Multiple sensors in our dataset sampled with a higher frequency than once per five minutes and some sensors for the boilers of houses B and C were both event-driven and sampling at regular intervals. In order to provide an easily accessible dataset, we decided to publish a pre-processed dataset alongside the raw data. The corresponding code is available under https://github.com/ihomelab/RAPT-dataset. The pre-processing included the following steps: • Round timestamps for the respective capture period, e.g., rounding '2018-10-01 18:05:08' to '2018-10-01 18:05:00' • Convert electrical meter measurements to mean consumed power ([W]) per period. • Linearly interpolate missing data for periods up to 15 min. • Down-sample everything to a capture period of 5 min using the mean, except for the weather data, where an hourly resolution was kept. • Calculate boiler_power: the boiler_power is calculated (not measured) in the RPi Boiler Control Unit, see sensor h in Table 5. Switch-on times are provided as sensors C_boiler_heater_1\2\3.
As these sensors were also event-driven, the energy consumption was redistributed on the regular capture period of 5 min during pre-processing.

Reading the Dataset
Reading the dataset in Python is straightforward. Using the Python package pandas [16], HDF files can be read with read_hdf(...), as an example: import pandas as pd pathToHouseA = " datasets / dfA_300s . hdf " dfA_300s = pd . read_hdf ( pathToHouseA ) dfA_300s refers to a standard pandas DataFrame with the timestamp as index and available meters as columns.

Missing Data
Due to connection issues, reconstruction, and technical issues, the dataset contains missing data. It is stored as NaN. Figure 3 summarizes the corresponding information for the dataset (some weather sensors with similar missing data patterns were omitted). The different dates where logging started for the houses are easily observable. More detailed heat plots of the houses, showing the percentage of missing data per three days is provided alongside the data.
Using pandas DataFrames, the missing data can be easily extracted by df[df['sensor_name'].isnull()].index where df is the pandas DataFrame and 'sensor_name' is the name of the respective sensor as a string. Additionally, we extracted the start and end dates of missing data for all the sensors and stored them in CSV files missingData/<sensorName>.csv. These files are provided alongside the data.

Known Issues
A simple sanity check of the measured data consists of testing that the sum of all measured appliances must always be smaller or equal to the total consumed energy. In the case of houses B and C, there is a small portion of samples where this condition is not met, 0.07% and 0.28% respectively. These cases are mostly small deviations that can be attributed to the normalization of the data during the pre-processing steps, see Section 3.2.

Particularities of the Installations and Corresponding Data
House B • Houses B has a solar panel, battery and a hot water boiler. The latter two are controlled by a custom controller, see sensor h in Table 5 Here, T min = 40°C, T max = 65°C and T min night = 35°C, T min night = 45°C. Depending on the current photovoltaic production, the three heating elements boiler_heater_1/2/3 are individually switched on. Remaining power will, in all cases, be used to charge the battery to its maximum. Further excess power is only after that allowed to flow back into the electrical grid. • Installed battery: 'Fronius Solar Battery 6.0' from Fronius (www.fronius.com): Usable capacity of the battery: 4800 Wh, nominal discharging and charging power 3200 W.
House C • The family inhabiting house C was on an extended leave from 2016-06-24 until 2016-08-13 with corresponding low power consumption in that period. • The heat pump of house C is directly connected to the electrical grid that means the corresponding power is not subsumed in C_total_cons_power. In exchange for a lower electrical tariff, the utility has the right to control switch-on times of the heat pump by means of ripple control. Before the 2016-04-01, the utility inhibited switch-on with minor exception from 11 to 12 o'clock, 15 to 18 o'clock and from 22 to 2 o'clock the following morning. Starting with the 2016-04-01, the utility is allowed to variably block the heat pump, where each blocked period has to be followed by a period of equal or longer duration within which the heat pump is allowed to pull power. The ripple control signal has been obtained from the utility and is available as sensor C_hp_on_utility, a value of 1 indicating the pump is allowed to pull power. • Heating of the hot water boiler was controlled in the following ways: -Before 2016-10-17: The boiler heating is turned on if the following signals are in the 'on' status: C_boiler_on_utility and C_boiler_on_thermostat. The former corresponds to the ripple control signal from the utility and the latter indicates if the water temperature in the boiler is below a certain threshold. The sensor C_boiler_on_relais is irrelevant for this period.  Table 5: If (C_pv_prod_power -C_total_cons_power) > 900 W, the C_boiler_heater_* are controlled according to the logic as described for house B with the following difference. In case the boiler is charged during the night, battery discharging is disallowed. The sensor C_boiler_on_relais is irrelevant for this period.
• 2017-11-29: Connected new battery to power system. The battery type is identical as for house B, see corresponding specifications. • The installations of PV panels and battery had the following consequences on the electrical installation: -Before the PV installation, the solar irradiation was logged with a solar irradiation sensor Tritec Spektron 320, see Table 5. The corresponding sensor is called C_solarlog_radiation and is only available until the retrofit. -After the retrofit, electrical power consumption from the boiler was calculated by the RPi Boiler Control Unit, see sensor h in Table 5. This sensor was not available before. The power consumption level of the boiler before the retrofit did however not vary because heating elements were always jointly turned on. The consumed power can be indirectly deduced from the power steps in the aggregate signal.
House D • The house is constructed as a "Zero-energy home"-Minergie-P, see Minergie (www.minergie.ch). That means that calculated heating energy per inhabited square meter and year amounts to 33 kWh. • Solar panels have been disconnected between 2017-07-14 and 2017-10-18 because panels were exchanged. The area and exposition of the installed panels did not change but due to the improved efficiency, peak power changed from 8.14 kW to 10.45 kW.
House E • A new dishwasher has been installed on the 2017-09-01.
• A new washing machine has been installed on the 2017-12-15 • The dehumidifier is located in the cellar. Its control logic ensures that the cellar is dehumidified at all times while maximizing the usage of excess solar power to this end. That means that every 10 min, the control logic checks if the relative humidity in the cellar exceeds an upper or falls below a lower limit and switches the dehumidifier on or off, respectively. Depending on the situation, different limits are applied. The limits are listed in Table 6. Table 6. Situation dependent upper and lower limits employed in the control logic of the dehumidifier of house E.