FIKWaste: A Waste Generation Dataset from Three Restaurant Kitchens in Portugal

In the era of big data and artificial intelligence, public datasets are becoming increasingly important for researchers to build and evaluate their models. This paper presents the FIKWaste dataset, which contains time series data for the volume of waste produced in three restaurant kitchens in Portugal. Organic (undifferentiated) and inorganic (glass, paper, and plastic) waste bins were monitored for a consecutive period of four weeks. In addition to the time series measurements, the FIKWaste dataset contains labels for waste disposal events, i.e., when the waste bins are emptied, and technical and non-technical details of the monitored kitchens. Dataset: https://www.doi.org/10.17605/OSF.IO/TYAJ6 Dataset License: CC-BY-4.0

The Future Industrial Kitchen (FIK) project (see https://futurekitchen.m-iti.org/accessed on 25 February 2021) was performed in Portuguese luxury hotels and the food preparation sector with the strategic aim to develop a next-generation IK concept utilizing Internet of Things (IoT) enabled interactive technologies, optimized appliance arrangements, and re-imagined spatial, lighting, and equipment layouts to maximize the workflow efficiency and pleasure of the operating staff. One of the main goals of the FIK project was to understand the interactions between the consumption of electricity and water and the generation of waste in such spaces. To this end, electricity, water, and waste monitoring technology were deployed in three restaurants for a consecutive period of four weeks [12].
This data descriptor presents the data collected through the real-time monitoring of waste generation and waste bin disposal in the scope of this project. More precisely, organic (undifferentiated) and inorganic (glass, paper, and plastic) waste bins were monitored in three IKs for a consecutive period of four weeks.

Relation to Prior Research
One of the most studied waste management research topics is the ability to automatically detect the fill level of waste bins, as this provides valuable inputs to various stakeholders (e.g., waste collection services, building managers, and even policymakers). Broadly speaking, the vast majority of the works on waste bin level detection can be Data 2021, 6, 25 2 of 11 divided into one of two categories: (1) image based (e.g., [1][2][3]) and (2) distance based (e.g., [4,5,7,8]).
Image based approaches rely on sequences of overhead images of the waste bins and image processing algorithms. The most common algorithms are waste bin detection and waste bin level classification. The former aims at finding waste bins in new images, whereas the latter attempts to classify the waste level in the identified waste bins. Image based solutions provide very good performance, e.g., in [2], the authors reported an average bin detection rate of 97.5% and a waste level classification rate of 99.4%. Nevertheless, such solutions are considerably expensive as they require capturing overhead images and heavy processing algorithms that generally need to run on the cloud or on the edge [13].
Distance based approaches provide a less expensive solution since they rely mostly on ultrasonic range sensors whose price can be as low and 1 to 40 Euros, depending on the required accuracy. Furthermore, these approaches rely mostly on signal processing algorithms applied to the measured distances' time series, which can often run on embedded devices. Distance based solutions also provide very good accuracy values concerning distance measurements. For example, Reference [5] reported an average deviation between manual and system readings of less than 1 cm, whereas in [8], the authors reported a 2-3 cm accuracy. However, the main drawback of such solutions is that they rely on batteries to work. Therefore, it is necessary to find the right trade-off between the rate of measurements and the lifetime of the battery. In this regard, in [8], the authors reported that by taking measurements every 15 min, the theoretical lifetime of their sensor node would be up to 500 days.
With respect to IKs, the few existing works mainly focus on understanding how to reduce food waste. For example, the work in [10] reported efforts to characterize the waste generated by a restaurant in a touristic area of Central Italy. The obtained results show that food alone (organic waste) is responsible for over 28% of the total waste generation. Another example is the work from Silvennoinen et al. [11], where the authors monitored and studied food waste in 51 Finish food service outlets. According to this research, about 17.5% of the produced food ended up as waste.
Interestingly, while these two works relied heavily on the quantification of waste generation, they did not use any automatic monitoring strategies. Instead, the amounts of generated waste were monitored following manual processes that relied on report cards. For instance, in [11], the participants had to produce daily reports of the amounts of food prepared, kitchen waste, serving waste, customer leftovers, and the number of customers. While none of these works reported the reasons for using manual strategies, this is possibly due to the lack of reliable solutions for that effect. Thus, it is fair to assume that further research in automatic waste monitoring is necessary, particularly in industrial contexts such as IKs.

Relation to Prior Datasets
A typical dataset for image based approaches would consist of labeled waste bin images. More precisely, at least two labels would be necessary: (1) the position of the waste bin (for detection algorithms) and (2) the fill level (for waste-level classification). In contrast, a typical dataset for distance based approaches would consist of time series measurements of the distances measured by the sensor and the corresponding volume represented. Since the fill levels are obtained directly from the measurements, it is not mandatory to have labels with the waste levels.
Although several research works exist in the field of waste management, to the best of our knowledge, there are not many publicly available datasets. This situation contrasts other fields that have seen enormous efforts to release public datasets in the previous years, e.g., electricity [14] and water [15].
A search on the data world website (see https://data.world/-accessed on 20 January 2021) for the keywords "waste", "bin", and "industrial" returned 95, 3, and 2 results, respectively. From these, none contained the keywords "restaurant" and "kitchen". In contrast, the keyword "household" was associated with ten datasets. We thus believe that FIKWaste represents a very good and unique contribution to the waste monitoring and management research field as concerns distance based approaches since this was the methodology used in the FIK project.

Methods
One of the critical features of waste monitoring and management is keeping track of the waste generation and informing when to clean waste bins. This implies having the ability to track near real-time how much waste is in the containers and detect significant changes in this value (e.g., [16]). This section provides an overview of the data collection process that led to creating the FIKWaste dataset.

Data Collection Setup
In the FIK project, the waste monitoring was performed using ultra-sonic range finders (see https://www.acmesystems.it/HC-SR04-accessed on 25 February 2021) installed on the lids of the waste bins. This solution is widely used in waste research management (e.g., [4,7]) and keeps track of the volume of waste by measuring the distance between the containers' lids and their contents. Figure 1 shows the sensor used and an illustration of the working principle. In order to proceed with the data collection, a bespoke monitoring platform was developed. The main components of the platform are illustrated in Figure 2. From left to right, the sensor nodes scan the waste bins and send the data to a local gateway using the MQ Telemetry Transport (MQTT) (see https://mqtt.org/-accessed on 25 February 2021) protocol. The data are stored locally before being uploaded to the Internet using the standard Hypertext Transfer Protocol Secure (HTTPS) protocol.  Figure 3 shows the block diagram with the different components of the sensor nodes. In high-level terms, the data acquisition algorithm works as follows. The data acquisition software running in the NodeMCU (see https://www.nodemcu.com/index_en.htmlaccessed on 25 February 2021) takes distance readings from the ultrasonic sensor at a predefined interval of M minutes, during S seconds. The median of the S second readings is then taken and compared to the actual distance between the sensor and the bottom of the waste bin to assess if the lid is open or closed. Median values above this value indicate that the waste bin was open during the measurements and were thus discarded. In this case, new measurements were taken during the next S seconds interval. Otherwise, the valid measurement was sent to the gateway using the MQTT protocol. A Real-Time Clock (RTC) was used to keep track of the time in each sensor node. By default, the values for M and S were set to 1 and 5, respectively. Nevertheless, these can be given as inputs to the data acquisition algorithm.   Figure 4 shows the block diagram with the different components of the gateway. The gateway was placed close to the sensor nodes to ensure proper communications using the MQTT protocol. This device is responsible for collecting, storing, and uploading the measurements to locations on the Internet. More precisely, every minute, the most up-todate measurements were uploaded to an online database server for providing third-party entities with near-real-time access to the data. Moreover, every day at 12:00 AM, a Comma Separated Values (CSV) file with the daily readings was uploaded to a shared folder. Upon successful upload, the local database was cleaned to keep its footprint as light as possible at all times.
Since the gateway was connected to the Internet, it was not necessary to install an RTC for clock synchronization. Instead, the Network Time Protocol (NTP) was used. Finally, a 3S lithium battery was used to allow deployments in places without a power connection and to avoid data losses in case of a power outage since the sensor nodes did not have storage capabilities.

Deployments
The monitoring platform was deployed in three restaurant kitchens for up to four weeks in each kitchen. The details of each kitchen are summarized in Table 1. To extend the duration of the battery charge, in the deployments of Kitchens 2 and 3, it was decided to set the value of M to five minutes. Furthermore, the sensor nodes were programmed to only capture data during the kitchens' working hours. Using this setup, the 1S battery would last four days on average, whereas the setup used in Kitchen 1 lasted only two days on average. Figure 5 shows the hardware prototypes that were deployed and an example of the sensor node installed in one of the waste bins.

Data Preprocessing
Despite the initial assumption that waste bin disposal events would be represented by volume values very close to zero, when deploying the sensor nodes, it was found that the empty waste bags were not usually totally stretched. As such, a volume of zero was not very common even after a disposal event. Furthermore, it was found that on many occasions, a decrease in the monitored volume did not represent a disposal event, representing, instead, periods when the kitchen staff adjusted the waste bags.
Therefore, in order to collect ground-truth information on the times that the waste bins were emptied, a webcam was placed in the direction of the bins. The videos were then analyzed to label the measurement data with this information. The three authors examined the video and selected the points in time when the waste bins were emptied.
Unfortunately, due to some technical issues, the video recordings were not available all the time. Therefore, part of the labeling was performed manually. To this end, each of the three authors provided the labels to the measurements where no video was available. The labels from the three authors were then compared, and only those selected at least twice were considered. The remaining were discarded.

Data Description
The FIKWaste dataset is made available individually for each monitoring kitchen, and all the data files are in CSV format. Figure 6 shows an overview of the underlying organization of the FIKWaste data. The following subsections describe the contents of the different files.

Measurements Data
The measurement files (measurements.csv) contain the measurements taken from the waste bins. These measurements are provided in raw form, i.e., as they were measured by the sensors. The respective volumes are calculated using Equation (1).
where A base is the area of the base, H bin is the height of the bin, and H sensor is the height measured by the sensor. The underlying fields of the measurements files are described in Table 2. Table 2. Column descriptions for the measurements files (measurements.csv).

Column Description Units timestamp
The timestamp at which the sensor was activated datetime distance The measured distance between the sensor and the waste cm volume The corresponding volume of waste % Table 3 presents a snippet of the raw waste measurements data, in this case for the undifferentiated waste bin from IK 3. Note that that at 16:14:46, the volume of waste was less than in the previous moments, which indicates a potential waste disposal event at 15:50:13.

Labels Data
The label files (labels.csv) identify the periods when the kitchen staff emptied the waste bins. The underlying fields are described in Table 4. Table 4. Column descriptions for the label files (labels.csv).

Column Description Units
timestamp The corresponding timestamp in the waste measurements file datetime volume The corresponding volume of waste at this timestamp % source The source of this label (V: Video, H: Human) text Table 5 presents the first five waste disposal labels for the undifferentiated waste bin from IK 3. As can be observed, the first record indicates a disposal event at 15:50:13.

Deployments
The deployment file (deployments.csv) contains technical and non-technical details of each deployment. The underlying fields are described in Table 6. Note that the start and end dates refer to the date of the first and last measurements in the waste bins of each kitchen, respectively. These dates do not necessarily correspond to the start and end dates in Table 1 since these correspond to the start and end of the FIK monitoring campaigns. Table 6. Column descriptions for the ground truth files (deployments.csv).

Kitchen identifier number service
Type of service provided (breakfast, lunch, dinner) text area Area of the kitchen floor m 2 capacity Maximum number of simultaneous customers number has_glass If the glass waste bin is monitored or not binary glass_volume Total volume of the glass waste bin m 3 has_paper If the paper waste bin is monitored or not binary paper_volume Total volume of the paper waste bin m 3 has_plastic If the plastic waste bin is monitored or not binary plastic_volume Total volume of the plastic waste bin m 3 has_undifferentiated If the undifferentiated waste bin is monitored or not binary undifferentiated_volume Total volume of the undifferentiated waste bin m 3 start Date of the first measurement across all the waste bins datetime end Date of the last measurement across all the waste bins datetime

Data Exploration
The number of measurements for the different waste bins in the monitored kitchens is presented in Table 7. As can be observed, the number of samples was much higher in Kitchen 1 since the data were collected every minute. In contrast, in Kitchens 2 and 3, data were only collected every five minutes. Table 7. Total number of records per waste bin in each of the monitored kitchens. The heatmap from blue to read indicates the data availability (dark blue-more data; dark red-less data).  Figure 7 shows the distribution of the waste volume in each of the monitored bins. As can be observed, Kitchen 1 tended to have lower volumes of waste in the bins. It is also interesting to observe that the monitored volumes for paper and plastic in Kitchens 2 and 3 never reached a value close to zero. There are two reasons for this effect: first, when placing the empty bags, it is not common to fully stretch them. As such, despite the bags being empty, the distance measured by the sensor does not correspond to a volume of zero. Second, the weight of the waste items prevents them from going to the bottom of the bin (particularly plastic and paper). A potential solution to mitigate this issue would be to add additional ultrasonic sensors around the lid of the waste bin and compute a more robust distance value by combining the different measurements.

ID
The total number of labels is presented in Table 8. Please note that since only events labeled by at least two of the authors were considered, some waste disposal events are not labeled. Table 8. Total number of labeled waste bin disposals in each of the monitored kitchens. The heatmap from blue to read indicates the label availability (dark blue -more labels; dark red -less labels).

ID
Glass Paper Plastic Undiff.  Figure 8 illustrates the waste bin volume measurements supplemented with the labeled disposal events. The dotted line indicates periods for which there were no data available. This can happen either because the bin was being emptied (therefore not sending measurements to the gateway), the measurements were discarded due to opening of the lid (as mentioned in Section 2.1), or because the sensor node ran out of battery (the case with the glass and paper between 12 PM of 13 March 13 and the afternoon of 14 March 14. It can also be observed that the volume measurements for plastic and paper are generally more unstable that for the other materials. In contrast, the measurements for heavier materials like glass and organic waste are much more stable. It is therefore suggested that some sort of filtering is implemented. Figures 9 and 10 illustrate the effect of different window sizes when employing rolling median filtering to the plastic and paper waste bins from Kitchens 1 and 3. In either case, both filters did a very good job of removing the noise and highlighting edges in the signal. However, it is important to remark that for Kitchens 2 and 3, a window of 31 samples would cause significant delays in the signal.