4.1. Systems Architecture
The platform architecture we will examine encompasses the training specifications and the deployment specifications. To train the model and validate it, we used an NVIDIA Tesla T4 [
41], one of Google Colab’s [
40] free-tier GPU that can be accessed by anyone without any additional costs. To access this GPU, we can, with or without a notebook open, go to the top left corner and click on “Runtime” and then “Change runtime type”. After opening the “Runtime” tab and the “Change runtime type”, we can choose the GPU we want, which is the T4. This will allow us to connect to a “hosted runtime”, which is a cloud instance provided by Google with the GPU we chose before. Finally, to connect to it, we just need to click on the “Connect” button in the top-right corner.
To deploy the trained model and therefore allow a user to test it with some of their images, we created a simple integration between a FastAPI web server [
43] and a Svelte app [
44]. The FastAPI server will act as the back-end, where all processing will occur, and the Svelte app is where the process is going to be shown to the end user. For instance, if the user uploads an image to the front-end (Svelte app), the back-end (FastAPI) will process it and then return the detection to the front-end, just like the flow shown in the
Figure 10.
To process the image, we use YOLOv11 in a direct endpoint on the FastAPI server to easily receive the image file, do the detection, and return the image encoded in base64 so it can be easily accessed in the Svelte app without the need to transfer files between services.
To deploy a demo of the app online, we use a server with the specifications listed in the
Table 4. It is important to note that on the specifications there is no GPU; this is because, even though it would provide a significant boost to detection, it can be run without a GPU and the use of a CPU only with the downside of lower performance.
4.2. Exploration and Results
In the previous sections, we presented the context, the dataset, the calculation formula, the YOLO benchmark focusing on version 11 as well as the platform architecture. Based on this structure, we developed a web platform that allows for the exploration of this work in detecting food waste on plates. We tested the platform using 50 images of plates after meals and evaluated the model. Some of these images were previously shown from the dataset and are illustrated in the following figure.
Figure 11 presents visual examples of the detection performed by the platform on different food plates. In images
Figure 11a,b,d, multiple objects on the plate are detected, with clustering applied to each identified item, such as cutlery, food, and waste. These images illustrate the platform’s ability to segment and correctly classify various elements present on the plates after the meal. In contrast, image
Figure 11c shows only the result of clustering applied to the food, without individual object detection, serving as a comparative baseline of the platform’s operation under different analysis modes. As an illustration, for the four images presented in
Figure 11,
Table 5 presents the results regarding food waste percentage and the wasted food.
In
Figure 11a, the platform correctly identified the cutlery (knife and forks) and a cup; however, it incorrectly detected a plate with traces of soup, which likely led to an inaccurate waste percentage calculation due to the misidentification. In
Figure 11b, a plate with food remnants and utensils is observed, where the platform successfully detected, in addition to the cutlery and cup, the presence of garbage beneath the plate, demonstrating the model’s ability to correctly distinguish waste. In
Figure 11c, the platform confidently identified the presence of french fries and rice, as well as a plate and a partially visible knife, highlighting its accuracy in detecting uneaten food. Finally, in
Figure 11d, the detection covered a wide range of elements, including rice, garbage, potatoes, cutlery, and additional items such as a cup and a spoon, reinforcing the platform’s robustness even in more complex scenarios with multiple items on the plate.
The platform output illustrated in
Figure 12 presents, on one side, the detected image, the clustered image, and the waste calculation result, respectively. Additionally, the interface includes a section that describes how the calculation was performed. All images were processed by the proposed platform, as exemplified in
Figure 12, where object detection and clustering outputs are generated, followed by the calculation of the corresponding waste percentage. The resulting data from this process are presented in the table below, aiming to demonstrate the platform’s effectiveness across different real-world scenarios.
Additionally, we tested the platform with food plate images from the following datasets: TossIt Plates Computer Vision Project [
45], Finding Defects Computer Vision Project [
46], and FoodWasteDetectionV2 Computer Vision Project [
47].
Some examples of these images are illustrated in the following figure. Three images from each dataset are presented, organized in the same order as the datasets mentioned above and grouped by row. As an illustration, for the nine images presented in
Figure 13,
Table 6 presents the results regarding food waste percentage and the wasted food.
The individual analysis of each image further highlights the model’s performance in various scenarios.
Figure 13a showed a low waste percentage, as it contained only small food remnants.
Figure 13b,d,f detected 0% waste. In
Figure 13b, the plate contained only a napkin, which was correctly classified as garbage. In
Figure 13d, the plate showed no visible food leftovers. Finally, in
Figure 13f the plate contained only bones, which were also accurately identified as inedible waste. These cases demonstrate the platform’s ability to effectively distinguish between food, garbage, and non-food objects.
In contrast,
Figure 13c,e,g,i presented low to moderate waste percentages, reflecting the presence of small amounts of uneaten food, which the platform was able to detect with precision. Lastly,
Figure 13h stood out with a 100% waste percentage, as the plate still contained the full meal, clearly indicating a case of complete non-consumption.
The platform output provides a visual representation of the platform’s processing workflow. On the left side, the interface displays the detected image with annotated objects and their respective classifications, along with a clustered version of the same image to highlight the spatial distribution of the identified items. On the right side, the interface presents a dedicated section that explains the waste calculation formula used by the platform. This section outlines the core principles of the computation, including which elements are considered (e.g., food area, inedible items such as garbage and utensils) and how the waste percentage is derived based on pixel areas. This combination of visual and analytical feedback allows users to understand both the qualitative and quantitative aspects of the food waste detection process.
Similarly, for the nine images presented in
Figure 13, which come from external datasets, the same processing pipeline was applied using the proposed platform. As illustrated in
Figure 14, the platform performed object detection, clustering, and subsequently calculated the estimated food waste percentage for each image. The results of this analysis are shown in the table below, providing further evidence of the platform’s applicability and robustness when tested on data sources different from the original dataset.
With the exception of some instances of misclassification, the images were generally processed correctly by the platform, and the obtained results largely reflect the actual food waste present on each plate, demonstrating the reliability and consistency of our approach. This precise alignment between automatic detection and real-world conditions is a key aspect of the platform’s effectiveness. To determine whether the image was correctly classified or misclassified, we validated by observation. This means that if the plate is full of food, it is expected to retrieve 100% FW, and if it is half empty, 50%, and so on.
As a research work focusing on the applicability of CV models, particularly YOLOv11, this project naturally presents limitations associated with the base dataset and the performance of the YOLOv11 model. However, we applied the project in the most realistic context possible, and we consider the results promising. Additionally, we’ve paved the way for possible research improvements since the dataset, code, and instructions are all available on the GitHub repository.