Strategies for Automated Identification of Food Waste in University Cafeterias: A Machine Vision Recognition Approach

Li, Yongxin; Zhang, Chaolong; Xu, Hui; Yang, Yuantong; Lu, Han; Deng, Lei

doi:10.3390/app15095036

Open AccessArticle

Strategies for Automated Identification of Food Waste in University Cafeterias: A Machine Vision Recognition Approach

by

Yongxin Li

¹,

Chaolong Zhang

²

,

Hui Xu

¹,

Yuantong Yang

³

,

Han Lu

² and

Lei Deng

^2,*

¹

Logistics Support Department, Capital Normal University, Beijing 100048, China

²

College of Resources Environment & Tourism, Capital Normal University, Beijing 100048, China

³

Beijing Bonade Technology Co., Ltd., Beijing 100040, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(9), 5036; https://doi.org/10.3390/app15095036

Submission received: 23 December 2024 / Revised: 10 April 2025 / Accepted: 29 April 2025 / Published: 1 May 2025

(This article belongs to the Section Computing and Artificial Intelligence)

Download

Browse Figures

Versions Notes

Abstract

To ensure the effective implementation of food waste reduction in college cafeterias, Capital Normal University developed an automatic plate recognition system based on machine vision technology. The system operates by obtaining images of plates (whether clean or not) and the diners’ faces through multi-directional monitoring, then employs several deep learning models for the automatic localization and identification of the plates. Face recognition technology links the identification results of the plates to the diners. Additionally, the system incorporates innovative educational mechanisms such as online feedback and point redemption to encourage student participation and foster thrifty habits. These initiatives also provide more accurate training samples, enhancing the system’s precision and stability. Our findings indicate that machine vision technology is suitable for rapid identification and location of clean plates. Even without optimized network parameters, the U-Net network demonstrates high recognition accuracy (MIOU of 68.64% and MPA of 78.21%) and ideal convergence speed. Pilot data showed a 13% reduction in overall waste in the cafeteria and over 75% user acceptance of the mechanism. The implementation of this system has significantly improved the efficiency and accuracy of plate recognition, offering an effective solution for food waste prevention in college canteens.

Keywords:

food waste prevention; deep learning; object detection; education; clean plate campaign

1. Introduction

Food waste is a global issue [1]. The United Nations Food and Agriculture Organization estimates that approximately 1.3 billion tons of food is wasted or lost annually, accounting for about a third of global food production. Globally, countries like the United States, France, and Japan have established strategies to combat food waste. China, in its efforts to curtail this issue, enacted the ‘Anti-Food Waste Law’ on 29 April 2021, legislatively addressing the problem of ‘waste on the tip of the tongue.’ Additionally, China’s Ministry of Education issued the ‘Action Plan to Stop Catering Waste and Cultivate Thrifty Habits,’ emphasizing the innovative use of technology to promote the ‘Clean Plate Campaign’ as a critical component of the educational sector’s food waste reduction efforts. In Chinese universities, addressing food waste in cafeterias is not only a matter of social norms and traditional virtues but also a question of food security. Reports indicate that China’s annual food waste equates to about 50 billion kg of grain, sufficient to feed approximately 350 million people for a year, or one-tenth of the nation’s total grain production [2,3]. Surveys reveal that an average university student in China wastes between 36.11 and 37.22 kg of food annually in cafeterias, with the national scale of food waste in university dining reaching 1.33 to 1.37 billion kg per year [4]. Such extensive waste not only fosters a culture of extravagance within campuses but also jeopardizes national resources and hinders the improvement of citizens’ quality of life. Hence, addressing campus food waste is urgent, and the ‘Clean Plate Campaign’ represents an effective approach to achieving conservation goals.

The implementation of the “Clean Plate Campaign” in university canteens requires efficient plate recognition technology and well-designed incentive mechanisms. Traditional approaches for assessing food waste behaviors typically rely on manual observation. While straightforward to implement, these methods are inherently subjective and labor-intensive, making them unsuitable for high-traffic environments like campus dining halls. In contrast, recent advancements in deep learning have provided more convenient and energy-efficient solutions. The potential of deep learning has been extensively demonstrated across domains: Bochkovskiy et al. [5] achieved 43.5% accuracy (AP) on the MS COCO dataset using YOLOv4, significantly outperforming earlier YOLO architectures [6,7]; ResNet [8] and Faster R-CNN [9] laid technical foundations for real-time detection in complex scenarios; while Fully Convolutional Networks (FCNs) by Long et al. [10] balanced precision and efficiency in pixel-wise segmentation tasks. It is worth noting that semantic segmentation technology has shown significant advantages in high-precision industrial quality inspection. For example, Nguyen [11] achieved a Mean IoU of 94.14% in seam defect detection by combining DeepLabV3+ with the lightweight EfficientNet encoder, demonstrating its pixel-level localization ability for small targets. Semantic segmentation is particularly suitable for tasks involving the identification of object boundaries and detailed information. Food waste recognition, in essence, requires precise identification and localization of food residues and tray shapes. Therefore, semantic segmentation can accurately capture the distribution and shape of leftover food on trays through detailed pixel-level segmentation, providing data support for subsequent food waste quantification and intervention. However, defect detection in industrial scenarios typically relies on static, standardized image acquisition environments, whereas campus tray recognition must contend with diverse tableware and irregular food residue shapes, posing greater challenges for the model’s robustness. Deep learning has been applied to food waste reduction through optimized classification. Khoshboresh-Masouleh and Shah-Hosseini [12] systematically reviewed its potential in food-related applications, and Mazloumian et al. [13] enhanced food categorization accuracy using deep models, demonstrating technical feasibility for waste reduction. However, systematic applications in campus food waste management remain scarce, particularly in integrating plate recognition with behavioral interventions.

This study addresses this gap by developing an automated plate recognition system through China’s “Logistics Education Credit Service Management Platform”, exploring deep learning’s potential to reduce food waste in university canteens. By combining object detection algorithms with incentive mechanism design, we demonstrate the technical feasibility and social necessity of applying deep learning to the Clean Plate Campaign, thereby advancing national dual-carbon goals (carbon peaking and neutrality). It is important to note that the ‘Clean Plate’ phenomenon in this study is defined as diners failing to completely consume the food on their plates, i.e., there is significant leftover food on the plate. This standard focuses on the actual amount of food consumed, without considering other subjective factors. This definition provides an objective standard for quantifying food waste in this study, aiming to provide data support for the promotion and implementation of the Clean Plate Campaign and offering technical support for achieving the national “carbon peaking and carbon neutrality” goals. Our approach establishes a closed-loop framework connecting intelligent perception with behavioral nudges, offering scalable solutions for sustainable campus management.

2. The Automatic Plates Recognition System

The automated plate recognition system comprises five main components: data acquisition, data preprocessing, plate recognition, accuracy assessment, and feedback (see Figure 1).

2.1. Data Acquisition

Data acquisition forms the backbone of the entire system, providing imagery that serves as training and validation samples for the plate recognition network (refer to Figure 2). Multiple fixed cameras are installed at the waste collection points in each cafeteria (shown in the red box in the figure). These cameras are installed at fixed collection points, with a relatively stable shooting distance, allowing them to capture clear and detailed images while covering a limited variety of trays within the collection point. The cameras capture video data of both the plate and the diner’s face and use time-synchronization technology to achieve target identification and matching between multiple cameras.

As illustrated in Figure 2, one camera is positioned directly above the waste collection point shown in Figure 2d to vertically capture plate imagery. Additional cameras are mounted on tripods beside the collection point in Figure 2c, primarily to capture the facial information of diners, aiding in matching plates with faces.

2.2. Data Preprocessing

Data preprocessing involves two key steps: interpretation marker establishment and dataset annotation. Interpretation markers, used to differentiate between clean and non-clean plates, are critical in reducing the subjectivity of human judgment and ensuring the accuracy of data annotation. Nearly 40 interpretation markers have been established based on extensive image data. The food waste determination standard adopts a combination of visual quantification and threshold grading: when only inedible parts (such as bones, shells), seasonings (such as garlic, ginger, and scallions), or edible residue covers less than 15% of the tray, it is classified as a “clean plate” (clean state); conversely, if the coverage of edible leftover food (such as leftover rice or vegetables) reaches or exceeds 15%, it is classified as a food waste event (See Figure 3). This design addresses the FAO’s threefold requirements for food waste monitoring: edibility definition, operability quantification, and local adaptation. The specific threshold of 15% coverage rate is validated through local experiments. The overall classification framework refers to the authoritative standards for dining scenarios outlined in the FAO’s “Global Initiative on Reducing Food Loss and Waste” published in 2015, aligning with the global collaborative efforts to reduce food waste.

In the dataset labeling process, firstly, one frame was extracted from the video recording every 0.5 s, and the images were screened based on manual visual inspection, discarding blurred or duplicated low-quality images. Due to the use of fixed camera shooting, the image quality was relatively stable, and there were fewer low-quality images. Next, the screened images were imported into Labelme v3.16.7 labeling software, and each image was labeled with clean and non-clean plates according to the preset labeling criteria. The corresponding .json labeling files were then generated. The annotation work was carried out by multiple annotators, and regular training and cross-checking of the annotation results were conducted among the annotators to ensure standardization and consistency of the annotation. Finally, all the .json annotation files were converted into .png format annotated images for training and accuracy verification of the deep learning network. A total of 2085 images were acquired and annotated, with a division of 4:1 between the training set and the validation set.

2.3. Clean Plate Recognition Method

The process of plate recognition in the target imagery involves locating and identifying the status of the plates (clean or not clean). The system utilizes a deep neural network based on semantic segmentation for plate recognition, which essentially follows a two-step process: (1) Inputting the training dataset into the network, which iteratively derives a fitting function; (2) Feeding the test dataset into the network, where the fitting function computes the recognition results. Selecting a suitable semantic segmentation network from the many available is crucial. Considering both hardware and software capabilities and practical needs, we employed three mature networks: U-Net, PSPNet, and DeepLab V3+ networks. In this study, we first compare the performance of the three networks in the task of recognizing CD-ROMs through quantitative analysis, and we select the network with the best performance for qualitative comparison. The main indexes for used quantitative comparison are the convergence speed and recognition accuracy of the networks, followed by qualitative analysis to further demonstrate the recognition effect of each model in different environments and tableware states. In the qualitative analysis, two typical scenarios, simple and complex, are selected for evaluation: the simple scenario includes a single empty plate arrangement, while the complex scenario involves different types of dinner plates with food residue. Due to the limited number of plate image scenes actually acquired, the selection of experimental scenes is relatively small, but it covers the typical situations commonly found in the cafeteria environment, which can effectively reflect the generalization ability of the model. The structure and characteristics of the three networks are as follows:

U-Net is a symmetric network with a simple structure that is capable of recognizing multi-scale features [14]. Its structure is shown in Figure 4.

The network structure is divided into three parts: downsampling, upsampling, and skip connections. During the encoding process, the network gradually reduces the image size through convolution and downsampling while extracting low-level features, such as the shape of the tray and the boundaries of food residues. During the decoding process, the network gradually restores the image size through upsampling and convolution, capturing high-level semantic features such as the specific location and shape of the leftover food. The skip connections in the middle fuse the features from the encoding and decoding stages, effectively preserving spatial information and improving the accuracy of tray details and food waste recognition. Through this structure, U-Net can efficiently extract key features necessary for clean plate recognition, enhancing the accuracy and robustness of object detection.

PSPNet has also been applied to the clean plate recognition task, with one of its key features being the use of auxiliary loss functions to accelerate the network’s convergence speed [15]. This network structure can integrate image features from four different scales, as shown in Figure 5. Each sub-region generates feature maps of different sizes through average pooling, which are then upsampled and merged. In this way, PSPNet enhances the understanding of global contextual information, which is particularly beneficial for accurately capturing global patterns of food residues and the overall layout of trays in clean plate recognition, reducing the likelihood of mis-segmentation. Additionally, the multi-scale feature fusion in PSPNet improves the model’s robustness when handling complex scenarios, especially in tray recognition tasks, allowing it to better handle trays and food residues of varying sizes and shapes.

DeepLab V3+: With its encoder–decoder structure and edge segmentation enhancement features, DeepLab V3+ addresses the issue of feature loss caused by excessive sampling. In the encoding phase, the network expands the receptive field by introducing atrous convolution while avoiding loss of image resolution and increase in computational burden. In clean plate recognition, this enables the network to capture a wide range of information about the tray and its surrounding environment, improving the accuracy of food residue recognition. In the decoding phase, DeepLab V3+ processes the output feature tensor from the encoding phase through bilinear interpolation upsampling, gradually restoring the feature map to the original image size, as shown in Figure 6 [11]. Cross-layer connections further enrich the semantic and detailed information of the image [16], aiding in the precise identification of food residues on the tray and mitigating the loss of details caused by excessive sampling.

The essence of the semantic segmentation model lies in mapping the image to pixel-wise semantic labels. The specific modeling process is as follows: Let the input image be

I \in R^{H \times W \times 3}

, which is an RGB tray image. The semantic segmentation model learns to map image III to a semantic label map

I \in R^{H \times W \times C}

, where C represents the number of semantic categories (in this case, tray and food residues). The loss function used during training is the pixel-wise multi-class cross-entropy, given as follows:

L_{C E} = - \sum_{i = 1}^{H} \sum_{j = 1}^{W} \sum_{c = 1}^{C} y_{i j c} \log ({\hat{y}}_{i j c})

where

y_{i j c}

represents the true label, indicating that the pixel at position (i,j) belongs to category c, and

{\hat{y}}_{i j c}

represents the predicted probability of the model for that category.

All networks were trained using Keras 2.1.5 with TensorFlow 1.13.2 as the backend on a 64-bit Ubuntu 16.04 system, powered by a 2.40 GHz CPU and an NVIDIA GeForce GTX 1080Ti GPU (NVIDIA Corporation, Santa Clara, CA, USA). To determine the optimal hyperparameter configuration, this study conducted ablation experiments on the three models, systematically evaluating the effects of different learning rates, batch sizes, and iteration counts on model performance. The learning rates (1 × 10⁻², 1 × 10⁻³, 1 × 10⁻⁴), batch sizes (16, 32, 64), and training iteration counts were tested through multiple rounds of experiments, with each hyperparameter combination evaluated individually to obtain the optimal configuration for each model.

2.4. Accuracy Assessment

To evaluate the recognition accuracy of the system, we used the Mean Intersection Over Union (MIOU) and Mean Pixel Accuracy (MPA) as the key metrics [10]. The MIOU is defined as the ratio of the intersection to the union of the predicted and true value sets [17,18]. In contrast, the MPA represents the proportion of correctly classified pixels within each class, averaged across all classes:

U_{M} = 1 / (k + 1) * \sum_{i = 0}^{k} (p i i / (\sum_{j = 0}^{k} p i j + \sum_{j = 0}^{k} (p j i - p i i)))

A_{M} = 1 / (k + 1) * \sum_{i = 0}^{k} (p i i / \sum_{j = 0}^{k} p i j)

MIOU is represented as

U_{M}

, MPA is represented as

A_{M}

, and k denotes the number of classes. Pij indicates the number of pixels that belong to class i but have been misclassified as class j, whereas Pii represents the number of pixels correctly classified as class i.

2.5. Recognition Result Feedback

The recognition results processed through the aforementioned procedures are imported in near-real time into the “University Logistics Education Points Service Management Platform” of Capital Normal University. This platform is designed to deeply explore the educational aspects within logistics operations. It quantifies students’ behavior, such as participation in anti-food waste initiatives and civilized dormitory competitions, and it allocates corresponding points. In order to further reduce food waste, the platform optimizes management in the following ways: (1) Personalized feedback mechanism based on recognition results: the system matches the recognition results with the diner’s information from facial recognition, and it matches it with the student’s number and name information, automatically updating it to the e-account. For users identified as “CD-ROM” users, the system will increase the number of points in their e-accounts to provide incentives. For users identified as having food waste behavior, the system will send reminders to their e-accounts, advising them to pick up food reasonably next time and encouraging them to pack leftovers in order to reduce waste. (2) Cafeteria management optimization: The system feeds back the results of the disc recognition to the cafeteria management to help them adjust the food supply structure based on the actual data. For example, they may reduce the supply of fatty foods and increase carbohydrates appropriately to better meet students’ dietary needs while optimizing the feeding strategy to reduce the waste of leftover food. In addition, the points platform provides a comprehensive feedback mechanism that allows users to verify the recognition results and raise objections or suggestions in a timely manner to ensure the fairness and accuracy of the data. Meanwhile, users can upload training samples on their own to continuously improve the recognition accuracy of the system. Students can check the details of the points at any time, understand the rules of the points, and use the points to offset part of the logistic service fees, thus enhancing their campus service experience. The platform has been integrated with other school management systems to achieve data sharing and interconnection, providing strong support for logistics management and education initiatives.

3. Results and Discussion

3.1. Hyperparameter Tuning and Experimental Setup

To optimize the model performance, hyperparameter tuning was conducted for PSPNet, U-Net, and DeepLab V3+. Three key hyperparameters were tested: learning rate, batch size, and training iteration count. For the learning rate, values of 1 × 10⁻⁴, 5 × 10⁻⁵, and 1 × 10⁻³ were evaluated. The learning rate of 1 × 10⁻⁴ was selected for both PSPNet and U-Net, while DeepLab V3+ performed optimally with a learning rate of 5 × 10⁻⁵. The batch sizes tested were 16, 32, and 64. PSPNet and U-Net achieved the best performance with a batch size of 32, while DeepLab V3+ showed the best results with a batch size of 64. The final hyperparameter configurations are summarized in Table 1.

3.2. Quantitative Comparison

This section employs quantitative analysis methods to compare the performance of the three networks, focusing primarily on network convergence speed and plate recognition accuracy. Network convergence is typically assessed by observing changes in network loss values. If the loss value decreases with the increase in epochs and stabilizes near a certain value, the network is considered to have reached convergence. Generally, faster network convergence indicates superior performance. The loss value trends for the three networks used in the system are shown in Figure 7, where the red line represents the loss function using the validation set, and the blue line represents the loss function using the training set.

Overall, under the parameter setting of epoch = 60, all three networks reached convergence. Regarding training set loss values, the U-Net network started with the lowest initial loss (<0.7), PSPNet had the highest (>1.3), and DeepLab V3+ was in between. The training loss values of all three networks decreased with the increase in epochs. U-Net’s loss value change was relatively gradual and remained almost constant after epoch = 40. PSPNet and DeepLab V3+ showed significant reductions in the first 10 epochs, then slowed down, similarly stabilizing after epoch = 40. The validation set loss value trends were similar, with U-Net having the lowest initial loss (<0.6), followed by PSPNet (>1.0), with DeepLab V3+ in between; all three networks’ validation set loss values remained largely unchanged after epoch = 40.

In summary, the training and validation loss values of all three networks stabilized after epoch = 40, indicating that each had reached convergence, and their convergence speeds were comparable.

The accuracy of plate recognition is the most direct reflection of network performance. Comparing the plate recognition accuracy of the three networks (Table 2), it was found that all three could effectively identify clean plates (MIOU > 60%, MPA > 75%). Among them, U-Net had the highest recognition accuracy, with MIOU and MPA values of 68.64% and 78.21%, respectively. PSPNet had the lowest accuracy, with a MIOU of 63.72% and MPA of 75.66%, and DeepLab V3+ fell in between. Overall, without any parameter optimization, the three networks achieved relatively high recognition accuracy (accuracy difference < 5%), demonstrating the suitability of deep learning technology for effective plate recognition.

3.3. Qualitative Comparison

This section provides a qualitative analysis of network performance, focusing primarily on the effectiveness of plate recognition. Due to space limitations, we will only showcase the recognition results of the U-Net network, which has slightly higher plate recognition accuracy compared to PSPNet and DeepLab V3+. Figure 8 and Figure 9 display recognition outcomes in various scenarios, featuring different types of dinnerware (bowls and plates) in various states (clean or not clean).

In the first scenario, which is relatively simple and contains only a single bowl in a clean state, the comparison between the annotated image and the original shows that the U-Net network can accurately identify the position of the bowl and precisely determine its status. Although the annotated image is not a perfect circle, the U-Net network still accurately recognizes the contour of the bowl. This result demonstrates the competent performance of the U-Net network in plate recognition tasks.

The second scenario is more complex, containing two targets: a plate and a bowl, with the plate being clean and the bowl not. Here, the U-Net network accurately identifies the positions of both the plate and bowl. Although it does not precisely determine the state of the bowl, the recognition is still accurate (not clean) based on the principle of area dominance. Notably, even when the plate’s color closely matches the background, the U-Net network can still accurately identify the plate and achieve an ideal recognition result. Some misidentification of bowl pixels may relate to an imbalance in the number of bowls and plates in the training samples. Overall, the U-Net network demonstrates commendable recognition performance in complex scenarios.

3.4. System Application Effectiveness

The plate recognition system developed in this study has been successfully implemented in a campus cafeteria environment, demonstrating significant effectiveness. Over one year of operation, the system covered a total of 12,000 users, with a notable increase in the proportion of users reducing food waste through point-based incentives. As a result, overall food waste in the cafeteria decreased by 13%. User feedback data indicate that over 85% of users were satisfied with the point redemption mechanism, and more than 75% actively participated in data uploads and provided improvement suggestions, highlighting broad acceptance of the system’s incentive strategy. Furthermore, the proposed approach was recognized as one of the “Top 10 Best Practices in Energy and Resource Conservation for Public Institutions (2021–2022)” by the China National Government Offices Administration, further validating its feasibility and effectiveness in promoting green campus initiatives and reducing food waste.

4. Discussion

The results affirm the effectiveness of the automated plate recognition system and demonstrate the feasibility of using deep learning in the creation of green schools. However, it is important to note that relying solely on limited manually annotated samples may not achieve high accuracy in plate recognition. In practice, issues such as misidentification can arise. Three key factors may contribute to lower recognition accuracy: (1) limitations in network selection and optimization for the “plate recognition” task; (2) an insufficient number of annotated samples and suboptimal annotation accuracy, making it difficult for the network to find the optimal parameter set; (3) challenges in deployment and expansion: environmental factors like lighting and complex backgrounds may affect recognition performance, and the system still has limitations in detecting food waste locations. To improve recognition accuracy, future research should focus on improvements in the three areas mentioned above, such as optimizing the network structure, increasing the number of high-quality annotated samples, enhancing the system’s adaptability to complex environments, and improving its scalability.

The plate recognition network based on semantic segmentation has a unique feature: it processes input plate images on a per-pixel basis, identifying each pixel on the plate. Under conditions like limited sample size and unoptimized network parameters, this approach can lead to inconsistencies in pixel classification for the same target (see Figure 8). To address this, using a target detection-based plate recognition network could be a viable solution [19,20,21]. Unlike per-pixel recognition in semantic segmentation, target detection networks classify the entire object, effectively preventing inconsistencies within the same target. These networks can not only recognize object categories but also differentiate between objects of the same category, thus effectively distinguishing between similar but differently shaped objects and reducing the frequency of misclassification. Therefore, in future model optimization, employing target detection networks could ensure more comprehensive target recognition, avoiding misidentification of plates. This study performed necessary hyperparameter adjustments and optimizations through ablation experiments to ensure fairness between different methods [8,18], but a comprehensive exploration of hyperparameter optimization has not been conducted. Future research could further explore the application of automated hyperparameter optimization techniques, such as Bayesian optimization and neural architecture search, to enhance model robustness and recognition accuracy.

The problem of insufficient sample size has long been considered a key challenge hindering the full potential of deep learning technology. To tackle this, researchers have explored data processing methods like data augmentation to overcome difficulties in data collection and annotation [16,22]. In this study, although the current dataset covers common cafeteria scenarios (such as simple and complex-shaped individual plates), the wide variety of food items and diverse plate shapes in school cafeterias pose challenges in sample acquisition. Additionally, manual annotation remains time-consuming and prone to inconsistencies. When using the coverage threshold method for labeling clean and non-clean plates, variations in food types (e.g., rice, vegetables, soup) and different forms of food residues (e.g., scattered rice grains, intact vegetable pieces, or residue with soup) may introduce discrepancies in shape and color, potentially leading to less precise annotations. The current dataset has demonstrated satisfactory performance in the canteen of Capital Normal University. However, to ensure the model’s robustness across diverse university canteen environments, further dataset expansion is necessary. This should encompass variations in tray types (e.g., material, compartment design), background conditions at collection points (e.g., window color, lighting conditions), and representative food categories (e.g., staple food preferences, residue patterns) to account for regional differences. To minimize these challenges, the system, through the logistics education platform, provides a user feedback service for the ‘Clean Plate Campaign.’ It employs incentive mechanisms like points redemption to encourage users to upload interpretation markers and plate sample data, providing suggestions for improvement of inadequate markers. This approach not only enriched the sample data and improved the network’s recognition accuracy but also fostered high user engagement. According to user feedback, over 85% of users were satisfied with the point redemption mechanism, and more than 75% actively participated in data uploads and provided improvement suggestions. The high level of user involvement enhanced the sense of participation among students and staff, further contributing to the continuous refinement of dataset annotation. Compared to traditional methods like meal card exchange, online points redemption is more aligned with student needs. Automating the points system in place of manual operations reduces human, material, and time costs while ensuring objective recognition results. The rewards, transitioning from fixed meals to both material and spiritual incentives, effectively promote the progress of green school initiatives.

In the practical deployment and expansion of the system, some issues still exist. Not only does the accuracy of tray recognition and adaptability in complex scenarios need improvement, but the system also faces new challenges when extending to broader dining and waste disposal scenarios. Variations in cafeteria lighting can cause plate edges to blur or colors to distort, reducing system performance. To address this, adaptive illumination enhancement algorithms (e.g., LIME) [10] or multispectral data fusion can be integrated to improve robustness under complex lighting conditions. Additionally, the dynamic cafeteria environment presents challenges such as changing utensil placement angles, dishware variations, and food residues obscuring plate contours, all of which impact model performance. Introducing adaptive learning mechanisms could enable the model to automatically adjust parameters in response to new data, enhancing generalization ability. To address the issue of partial occlusion caused by improper tray placement, the camera array layout can be optimized to achieve multi-angle coverage. Additionally, a guided tray placement platform can be designed to standardize user behavior. However, when the tray is at an extreme tilt angle or partially obscured by highly reflective tableware, vision-based recognition systems still face a risk of misclassification. To address this, future research could integrate weight sensing units into the visual system to construct a multimodal perception network. By combining visual features and mass variation signals, a cross-validation mechanism could be established—for instance, triggering weight change analysis when the confidence of visual recognition falls below a threshold—thereby significantly enhancing the system’s fault tolerance in complex scenarios [23]. This multi-source information fusion strategy not only compensates for the limitations of a single sensor but also provides multi-dimensional data support for subsequent statistics on tableware recycling.

In terms of system expansion, the current model primarily relies on data collected from fixed cameras in the canteen and has yet to cover broader dining and waste disposal scenarios (e.g., trash bins, dormitories). In the future, cameras or sensors could be deployed in waste collection areas and dormitories to enhance recognition capabilities. Additionally, the current model mainly focuses on leftover food on plates, overlooking the impact of individual and environmental factors on waste behavior. By analyzing these factors, the system can predict waste trends across different times, food types, and groups, improving adaptability and robustness. For instance, changes in ingredient consumption rates, diner fluctuations, and the effects of events or holidays can influence waste behavior. Integrating these data allows the system to adjust strategies in real-time, optimizing resources and reducing food waste more effectively. Secondly, food texture and quality may directly influence waste behavior. Future research could incorporate user feedback on food texture, as well as the types and quantities of leftover food, into model training. By analyzing consumption patterns and residual characteristics of different ingredients, a more sophisticated multimodal recognition system could be developed. Integrating visual information with behavioral data could further enhance recognition accuracy, driving the development of a more intelligent and efficient green campus.

5. Conclusions

This paper has detailed the exploration and implementation efforts of Capital Normal University towards green school initiatives, focusing on the construction background, system composition, recognition results, and policy mechanisms of the automated plate recognition system. Leveraging deep learning technology, the system enables fast and automated recognition of clean plates, achieving an accuracy of over 70%. This significantly reduces the subjectivity of manual identification while improving recognition efficiency. Despite the fact that the general deep learning model used was not specifically tuned for this task, it still performs well in plate recognition, demonstrating its applicability and potential in related scenarios. Additionally, the incentive mechanism of the educational platform fosters data accumulation and accuracy improvement. The system’s implementation has effectively promoted food conservation, reducing overall cafeteria food waste by 13%. Additionally, the reward and incentive mechanism established on the logistics education platform provides an improved platform for communication and feedback among students and staff, further supporting the enhancement of plate recognition accuracy with additional data.

However, the system has areas for improvement: the interpretation markers are not comprehensive enough, the recognition accuracy for meals with less distinct features is lower, and the efficiency of the student feedback sample review mechanism could be enhanced. Future developments will focus on improving the quality of data acquisition, refining interpretation markers, diversifying reward and penalty measures, and enhancing the humanized management mechanisms. These improvements aim to elevate the ‘Clean Plate Campaign’ to a new level, promoting a more intelligent and humane approach to advancing green campus initiatives.

Author Contributions

Y.L. made substantial contributions to the conception and design of the work, participated in the acquisition and analysis of data, and contributed to drafting and revising the manuscript. C.Z. contributed to the development of the methodology, data interpretation, and technical implementation, and critically revised the manuscript for important intellectual content. H.X. participated in data collection and analysis, provided technical support for the study, and contributed to revising the manuscript. Y.Y. assisted in the development of software and the processing of experimental data and contributed to the analysis and interpretation of the results. H.L. contributed to the design and implementation of experiments, data processing, and revisions to the manuscript. L.D. (corresponding author) led the overall conception, supervision, and guidance of the research; took the lead in drafting, revising, and final approval of the manuscript; and is accountable for all aspects of the work, ensuring its accuracy and integrity. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author. The data presented in this study are available on request from the corresponding author. The data are not publicly available due to “The data are not publicly available due to privacy concerns and institutional restrictions related to the use of surveillance data in a cafeteria setting”.

Conflicts of Interest

Author Yuantong Yang was employed by the company Beijing Bonade Technology Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Chaudhary, S. Sustainable chitosan nanoemulsion coatings/films with agri-food byproducts: Advances, composition, production methods and applications in food preservation. Food Meas. 2024, 18, 1627–1649. [Google Scholar] [CrossRef]
Dan, Z.; Fei, L.; Shengkui, C.; Xiaojie, L.; Xiaochang, C.; Zixin, L. The nitrogen footprint of different scales of restaurant food waste: A Beijing case study. Acta Ecol. Sin. 2017, 37, 1699–1708. [Google Scholar]
Lingen, W.; Shengkui, C.; Gang, L.; Xiaojie, L.; Junfei, B.; Dan, Z.; Liwei, G.; Xiaochang, C.; Yao, L. Study on theories and methods of Chinese food waste. J. Nat. Resour. 2015, 37, 715–724. [Google Scholar]
Qiang, Z.; Feng, L.; Zhuang, Q. A survey of canteen food waste and its carbon footprint in universities nationwide. J. Arid. Land Resour. Environ. 2020, 34, 49–57. [Google Scholar]
Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv 2022, arXiv:2004.10934. [Google Scholar]
Redmon, J.; Farhadi, A. YOLO9000: Better, Faster, Stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 6517–6525. [Google Scholar] [CrossRef]
Redmon, J.; Farhadi, A. YOLOv3: An Incremental Improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. In Advances in Neural Information Processing Systems; Curran Associates: Brooklyn, NY, USA, 2016. [Google Scholar]
Long, J.; Shelhamer, E.; Darrell, T. Fully Convolutional Networks for Semantic Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 640–651. [Google Scholar] [CrossRef]
Nguyen, Q.T. Defective sewing stitch semantic segmentation using DeeplabV3+ and EfficientNet. Intel. Artif. 2022, 25, 64–76. [Google Scholar] [CrossRef]
Khoshboresh-Masouleh, M.; Shah-Hosseini, R. Development and evaluation of a deep learning model for realtime ground vehicle semantic segmentation from UAV-based thermal infrared imagery. ISPRS J. Photo-Grammetry Remote Sens. 2019, 155, 172–186. [Google Scholar] [CrossRef]
Mazloumian, A.; Rosenthal, M.; Gelke, H. Deep Learning for Food Waste Classification; Zurich University of Applied Sciences, Institute of Embedded Systems: Winterthur, Switzerland, 2022. [Google Scholar]
Zunair, H.; Hamza, A.B. Sharp U-Net: Depthwise Convolutional Network for Biomedical Image Segmentation. Comput. Biol. Med. 2021, 136, 104699. [Google Scholar] [CrossRef] [PubMed]
Chen, S.; Song, Y.; Su, J.; Fang, Y.; Shen, L.; Mi, Z.; Su, B. Segmentation of field grape bunches via an improved pyramid scene parsing network. Int. J. Agric. Biol. Eng. 2021, 14, 185–194. [Google Scholar] [CrossRef]
Chen, L.C.; Papandreou, G.; Schroff, F.; Adam, H. Rethinking Atrous Convolution for Semantic Image Segmentation. arXiv 2017, arXiv:1706.05587. [Google Scholar]
Berger, L.; Hyde, E.; Cardoso, M.J.; Ourselin, S. An Adaptive Sampling Scheme to Efficiently Train Fully Convolutional Networks for Semantic Segmentation. Med. Image Underst. Anal. 2018, 894, 277–286. [Google Scholar] [CrossRef]
Boonpogmanee, I. Fully Convolutional Neural Networks for Semantic Segmentation of Polyp Images Taken During Colonoscopy. Am. J. Gastroenterol. 2018, 113 (Suppl. S2), S1532. [Google Scholar] [CrossRef]
Duan, K.; Xie, L.; Qi, H.; Bai, S.; Tian, Q. Corner Proposal Network for Anchor-free, Two-stage Object Detection. Comput. Vis. ECCV 2020, 12348, 399–416. [Google Scholar] [CrossRef]
Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. arXiv 2014, arXiv:1311.2524. [Google Scholar]
Xiao, Y.; Lepetit, V.; Marlet, R. Few-shot Object Detection and Viewpoint Estimation for Objects in the Wild. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 3090–3106. [Google Scholar] [CrossRef] [PubMed]
Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention (MICCAI); Springer: Cham, Switzerland, 2015; pp. 234–241. [Google Scholar] [CrossRef]
Joshua, S.R.; Shin, S.; Lee, J.H.; Kim, S.K. Health to eat: A smart plate with food recognition, classification, and weight measurement for type-2 diabetic mellitus patients’ nutrition control. Sensors 2023, 23, 1656. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Architecture of the automatic plate recognition system.

Figure 2. Real-time data acquisition platform. (a) Overview of the real-time data acquisition system deployed at cafeteria waste collection points. (b) Fixed camera setup at the collection point, showing the typical field of view. (c) Side-mounted cameras on tripods capturing diners’ facial features for identity matching. (d) Top-view camera mounted directly above the collection area, capturing clear imagery of plates.

Figure 3. Interpretation markers of clean and non-clean plates.

Figure 4. Diagram of U-Net network architecture.

Figure 5. Diagram of the PSPNet network architecture.

Figure 6. Diagram of the DeepLab V3+ network architecture.

Figure 7. Loss functions of different methods.

Figure 8. Simple scenario.

Figure 9. A complex scenario.

Table 1. Hyperparameter configuration for each model.

Name of the Network	Learning Rate	Batch Size
U-Net	1 × 10⁻⁴	32
PSPNet	1 × 10⁻⁴	32
DeepLabV3+	5 × 10⁻⁵	64

Table 2. Comparison of recognition accuracy.

Name of the Network	MIOU (%)	MPA (%)
U-Net	68.64	78.21
PSPNet	63.72	75.66
DeepLabV3+	65.51	77.64

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, Y.; Zhang, C.; Xu, H.; Yang, Y.; Lu, H.; Deng, L. Strategies for Automated Identification of Food Waste in University Cafeterias: A Machine Vision Recognition Approach. Appl. Sci. 2025, 15, 5036. https://doi.org/10.3390/app15095036

AMA Style

Li Y, Zhang C, Xu H, Yang Y, Lu H, Deng L. Strategies for Automated Identification of Food Waste in University Cafeterias: A Machine Vision Recognition Approach. Applied Sciences. 2025; 15(9):5036. https://doi.org/10.3390/app15095036

Chicago/Turabian Style

Li, Yongxin, Chaolong Zhang, Hui Xu, Yuantong Yang, Han Lu, and Lei Deng. 2025. "Strategies for Automated Identification of Food Waste in University Cafeterias: A Machine Vision Recognition Approach" Applied Sciences 15, no. 9: 5036. https://doi.org/10.3390/app15095036

APA Style

Li, Y., Zhang, C., Xu, H., Yang, Y., Lu, H., & Deng, L. (2025). Strategies for Automated Identification of Food Waste in University Cafeterias: A Machine Vision Recognition Approach. Applied Sciences, 15(9), 5036. https://doi.org/10.3390/app15095036

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Strategies for Automated Identification of Food Waste in University Cafeterias: A Machine Vision Recognition Approach

Abstract

1. Introduction

2. The Automatic Plates Recognition System

2.1. Data Acquisition

2.2. Data Preprocessing

2.3. Clean Plate Recognition Method

2.4. Accuracy Assessment

2.5. Recognition Result Feedback

3. Results and Discussion

3.1. Hyperparameter Tuning and Experimental Setup

3.2. Quantitative Comparison

3.3. Qualitative Comparison

3.4. System Application Effectiveness

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI