1. Background and Literature Review
Rapid urban growth and shifting consumption patterns in Panama—intensified after the Canal expansion—have increased both the volume and complexity of urban waste streams [
1]. Historically, disposal relied on burial and incineration when materials were largely biodegradable [
2]. Current realities are more demanding: disposal infrastructure and public education on waste reduction lag behind needs, and Panama City’s Cerro Patacón landfill has faced irregular collection, fires, and environmental impacts despite gas-capture efforts [
1,
3,
4]. Nationally, the legal framework has advanced in line with the Basel Convention and EU directives on Waste from electrical and Electronic Equipment (EEE) and single-use plastics [
5,
6,
7]; and Panama adopted laws on hazardous-waste import bans and integrated solid-waste management that call for data-driven strategies and stakeholder participation [
8,
9,
10,
11]. However, implementation gaps persist in institutional settings such as universities [
12].
Higher Education Institutions (HEIs) generate diverse waste streams from teaching, labs, offices, residences, and food services, making them strategic arenas to model sustainable practices [
13]. Studies on campuses report organic residues from cafeterias, paper and e-waste from offices, and hazardous waste from labs, underscoring the need for accurate quantification and tailored interventions [
14,
15]. In Panama, the Universidad Tecnológica de Panamá (UTP) faces typical pressures as student numbers and activities grow across 19 buildings with intensive laboratory and food-service operations [
16,
17]. Local works have explored cafeteria recycling and reuse programs, incentive-based schemes, “zero-waste” planning, and energy densification of organics [
12,
17,
18,
19,
20]. These initiatives inform campus operations but generally do not deploy predictive modeling to anticipate waste dynamics.
Early university studies centered on measuring and classifying waste to provide a planning baseline [
13,
21]. A recent bibliometric review identified a thematic shift from “characterization/collection” toward “management,” “sustainability,” and “circular economy,” reflecting a move to integrated, evidence-based decision making [
22]. In parallel, work on municipal and national systems has matured, offering methods and tools that could, in principle, be adapted to campuses [
23]. Yet HEI-specific evidence, especially in Latin America, remains sparse, limiting context-appropriate design and calibration of models [
22].
A diverse array of quantitative frameworks underpins modern forecasting and system design, ranging from classical statistics to advanced computational modeling:
Statistical indicators and inference: Proportion-based metrics (e.g., Waste Category Proportion) and chi-square tests help examine relations between user groups and waste composition [
24,
25]
Regression models: Multiple Linear Regression (MLR) is widely used to estimate generation from socioeconomic and operational variables, including event-based contexts [
26,
27,
28]
Optimization: Mixed-integer linear programming and related models support collection-route efficiency, siting, and multi-objective planning under uncertainty [
29,
30,
31,
32,
33,
34]
Dynamic/system models: System-dynamics formulations capture feedback and temporal evolution for policy testing and financial analysis [
23,
35,
36]
Machine learning and hybrid methods: Recent studies apply neural networks, ensembles, and fuzzy optimization to forecast waste with limited histories and high variability (e.g., applications in campus dining and municipal flows), often achieving high predictive accuracy but focusing mainly on operational optimization rather than social drivers [
22,
37,
38]
A growing body of evidence demonstrates that attitudinal factors, social norms, and institutional roles are primary drivers of participation and waste separation outcomes. Structured surveys have been used to capture knowledge, habits, and perceptions, and to link them to generation or diversion [
24,
39]. Agent-based modeling grounded in the Theory of Planned Behavior has been applied to simulate participation under different recycling schemes [
40]. Despite this progress, most models still treat behavior and operations in isolation, which limits their use for interventions in complex institutional settings like HEIs [
22]. For campuses, a key opportunity is to connect recovery data (e.g., organics, e-waste) with predictive modeling to plan infrastructure and behavior-change programs jointly—an integration that remains rare in Latin America [
41].
However, three critical gaps limit the practical application of existing research in real-world campus settings in Latin American higher education institutions.
First, there is a lack of integrated shift-level analysis. Most campus waste studies rely on daily or weekly aggregates, overlooking intra-day dynamics that directly inform operational decisions. Without understanding how waste accumulates across meal periods, such as breakfast, lunch, and dinner, timely interventions, including adaptive collection schedules or peak-period targeting, remain difficult to implement.
Second, persistent silos exist between data types. Waste characterization, behavioral surveys, and predictive modeling are often conducted independently, limiting their combined value. As a result, behavioral insights rarely inform operational models and observed waste patterns lack explanatory context regarding user behavior.
Third, many existing models are poorly suited to institutional constraints. Although advanced machine-learning approaches show promise, they typically require long time series, substantial computational resources, and specialized expertise that many universities, especially in Latin America, lack. This creates a clear need for simpler, interpretable tools that facility managers can apply directly.
This study is designed to address these gaps directly by integrating (i) 20 days of detailed waste characterization across shifts in a university cafeteria, with (ii) 772 questionnaires (705 valid responses) on demographics and behavioral factors. Based on this integrated dataset, two complementary predictive tools were developed: an automated rule-based classifier (contingency tables) and a Monte Carlo simulation based on a polynomial regression function with uncertainty. The proposed approach low-cost, scalable decision support that links operational and behavioral dimensions in a Latin American HEI context, aligning with national mandates on integrated solid-waste management and with SDG 12 on responsible consumption and production.
Building on the above, this work aims to statistically model solid-waste flows in a university cafeteria, forecast future generation, and assess management strategies.
RQ2. Which demographic and behavioral factors (e.g., user role, faculty, meal type) are associated with variations in waste levels?
2. Materials and Methods
2.1. Study Site and Scope of Analysis
This study was carried out at the central campus of the Universidad Tecnológica de Panamá (UTP), located in Panama City. The analysis focused on the cafeteria of Building No. 1 (
Figure 2), an institutional facility that provides food services to students, academic staff, administrative personnel, and external users. The cafeteria operates during three defined service shifts: breakfast (7:00–10:00), lunch (11:00–14:00), and dinner (16:00–18:00). During the semester, this space maintains a high user flow throughout the day and plays a central role in campus dynamics.
The study site meets three key design requirements: (i) a stable three-shift service schedule (breakfast, lunch, dinner); (ii) high but variable user throughput, which enables the identification of within-day peaks; and (iii) managerial access to shift records and permission to conduct repeated measurements across consecutive working days. Together, these conditions ensure the user-flow signal is observable and that the data-collection protocol can be implemented under routine operational conditions. For clarity, this case study is intended as an operational “typical-case” setting for developing and validating a replicable workflow, rather than as a statistically representative sample of all HEIs. The procedure is designed for replication in facilities with comparable structural conditions; the findings are not intended to be generalized beyond this analytical frame.
This specific cafeteria was selected after conducting preliminary visits and interviews with operational staff, who confirmed that it is one of the highest generators of solid waste on campus. Several factors supported its selection: its high and consistent activity level, a well-defined waste collection area that could be monitored regularly, and a structured service schedule that allowed observation. These conditions enabled a focused study design with controlled variables, reducing logistical noise that often complicates broader institutional characterizations. Because it functions as a high-throughput, multi-user institutional canteen with routine service patterns, the site is suitable as a sentinel facility for testing shift-level predictive tools under real operating conditions.
The scope of the study was limited to this facility and did not include informal food vendors or satellite areas. The objective was not to generalize to the entire university system, but rather to develop a detailed understanding of solid waste flows in a regulated institutional setting. By isolating one service unit with consistent operations, the study aimed to explore relationships between waste generation, cafeteria use, and user behavior—insights that can inform the design of management strategies and predictive tools. Transfer to other institutions is expected through local calibration (thresholds, coefficients, and rules), not by directly exporting parameter values from UTP.
2.2. Preparatory Activities and Team Organization
The planning and execution of the fieldwork required a structured coordination process to ensure that both the waste characterization and survey application could be carried out reliably. Preparatory activities began with a series of visits to the cafeteria of Building No. 1 to observe its operation schedule, physical layout, user flow, and the conditions of its waste collection area. These visits allowed the research team to define observation points, identify constraints, and adapt the logistics accordingly.
One of the main challenges identified early on was the lack of a designated space to handle the separation and weighing of solid waste. After evaluating several alternatives, the team secured a work area next to the waste collection room, with sufficient space to set up an electronic scale, containers, and safety barriers. Materials such as a digital scale, waste collection tanks, gloves, liners, labels, aprons, and face shields were purchased in advance.
To ensure that data collection across all three service shifts was viable, we requested support from student volunteers. An open call was launched through the university’s Student Welfare Department, resulting in the participation of 38 students, organized by rotating teams. These volunteers participated in a total of 60 shifts—20 for each meal period—accumulating approximately 300 h of service.
Each shift required the presence of at least four to five volunteers, with roles distributed into three functional categories:
Separation and weighing team: In charge of preparing labeled containers before each shift, supervising that users dispose of their waste correctly, and performing the weighing of each waste fraction at the end of the service. They also ensured the workspace remained clean and safe.
Digital recording team: Responsible for entering the recorded weights into a structured form created with Microsoft Forms. This team also ensured that labels on the containers matched the records to avoid data mismatches.
Survey team: Approached users as they exited the cafeteria, explained the objectives of the study, and encouraged them to scan a QR code to complete the behavioral survey on their mobile devices. When necessary, volunteers assisted respondents during the process to ensure data quality.
To maintain consistency across shifts, a training session was held for all volunteers before the start of fieldwork. This included a presentation on safety measures, data collection protocols, and role assignments. Each volunteer received a digital infographic summarizing procedures and a digital copy of the full protocol. An informational banner was placed inside the cafeteria to inform the community about the project and legitimize the presence of the data collection team.
2.3. Data Collection and Waste Characterization
To capture representative weekday patterns, waste was monitored during 20 consecutive working days in April 2024, a period located in the first academic semester, when university statistics show the campus reaches its highest average foot-traffic and experiences fewer public holidays [
16]. This scheduling ensured that data was collected under stable service conditions and typical user demand.
The decision to monitor for 20 days was based on three considerations: (i) this period spans approximately one full academic month, encompassing four complete Monday–Friday cycles and multiple menu rotations, which allows weekday and shift-specific variations in attendance and waste composition to be repeatedly observed under comparable operating routines; (ii) it yields a sufficient number of independent shift-day observations (
n = 60) to support descriptive and inferential statistical analysis while limiting the influence of seasonal effects that may arise when sampling across different months; and (iii) it balances data robustness with the logistical constraints of sustained fieldwork: extending the campaign beyond this duration would have produced diminishing returns, as menu patterns and cafeteria practices remain relatively constant within a semester, while imposing higher logistical demands on volunteers and equipment. The study used systematic consecutive-day sampling: every weekday in the chosen month was measured instead of selecting random days. In comparable waste-audit practice, characterization campaigns are commonly conducted over ~1 week (e.g., 5–8 sampling days) [
42]; therefore, the 20 consecutive weekdays used here provide a conservative sampling window that exceeds typical minimums and improves stability of weekday-cycle estimates.
The selected window corresponds to a high-attendance teaching period early in the semester, which maximizes the likelihood of continuous cafeteria operation and minimizes disruptions due to public holidays or atypical schedules. Conducting the monitoring consecutively, rather than on a random subset of days, helped maintain comparability between shifts and avoided confounding factors introduced by irregular sampling intervals. This design can therefore be described as a systematic consecutive-day sampling approach, intended to represent regular teaching weeks characterized by stable campus activity. Accordingly, the results should be interpreted as representative of regular teaching weeks and not as a year-round characterization that includes exam sessions, holiday periods, or atypical academic schedules.
From a statistical perspective, the four repeated weekday cycles increase replication and reduce day-specific noise, enabling contrasts between shifts and weekdays with adequate degrees of freedom. The sample size achieved (60 shift-days) provides sufficient statistical power to detect moderate differences in waste generation patterns without requiring long-term monitoring. Although this design strengthens internal consistency, it is acknowledged that the exclusive focus on a high-demand period may over-represent peak scenarios. Consequently, periods of reduced attendance such as exam recesses or mid-year breaks were not captured and should be addressed in future campaigns to examine potential seasonal variability. Accordingly, the reported generation levels should be interpreted as representative of peak teaching-period operations; lower-demand periods (exam weeks, breaks, inter-semester) were outside the scope and may yield different baselines.
Two color-coded tanks were used for on-site segregation of waste: one for organic residues (primarily food waste) and another for inorganic materials (mainly packaging). The tanks were positioned at the standardized disposal area adjacent to the dish return station. Visual signage and supervision by trained volunteers helped minimize misclassification. This two-stream scheme follows international measurement frameworks that scope food (including inedible parts) separately from packaging and allows pragmatic category design to match study objectives and operational constraints [
43,
44,
45]. The split is also decision-relevant because it maps to typical end-of-life pathways (e.g., composting/anaerobic digestion for organics; recycling/landfilling for inorganics) and enables standard GHG calculations when needed [
46]. The two-stream classification supports shift-level load forecasting but does not resolve material-specific recyclables patterns; future work will incorporate finer sorting when operationally feasible.
This aggregation is deliberate: the objective of the present work is shift-level load forecasting for collection logistics and capacity planning under routine conditions. In this cafeteria, the operational decision unit is the total mass requiring handling per shift (bagging, storage, and removal), which can be estimated reliably with a two-stream audit under volunteer-based field constraints. Nevertheless, recyclable fractions (e.g., PET, aluminum, paper) can follow different formation mechanisms than mixed packaging waste; therefore, the current model should not be interpreted as predicting recyclable capture potential, diversion rates, or material-specific pathways.
Organic waste was weighed three times during each shift using a calibrated digital scale with ±2 kg uncertainty. The scale has a maximum capacity of 600 kg and a readability of 0.2 kg. Prior to data collection, the instrument was checked using a zero-reading procedure and verified with known reference loads available on-site to confirm stable readings. The ±2 kg reported in this study is a conservative operational uncertainty showing field conditions (bag handling, platform stability, and instrument resolution), rather than a statistical confidence interval. This uncertainty has a higher relative influence during low-load periods (e.g., breakfast); therefore, results for small totals were interpreted at the shift-aggregate level and across repeated weekday cycles to reduce the influence of individual low-mass readings. The times selected corresponded to the beginning, midpoint, and end of each service window:
Breakfast: 8:00, 9:00, 10:00
Lunch: 12:30, 13:30, 14:30
Dinner: 17:30, 18:30, 19:30
Hourly weighing was adopted for organics because putrescible, moisture-rich residues fill and change more rapidly within service periods and—critically—food safety and sanitation codes require the removal of refuse as often as necessary to minimize odors and conditions that attract pests [
47]. Operationally, this meant removing, tying, labeling, and weighing the organic bag at each time point and immediately installing a new bag, which preserved hygiene and provided time-resolved profiles for meal rushes. The weight was recorded on a standardized form developed in Microsoft Forms, which captured the shift, time, and volunteer name.
Inorganic waste followed the same classification scheme but was only weighed once per shift, typically at the end or when the bin reached capacity. Packaging-related materials accumulated more slowly and exhibited lower intra-shift variability than organics in this cafeteria context, so a single end-of-shift measurement provided a reliable shift total while limiting handling time and avoiding unnecessary disturbance to service. This focus is consistent with sector evidence showing that food residues are a dominant mass component in hospitality and food service, whereas packaging is comparatively smaller and less volatile during service windows [
48].
2.4. User Survey Design and Sampling
In parallel with the waste characterization, a behavioral survey (
Appendix A) as administered to cafeteria users through a QR code displayed on banners and promoted by student volunteers. The survey was designed to gather information on user characteristics and behaviors associated with waste generation, providing complementary data to support the interpretation of observed waste patterns.
The questionnaire included 16 questions, grouped into three thematic sections:
Demographics: basic user descriptors.
Consumption and disposal behaviors– patterns of cafeteria use, leftover management, and waste handling.
Environmental awareness and preferences– attitudes towards sustainability and packaging choices.
Demographic items and behavior questions were single-choice nominal categories. Behavioral frequency items used a six-point ordinal scale coded 0 = Never to 5 = Daily/Always. Attitudinal items (satisfaction, packaging preference, awareness, knowledge, willingness, perceived importance) relied on a five-point Likert-type scale coded 1 = Very low to 5 = Very high. An optional open-ended field allowed respondents to leave an e-mail address if they wished to receive study results.
The survey was administered online, ensuring anonymity and reducing response bias. Participation was voluntary and promoted equally across all shifts and weekdays to improve time–location coverage, rather than to achieve random sampling. To ensure ethical compliance, the survey design was reviewed and approved by the university’s Bioethics Committee prior to implementation. No personally identifiable information was collected, and all responses were used strictly for research purposes.
Linkage between survey responses and measured waste: Microsoft Forms automatically records a submission timestamp for each questionnaire. For analysis purposes, each response was assigned to the corresponding service shift (breakfast, lunch, dinner) and date based on its timestamp. This allowed the response to be linked to the measured waste category (low/medium/high) and user count recorded for that same shift-day. Daily analyses that use total waste are therefore interpreted as exploratory, whereas shift-level analyses (Stage 3 and predictive models) align with the operational unit of measurement and action.
Because participation occurred by scanning a QR code, the design is a non-probability, time–location convenience sample. To improve coverage and reduce selection bias, invitations were issued uniformly across all service and weekdays during the study month using identical banners and standardized, neutral scripts; banner positions were rotated around the entrance and dish-return areas. No incentives were offered. This procedure improves coverage but does not create a probability sample; therefore, inferences are framed for on-site cafeteria users during the observed month.
The minimum target sample size was estimated using standard formulas as a planning benchmark for precision under conservative assumptions (
p = 0.5, E = 0.05). Because the survey was non-probabilistic, this calculation does not imply probabilistic representativeness; rather, it serves to avoid under-sampling and to support stable comparisons across user subgroups. The initial sample size was calculated using Equation (1), and the finite population correction was applied using Equation (2). Both calculations were based on conservative assumptions to ensure adequate precision of the estimated parameters [
49].
where:
(95% confidence level)
(maximum variability)
(margin of error)
The finite population correction was applied using [
49]:
where
is the corrected sample size,
is the initial sample size for an infinite population, and
is the total number of students from the faculties housed in the cafeteria building, obtained by aggregating enrollment data from each faculty.
A total of 772 questionnaires were received. After screening, 705 responses were retained as valid for analysis. Invalid submissions were removed due to (i) incomplete responses in key fields, (ii) inconsistent entries, and (iii) duplicates when identifiable. Because recruitment was voluntary and the number of individuals exposed to the QR invitation cannot be enumerated, a conventional response rate could not be computed. We therefore report final valid responses relative to the target sample size threshold.
2.5. Estimation of Customer Flow per Shift
The number of users served in each shift was the main driver of the predictive model and a practical proxy for waste generation. During the 20-day study, shift-level counts were obtained from the cafeteria’s administrative records (cashier/POS tallies aggregated by service window) and triangulated with field observations and time-stamped photographs collected by the research team.
On every study day and for each meal period, the team produced short observation logs (queue length at the dish-return and general seating occupancy, notes on unusual events such as group visits or promotions) and captured time-stamped photographs at anchor moments within the service window (start, midpoint, end). These materials were used to verify the internal consistency of the administrative series (e.g., identification of peak periods and relative ranking of shifts) and to flag atypical days. When the photographic/observational record suggested an anomaly, the case was annotated after brief consultation with on-duty staff; the administrative record remained the primary source for the daily counts.
This cross-check consistently reproduced the shift ranking observed throughout the campaign (lunch > dinner > breakfast). Because entry was not instrumented (no turnstiles or ticket scans), some measurement error cannot be ruled out. To show this in the modeling stage, uncertainty in user flow was explored through sensitivity analyses (we explored uncertainty by varying the administrative count ±5% to ±20), which did not alter the qualitative patterns reported.
The visitor-count variable is used as an operational proxy for demand-driven load, not as a causal determinant of waste generation. Unobserved factors such as menu composition, pricing, promotions, or campus events may co-vary with attendance and contribute to residual variance; therefore, regression coefficients are interpreted as associative planning parameters within the observed operating regime rather than as causal effect sizes.
2.6. Variable Selection
To determine which behavioral and demographic variables were most relevant for predicting waste generation in the cafeteria, a sequential multi-stage analytical process was implemented. Rather than relying on a single statistical test, the analysis was structured to progressively refine the response representation and reduce the initial pool of predictors, retaining only variables that remained consistent across screening steps and were meaningful for operational modeling.
The three-stage design was defined as a priori to match the structure of the available data and to separate: (i) exploratory signal detection, (ii) operational tier-based screening, and (iii) shift-level modeling aligned with cafeteria decision-making.
Stage 1 (daily totals): each survey response was associated with the total waste recorded on the same day on the day calendar to screen for broad group differences.
Stage 2 (daily tiers): the screening was repeated using daily low/medium/high categories to assess whether the same patterns persisted when waste was expressed in operational tiers, using ordinal-consistent procedures.
Stage 3 (shift-level): each survey response was assigned to a shift-day using its timestamp and linked to the measured shift waste category. Associations were evaluated using Chi-square and Cramér’s V, and retained predictors were used in the multinomial model.
Normality and variance assumptions were assessed using the Shapiro–Wilk test [
50] and Levene’s test [
51] to guide the selection and interpretation of parametric versus nonparametric procedures during the screening stages.
The dependent variable, hourly solid-waste generation per shift (kg·h−1), was obtained from the direct weighing carried out during the characterization campaign. Using shift-based measurements rather than aggregated monthly or daily figures allows the analysis to capture short-term fluctuations linked to service peaks and operational schedules, consistent with the objective of forecasting cafeteria waste output.
The pool of independent variables was drawn from two domains, representing both user attributes and consumption behavior observed on campus. For clarity, these variables were grouped as follows: Demographic attributes (5): (a) Gender; (b) age group; (c) academic role; (d) campus site; and (e) faculty affiliation. Consumption-related factors (10): (a) visit frequency; (b) most-purchased product type; (c) disposal habit factor; (d) wastage habit factor; (e) self-reported reason for leaving leftovers; (f) packaging preference; (g) environmental awareness level; (h) self-assessed knowledge level; (i) willingness to join sustainability programs; and (j) perceived relevance of individual action to campus sustainability goals.
2.6.1. Stage 1: Continuous Analysis Using Daily Totals
In Stage 1, total daily waste generation was treated as a continuous dependent variable and compared across survey-defined groups. One-way ANOVA was applied as an initial mean-comparison test. Because daily totals may deviate from parametric assumptions and group sizes may be unequal, the Kruskal–Wallis test was applied as a complementary rank-based alternative [
51]. Concordance between ANOVA and Kruskal–Wallis outcomes was used to verify whether detected group differences were consistent across parametric and rank-based approaches.
2.6.2. Stage 2: Categorical Analysis Using Daily Groupings
In Stage 2, total daily waste generation was transformed into three ordered categories (low, medium, and high) using percentile thresholds computed from the observed distribution (
Appendix B). These tiers were coded as 1, 2, and 3. Kruskal–Wallis tests were then repeated using the tiered outcome to evaluate whether the group contrasts observed in Stage 1 persisted under an operational tier representation.
As an additional exploratory check, a linear regression was fitted using the tier-coded outcome (1–3). This analysis was used only to test whether any strong linear signal emerged and did not define the final modeling decisions, which were developed under the shift-level framework (Stage 3).
2.6.3. Stage 3: Shift-Level Analysis and Categorical Modeling
In stage 3 the analytical unit was defined at the shift-day level (breakfast, lunch, and dinner), consistent with the waste monitoring protocol and the operational structure of cafeteria service. Total waste generated per shift was categorized into three ordinal levels—low, medium, and high—using tertile thresholds (
Appendix C). These categories were used as the dependent variable to evaluate associations with the survey-based independent variables.
The Chi-square test of independence was applied to evaluate whether statistically significant associations existed between predictor categories and the shift-level waste tiers [
24]. The expected frequencies in the contingency tables were calculated using Equation (3):
where:
is the expected frequency in cell
and are the marginal totals of the row and column,
is the total sample size.
For variables with statistically significant Chi-square results, Cramér’s V was calculated to evaluate the strength of association [
52], using Equation (4):
is the Chi-square statistic,
N is the total number of observations
and are the number of columns and rows, respectively
Only predictors showing both statistical significance and non-negligible association strength were retained for subsequent modeling. This dual criterion helped prioritize variables with interpretable relationships to the outcome while maintaining a parsimonious predictor set for the final specification.
To estimate the probability of each shift belonging to a given waste-generation category, a multinomial logistic regression model was formulated [
53]. The general form is Equation (5):
The reference category was defined to represent the baseline operating condition commonly observed in the cafeteria, enabling interpretable comparisons for medium and high waste-generation levels relative to this baseline. By restricting the model to predictors retained through the preceding stages, the final specification emphasizes parsimony and interpretability.
2.7. Predictive Modeling and Implementation
This section presents the predictive modeling strategies developed to simulate and classify the quantity of solid waste generated per shift in a university cafeteria. Two approaches were designed, each responding to different analytical needs:
All predictive modeling procedures were implemented using RStudio (v2024.04.2) under Windows 11. The rule-based classification algorithm was developed using custom functions written in base R, including procedures for variable categorization, construction of shift-specific contingency tables, and frequency-based rule assignment. Stratified partitioning of the dataset into training and testing subsets was performed using the caret package to preserve class distributions across subsets.
The multinomial logistic regression models were fitted using the nnet package (multinom function). Monte Carlo simulations were conducted through iterative resampling of the dataset and repeated refitting of the regression models using custom R scripts. All modeling steps were executed using fixed model structures to ensure reproducibility and consistency across simulation runs.
2.7.1. Rule-Based Classification Algorithm
This method was developed to classify the quantity of waste generated in each cafeteria shift into three ordinal categories—low, medium, or high—based on the number of users recorded.
The goal was to automate the assignment of waste generation levels using a fixed set of rules derived from historical data. By using categorized inputs and frequency tables, the model allows quick, repeatable classification without relying on regression coefficients or machine learning algorithms.
The modeling process involved five main stages:
- (a)
Variable preparation and categorization
Two independent variables were used:
Shift type, encoded as 1 (breakfast), 2 (lunch), or 3 (dinner).
Number of users per shift, categorized into three levels:
- ○
1 = Low
- ○
2 = Medium
- ○
3 = High
This categorization was based on the distribution of users using quartiles and was implemented in R through custom functions (categorize_turno() and categorize_clientes()).
- (b)
Stratified partition of training and testing data
To ensure the model’s performance could be evaluated, the dataset was split into:
This partitioning was performed with the createDataPartition() function from the caret package, which ensured that the distribution of categories remained balanced across both subsets.
- (c)
Construction of shift-specific contingency tables
Using only the training data, separate contingency tables were created for each shift (breakfast, lunch, and dinner), recording the frequency of each waste-generation level under each user-flow category.
To ensure consistent table structure, combinations not observed in the data were completed with zero frequencies using the custom function ajustar_tabla().
- (d)
Definition of classification rules
The core idea of the model is simple: For any given shift and level of user attendance, assign the most frequently observed waste generation category recorded under that condition in the training data.
This logic was implemented in the function asignar_generación(), which:
This is a fixed rule system, meaning that it does not “learn” from new data, but instead shows patterns established during the training phase.
- (e)
Application of the model
The classification function was then applied to the complete dataset. For each record (in both training and test sets), the algorithm retrieved the appropriate rule and assigned the predicted waste category accordingly.
2.7.2. Development of the Monte Carlo Predictive Model
To simulate the relationship between the number of cafeteria users and the quantity of solid waste generated, a predictive model was developed based on simple linear regression using the least squares method. This technique aims to determine the best-fit line that minimizes the sum of squared differences between observed and predicted values, assuming a linear relationship between variables [
55,
56].
The general form of the model is Equation (6):
where
represents the predicted value of solid waste generation (kg),
is the intercept corresponding to the estimated waste generation when no users are present,
is the slope of the regression line expressed in kg per user, and
denotes the number of users per hour.
The residual for each observation is defined in Equation (7):
and the model performance was evaluated using the mean squared error (MSE), calculated as in Equation (8): [
57]
The coefficients
and
were estimated to minimize the MSE using Equations (9) and (10):
Once the parameters were obtained, the model equation was used to estimate waste generation based on the simulated number of users in each iteration.
Preliminary Analysis and Variable Definitions
To develop the predictive model, a preliminary analysis was carried out to evaluate the behavior of the main variables and to define the structure of the simulation.
- (a)
Definition of variables
Dependent variable (G): Total solid waste generation (kg) per hour, per shift, per day.
Independent variable (f): Number of users per shift, per day.
The relationship between the two variables is expressed conceptually as :
- (b)
Assumptions regarding waste generation (G)
The dependent variable was defined as total solid waste weight per hour. While this aggregation does not distinguish between specific waste streams (e.g., kitchen waste, recyclables, or packaging), it represents the operational metric directly relevant for on-site collection logistics, storage capacity, and service scheduling. The focus on total weight was therefore intentional and aligned with the primary objective of forecasting overall waste load per shift. The authors acknowledge that disaggregated waste fractions may respond to different behavioral drivers; however, such differentiation would require parallel characterization protocols beyond the scope of the present dataset. Accordingly, results derived from G should be interpreted as shift-level load forecasts for operational planning, rather than as stream-specific predictions (e.g., recyclables vs. residuals) or source attribution (kitchen vs. consumer waste).
The waste generation variable was analyzed under two general assumptions:
EG1: Waste is generated exclusively during operational hours; no significant accumulation occurs between shifts.
EG2: If waste is generated during non-measured periods, interpolation or hourly averages may be used to estimate values, provided they remain within the bounds of occupancy time.
Depending on how continuity between shifts is treated, waste generation can be modeled as cumulative flow or as discrete segments. Accordingly, three conceptual subscenarios were defined to represent alternative interpretations of waste behavior throughout the day (
Figure 3):
Subscenario 1 (Inc. in weight E1): No waste is generated between shifts. Generation is considered stable and limited to each time block. Represented by the orange dashed line.
Subscenario 2 (Inc. in weight E2): Waste accumulates progressively, estimated through interpolation between measured values. Represented by the blue continuous line.
Subscenario 3 (Inc. in weight E3): Waste generation is only counted during specific shift periods, without interpolating between them. Represented by the green dashed line.
The definition of alternative waste-generation scenarios (E1–E3) was intended to represent different conceptual interpretations of temporal continuity between measured points, rather than to formulate statistically testable hypotheses. Due to the limited temporal resolution of disposal events, no independent observations were available to support a formal statistical comparison between scenarios.
Consequently, the selection of Subscenario E2 was based on methodological coherence with field observations and modeling requirements. This formulation preserves continuity between consecutive shifts and represents the observed lag between consumption and disposal, which is necessary for hourly simulation and probabilistic resampling. The choice therefore responds to structural and operational considerations inherent to the dataset, rather than to statistical model selection criteria.
Subscenarios E1–E3 can be interpreted as bounding assumptions about temporal continuity: E1 represents a discontinuous, shift-isolated process (lower-bound continuity), whereas E3 represents a strictly segmented accounting approach (upper-bound discontinuity), and E2 provides an intermediate continuous reconstruction suitable for hourly simulation. Importantly, the regression-based validation is performed against observed waste values in the hold-out set; therefore, the role of E2 is to enable internally consistent hourly resampling within the Monte Carlo framework rather than to claim empirical superiority over alternative continuity assumptions.
- (c)
Assumptions regarding number of users (f)
The independent variable
represents the number of clients served per shift (and its hourly allocation is reconstructed for simulation). It was analyzed using the same temporal granularity as waste generation, either at the hourly level or by shift segments. This variable follows a pattern consistent with the service dynamics of the cafeteria, and its behavior resembles the trend shown in
Figure 3.
The independent variable was restricted to the number of users per shift, which acts as a direct operational proxy for waste generation. Although factors such as meal type, portion size, or take-away behavior may influence waste production, these variables were not consistently available at the hourly scale required for simulation. The choice of user count prioritizes temporal consistency and data completeness, allowing the model to capture demand-driven variability while maintaining coherence with the available measurement resolution.
Since the maximum number of users per shift is known, the most appropriate approach was to use hourly averages to reconstruct realistic attendance profiles for each shift. In
Figure 3, this corresponds to the average user-flow profile (E2, secondary axis), which was selected as the most representative scenario. These averages allowed the simulation to represent typical daily or shift-based client behavior with operational consistency.
An important modeling question at this stage was whether the actual hourly pattern of user attendance coincides with the curve of waste generation. While some level of correlation is expected, there are factors that introduce divergence:
These factors introduce temporal and behavioral uncertainty into the relationship between user flow and waste generation, justifying the use of probabilistic modeling and iterative simulation to capture this variability. Therefore, the model estimates average waste loads conditional on attendance, and it is not intended to explain per-user waste intensity drivers such as menu composition, portion size, or take-away practices.
Waste Generation Scenario
To simulate the behavior of solid waste generation in the university cafeteria, the dataset was divided into training and testing subsets based on hourly, shift, and daily observations. This allowed the construction of a reliable baseline for trend estimation and model validation.
- (a)
Data partitioning
The dataset was divided chronologically, selecting the first 80% of observed days as the training set and the remaining 20% as the testing set. This 80/20 hold-out split reserves an independent subset for performance evaluation while preserving temporal ordering and reducing information leakage from later observations into model fitting. A random split was also tested as a robustness check, yielding no meaningful differences.
- (b)
Data organization and cleaning
The training dataset was structured in a spreadsheet, with columns representing the timestamp (in 24 h format) and the corresponding quantity of waste generated. The 24 h format was selected to maintain chronological coherence and to facilitate the fitting of continuous curves.
To ensure data quality, outliers were removed using the interquartile range (IQR) method. Upper and lower bounds were calculated based on the standard 1.5 × IQR rule, as previously described in the statistical treatment section.
- (c)
Estimation of central tendency and dispersion
Given that the waste data did not follow a normal distribution, the median was used as a more robust indicator of central tendency. It was calculated using the 80% training subset, grouped by hour, allowing the model to represent a typical daily generation profile.
Additionally, the standard deviation was calculated as a reference measure of dispersion. Although the data were not normally distributed, this metric was still useful to estimate variability around the median and to define upper and lower empirical bounds.
- (d)
Data visualization
All waste values from the 80% training set were visualized in a single plot, along with upper and lower bounds constructed as the median ±1 standard deviation. The deviation itself was not plotted as a separate curve but rather used to define shaded boundaries within which most data points were expected to fall.
- (e)
Polynomial fitting
To model the wasted trend over time, a fourth-degree polynomial regression was fitted to the median values. The fitted curve expresses the quantity of waste generated per hour () as a function of time (, in hours), producing a continuous representation of expected behavior.
The fourth-degree polynomial was selected as the lowest-order smooth function able to reproduce the main intra-day inflection structure associated with the three service windows, while avoiding spurious oscillations that appear with higher-order polynomials when fitted over short time domains. Because this polynomial is used only as a deterministic shape function to reconstruct hourly dynamics within the Monte Carlo framework (not as an inferential or stand-alone predictive model), the selection objective is stability and plausibility of the reconstructed profile rather than parameter interpretability. To reduce sensitivity to noise, the fit was performed on hourly medians rather than raw observations, and variability is subsequently represented through empirical dispersion bounds.
To simulate variability in future predictions, a random component
was added to the polynomial output. The final model also incorporated upper and lower boundaries defined in Equations (11) and (12):
where
is the fitted value and σ is the standard deviation of the corresponding hour. This structure allowed the model to project the expected value of waste generation per hour, and a realistic range of variability based on empirical dispersion, enhancing its predictive capacity and alignment with operational uncertainty.
Client Distribution
In this stage of the simulation, the goal was to distribute the total number of clients per shift across the three hours that make up each shift. A probabilistic structure was applied to ensure that each hour received a proportion of the total, while maintaining logical consistency with observed behavior during fieldwork.
- (a)
Distribution criteria
Observations and photographic records collected during the measurement period indicated a clear decrease in user attendance toward the end of each shift. However, the number of users never dropped below 15% in the final hour. Based on this, a constraint was applied in the simulation to ensure that the third hour always received at least 10% of the total clients, thus avoiding unrealistically low values. To guarantee reproducibility in the random sampling process, a fixed random seed was used throughout the simulation.
- (b)
Methodological steps
- (1)
Data partitioning: As with the waste generation scenario, the dataset was divided into 80% training and 20% testing subsets, preserving chronological order.
- (2)
Randomization control: A fixed random seed was set to ensure that the simulation would produce the same random client distributions across executions, allowing for reproducibility and traceability of the modeling results.
- (3)
Dataset creation: A structured dataset was prepared with the dates of each measured day and the corresponding number of clients served during each shift (breakfast, lunch, and dinner). The date format was standardized to facilitate time-based manipulation.
- (4)
Generation of random probabilities: To distribute the total number of clients across the three hours of each shift, three random values were generated and sorted in descending order. This approach ensured that the first hour received the highest number of users, followed by the second and the third.
- (5)
Minimum threshold enforcement for the final hour: A validation step was included to verify that the third hour always received at least 10% of the total clients. If the initial distribution did not meet this condition, new probabilities were generated until the minimum threshold was satisfied. This rule shows realistic operational conditions and ensures adequate data representation for modeling.
- (6)
Client allocation: Once the random proportions were validated, the total number of clients for each shift was distributed across the three hours using a multinomial logic, assigning integer quantities of users to each hour proportionally.
- (7)
Daily application of the distribution logic: This distribution procedure was applied to every row in the training dataset, meaning that for each day and shift, the total clients were redistributed across the three hours using the same logic. The outputs were stored in a new dataset containing the day, shift, and number of users per hour.
- (8)
Result verification: The resulting distributions were compiled into a summary table to verify that the logic was correctly applied across all records and that the 10% minimum rule for the final hour was consistently respected.
Simulation and Fitting of the Waste Generation Model Using Monte Carlo
Once the generation and client distribution scenarios were defined, an iterative simulation process was implemented using least squares linear regression to model the relationship between the number of users and the amount of waste generated. The simulation was also developed in RStudio (version 2024.04.2) under a Windows 11 environment.
- (a)
Waste generation configuration
In this case, the fourth-degree polynomial function of time was used to dynamically calculate the expected waste generation during each hour. Each iteration produced a different waste value by evaluating the polynomial and incorporating a random component, resulting in greater variability across simulations.
- (b)
Client distribution logic
Clients were distributed across the three hours of each shift using the same decreasing probabilistic logic described earlier. This ensured that each iteration included a realistic combination of waste and user data aligned to the same temporal structure.
- (c)
Merging simulation inputs
Once generated, the simulated values for waste and users were aligned and merged hour by hour to produce a unified dataset. Each record contained the number of users and the amount of waste for a specific hour. This structure enabled direct regression modeling of the user–waste relationship in each simulation cycle.
- (d)
Model fitting using least squares regression
For every iteration, a simple linear regression model was fitted to the combined dataset. The function was used to estimate the slope m and intercept by minimizing the sum of squared residuals between observed and predicted values. This process was applied iteratively using new randomly generated inputs in each cycle.
- (e)
Monte Carlo simulation procedure
To account for variability in both the independent (clients) and dependent (waste) variables, a Monte Carlo simulation was implemented. A total of 10,000 iterations were performed, with waste values dynamically generated from the polynomial model. In each iteration, a new user–waste combination was created and a new regression model was fitted. The coefficient of determination () was recorded to evaluate goodness of fit.
The simulation was used to assess model stability and expected performance under repeated resampling and stochastic hourly reconstruction. Rather than imposing a fixed acceptance threshold, model performance was summarized descriptively using the distribution, mean, and dispersion of
across iterations. As a reference benchmark in waste-related regression studies,
values around 0.50 are often described as indicating moderate explanatory power [
58]. In this study, that value was used only as a contextual benchmark and not as a pass/fail criterion.
Prediction error on the hold-out test set was quantified using the root mean squared error (RMSE), expressed in kg/h.
- (f)
Summary of model coefficients across iterations
Model coefficients were summarized across the Monte Carlo iterations to examine the stability of the fitted relationship under repeated scenario-based hourly reconstruction. Because each iteration generated a slightly different hourly profile of attendance and waste generation, the estimated regression coefficients were not interpreted from a single run alone, but rather from their overall distribution across simulations. For visualization purposes, a representative fit was selected and displayed in the validation plots. This representative fit was used only to illustrate the general model behavior, while the overall model performance was summarized using the distribution of coefficients and the mean and dispersion of values across iterations.
- (g)
Model validation (hold-out evaluation and simulation-based assessment)
To facilitate interpretation of prediction accuracy in the original units of the dependent variable (kg/h), the root mean squared error (RMSE) was computed as Equation (13):
where
represents the observed waste generation and
the corresponding predicted value.
Predictive performance was evaluated using the testing subset (20%) in a hold-out validation procedure. Predicted values were compared with observed waste generation values from this subset, and model accuracy was assessed using RMSE. In the validation plots, prediction error was visually represented as a band of ±1 RMSE around the representative fit. In addition, a ±2 kg uncertainty margin was incorporated into the predicted values to show the precision of the digital scale used during the field campaign. This uncertainty was propagated during the validation stage to represent measurement conditions and should not be interpreted as a probabilistic prediction interval.
For clarity, the Monte Carlo component was used to assess model stability under scenario-based hourly reconstruction. Since both hourly attendance and hourly waste generation were reconstructed within the simulation framework, the simulation outputs are interpreted as an internal consistency assessment under the assumed temporal structure rather than as an external validation against an independent observed hourly time series.
- (h)
Visualization of the simulation output
Two graphical outputs were prepared to support interpretation of the simulation and model performance:
A linear regression plot showing observed and predicted values of waste generation against the number of users, including the fitted regression line with ±2 kg uncertainty bands derived from scale resolution.
A model convergence plot illustrating the evolution of the R2 coefficient across all iterations of the Monte Carlo simulation, highlighting how the model progressively approached better fit values over time.
- (i)
Flow diagrams of the predictive models
The logical structure of both predictive approaches is summarized in
Figure 4. Diagram (a) illustrates the structure of Algorithm A, the rule-based classification model that uses shift type and categorized user attendance as inputs to generate waste predictions through contingency tables. Diagram (b) corresponds to Model B, the Monte Carlo simulation framework that integrates hourly generation and user flow distributions, applies resampling, and quantifies uncertainty through multiple iterations. Together, these diagrams provide a visual overview of the distinct data flows, processing stages, and output generation in both modeling strategies.
3. Results
3.1. Waste Generation Trends per Shift
The preliminary analysis of waste characterization revealed distinct behaviors for organic and inorganic fractions, both by shift and by day of the week.
3.1.1. Organic Waste
During the 20-day monitoring period, organic waste generation showed a consistent structure and behavior across shifts and days of the week. The figures made it possible to track the day-by-day and shift-by-shift sequence of generation throughout the monitoring period. The data revealed a regular growth pattern each day, beginning with low volumes in the breakfast shift and increasing steadily until lunch, which consistently concentrated the highest waste load. Dinner maintained intermediate values, though closer to lunch than breakfast in certain cases. This daily progression is illustrated in
Figure 5, where the recurring structure of three points per day marks the rhythm of waste accumulation.
Across shifts, lunch produced the highest volumes and showed the greatest variability, likely due to fluctuations in user flow and menu complexity. Breakfast, in contrast, presented minimal variation, with consistently low values, while dinner exhibited moderate dispersion. This is evident in
Figure 6, where lunch curves display wider vertical spread compared to the other shifts. Quantitatively, breakfast values remained below 5 kg per shift throughout the monitoring period, whereas lunch frequently exceeded 10 kg and reached peaks close to 14–15 kg. Dinner values were consistently positioned between these two ranges, rarely overlapping with breakfast and only occasionally approaching lunch levels.
Figure 7 summarizes shift-level organic waste totals across the 20-day monitoring period. The dataset shows a stable shift ranking (lunch > dinner > breakfast), despite day-to-day variability. On some days, lunch or dinner totals dropped noticeably, which may be associated with reduced operations or simplified offerings. However, the separation between shifts remained evident across all monitoring days, and breakfast never surpassed dinner or lunch in total organic waste.
Looking at the total daily waste in
Figure 8, the cafeteria generated between 21 and 29 kg of organic waste per day, with no abrupt peaks or drops. The curve appears stable, though certain days such as April 15th and 18th registered slightly higher totals. Days with lower values tended to coincide with lighter operations or lower attendance. The majority of observations were concentrated within a narrow band of approximately ±3–4 kg around the mid-range of daily values, indicating limited dispersion relative to the total magnitude of generation.
Figure 9 presents a comparison of waste generation over four consecutive weeks. The labels 1 to 4 correspond to each week, making it possible to observe how the values change from Monday to Friday. Looking at the month, Mondays and Fridays tend to register higher averages, especially in the first and third weeks. This pattern could be linked to greater cafeteria attendance at the start and end of the week, or to particular routines on campus that affect food service use. Inter-week variation did not exceed the overall daily range previously described, suggesting that temporal fluctuations remained within the same operational envelope throughout the month.
In general, the results point to a structured and predictable generation pattern for organic waste, mainly shaped by user volume and service intensity. While external factors such as menu type or academic calendar may explain certain fluctuations, the dominance of the lunch shift and the stability of breakfast waste are clear and consistent throughout the sampling period.
3.1.2. Inorganic Waste
In contrast to the more structured behavior observed in organic waste, the generation of inorganic waste followed a less consistent pattern. The figures allowed the shift-by-shift sequence of inorganic waste generation to be examined across the monitoring period, making visible the greater fluctuation and weaker regularity of the observed values. Although a general upward trend was still identifiable over the course of the week, daily fluctuations were more irregular and less predictable.
Figure 10 (corresponding to the inorganic waste version of the daily progression chart) shows this variation, where certain days show unexpected peaks or dips not directly linked to overall user flow. Total daily inorganic waste generation ranged approximately between 12 and 25 kg/day, with several isolated peaks approaching the upper bound that were not systematically aligned with peak organic generation days.
The shift-based distribution also differed from that of organic waste. While lunch again registered the highest total volumes, the difference between shifts was not as marked. Breakfast and dinner often recorded similar levels, most significantly on days when packaged items or beverages were more commonly consumed. This is shown in
Figure 11 where the distribution across shifts appears more balanced, especially when compared to the steep contrast seen in organic waste. Final accumulated intra-shift inorganic totals for breakfast frequently ranged between 2 and 6 kg, dinner between 4 and 10 kg, and lunch typically between 6 and 13 kg, indicating partial overlap among shift totals that was not observed in the organic fraction.
Variability across the three shifts was more pronounced as shown in
Figure 12. Lunch remained dominant, but the margin over dinner was narrower, and in some cases, evening values surpassed the morning figures. These fluctuations could be attributed to patterns in the use of single-use packaging, the availability of pre-packaged meals, or different user disposal behaviors during each shift. Unlike the organic fraction, instances were observed where dinner values approached or temporarily exceeded lunch values, showing a less rigid hierarchical structure among shifts.
Figure 13 shows the daily totals of inorganic waste, which remained generally stable, with most values clustering within a narrow range. However, a few days registered significantly higher weights. These increases were not always aligned with those observed in organic waste, suggesting that different factors may be driving each waste stream. Notably, peaks in inorganic waste were more frequent toward the end of the week. Total daily inorganic waste typically fluctuated within a band of approximately 17 to 25 kg, with isolated maxima occurring in the latter half of the monitoring period.
This trend becomes clearer in the weekly comparison shown in
Figure 14, where Thursdays and Fridays consistently stand out with higher average weights. These peaks may be related to increased consumption of snacks or bottled beverages as users prepare for the weekend, or to operational factors such as menu rotation and packaging choices. Inter-week differences remained within the overall observed daily range, but the relative increase toward the end of the week was more pronounced as inorganic waste than for the organic fraction.
Overall, inorganic waste generation proved to be more variable and less tied to user volume alone. While lunch remained the most impactful shift in terms of quantity, the differences across shifts and days were less predictable, reinforcing the importance of evaluating behavioral and logistical influences alongside user attendance.
Across the 20-day monitoring period, total daily organic waste averaged approximately 24.0 ± 2.5 kg/day (range: 21–29 kg/day), while inorganic waste averaged approximately 20.5 ± 2.8 kg/day (range: 17–25 kg/day). Although both streams remained within relatively narrow daily ranges, inorganic waste exhibited slightly higher relative dispersion compared to its mean, supporting the previously described irregular shift-level behavior.
3.2. Preliminary Survey-Based Profile and Behavioral Trends
A total of 705 valid surveys were collected to characterize the cafeteria user population and assess behaviors related to food consumption, waste generation, and sustainability awareness. The information gathered served as the foundation for the statistical analyses described in subsequent sections.
3.2.1. Sample Size and Representativeness
To determine the minimum number of valid responses required for this study, the sample size was calculated using standard parameters: a 95% confidence level (Z = 1.96), a proportion of p = 0.5, and a margin of error of 5% (E = 0.05). For an infinite population, the required sample size was estimated using Equation (1), resulting in a value of 384.16.
Since the total student population in the four main engineering faculties with access to the cafeteria is finite (N = 11,717), the adjustment was made using the finite population correction formula in Equation (2), yielding a final sample size of 372.
Therefore, the adjusted minimum required sample size was approximately 372 valid responses. The study collected 705 valid surveys, which exceeded this threshold and provided adequate statistical precision for the analysis.
3.2.2. Demographic Profile
A total of 705 valid responses were collected. Of these, 65.4% corresponded to male participants and 34.6% to female participants. Most respondents (87.2%) were between 18 and 25 years old. Participants aged 26–40 accounted for 8%, while only 3% were in the 41–65 age range.
In terms of academic status, 86.5% of participants were undergraduate students. Postgraduate students represented 3.8%, professors accounted for 3.6%, and the remaining participants were distributed among administrative staff, researchers, and others with minimal representation.
Regarding campus affiliation, 97.4% of respondents belonged to the Víctor Levi Sasso main campus. Only 0.8% reported being from other campuses, and 1.8% did not specify their location.
Faculty affiliation was concentrated in the faculties with the highest enrollment and proximity to the cafeteria. Civil Engineering accounted for 27.5% of the sample, followed by Industrial Engineering with 20.8%, and Mechanical Engineering with 20.1%. Electrical Engineering contributed 10.4% of the responses, and Systems Engineering 7.8%. Other faculties were represented to a lesser extent, including a small portion of participants from the Faculty of Science and Technology or those who selected “Not applicable.”
3.2.3. Cafeteria Usage and Behavioral Patterns
The collected behavioral data serves as a basis for analyzing user interaction with food services and waste management practices in the cafeteria.
Regarding visit frequency, 40.3% of respondents reported using the cafeteria daily, and 38.5% said they visited several times per week. Occasional users (once a week or less) represented a smaller share, and only 0.5% indicated they never used the service.
In terms of consumption, full meals were the most frequently purchased item, selected by 79% of respondents. Beverages accounted for 10.9%, while fast food options made up 7.9%. Snacks and other items were rarely chosen as the primary purchase.
Concerning waste disposal, 45% of users reported disposing of their waste daily in the cafeteria, and 30.4% did so several times per week. Lower frequencies of disposal were less common but still present in the data.
When asked about food waste, 51.9% of respondents said they rarely leave food uneaten, and 33.7% indicated that they never left any food. A smaller percentage admitted to frequently or occasionally wasting food. The main reasons cited for food waste were the presence of inedible leftovers (such as bones or peels), reported by 49.7% of participants, and dissatisfaction with taste, which was mentioned by 29.7%. Other reasons, including oversized portions or changes in appetite, were less frequently reported.
3.2.4. Sustainability Awareness and Engagement
In terms of packaging preferences, 35.4% of respondents indicated that they generally prefer products with sustainable packaging, and 23.6% stated that they always choose sustainable packaging, even if it is more expensive. A smaller percentage either preferred the cheapest option regardless of packaging or expressed no specific preference.
Regarding environmental awareness, 34.2% of participants considered themselves very aware of issues related to waste management and sustainability. Additionally, 26.9% said they felt adequately informed, and 31.7% described their knowledge as moderate. A smaller group identified themselves as having low or no awareness.
When asked about their willingness to participate in sustainability-related programs, 60.8% of respondents said they were fully willing to participate, and 23.3% reported a high level of willingness. The remaining participants expressed some degree of interest, with very few indicating a lack of willingness.
Finally, 70.9% of respondents stated that they consider campus sustainability to be very important. This perspective was common across all user profiles and supports the relevance of integrating waste reduction strategies within institutional settings.
3.3. Statistical Assessment of Influencing Variables
Normality was assessed for the continuous waste-response series using the Shapiro–Wilk test. Results indicated non-normality (p < 0.05), supporting the use of distribution-free comparisons alongside parametric contrasts. Levene’s test was used to evaluate homogeneity of variances across groups. Most groupings showed homogeneous variances; however, academic profile and most frequently purchased product presented significant heterogeneity (p < 0.05).
3.3.1. Exploratory Screening Using Daily Totals
Using total daily waste as the response, ANOVA identified statistically significant differences across categories for most frequently purchased product (p = 0.000877) and environmental awareness (p = 0.026872). Faculty (p = 0.086255), frequency of cafeteria visits (p = 0.073414), and reason for food waste (p = 0.081861) showed marginal significance, while the remaining variables (including gender, age, disposal behavior, and sustainability importance) were not significant.
3.3.2. Categorical Analysis Using Daily Groupings
When daily waste totals were analyzed as low/medium/high tiers, the ANOVA test revealed significant differences for the variables user profile (p = 0.00618), most frequently purchased product (p = 0.00451), reason for food waste (p = 0.04511), and importance assigned to campus sustainability (p = 0.02665). The variable environmental awareness level showed marginal significance (p = 0.08375). Other variables, including gender, age, visit frequency, packaging preferences, knowledge level, and willingness to participate, did not present statistically significant differences.
The Kruskal–Wallis test was also applied using the categorized dependent variable. This test confirmed significant associations for user profile (p = 0.01101), faculty affiliation (p = 0.02044), most frequently purchased product (p = 0.00781), reason for food waste (p = 0.01642), and environmental awareness level (p = 0.03893). The variable importance assigned to sustainability was marginally significant (p = 0.06953). Other variables did not yield significant results under this test.
As an exploratory check, a linear regression was fitted using the tier-coded daily outcome (1–3). The model showed limited explanatory power (R2 = 0.03188; adjusted R2 = 0.01233) and was not used to support subsequent modeling decisions.
3.3.3. Shift-Level Analysis and Categorical Modeling
At the shift level (breakfast, lunch, and dinner), waste generation was categorized into three levels (low, medium, high) using tertile thresholds. The results of the Chi-square test are presented in
Table 1. Statistically significant associations were identified for the variables
most purchased product (
p < 0.001) and
faculty (
p < 0.001). The variables
user profile (
p = 0.0957) and
disposal behavior (
p = 0.0656) were marginally significant, while no statistically significant associations were observed for the remaining variables.
To evaluate the strength of association between each independent variable and the categorized waste levels, Cramér’s V coefficients were calculated. As shown in
Table 2, the strongest associations were observed for
most purchased product (V = 0.1790) and
faculty (V = 0.1552), followed by
disposal behavior (V = 0.1109) and
user profile (V = 0.1068). All other variables presented values below 0.10.
A multinomial logistic regression was fitted using low waste generation as the reference category. For the predictors, the baseline group for most purchased product was complete meals, and for faculty it was Electrical Engineering (
Table 3).
For most purchased product, the model indicated lower odds of belonging to the medium or high waste categories for users purchasing fast food, snacks, or beverages, relative to the complete-meals baseline. The strongest reduction was observed for snacks, followed by fast food, while beverages also showed lower odds with a smaller magnitude.
For faculty affiliation, differences across groups were observed. Systems and Civil Engineering showed higher odds of belonging to the high-waste category relative to the reference group, while Mechanical and Industrial Engineering showed lower odds. Extremely large odds ratios for Science & Technology and for the Not applicable category likely show sparse counts and should therefore be interpreted with caution, given that the Science & Technology faculty is not located in the study area and the Not applicable group is underrepresented in the sample.
3.4. Results of the Rule-Based Classification Algorithm
This section presents the results obtained from applying the rule-based classification algorithm to real-world data from the university cafeteria. The model aimed to assign each shift a waste generation category (low, medium, or high) based on shift type and number of users attended.
- (A)
Contingency tables
The contingency tables generated automatically during the training phase are shown in
Figure 15. These tables summarize the frequency of observed waste generation levels for each shift and category of user flow.
- (B)
Comparison of predicted and actual values
The comparison between predicted values and actual observed generation levels in the test set is shown in
Figure 16. For each record, the model used the corresponding shift and categorized number of users to assign a predicted waste category.
- (C)
Distribution of correct and incorrect predictions
In the test set, 6 of the 12 predictions matched the actual waste generation category. This corresponds to an overall accuracy of 50%. The remaining 6 records were misclassified, falling into adjacent or non-matching categories.
- (D)
Model performance by generation category
Model performance varied across the waste generation categories:
Low generation (category 1): All records were correctly predicted (100% accurate).
Medium generation (category 2): Accuracy was approximately 66%.
High generation (category 3): Half of the records (50%) were correctly classified.
3.5. Predictive Modeling Using Monte Carlo Simulation
This section summarizes the outputs of the Monte Carlo simulation used to assess variability and stability in predicted solid-waste generation under the defined input scenarios. The simulation was applied using two distinct input scenarios: predicted waste generation per hour and user flow by shift.
- (A)
Waste generation scenario
A fourth-degree polynomial regression model was fitted to the data to describe the temporal trend in waste generation. The model was fitted to hourly summary values (median waste per hour) obtained from the training dataset, so the curve represents a typical daily profile rather than individual raw measurements. This polynomial is used only as a deterministic “shape function” to reconstruct hourly dynamics and support Monte Carlo resampling; it is not intended as a mechanistic model of waste generation nor for extrapolation beyond the observed hours.
The resulting equation is:
where
is the hour of observation (24 h format) and
is the predicted waste generation (kg) at that hour. The 4th-degree form was selected because it can represent multiple turning points associated with the breakfast–lunch–dinner service peaks while avoiding excessive oscillations observed with higher-degree fits. The goodness-of-fit for the fitted curve is reported in
Figure 17 (R
2 shown in the plot).
Predicted values were computed for each hour, corresponding to the three shifts. These estimates were then used to construct a plot combining the predicted curve (
), the empirical bounds lower and upper (
&
), and the median across simulations. This comparison is shown in
Figure 18.
- (B)
Customer flow scenario
The number of users per shift was distributed across each hour according to observed attendance patterns. Subsequently, a plot was constructed to compare hourly customer flow with predicted waste generation levels.
The fully automated Monte Carlo simulation produced a positive linear relationship between the number of users per hour, and the amount of waste generated. The slope of the fitted model indicates that for each additional user, waste generation increases by approximately 0.0089 kg per hour. The coefficient of determination (mean R2 = 0.32 across 10,000 iterations) indicates that, on average, 32% of the variability in waste generation is explained by variation in user attendance during shifts. The fitted relationship was consistently positive across iterations.
Hold-out evaluation results are shown in
Figure 19. The figure overlays observed (blue) and predicted (red) hourly waste generation as a function of the number of clients. Blue points include measurement uncertainty shown as blue I-bars (±2 kg/h). Red points are the model predictions for the same samples; red I-bars denote ±1 RMSE (1.84 kg/h), i.e., the typical prediction error. The black line is the fitted linear model, Generation = b + m·Clients, where m is expressed in kg/h per client and b in kg/h (the panel reports R
2, m, and b). Overall, most observations fall within the red bands or overlap with them, indicating that the typical prediction error (±1 RMSE) is comparable in magnitude to the reported measurement resolution band and that no marked bias is apparent across the client range.
Across the Monte Carlo simulations, the mean R2 stabilized around 0.31–0.32, indicating that client attendance accounts for approximately one-third of the variability in hourly waste generation. While this shows moderate explanatory capacity, the dispersion of points around the regression line remains consistent with the magnitude of the reported RMSE, supporting the model’s internal coherence without suggesting overfitting.
The model was executed with 10,000 Monte Carlo iterations to assess predictive stability. As iterations accumulated, the running mean of R
2 (
Figure 20) flattened around 0.31 after ~1000 iterations, and additional iterations produced only marginal changes (<0.01). Extending to 10,000 mainly increased the best single draw to 0.36 (red marker) but did not shift the central tendency, indicating that deeper simulation improves the chance of hitting a slightly better fit rather than changing expected performance. This best-case draw is reported only to illustrate the upper envelope of fits under resampling and was not used as a selection criterion for the reported mean performance.
The narrowing of fluctuations in the running R2 curve after the initial iterations suggests convergence of the simulation process, confirming that model performance metrics are stable and not driven by stochastic artifacts.
4. Discussion
4.1. Waste Generation Patterns by Shift and Waste Type
The results revealed a clear and consistent pattern in organic waste generation: lunch shifts produced the highest quantities, followed by dinner and then breakfast. This order indicates the flow of users throughout the day, suggesting that organic waste generation is closely tied to the number of people served. Across the 20-day monitoring period, the total daily amount of organic waste remained relatively stable, ranging between 21 and 29 kg. This consistency suggests a predictable trend that may facilitate future planning based on shift-level data.
In contrast, inorganic waste displayed less regularity and showed higher peaks during dinner shifts. This outcome is not entirely explained by user volume, as similar attendance levels were observed during breakfast and dinner. Instead, the increased generation of inorganic waste during the evening appears to be linked to the cafeteria’s internal logistics. Specifically, it was observed that after a certain hour, reusable dishware was no longer used, and disposable containers became the default. This operational change was consistently recorded during the fieldwork period and had a direct impact on the amount of inorganic waste generated.
Regarding the user profile, most survey participants were undergraduate students between 18 and 25 years old. This was expected, as students represent the largest proportion of the university population and are the main users of cafeteria services. This result is also in line with previous research showing that students are the primary users of shared facilities on university campuses, such as dining areas [
24]. Although this ensures that the sample shows the dominant user group, it also means that the behaviors of other institutional roles—such as administrative staff or professors—were less represented.
4.2. Associations Between User Characteristics and Waste Generation
Beyond identifying which variables were statistically significant, the analysis process itself highlighted several issues in how the data was being handled. At first, waste generation was treated as a continuous variable, and the survey responses—mostly categorical—had to be converted into numeric codes to apply tests like ANOVA, Kruskal–Wallis, and correlation analysis. But assigning numbers to categories does not always capture the real differences between them, predominantly when the values are arbitrary. That approach did not match how people behave, and it led to problems with the results.
The continuous analysis ended up showing weak, and at times illogical, relationships. A few variables, like product type and awareness level, stood out with significant differences, but most did not—even in cases where an effect was expected. One example was a negative correlation between cafeteria visit frequency and the type of product purchased, which did not make sense based on what was observed during fieldwork. These outcomes showed that the method being used was not picking up the structure of the data properly.
To address this, waste generation was reclassified into three levels—low, medium, and high—based on daily totals. This allowed the use of tests that are more suitable for ordinal data and helped bring the analysis closer to the kind of answers the study was trying to find. While the new structure helped in some ways, there were still gaps. Grouping by day made it harder to see what was happening during each specific shift, and the results did not always match what was happening on the ground.
A clearer picture only emerged after the analysis was adjusted again—this time using waste generation by shift instead of by day. That change made a big difference. It matched how the cafeteria works, and allowed the use of categorical techniques like the Chi-square test and Cramér’s V. These tests showed stronger and more consistent links, predominantly for variables like most purchased product and faculty.
Based on this shift-level structure, a multinomial logistic regression model was then applied. Unlike earlier methods, this model was built to work with categorical outcomes and gave a better estimate of how each user characteristic influenced the chances of generating more or less waste. The results were more aligned with what was seen in practice and gave a more solid base for the modeling work that followed.
This step-by-step adjustment—from continuous data to daily categories, and finally to a shift-by-shift approach—was necessary to get results that actually described the situation in the cafeteria. It also showed that choosing the right analysis method is just as important as collecting the data itself, especially when working with behavior-based survey responses.
Our observation that high environmental awareness does not necessarily translate into lower waste generation aligns with the ‘attitude-behavior gap’ documented in recent literature. For instance, research on Generation Z consumers indicates that while environmental attitudes are positive, actual food waste reduction behavior is more strongly influenced by subjective norms and perceived behavioral control than by attitude alone [
59]. Similarly, studies analyzing waste management practices reveal a significant disconnect where high levels of theoretical knowledge do not correlate with appropriate disposal practices due to structural or habitual barriers [
60].
This discrepancy introduces a stochastic element to waste generation: user behavior fluctuates based on social and logistical contexts rather than consistent values. Since simple descriptive statistics cannot fully account for this behavioral variability, predictive modeling becomes essential. The use of computational models allows for the translation of these irregular behavioral patterns into actionable operational data, justifying the methodological shift presented in the following section.
4.3. Performance and Comparison of Predictive Models
Two different data driven actions were tested in this study: an automated rule-based classification algorithm and a Monte Carlo simulation model. Each offered distinct advantages and limitations depending on the level of detail required and the intended use. The Monte Carlo component is therefore interpreted as a stability assessment under scenario-based hourly reconstruction, not as external validation against an independent hourly time series.
The rule-based algorithm used shift type and categorized user volume to assign each case to a waste generation level: low, medium, or high. Although the overall accuracy was 50%, the model performed chiefly well when predicting low generation levels, which may be due to the consistency and clearer patterns in that category. In contrast, the medium generation was the most difficult to predict, and the high generation showed uneven performance—possibly because there were fewer observations in that group. These limitations affected its ability to capture more subtle or irregular patterns. Overall, these results indicate moderate predictive performance under the observed conditions; therefore, H3 is partially supported in terms of operational usefulness, but not as a high-accuracy forecasting claim.
Despite these challenges, the rule-based model is highly efficient. It can be applied to other datasets quickly and continuously without manual intervention, making it suitable for automated workflows or decision-making tools. Its simplicity also makes it accessible to users without technical expertise in modeling or statistics, which gives it practical value in institutional settings where technical capacity is limited.
The Monte Carlo simulation, on the other hand, offered a more detailed and flexible approach. It incorporated a polynomial trend line fitted to the waste data over time, allowing for hour-by-hour reconstruction within the simulation framework. When compared to client flow patterns, the simulation showed a consistent alignment: shifts with higher user attendance consistently aligned with peaks in waste generation. These findings should be interpreted as a context-specific, exploratory demonstration intended for operational planning in the studied cafeteria. Broader deployment would require local recalibration and additional validation across sites and periods.
By integrating user flow variability and the temporal structure of cafeteria operations, the model provides an operational estimate of expected waste loads by hour. Variability was represented using empirical bounds derived from observed dispersion. The simulation also visualized uncertainty by including upper and lower bounds around the predicted values, based on the standard deviation of observed data.
Each model serves a different purpose. The rule-based algorithm is ideal for fast classification and routine use, while the Monte Carlo simulation is more appropriate for strategic planning, scenario analysis, or evaluating operational changes. Used together, they offer a complementary toolkit that can strengthen institutional waste management: one provides scalability, the other depth.
In recent years, demand forecasting in institutional food services has increasingly incorporated artificial intelligence–based approaches. Several studies report that machine learning models, including XGBoost and Long Short-Term Memory (LSTM) networks, can reach high predictive accuracy when trained on large historical datasets [
61]. Despite these advantages, their application often depends on substantial computational capacity, specialized technical knowledge, and continuous data availability.
In contrast, the rule-based and Monte Carlo approaches utilized in this study offer a distinct advantage in terms of implementability. While they may not capture the granular non-linear patterns identified by deep learning models [
61], they provide robust, explainable estimates for daily operations without necessitating the complex data infrastructure typical of advanced ML systems. This makes them suitable for HEIs seeking to initiate waste management planning based on empirical evidence rather than advanced AI infrastructure.
4.4. Institutional Planning and Practical Implications
This study provides actionable evidence to support improvements in waste management practices within HEIs. By analyzing waste generation at the shift level and linking it to user behavior and service conditions, it becomes possible to move beyond generic waste reduction campaigns and toward more targeted, data-driven interventions. In practice, such interventions can be timed and scoped by shift, concentrating effort where consistent peaks are observed.
First, organic waste shows a sustained peak at lunch. Interventions aimed at this period—such as adjusting portion sizes, refining menu planning based on demand forecasts, or offering take-away containers that preserve food—can reduce waste with minimal operational disruption. Because this pattern was stable over the observation window, these measures can be scheduled, monitored, and refined on a regular cadence.
Second, the analysis of inorganic waste highlights the role of internal logistics. The increase in disposables during dinner, due to the cessation of dishwashing services, points to a clear opportunity for operational improvement. Extending dishwashing hours, offering reusable packaging for evening shifts, or rethinking how meals are served late in the day could reduce single-use items. These are adjustments in how the service is organized rather than changes in user behavior. That said, not extending dishwashing would reduce the cafeteria’s water footprint; any shift toward greater use of reusables should therefore weigh the water- and energy-use of washing against the avoided waste, ideally using a simple water-footprint check (e.g., liters per meal) and pairing it with efficiency measures (high-efficiency washers, heat recovery, low-flow rinses).
Third, the statistical analysis showed that certain user characteristics—such as faculty affiliation and type of product purchased—were more consistently associated with higher waste levels than variables like environmental awareness or willingness to participate. This suggests that institutional planning efforts may benefit more from segmenting users based on actual cafeteria usage patterns than relying solely on outreach or awareness campaigns. For example, promoting lighter meal options in faculties with higher waste averages or revising packaging practices for full meals could be more effective than generic awareness campaigns.
These applications should be interpreted as decision-support within the studied setting; broader implementation requires local calibration and additional evaluation across periods with different operating conditions. The rule-based algorithm is suitable for quick classifications and could help cafeteria staff estimate expected waste levels for each shift in real time. Meanwhile, the Monte Carlo simulation can support long-term planning by simulating different operational scenarios, estimating peak demands, and preparing the system for variations in user flow.
The combined results from the modeling approaches and survey analysis informed the development of context-specific strategies for the cafeteria. These strategies were organized into five key areas (
Figure 21): operational adjustments, infrastructure planning, targeted educational actions, behavioral incentives, and continuous monitoring.
Operationally, one of the most direct applications is the adjustment of waste collection schedules based on predicted peaks. Since lunch and dinner shifts account for over 80% of total waste, increasing collection frequency during those periods—while reducing it during breakfast—can optimize resource use and avoid unnecessary accumulation. Similarly, separating collection routes for organic and inorganic waste by shift may help improve recovery rates, most significantly given the high percentage of non-edible food waste reported in the surveys.
In terms of infrastructure, placing larger bins during lunch and dinner hours, along with clear signage, could reduce overflow and support waste separation. Installing dedicated stations for composting and recycling also aligns with the observed shift-based patterns: organic waste is more common at lunch, while packaging dominates during dinner.
Shifting from disposable items to reusable dishware entails a set of environmental trade-offs that are often underestimated. Life Cycle Assessment studies published in 2025 show that single-use plastic cups contribute substantially to global warming potential, but reusable options only perform better when high reuse rates are achieved and washing systems operate efficiently [
62]. Without these conditions, the environmental gains of reusables are reduced or may even disappear [
63].
Beyond environmental performance, operational constraints remain a limiting factor. Recent case studies report that insufficient capacity for collecting, washing, and sanitizing dishware during peak service periods is one of the main obstacles to reducing single-use packaging in food services [
64]. This evidence aligns with the results of this study, where the dinner shift concentrates higher levels of inorganic waste and functions as a logistical bottleneck. Under these conditions, transitioning to reusable systems would require parallel investments in high-capacity and time-efficient washing infrastructure to achieve a net environmental benefit [
65].
From a behavioral perspective, the results showed that certain faculties and user profiles generate more waste than others. Educational campaigns that target these groups—with tailored messages about food waste and packaging impact—can increase their effectiveness. The high levels of environmental awareness and willingness to participate reported in the surveys also suggest that incentive-based strategies could work well [
66]. For example, offering rewards to users who consistently separate waste or use reusable items, and publicly recognizing faculties that show improvement, can help sustain motivation and engagement.
The implementation of real-time monitoring tools, such as sensors on waste bins, and the periodic updating of predictive models, would allow the system to respond to changes in user behavior, seasonal shifts, or specific events like exam weeks. This ongoing feedback loop is important to keep waste management efforts relevant and effective over time.
4.5. Comparison with Prior Studies
Shift-level results align with campus studies where service cycles shape daily accumulation: lunch concentrates organic waste, breakfast remains consistently lower, and dinner varies with operations. Characterization-based audits treat measurement as the starting point for planning; the present analysis follows that logic while adding timing at the hour/shift scale [
13,
15,
67]
User segmentation is consistent with prior evidence. Factors tied to actual use—such as type of product purchased and user profile—show stronger links to waste than self-reported awareness or willingness to participate, an attitude–behavior gap reported in university settings [
24,
68]. In practice, this favors actions framed around purchase patterns and service windows over broad outreach alone.
On the inorganic side, the evening rise in disposables is closely linked to internal logistics. When dishwashing stops early, single-use items increase; where washing remains available or reusable options are offered, that effect softens. Campus proposals emphasizing procedural and infrastructural levels (3R programs, reusable systems, scheduling) are consistent with this pattern, and the shift-level results provide a quantitative basis for those levers [
12,
17]. Any move toward reusables should weigh waste avoided against water and energy for washing, using simple per-meal metrics.
Methodologically, this work moves beyond descriptive audits by reporting hold-out evaluation metrics (R
2, RMSE) and a Monte Carlo stability check. That responds to calls in the literature to integrate behavioral and operational variables into decision-support tools and complement systems-oriented approaches (e.g., system dynamics) with an empirical predictor suited to day-to-day operations [
22,
23].
4.6. Limitations and Opportunities for Improvement
Although this study produced useful results and opened new possibilities for predicting and understanding waste generation in university cafeterias, there are several aspects that limit how broadly the findings can be applied.
First, the data was collected in a single institutional cafeteria over a 20-day period. While that timeframe was enough to identify clear patterns by shift, it does not account for changes that may occur during exam weeks, academic breaks, or seasonal variations. Extending the monitoring period or repeating the study in other university settings would help confirm whether the patterns observed here are consistent elsewhere.
Second, the survey responses came mostly from undergraduate students, which is expected given that they are the main users of the cafeteria. However, this also means that the perspectives of other groups, like administrative staff or professors, were less represented. If future studies want to understand the full picture of waste-related behavior across all user types, it would be useful to include a more balanced sample.
Third, while the final models worked well with the available data, the input variables were relatively simple: mainly the number of users and the time of day. Including more context, like the type of menu offered and events happening on campus, could help improve prediction accuracy and reveal new patterns.
Waste was aggregated into organic and inorganic streams, which supports operational load forecasting but reduces analytical resolution for diversion planning. In particular, recyclable materials embedded in the inorganic stream were not separately quantified; therefore, the study cannot estimate recyclable-specific generation, contamination rates, or recovery performance. Future replications can extend the same workflow by adding a third stream (recyclables) or a sub-sorting protocol within inorganics, while keeping the shift-level design and calibration logic unchanged.
Another important point is that most of the statistical techniques used were chosen based on the nature of the data (categorical and limited in scale). But with larger datasets in the future, there is room to explore more advanced approaches like decision trees or other machine learning models that can detect patterns not visible through traditional methods. Formal information-criterion selection (AIC/BIC) was not pursued because the polynomial is not used for inference; future implementations may compare candidate smoothers (e.g., polynomial degrees, splines) using hold-out evaluation reconstruction error when higher-resolution time series are available.
The study also brought up some clear opportunities to improve data collection going forward. For example, keeping a permanent record of waste generation over time would allow for better tracking of trends and make future analyses more robust. Installing a simple system to count cafeteria users by hour or shift would also help refine the relationship between attendance and waste levels.
Measurement uncertainty in the weighing process represents a minor methodological limitation. During low-load periods (e.g., breakfast), the relative influence of the scale uncertainty is higher than during peak periods. This effect is unlikely to alter the primary trends identified in the study, although it may have contributed marginally to the variance observed in high-resolution, fraction-specific estimates. Because between-shift disposal dynamics were not independently observed, scenario selection is conceptual; future deployments should validate continuity assumptions using higher-frequency disposal logging (e.g., bin sensors or timed bag removal records).
Attendance may co-vary with unmeasured drivers (menu, pricing, promotions, academic events), which can bias causal interpretations of coefficients. The present study treats attendance as a pragmatic predictor for operational planning; future work can incorporate daily menu type or promotion flags as additional covariates when such records are consistently available at the same temporal resolution as the waste measurements.
Finally, regarding methodological evolution, there is a clear opportunity to integrate automated data collection technologies to overcome the limitations of manual sorting. Recent advancements in computer vision demonstrate that deep learning models, such as YOLOv8 integrated with RGB-D cameras, can classify and quantify food waste types with high precision in real-time [
69]. Future research should explore integrating such automated identification systems with the predictive models developed in this study, paving the way for fully autonomous, responsive waste management systems in university cafeterias.
Overall, the limitations identified in this study do not invalidate the proposed workflow but rather define the conditions under which it should be interpreted and replicated. Future implementations should prioritize longer monitoring windows, higher-frequency operational records, and expanded waste fraction resolution to improve transferability across institutional settings. In this way, the approach can evolve from a case-specific planning tool into a more robust framework for operational decision-making in university food service systems.
4.7. Transferability and Replication
This is a single-site study; therefore, results are not claimed to be statistically representative of all HEIs or institutional canteens. The contribution is the transferable model structure and data-collection workflow. Replication in other sites requires a short local calibration campaign: (i) record shift-level user counts (e.g., POS totals or standardized counts), (ii) measure waste by shift using the same organic/inorganic split or an equivalent local scheme, (iii) compute local thresholds for low/medium/high categories and rebuild the rule tables, and (iv) re-estimate regression parameters and uncertainty terms using the local scale precision and observed variability. Under this approach, coefficients are site-specific while the procedure remains consistent across institutions.
5. Conclusions
The study examined solid waste generation in a university cafeteria using 20 days of shift-level waste characterization, 705 user surveys, and two complementary predictive modeling approaches. The conclusions presented here are directly derived from the empirical results and address the research questions posed at the outset of the study.
First, the analysis identified clear temporal patterns in waste generation across cafeteria shifts (RQ1). Organic waste peaked during lunch hours, coinciding with the highest user attendance, while inorganic waste showed greater variability and reached its highest levels during dinner service. These patterns indicate that waste generation in university food services shows both user flow and service practices, not attendance alone.
Second, the statistical assessment indicated that waste generation levels were more consistently associated with observable behavioral and institutional variables than with self-reported environmental attitudes (RQ2). Chi-square tests and Cramér’s V identified associations between categorized waste levels and factors such as product type and faculty affiliation, whereas variables related to sustainability awareness showed weaker or less consistent relationships. This pattern is consistent with an attitude–behavior gap in cafeteria waste generation and indicates that management actions should prioritize observed consumption patterns.
Third, the predictive modeling results indicate moderate predictive performance under the observed conditions (RQ3). The rule-based classification algorithm achieved 50% overall accuracy, with stronger performance for low waste generation and weaker performance for medium and high categories, partly influenced by class imbalance and limited observations in the high category. The Monte Carlo model showed a positive relationship between user attendance and waste generation with moderate explanatory power (mean R2 ≈ 0.32) and provided an hour-by-hour representation of variability within the simulation framework. Together, these models offer low-complexity tools that can support shift-level planning in the case-study cafeteria without requiring advanced machine-learning architectures.
While the study was limited to a single facility and a finite monitoring period, the results provide a context-specific, exploratory demonstration of how shift-level monitoring and demand-linked predictors can be integrated into operational forecasting. Extending the approach to other cafeterias or institutions would require site-specific recalibration and additional evaluation across different operational conditions and time periods (e.g., exam weeks, holiday periods, and menu or service changes). Regular waste characterization, basic user-flow tracking, and periodic model updating are practical steps that can improve reliability when applying the approach beyond the study setting. Because the study was conducted in one facility and one teaching-period window, the reported coefficients and performance metrics are not claimed to hold universally across HEIs without site-specific recalibration.
Overall, this research shows that shift-level monitoring, behavior-oriented indicators, and lightweight predictive models can provide a practical basis for anticipating solid waste generation at the local institutional scale, while acknowledging the need for broader validation before generalizing performance across institutions.
All modeling steps were implemented in R using fixed scripts and deterministic seeds for the Monte Carlo components, ensuring that the reported outputs can be reproduced from the same inputs. The workflow is designed to be transferable through local recalibration of thresholds and coefficients, rather than by exporting parameter values from this single case study.