1. Introduction
Waste management is a significant challenge worldwide. The increase in waste production, due to urbanization, consumption, and population growth, shows the need for an organized system to deal with this problem [
1,
2]. To address this, Smart Waste Management (SWM) systems based on Internet of Things (IoT) technology could be used to improve waste collection in smart cities [
1].
Waste collection can be modeled as a Vehicle Routing Problem (VRP) or with its variants [
3]. A study by Pranathy et al. [
4] used a Capacitated VRP (CVRP) model to address the problem of maximum load capacity on each route. CVRP with Time Windows (CVRPTW) was used by Idwan et al. [
5] to develop a waste collection process that must be completed within a certain time window. Additionally, a study by de Morais et al. [
6] used two different approaches: The first applies a VRP with profits (VRPP) to select bins and optimize paths, while the second considers a dynamic Inventory Routing Problem (IRP).
Other variants used by the researchers are as follows: Dynamic Multi-Compartmental Vehicle Routing Problem (DM-CVRP) [
7], dynamic reverse IRP [
8], multiple vehicle routing (MVR) [
9], and Multi-trip Vehicle Routing Problem with intermediate facilities (VRP-IF) [
10].
With regard to the IoT technologies used, various sensor types are employed to monitor waste collection systems. Ultrasonic sensors are frequently mentioned as a means to detect bin fill levels by measuring the time it takes for the emitted wave to return to the sensor [
4]. Similarly, some studies refer generically to volumetric sensors without specifying the underlying technology [
8]. In addition to fill-level monitoring, other technologies are also adopted to capture complementary data, such as RFID tags for user authentication and data transmission to the system [
11], temperature sensors [
12], gas sensors [
4], and load cells for weight measurement [
12].
In other industrial contexts, IoT technologies have been applied to improve inventory tracking and supply-chain coordination [
13,
14,
15]. In these cases, the main focus of the applications is on information visibility and process control. In contrast, IoT-enabled waste management aims to optimize collection routes and operational efficiency by using data from the sensors to support real-time decisions.
Concerning the SWM context, several researchers have proposed different algorithms to address VRP problems. Alwabli et al. [
16] used an Ant Colony (AC) Algorithm, and Cao et al. [
17] employed a modified version of the AC. In addition, an Improved Moth Flame Optimizer was applied by Ishaque & Florence [
18]. Another algorithm, proposed by Facchini et al. [
19], utilized Simulated Annealing in a Dynamic VRP context. Furthermore, the research by Sar & Ghadimi [
20] used a non-dominated genetic algorithm.
To verify the quality of the algorithms, some state-of-the-art VRP benchmarks could be used. An example is the traveling salesman problem library (TSPLIB) instances [
21], which were used in studies [
11,
22]. Another example is the CVRP instances proposed by Christofides & Eilon [
23], which are used in research [
24]. Moreover, Boudanga et al. [
25] used the Solomon instances to evaluate the performance of the proposed algorithm.
To test the algorithms in a real-life context, different authors employed two main approaches. The first considers real latitude and longitude and simulates the fullness of the bins. For example, studies using this approach were conducted in different countries, such as India [
26], Pakistan [
27], and Morocco [
28]. The second approach considers the real location and real-world data regarding the fullness of the bins. In this context, the data are provided by companies or municipalities in different countries, such as Ireland [
29], Italy [
30], and Luxembourg [
3].
Although the existing body of research demonstrates a growing interest in applying IoT and routing optimization to waste management, it also presents a fragmented landscape. Studies employ a wide variety of VRP models, optimization algorithms, and dataset types (real-world vs. simulated), leading to a diverse range of reported outcomes. This variation makes it difficult to determine a clear and consolidated understanding of the quantifiable benefits of these technologies. For this reason, we conducted a meta-analysis of studies that utilize IoT technologies with vehicle routing techniques in WMS to quantify the distance saved by employing these systems. We aim to critically review the current literature in this area based on the following research questions:
What is the reported impact of IoT-enabled smart waste management systems with vehicle routing optimization techniques on the distance traveled compared to non-IoT scenarios?
Does the type of dataset influence performance?
What types of vehicle routing algorithms are most commonly used in SWM and how are they evaluated?
Does the classification of the vehicle routing problem significantly influence the reduction of the distance traveled in IoT-based SWM systems?
The remainder of this paper is structured as follows.
Section 2 details the methodology for the literature screening, data extraction, and statistical analysis.
Section 3 presents the results of the meta-analysis, addressing each research question.
Section 4 discusses the practical implications and limitations of our findings. Finally,
Section 5 concludes the study and summarizes the key contributions.
2. Methodology
This systematic review and meta-analysis was conducted and reported in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) 2020 guidelines [
31]. The protocol for this systematic review was not registered in a public registry. The complete PRISMA checklist is provided in the
Supplementary Material Table S1.
2.1. Eligibility Criteria
This review included studies based on predefined eligibility criteria guided by the PICOC (Population, Intervention, Comparison, Outcome, Context) framework. The specific inclusion and exclusion criteria are detailed in
Table 1 and
Table 2, respectively. To ensure the effective implementation of this protocol, the authors used the Parsifal platform [
32] to process and select articles according to the criteria.
2.2. Information Sources and Search Strategy
The literature search was initially conducted in April 2025 and updated in October 2025. It included all articles available up to that date from three electronic databases: Scopus [
33], IEEE Xplore [
34], and ACM Digital Library [
35]. The following search string was used:
(“waste management” OR “waste collection”) AND (“IoT” OR “internet of things” OR “sensor”) AND (“vehicle routing” OR “vrp” OR “routing optimization”)
2.3. Study Selection and Data Extraction
The initial search of the databases was followed by the removal of duplicates. The remaining unique articles were then screened for relevance based on their titles and abstracts; this screening was performed by a single author. Studies that passed this initial screening were retrieved for a full-text review, during which they were assessed for final eligibility against the criteria detailed in
Table 1 and
Table 2. The reasons for excluding articles at the full-text stage are detailed in the PRISMA flow diagram (see the
Section 3).
2.4. Data Extraction, Items, and Effect Measure
A structured data extraction form was used to collect information from the final studies. The data extraction was performed by a single author. The following variables were extracted from each study:
Bibliographic details (author, year).
Vehicle routing algorithm category (e.g., Heuristic-based, Mathematical Programming).
Dataset type (real-world or Simulated).
VRP classification (e.g., Capacity Only, Inventory Routing).
Baseline and optimized route distances.
The primary effect measure for this meta-analysis was the percentage difference in distance traveled between the IoT-enabled scenario and the non-IoT baseline scenario, calculated from the extracted distance data. All raw distance values were standardized to kilometers (km) before calculation.
2.5. Quality Assessment
The methodological quality of the full-text articles was assessed using a custom 7-item checklist. The specific questions used for the assessment were:
Is the objective of the study clearly defined and relevant to smart waste management using IoT and routing optimization?
Does the study describe the IoT infrastructure or technologies used in the system?
Is the vehicle routing problem (VRP) or the routing optimization technique clearly described?
Are the quantitative performance metrics used in the evaluation of the VRP clearly presented?
Does the study describe the experimental setup or simulation environment in sufficient detail?
Are the routing results based on real-world data or realistic simulation scenarios?
Is there any statistical analysis or comparison between the methods and results?
A scoring system was applied to each question, assigning 0 for “No”, 0.5 for “Partially”, and 1 for “Yes”. Articles with a total score of less than 4.0 were excluded.
This custom checklist was developed specifically for this review to ensure that studies meet a minimum methodological standard for inclusion. It is important to note that this is not a standardized risk-of-bias tool, and a formal risk-of-bias assessment for each of the final included studies was not performed.
2.6. Quantitative Synthesis and Heterogeneity Assessment
To quantitatively assess the overall impact on distance traveled, a heterogeneity analysis was first performed using Cochran’s Q statistic [
36] and Higgins’ I-squared (I
2) statistic [
37,
38]. This analysis was carried out on the indicator “distance difference %”, calculated by comparing the IoT-enabled scenario with the respective non-IoT baseline presented in each study. All raw distance values were standardized to kilometers (Km) before calculation to ensure consistency between studies. This detailed approach to data preparation is critical for ensuring the replicability of our findings.
Based on the heterogeneity assessment, the appropriate model for combining effects was determined. Given the observed heterogeneity characteristics, which are detailed in
Section 3.3, a fixed-effects model was considered appropriate to synthesize the data, under the assumption that the observed “distance difference %” represents estimates of a common underlying effect. This model estimates a combined effect size. In addition, to calculate the 95% confidence intervals, we employed a bootstrap resampling technique (with 9999 iterations) to generate robust estimates for the relatively small sample size.
Sensitivity analyzes to assess the robustness of the synthesized results were not performed. This was primarily due to the fact that the majority of the included studies did not report the standard errors, confidence intervals, or other measures of variance necessary to conduct such analyzes. This decision was further supported by the low statistical heterogeneity observed in the overall analysis, which indicated a high degree of consistency across the study results.
2.7. Statistical Analysis
All statistical analyzes were performed using Python (version 3.12.7) [
39] with the aid of the SciPy library (version 1.13.1) [
40] for the core statistical functions and scikit-posthocs (version 0.11.4) [
41] for posthoc analyzes. To confirm the distributional properties of the distance difference data collected, the Shapiro-Wilk normality test was performed [
42]. Visual inspection of histograms and Quantile-Quantile (Q-Q) plots also complemented this assessment. Given the observed distributional characteristics, non-parametric statistical tests were employed for group comparisons.
2.7.1. Two-Group Comparisons
For comparing the distance difference percentage between two independent groups (e.g., defined by implementation contexts, such as the type of dataset used), the Mann-Whitney U test was applied [
43]. This test assesses whether two independent samples are drawn from the same distribution or if one distribution is stochastically larger than the other.
2.7.2. Multiple-Group Comparisons
When comparisons involved more than two independent groups, the Kruskal-Wallis H test was used [
44]. This test determines whether there are statistically significant differences between the medians of three or more independent groups.
2.7.3. Post-Hoc Analysis
Following a significant result from the Kruskal-Wallis H test, pairwise post hoc comparisons were performed using the Dunn test [
45]. To mitigate the increased risk of Type I errors (false positives) associated with multiple comparisons, a Bonferroni correction was applied to the
p-values.
2.8. Reporting Bias and Certainty of Evidence Assessment
To investigate potential publication bias, a funnel plot was generated. The plot visualizes the relationship between the effect sizes (distance difference %) and their precision. As standard errors were not reported in the primary literature, the inverse of the square root of the sample size () was used as a proxy for study precision. Asymmetry in the plot was then assessed visually to detect potential publication bias or small-study effects.
A formal certainty assessment of the cumulative evidence (e.g., using the GRADE framework) was not conducted. However, the overall strengths and limitations of the body of evidence, including the consistency of findings and the discrepancies between real-world and simulated results, are critically appraised in
Section 4.
3. Results
The study selection process is summarized in
Figure 1. The initial database search yielded 101 articles. After 15 duplicates were removed, 86 articles were screened, leading to 48 articles for full-text review. Of the 48 full-text articles assessed for eligibility, 37 were excluded for the following reasons:
Did not meet the minimum quality assessment score (n = 3).
Did not report a route distance metric (n = 21).
Did not compare the results with a baseline scenario (n = 9).
Did not compare the system with a non-IoT scenario (n = 3).
Did not report a comparable sample size (e.g., number of bins) (n = 1).
This resulted in a final set of 11 studies, which were included in the final quantitative synthesis.
Figure 1.
PRISMA flow diagram of the study selection process.
Figure 1.
PRISMA flow diagram of the study selection process.
3.1. General Characteristics of Included Studies
A total of 11 studies [
5,
6,
7,
8,
9,
10,
12,
46,
47,
48,
49] were selected for the meta-analysis. Several of these articles used more than one approach to solve the VRP or evaluated different scenarios, resulting in a total of 21 distinct samples for the quantitative synthesis. For instance, studies such as [
6,
8] model multiple scenarios, which are briefly described in
Table 3.
To provide a clear overview of the evidence base, the key characteristics of each of the 11 included studies are presented in
Table 4. The studies are detailed according to their primary vehicle routing algorithm, the type of dataset used, their VRP classification, and the scale of the study. A high-level summary of these characteristics is provided in
Table 5.
The quality assessment of the 11 included studies revealed that all met the predefined quality criteria. Scores ranged from 4.0 to 7.0, indicating an acceptable level of methodological rigor across the evidence base.
The update of the meta-analysis carried out in October 2025 returned 4 additional articles. As these articles utilized IoT-enabled routing optimization, they are screened to evaluate whether they will be added to the meta-analysis following the pre-established criteria. All these articles are excluded by the criterion CE-06 of
Table 2.
Of these studies, Kuraganti et al. [
50] presented a CVRP framework for waste management in Varanasi, India, and reported a distance reduction of 37.5%, but did not report the sample size of the experiment, which resulted in its exclusion from the meta-analysis. In addition, Livyashree et al. [
51] proposed a Backpropagation Neural Network (BNN) for route scheduling and reported only the optimized route distances; for this reason, it was removed from the meta-analysis. Finally, the studies [
52,
53], although related to the topic, did not report the distance reduction metric and were therefore removed from the meta-analysis.
3.2. Exploratory Analysis of Inter-Category Relationships
This section provides an exploratory visual analysis using heatmaps to illustrate the distribution and interaction of studies in different classification categories. Each heatmap presents the cross-tabulation of two categorical variables, with the intensity of the color in each cell indicating the count of samples within that specific combination. This approach helps to identify the patterns, concentrations, and absences of studies in various methodological and contextual dimensions.
Figure 2 presents the distribution of samples across different categories of routing algorithms and dataset types. The most significant concentration of samples is observed in “Heuristic-based” algorithms using real-world datasets (Count = 8 samples). These samples came from 4 unique studies. This highlights the prevalence of heuristic approaches in practical applications. “Mathematical Programming” algorithms also show a strong presence in real-world contexts, with 5 samples from 2 unique studies. In contrast, for simulated datasets, “Heuristic-based” algorithms contribute 2 samples from 2 unique studies, while “Graph-based/Local Search” contributes 1 sample from 1 unique study. Interestingly, the “Not Specified” category, representing a lack of algorithmic detail, shows 1 sample in real-world data but 4 samples in simulated data, indicating that simulation based studies might sometimes lack specific algorithmic reporting. This distribution suggests a prevalent application of heuristics in real-world studies, potentially due to their practical adaptability, which contrasts with a less consistent reporting of algorithmic specifics in certain simulation contexts.
The heatmap in
Figure 3 reveals the distribution of the samples according to their VRP category and the type of dataset used. A prominent observation is the high concentration of samples employing the ‘Capacity Only’ VRP in simulated environments (Count = 6 samples). These samples were derived from 3 unique studies. This suggests a foundational focus on the basic capacitated problem within controlled simulation settings. In contrast, the VRP categories “Inventory Routing” (count = 4 samples) and “Multi-trip/Intermediate” (count = 4 samples) appear exclusively in real-world datasets within this sample. For “Inventory Routing”, these 4 samples originate from 2 unique studies. For “Multi-trip/Intermediate”, the 4 samples originated from 1 unique study. This indicates that these other VRP variants are primarily explored in practical application contexts. The ‘Capacity with Time Windows’ category also shows a higher representation in real-world studies (count = 2 samples) compared to simulations (count = 1 sample). The “Profit-oriented” VRP is represented by a single sample in the real-world dataset. In particular, several combinations, such as “Inventory Routing”, “Multi-trip/Intermediate”, and “Profit-oriented”, are absent from the simulated dataset in this sample. This distribution suggests a tendency for simulated studies to focus on more fundamental VRP structures, while real-world applications delve into more complex and integrated problem formulations, potentially reflecting the practical demands of real-world operational environments.
Figure 4, which explores the intersection of routing algorithm categories and VRP classifications, shows a notable combination of “Heuristic-based” algorithms with “Capacity with Time Windows” (Count = 2 samples). These samples were derived from two unique studies that highlight the direct application of heuristics to problems incorporating temporal constraints. Additionally, “Heuristic-based” algorithms are frequently applied to “Capacity Only” (Count = 4 samples), originating from 3 unique studies. A significant group also exists for “Heuristic-based” algorithms combined with “Multi-trip/Intermediate” (count = 4 samples), all derived from 1 unique study. Furthermore, ‘Mathematical Programming’ algorithms are most associated with ‘Inventory Routing’ (Count = 4 samples) from 2 unique studies. The “Not Specified” routing algorithm, representing unstated approaches, is related to “Capacity Only” (count = 4 samples) from one unique study and “Inventory Routing” from a unique study. In addition, “Graph-based/Local Search” methods are observed with ‘Capacity Only’ in one unique study. These patterns illustrate the diverse pairings of algorithmic strategies with different VRP complexities.
3.3. Normality Assessment of Distance Difference Data
The normality of the 21 unique “distance difference percentage” data points was formally tested using the Shapiro-Wilk test. The results yielded a W statistic of 0.8928 and a p value of 0.0254. With a predefined significance level () of 0.05, the obtained p value was less than . Consequently, the null hypothesis of normality was rejected, indicating that the “distance difference percentage” data are not normally distributed.
This conclusion was further supported by a visual inspection of the data distribution.
Figure 5a shows the histogram of the “distance difference percentage” data, with a Kernel Density Estimate (KDE) overlay. The histogram visually suggests a departure from a symmetrical bell shape, with a concentration of data points in the first and last two bins. Furthermore, the Q-Q plot, presented in
Figure 5b, shows a clear deviation of the data points from the theoretical normal distribution line, providing further graphical evidence against the assumption of normality.
The findings of this normality assessment supported the selection of non-parametric statistical tests for group comparisons, as detailed in
Section 2.7.
3.4. Overall Reported Impact
This section addresses Research Question 1 (RQ1): “What is the reported impact of IoT-enabled smart waste management systems with vehicle routing optimization techniques on the distance traveled compared to non-IoT scenarios?”.
To quantitatively answer this question, the following hypotheses were formulated:
Null Hypothesis (): There is no impact of smart waste management systems with vehicle routing optimization techniques and IoT technologies on the distance traveled; that is, the combined mean percentage difference in the distance traveled is zero or greater than zero ().
Alternative Hypothesis (): Smart waste management systems with vehicle routing optimization techniques and IoT technologies lead to a statistically significant reduction in distance traveled; that is, the combined mean percentage difference in distance traveled is less than zero ().
The evaluation of heterogeneity between the 21 extracted “distance difference (%)” data points was conducted as a preliminary step. The analysis yielded a Cochran’s Q statistic of 20.0 with 20 degrees of freedom, resulting in a Higgins’ I2 value of 0%, indicating an absence of detectable heterogeneity. These results suggest that the variability observed between the reported effect sizes is entirely attributable to sampling error rather than to true heterogeneity.
Concerning negligible heterogeneity, quantitative synthesis was performed using a fixed effect model, which estimated a combined reduction of −21.51% in distance when implementing IoT-enabled smart waste management systems with vehicle routing optimization. In addition, the bootstrap 95% confidence interval ranges from −30.66% to −13.00%. As the interval did not include zero, the null hypothesis was rejected, providing strong evidence for a reduction in the distance traveled by IoT-enabled SWM systems compared to non-IoT systems.
Figure 6 visually summarizes the effects reported in the included studies. Each point represents the reported distance difference from a single study. The dashed vertical line indicates the calculated combined effect, while the shaded area represents its 95% confidence interval. Most studies show negative values for the “distance difference (%)”, indicating a reduction in the distance traveled when IoT-based systems are adopted.
However, it should be noted that some studies have reported positive values, indicating an increase in the distance traveled. These instances typically occurred in scenarios where optimization strategies prioritized other key performance indicators (KPI), such as improving the amount of waste collected [
6]–IRP, and the average vehicle utilization in [
10]–Outskirts.
3.5. Influence of Dataset Type on Distance Reduction
This section investigates Research Question 2 (RQ2): “Does the type of dataset influence performance?”
To evaluate whether the type of dataset influences the reported distance reduction achieved by IoT-enabled smart waste management systems, studies were classified into two groups: those utilizing real-world operational data (collected from companies or municipalities) and those based on simulated datasets. For this purpose, the following hypotheses were formulated:
Null Hypothesis (): There are no significant differences in the percentage of distance reduction between studies using real-world data and those using simulated datasets.
Alternative Hypothesis (): There is a significant difference in the percentage of distance reduction between studies using real-world data and those using simulated datasets.
The descriptive statistics presented in
Table 6 indicate a marked difference between the two groups. Studies based on simulated datasets reported a greater reduction in mean and median distances compared to studies that used real-world data. To visually reinforce this finding,
Figure 7 shows the box-plot of the distribution of the two groups of datasets.
To statistically compare the two independent groups, a Mann-Whitney U test was conducted, resulting in a U statistic of 13.0 and a p-value of 0.0056. Since the p-value is less than the predefined significance level of 0.05, the null hypothesis was rejected. This indicates a statistically significant difference in the median of the “distance difference (%)” between studies using real-world data and those using simulation data. This discrepancy may reflect the controlled and idealized conditions inherent in simulations, which often omit practical constraints such as traffic variability, imperfect IoT data reliability, and operational uncertainties encountered in real-world deployments.
To further emphasize this distinction, a separate forest plot was generated for each type of dataset.
Figure 8 presents the forest plot for only the studies that use real-world datasets. This graph indicates a combined effect of −12.37% (95% Bootstrap-CI: [−24.32%, −5.27%]), with individual studies showing a range of impacts, some even reporting an increase in the distance traveled, due to the reasons discussed earlier.
In contrast,
Figure 9 illustrates the forest plot for only the studies based on simulated datasets. In that case, the combined effect is significantly higher at −39.79% (95% Bootstrap-CI: [−47.99%, −26.30%]). As these studies can disregard some aspects of real-world environments and have controlled experimental settings, it is expected that they will report higher performance, which reflects the considerably greater distance reductions reported in these environments.
The comparison between the subgroup of studies that used real-world datasets (see
Figure 8) and the subgroup that used simulated datasets (see
Figure 9) supports the statistical finding of a significant difference between them. This difference highlights the gap between controlled simulations and real-world applications, and we addressed this finding in
Section 4.
3.6. Influence of Vehicle Routing Algorithms in Distance Reduction
This section addresses Research Question 3 (RQ3): “What types of vehicle routing algorithms are most commonly used in SWM and how are they evaluated?”
To investigate whether the category of the vehicle routing algorithm influences the reported distance reduction in IoT-enabled smart waste management systems, the studies were classified into four groups according to their routing approach: Heuristic-based, Mathematical Programming, Graph-based/Local Search, and Not Specified. Due to the insufficient number of observations in the graph-based/local search group (n = 1) and the lack of methodological clarity in the not specified group (n = 5), these were excluded from the hypothesis testing to ensure the reliability of the analysis. For this purpose, the following hypotheses were formulated:
Null Hypothesis (): There is no significant difference in the percentage of distance reduction between studies employing heuristic-based algorithms and those using mathematical programming approaches.
Alternative Hypothesis (): There is a significant difference in the percentage of distance reduction between studies employing heuristic-based algorithms and those using mathematical programming approaches.
Before delving into the comparison between the “heuristic-based” and “mathematical programming” approaches, it is worth examining the descriptive statistics for the “Not Specified” and “Graph-based/Local search” categories presented in
Table 7.
For the category “Not Specified”, which contains 2 studies with 5 samples, the average distance reduction performance is significantly higher, with a mean of −47.80% and a median of −50.00%. The relatively low standard deviation of 5.83 suggests that the results within this group are more consistent. However, it is important to note that four of these five samples were tested in simulated datasets, which, according to the previous analysis, presented statistically significant differences in distance reduction compared to real-world datasets.
Moreover, the category “Graph-based/Local search” is represented by a single study tested on a simulated dataset, which achieved a distance reduction of −30.28%. This value naturally serves as both its mean and median. As expected, with only one data point, the dispersion cannot be calculated.
Furthermore, the descriptive statistics presented in
Table 7 also suggest that studies using heuristic-based approaches report a higher reduction in mean and median distances than those based on mathematical programming. However, the presented standard deviation shows more dispersed data.
To statistically investigate whether the difference is significant, the Mann–Whitney U test was applied. The result yielded with a p-value of 0.3097. Since the p-value is greater than the predefined significance level of 0.05, the null hypothesis was not rejected. This result indicates that the observed difference is not statistically significant.
Influence of Vehicle Routing Algorithms Only in Real-World Datasets
To further investigate the impact of the routing algorithm under practical conditions, the same analysis was performed using only the subset of studies based on real-world datasets, given the significant differences found between simulation and real-world data. The hypotheses remained the same, but now consider only samples that were tested in real-world datasets.
The descriptive statistics presented in
Table 8, as in the previous case, show a reduction in the mean and median distance of a greater magnitude for the heuristic-based category, but with considerably greater variability. To visually illustrate these findings,
Figure 10 presents the box-plot of the distribution of the two categories.
The Mann-Whitney U test yielded a U statistic of 25.0 and a p-value of 0.5237. Since the p-value is greater than the predefined significance level of 0.05, the result confirms the previous observation, and the null hypothesis was not rejected. This indicates that, using real-world datasets, no statistically significant differences were found between heuristic-based and mathematical programming approaches with respect to distance reduction.
These results suggest that, from a practical point of view, both categories are capable of achieving meaningful distance reductions in IoT-enabled waste collection systems, with no evidence favoring one method over the other in real-world contexts.
3.7. Influence of VRP Classification on Distance Reduction
This section analyzes Research Question 4 (RQ4): “Does the classification of the vehicle routing problem significantly influence the reduction of distance traveled in IoT-based SWM systems?”
To investigate whether the type of VRP problem influences the magnitude of distance reduction. The comparison considered the categories: Capacity Only, Capacity with Time Windows, Inventory Routing, Multi-trip/Intermediate, and Profit-oriented. Due to the presence of only one observation in the Profit-oriented category, it was excluded from the statistical analysis. To formally assess whether distance reduction differs significantly across the VRP categories, the following hypotheses were tested:
Null Hypothesis (): There are no significant differences in the percentage of distance reduction between different VRP classification categories.
Alternative Hypothesis (): There is a significant difference in the percentage of distance reduction between the categories, which means that at least one VRP category differs from the others.
The descriptive statistics presented in
Table 9 suggest considerable variation between categories. The Capacity Only problems presented the highest mean and median reduction in distance. Similarly, Capacity with Time Windows showed a relevant reduction. On the other hand, Inventory Routing and Multi-trip/Intermediate categories exhibited notably lower reductions. In addition, the highest spread was encountered in the capacity only category. To reinforce these findings,
Figure 11 presents the box-plot for the categories.
To test these hypotheses, the nonparametric Kruskal-Wallis H test was applied, resulting in an H statistic of 8.25 and a p-value of 0.0412, leading to the rejection of the null hypothesis at the 95% confidence level. This indicates that there is a statistically significant difference in distance reduction between at least some of the VRP categories.
Following this global test, pairwise post-hoc comparisons were performed using Dunn’s test with Bonferroni correction to identify specific differences between groups. The results of Dunn’s test are presented in
Table 10.
Despite the overall findings of the Kruskal-Wallis H test, pairwise comparisons revealed no statistically significant differences between individual categories after correction. This outcome may be attributed to the conservative nature of the Bonferroni correction and/or the relatively small count in some VRP categories, which can reduce the statistical power to detect specific pairwise differences. Therefore, further research with more articles is recommended to confirm the patterns of the Kruskal-Wallis H test and strengthen the conclusions regarding pairwise differences.
Influence of VRP Classification on Distance Reduction Only in Real-World Datasets
This subsection refines the analysis of the influence of VRP classification (addressed in RQ4) by specifically examining its impact within the subset of studies utilizing real-world data, given the significant differences found between simulation and real-world data types. The hypotheses remained the same, but only with samples that were tested in real-world datasets.
The descriptive statistics in
Table 11 and
Figure 12 highlight that the categories capacity only and capacity with time windows were modified. The capacity only category shows a notably high standard deviation, and the median of distance reduction concerning only real-world datasets was lower than that in inventory routing and multi-trip/intermediate categories. In addition, the capacity with time windows category shows the highest mean and median reduction in distance and the lowest standard deviation.
To statistically investigate the difference, a Kruskal-Wallis H test was performed, which produced an H statistic of 3.52 and a p-value of 0.3186. Since the p-value is greater than the predefined significance level of 0.05, the null hypothesis was not rejected. This result indicates that there are no statistically significant differences in distance reduction between the VRP categories.
To further examine potential differences between specific pairs of categories, a post-hoc Dunn test with a Bonferroni correction was applied. The results, summarized in
Table 12, revealed that none of the pairwise comparisons reached statistical significance, with all
p-values equal to or greater than 0.529, supporting the Kruskal-Wallis result.
3.8. Assessment of Reporting Biases
A visual inspection of the funnel plot (
Figure 13) reveals moderate asymmetry in the distribution of study effects around the pooled mean (−21.51%). Specifically, there is a left-skewed pattern, with a concentration of smaller studies reporting larger distance reductions. This pattern indicates the presence of a “small-study effect,” which, in this context, appears to reflect a methodological difference rather than a classical publication bias.
This interpretation is strongly supported by the findings from the subgroup analysis (
Section 3.5), which confirmed that simulated studies reported significantly larger distance savings than real-world implementations. In particular, simulation based studies, which typically involve fewer samples and controlled experimental conditions, tend to yield higher percentage reductions in distance traveled. Conversely, real-world studies, which generally encompass larger operational scales and higher contextual variability, cluster more closely around the overall mean effect.
Therefore, while the funnel plot does not suggest substantial evidence of selective publication (i.e., missing null or unfavorable results), it highlights an important systematic difference in study design and context. Such methodological heterogeneity should be considered when interpreting the pooled effect size, as smaller studies may overestimate practical distance reductions compared with large-scale, real-world deployments.
4. Discussion
4.1. Practical Implications
The findings of this meta-analysis present significant practical implications for urban waste management, offering a strong case for investing in IoT-enabled Smart Waste Management (SWM) systems. The demonstrated average reduction of 21.51% in the distance traveled by these systems directly translates into substantial operational benefits for municipalities and waste collection companies. These benefits include reduced fuel consumption, lower CO2 emissions, reduced vehicle wear and tear, and optimized personnel hours, all of which contribute to considerable cost savings and the achievement of environmental sustainability targets, thus fostering greener urban environments.
A critical takeaway for practitioners is the observed discrepancy between the performance reported in studies using simulated datasets versus those using real-world data; simulated studies reported a greater reduction in mean distance (−39.79%) compared to real-world applications (−12.37%). This difference likely stems from the controlled and idealized conditions of the simulations, which often do not account for practical constraints such as traffic variability, imperfect IoT data reliability, and other operational uncertainties faced in actual deployments. Therefore, it is crucial for practitioners to consider these inherent complexities when estimating potential gains and planning implementation strategies to set realistic expectations.
Furthermore, from a practical standpoint on technology choices within real-world settings, the analysis indicated that there were no statistically significant differences in distance reduction between heuristic-based and mathematical programming algorithms. This suggests that both approaches can achieve meaningful efficiencies while offering flexibility in system design based on specific operational needs or available expertise.
From a broader perspective, the adoption of IoT-enabled SWM systems also aligns with international sustainability goals, particularly the United Nations Sustainable Development Goal 11 (Sustainable Cities and Communities), which promotes the development of inclusive, safe, resilient, and sustainable urban environments. By reducing the distance traveled in waste collection operations, these systems contribute to lowering greenhouse gas emissions, improving fuel efficiency, and optimizing resource use, which are key metrics in the Environmental, Social, and Governance (ESG) frameworks. Thus, beyond their technical effectiveness, these solutions support smarter urban planning and more sustainable public service strategies.
4.2. Limitations and Future Research Directions
Despite the valuable insights provided by this meta-analysis, it is important to acknowledge certain limitations that may influence the generalization and interpretation of our findings. These limitations also serve to delineate avenues for future research.
First, the number of primary studies available in the literature remains limited, particularly those reporting quantitative results with sufficient detail to be included in statistical analyzes. In addition, most of the primary studies did not report individual variances or standard errors associated with these quantitative results.
Second, while the analysis of dataset types revealed statistically significant differences between simulated and real-world scenarios, the sample size imbalance between these groups may affect the generalization of the findings. Similarly, for the analysis involving routing algorithms and VRP classification, some categories were underrepresented, limiting the robustness of statistical comparisons.
In addition, the scope of this review was limited to studies that explicitly reported the quantitative distance traveled. This criterion might have excluded relevant research that discusses efficiency gains in qualitative terms or focuses on other KPIs without a direct mention of distance reduction. Also, a detailed classification of the studies by the precise type of volumetric sensor used for the bin fill level was not feasible due to the lack of consistent and clear descriptions in the original articles.
Furthermore, the generalization of our findings is limited by the overall number of samples included (n = 21). Although sufficient for exploratory statistical analyzes and identifying broad trends, a larger and more diverse body of literature with standardized reporting metrics would strengthen the robustness of the combined effect size and allow for more sophisticated meta-analytical techniques.
It is also important to note that the funnel plot presented in this study was constructed using an approximate measure of precision, defined as the inverse square root of the study sample size. This approach was necessary because most primary studies did not report standard deviations, standard errors, or confidence intervals for their effect estimates. Consequently, the resulting funnel plot should be interpreted as an exploratory visualization rather than a formal test of publication bias. While the proxy precision allows for a reasonable qualitative assessment of asymmetry, it does not fully capture the statistical uncertainty of each study’s effect size. Concerning the insights and limitations identified in this meta-analysis, several directions for future research are suggested.
Future studies in SWM should consistently report not only mean effect sizes but also their associated measures of variability. This standardization would significantly enhance the potential for more rigorous meta-analyzes and comparative studies.
Future studies should investigate how contextual factors influence the performance of routing solutions. These factors include the types and accuracy of the IoT sensors used, the scale of deployment, and the characteristics of waste generation.
Future research should focus on developing and validating simulation models that more closely replicate real-world operational complexities.
A complementary qualitative synthesis could explore the qualitative challenges and facilitators of implementing an IoT-enabled SWM system.
5. Conclusions
This meta-analysis synthesized and quantitatively evaluated the reported impact of IoT-enabled SWM systems with vehicle routing optimization techniques on the distance traveled. Our findings provide valuable information on the current state and effectiveness of these innovative solutions in urban logistics. In addition, a statistically significant combined reduction of 21.51% in the distance traveled was found.
Furthermore, the analysis revealed that solutions based on simulated datasets tend to report higher distance reductions compared to those based on real-world datasets, indicating potential discrepancies between theoretical models and practical applications. Moreover, although descriptive differences were observed between the heuristic-based and mathematical programming approaches, the statistical analysis did not find significant differences in their reported distance reductions. A similar result was observed in the comparison among different VRP classifications, where only marginal differences emerged, often constrained by the limited availability of data in some categories.
Although the number of included studies is relatively limited, the consistent statistical findings support the robustness of the observed trends and provide a valuable foundation for future research.
In conclusion, this study not only quantifies the benefits of IoT-based SWM systems but also paves the way for more informed and data-driven decisions in the design of next-generation smart urban infrastructures.