Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessReview

Peer-Review Record

Machine Learning in Reverse Logistics: A Systematic Literature Review

Algorithms 2025, 18(10), 650; https://doi.org/10.3390/a18100650

by Abner Fernandes Souza da Silva¹

, Virginia Aparecida da Silva Moris¹, João Eduardo Azevedo Ramos da Silva¹, Murilo Aparecido Voltarelli² and Tiago F. A. C. Sigahi^1,3,*

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Reviewer 3: Anonymous

Algorithms 2025, 18(10), 650; https://doi.org/10.3390/a18100650

Submission received: 5 September 2025 / Revised: 8 October 2025 / Accepted: 10 October 2025 / Published: 16 October 2025

(This article belongs to the Special Issue Artificial Intelligence in Sustainability Research Operations, Management, and Ecosystems)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

This paper systematically reviews the application of machine learning in reverse logistics, focusing on its development trends, challenges, and research opportunities. Following the PRISMA guidelines, the study analyzed 52 articles retrieved from the Scopus and Web of Science databases. The paper identifies four major research gaps: lack of metadata standardization, absence of public benchmarks, insufficient model interpretability, and challenges in integrating machine learning with simulation technologies and digital twins. However, from the perspectives of academic rigor and depth of contribution, this paper has several shortcomings and is recommended for supplementation and improvement.

The paper fails to mention any quality assessment of the 52 original research articles ultimately included. A “Quality Assessment” section should be added, scoring articles based on research design, data transparency, model validation, and result reporting.
The authors used only Scopus and Web of Science databases and did not retrieve important conference papers in computer science (e.g., NeurIPS, ICML, KDD). Expanding the search scope is recommended.
The results section (Section 3.2) primarily describes and categorizes literature content without in-depth discussion. A comprehensive analysis should be added to the discussion section.
Section 3.2.4 lists challenges (data scarcity, uncertainty, computational cost), while Section 3.2.5 outlines future directions (metadata standards, open benchmarks). However, the connection between these two sections is weak. It fails to clearly indicate which studies have addressed these challenges, which are attempting to solve them, and how effective their latest solutions are.
In the conclusion, the proposed future directions (e.g., developing multidimensional metrics, creating public datasets, integrating digital twins) are valid but somewhat trite. A more targeted and refined research agenda should be presented, leveraging the unique findings of this review to attract researchers.
Figure 6 (Distribution by model. Source: Authors (2025)) highlights an excessively high proportion of “unspecified” models. This issue requires discussion within the text to emphasize its negative implications.

Author Response

Dear Editor and Reviewers,

We would like to sincerely thank you and the reviewers for the constructive feedback and insightful comments that have significantly improved the quality of our manuscript.
We have attached a document where we provide a detailed, point-by-point response explaining how we have addressed each of the reviewers’ comments/suggestions. All improvements are highlighted in the revised version of the manuscript.

Reviewer #1

Comment	Actions
1. The paper fails to mention any quality assessment of the 52 original research articles ultimately included. A “Quality Assessment” section should be added, scoring articles based on research design, data transparency, model validation, and result reporting.	A new subsection entitled “2.2. Quality Assessment” has been added. It introduces four criteria (research design, data transparency, model validation, and reporting completeness). Each study was rated on a 3-point scale (low, moderate, high) to ensure methodological rigor.
2. The authors used only Scopus and Web of Science databases and did not retrieve important conference papers in computer science (e.g., NeurIPS, ICML, KDD). Expanding the search scope is recommended.	We acknowledge this limitation and have clarified it in the Methods section: “Although the search strategy was limited to the Scopus and Web of Science databases, this decision aimed to ensure methodological consistency and quality control by focusing exclusively on peer-reviewed journal articles. While this approach may have excluded relevant computer science conference papers—such as those published in NeurIPS, ICML, or KDD—it aligns with the objectives of this review, which emphasize methodological rigor and reproducibility rather than coverage breadth. Future studies may expand the search scope to include leading conference proceedings to capture cutting-edge algorithmic developments and implementation trends in reverse logistics.”
3. The results section (Section 3.2) primarily describes and categorizes literature content without in-depth discussion. A comprehensive analysis should be added to the discussion section.	We expanded Sections 3.2.1 and 3.2.2 with interpretative insights explaining why certain models and objectives dominate (e.g., prevalence of ANN due to nonlinear RL data and maturity of frameworks).
Section 3.2.4 lists challenges (data scarcity, uncertainty, computational cost), while Section 3.2.5 outlines future directions (metadata standards, open benchmarks). However, the connection between these two sections is weak. It fails to clearly indicate which studies have addressed these challenges, which are attempting to solve them, and how effective their latest solutions are.	A linking paragraph was added at the end of Section 3.2.4 explicitly connecting the challenges with the research gaps (e.g., data scarcity → metadata standards; uncertainty → real-time pipelines).
5. In the conclusion, the proposed future directions (e.g., developing multidimensional metrics, creating public datasets, integrating digital twins) are valid but somewhat trite. A more targeted and refined research agenda should be presented, leveraging the unique findings of this review to attract researchers.	The Conclusion has been rewritten to present a refined, author-derived research agenda, emphasizing future directions such as hybrid RL–ML pipelines, FAIR-based datasets, and explainable models.
6. Figure 6 (Distribution by model. Source: Authors (2025)) highlights an excessively high proportion of “unspecified” models. This issue requires discussion within the text to emphasize its negative implications.	A paragraph discussing this issue has been added: “An important observation is that nearly one-third of the studies failed to explicitly state the type of ML model employed, classifying them as “unspecified.” In most of these cases, the papers referred to generic “predictive models” or “machine learning algorithms” without further detail. This lack of transparency hampers replicability and comparative evaluation, highlighting the urgent need for clearer reporting standards and model documentation practices in ML-based reverse logistics research.”

We would like to thank the reviewers once again, as their comments/suggestions were valuable for us to improve our work.

Sincerely,

The authors

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

I think the article is well-structured and nicely written. It effectively answers the five research questions and plainly explains the analysis results.

The methodology is clearly explained, also with the help of the Prisma protocol block diagram.

If I could suggest a slight improvement, I would recommend including the time frame of the articles in the main text. Furthermore, although the strengths are well-highlighted, it would be helpful to pay more attention to the weaknesses, also considering the number of papers actually considered in relation to the much larger initial selection.

Overall, it is an enjoyable read with added scientific value.

Author Response

Dear Editor and Reviewers,

Reviewer #2

Comment

Actions

1. If I could suggest a slight improvement, I would recommend including the time frame of the articles in the main text. Furthermore, although the strengths are well-highlighted, it would be helpful to pay more attention to the weaknesses, also considering the number of papers actually considered in relation to the much larger initial selection.

Added in Section 3.1: “The reviewed studies span from 2007 to 2025, with a noticeable surge after 2020, reflecting the accelerating intersection between sustainability and artificial intelligence.
The predominance of research in China and Iran can be partially attributed to national initiatives promoting circular economy and AI-driven industry, such as China’s 14th Five-Year Plan emphasizing green manufacturing and Iran’s growing recycling and waste management programs. These institutional incentives appear to have fostered scientific output in machine learning for reverse logistics.”

We would like to thank the reviewers once again, as their comments/suggestions were valuable for us to improve our work.

Sincerely,

The authors

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

Dear, Authors.

As a reviewer, I precisely read and review the manuscript.
The manuscript focuses on the timely and relevant issue of reverse logistics.

Generally, the research performed a systematic review by following the well-known PRISMA protocol. Since it has sufficiently important and interesting findings, I hope as a reviewer that my comments are helpful for you to improve the quality of paper.

Major Concerns
1. Significant Contradiction in Results (RQ1): There is a critical inconsistency between the text and a key figure in your results. On page 8, the text states, "...supervised learning models predominate, followed by reinforcement learning techniques and, to a lesser extent, unsupervised learning methods." However, Figure 5 on the same page clearly shows that
Unsupervised learning (~36%) is significantly more prevalent than Reinforcement learning (~15%), second only to Supervised learning (~40%). This is a major error that calls into question the validity of the analysis for RQ1. You must correct this discrepancy and revise the entire narrative and discussion surrounding the prevalence of different ML approaches. Which is correct, the text or the figure? This needs to be thoroughly re-examined.
2. Methodological Rigor and Justification:
- Search String Scope: The search string detailed in Table 1 includes specific models like "Decision tree" and "random forest" but omits other common and powerful ML techniques such as "Support Vector Machine," "SVM," "Gradient Boosting," or "k-nearest neighbors." The choice of which specific models to include in the primary search string seems arbitrary and could introduce significant selection bias, potentially excluding a relevant body of work. Please provide a strong justification for this choice. A more robust approach would be to use broader terms (e.g., "supervised learning," "classification," "regression") and then categorize the specific models found within the retrieved articles. At a minimum, you must discuss this as a key limitation.
- Exclusion of Articles: You state that ten articles were excluded "due to the unavailability of the full text." For a systematic review, this is a methodological weakness. Please detail the steps taken to acquire these articles (e.g., contacting authors, institutional library requests, inter-library loans). If these articles could not be obtained after exhaustive efforts, you must discuss the potential impact of their exclusion on your findings. What if these ten articles contained novel applications or focused on underrepresented areas?
3. Depth of Analysis: While the classification of papers by objective (RQ2) is useful, the analysis could be deeper. For instance, why are ANNs so prevalent for forecasting in RL? Is it due to the nature of the data (e.g., non-linear time series), or is it simply a trend reflecting the broader ML field? The discussion should move beyond what was done to why it was done and how it uniquely addresses the challenges of the RL context. Similarly, the bibliometric analysis in Section 3.1 is descriptive but offers limited insight. Can you connect the leadership of China and Iran in this research area to specific national policies, industrial priorities, or data availability?

Minor Concerns
1. RQ5 (Research Gaps) Presentation: Table 4 is excellent and is the strongest part of the manuscript. However, the prose in Section 3.2.5 could be structured to more clearly mirror the table, walking the reader through each gap with a consistent structure (e.g., define the gap, explain its importance, and then discuss the promising directions). This would improve readability and impact.
2. Temporal Analysis: In Section 3.1, you note that the review was conducted in "June 2025." Given that some of the cited sources are from 2025, this implies a very current review, which is a strength. However, please ensure all such time references are consistent and accurate at the time of final submission.
3. Clarity on "Not specified" Models: Figure 6 reveals that "Not specified" is the largest single category of ML models. While you correctly identify this as a transparency issue, it would be helpful to elaborate. Did these papers use ML as a "black box"? Or did they refer to a general class of algorithm (e.g., "a predictive model") without naming the specific one? A little more texture here would be beneficial.
4. Reference Inconsistency: In the introduction (page 2), you cite "(Rolf et al., 2025)" for unsupervised learning. I could not find this reference in your bibliography. Please check all citations for accuracy and completeness.

I hope all of my comments to be helpful for the authors.

Thanks

Comments on the Quality of English Language

The quality of English in the manuscript is suitable for the readers to follow the main logical flow as the authors intended.

Author Response

Dear Editor and Reviewers,

Reviewer #3

Comment

Actions

1. Significant Contradiction in Results (RQ1): There is a critical inconsistency between the text and a key figure in your results. On page 8, the text states, "...supervised learning models predominate, followed by reinforcement learning techniques and, to a lesser extent, unsupervised learning methods." However, Figure 5 on the same page clearly shows that
Unsupervised learning (~36%) is significantly more prevalent than Reinforcement learning (~15%), second only to Supervised learning (~40%). This is a major error that calls into question the validity of the analysis for RQ1. You must correct this discrepancy and revise the entire narrative and discussion surrounding the prevalence of different ML approaches. Which is correct, the text or the figure? This needs to be thoroughly re-examined.

The inconsistency was corrected throughout the text. Section 3.2.1 now aligns with Figure 5, clarifying that supervised learning remains predominant, followed closely by unsupervised methods, while reinforcement learning is less frequent but growing.

2. Methodological Rigor and Justification:
- Search String Scope: The search string detailed in Table 1 includes specific models like "Decision tree" and "random forest" but omits other common and powerful ML techniques such as "Support Vector Machine," "SVM," "Gradient Boosting," or "k-nearest neighbors." The choice of which specific models to include in the primary search string seems arbitrary and could introduce significant selection bias, potentially excluding a relevant body of work. Please provide a strong justification for this choice. A more robust approach would be to use broader terms (e.g., "supervised learning," "classification," "regression") and then categorize the specific models found within the retrieved articles. At a minimum, you must discuss this as a key limitation.
- Exclusion of Articles: You state that ten articles were excluded "due to the unavailability of the full text." For a systematic review, this is a methodological weakness. Please detail the steps taken to acquire these articles (e.g., contacting authors, institutional library requests, inter-library loans). If these articles could not be obtained after exhaustive efforts, you must discuss the potential impact of their exclusion on your findings. What if these ten articles contained novel applications or focused on underrepresented areas?

Justification added: “The search string was designed to balance conceptual breadth with methodological precision. It combines paradigm-level terms — such as supervised learning, unsupervised learning, and reinforcement learning — with two representative algorithmic families, decision trees and random forests, which are among the most frequently applied and interpretable ML techniques in logistics-related studies. This hybrid structure ensured broad coverage of the main learning paradigms while capturing well-established algorithmic approaches without overextending the search scope. Specific models such as Support Vector Machines (SVM), Gradient Boosting, or k-nearest neighbors were not individually listed to maintain conceptual focus and avoid excessive fragmentation of results. This design aligns with PRISMA’s emphasis on conceptual inclusiveness and methodological consistency. Nonetheless, we acknowledge that this decision may have excluded some domain-specific studies, representing a limitation to be addressed in future reviews through a more algorithm-explicit search expansion.”

Clarified: “Regarding article availability, for the ten studies whose full texts were inaccessible, retrieval was attempted through institutional libraries and direct contact with the authors. Despite these efforts, the papers remained unavailable. Their exclusion may have slightly limited the representation of niche or regional applications.”

3. Depth of Analysis: While the classification of papers by objective (RQ2) is useful, the analysis could be deeper. For instance, why are ANNs so prevalent for forecasting in RL? Is it due to the nature of the data (e.g., non-linear time series), or is it simply a trend reflecting the broader ML field? The discussion should move beyond what was done to why it was done and how it uniquely addresses the challenges of the RL context. Similarly, the bibliometric analysis in Section 3.1 is descriptive but offers limited insight. Can you connect the leadership of China and Iran in this research area to specific national policies, industrial priorities, or data availability?

Expanded discussion in 3.2.2: the prevalence of ANN models is linked to data nonlinearity and strong library support; RL models are less frequent due to high data and environment requirements.

Section 3.1 expanded to associate productivity with national policies on circular economy and AI adoption (e.g., China’s 14th Five-Year Plan, Iran’s recycling initiatives).

4. RQ5 (Research Gaps) Presentation: Table 4 is excellent and is the strongest part of the manuscript. However, the prose in Section 3.2.5 could be structured to more clearly mirror the table, walking the reader through each gap with a consistent structure (e.g., define the gap, explain its importance, and then discuss the promising directions). This would improve readability and impact.

Section restructured to follow Table 4 format with clear subheadings for each gap (e.g., “Gap 1 – Uncertainty Modeling”).

5. Temporal Analysis: In Section 3.1, you note that the review was conducted in "June 2025." Given that some of the cited sources are from 2025, this implies a very current review, which is a strength. However, please ensure all such time references are consistent and accurate at the time of final submission.

This paragraph was added to the text: The review was completed in July 2025, ensuring that the most recent publications were captured up to this date. This timing reinforces the contemporaneity of the analysis, as several studies published in 2025 were already indexed and included. Minor adjustments were made to ensure temporal consistency across the text and figures, thereby maintaining alignment between the reported period of data collection and the latest referenced works.

6. Clarity on "Not specified" Models: Figure 6 reveals that "Not specified" is the largest single category of ML models. While you correctly identify this as a transparency issue, it would be helpful to elaborate. Did these papers use ML as a "black box"? Or did they refer to a general class of algorithm (e.g., "a predictive model") without naming the specific one? A little more texture here would be beneficial.

Detailed explanation added after Figure 6: “

An important observation is that nearly one-third of the studies failed to explicitly state the type of ML model employed, classifying them as “unspecified.” In most of these cases, the papers referred to generic “predictive models” or “machine learning algorithms” without further detail. This lack of transparency hampers replicability and comparative evaluation, highlighting the urgent need for clearer reporting standards and model documentation practices in ML-based reverse logistics research.”

7. Reference Inconsistency: In the introduction (page 2), you cite "(Rolf et al., 2025)" for unsupervised learning. I could not find this reference in your bibliography. Please check all citations for accuracy and completeness.

Citation removed to ensure reference consistency.

We would like to thank the reviewers once again, as their comments/suggestions were valuable for us to improve our work.

Sincerely,

The authors

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

This paper systematically reviews the application of machine learning in reverse logistics, focusing on its development trends, challenges, and research opportunities. Following the PRISMA guidelines, the study analyzed 52 articles retrieved from the Scopus and Web of Science databases. The paper identifies four major research gaps: lack of metadata standardization, absence of public benchmarks, insufficient model interpretability, and challenges in integrating machine learning with simulation technologies and digital twins. After revisions and refinements, the quality of the paper has been improved, and it is recommended for publication.

The paper fails to mention any quality assessment of the 52 original research articles ultimately included. A “Quality Assessment” section should be added, scoring articles based on research design, data transparency, model validation, and result reporting.

Action:A new subsection entitled “2.2. Quality Assessment” has been added. It introduces four criteria (research design, data transparency, model validation, and reporting completeness). Each study was rated on a 3-point scale (low, moderate, high) to ensure methodological rigor.