Next Article in Journal
The Effect of Deep Sedation with High Flow Nasal Oxygen Therapy on the Transcutaneous CO2 and Mitochondrial Oxygenation: A Single-Center Observational Study
Previous Article in Journal
A Preliminary Study of Hyperspectral Imaging Combined with Dual-Threshold Segmentation Technique for Peeling Rate Determination in Potatoes
 
 
Article
Peer-Review Record

Information Bottleneck-Enhanced Reinforcement Learning for Solving Operation Research Problems

Sensors 2025, 25(24), 7572; https://doi.org/10.3390/s25247572
by Ruozhang Xi 1, Yao Ni 2,* and Wangyu Wu 3,*
Reviewer 1:
Reviewer 2:
Sensors 2025, 25(24), 7572; https://doi.org/10.3390/s25247572
Submission received: 4 November 2025 / Revised: 6 December 2025 / Accepted: 10 December 2025 / Published: 13 December 2025
(This article belongs to the Section Intelligent Sensors)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

(can also be seen in the attachment)

Review Comments

  1. Strengthen Literature Review
    The following paper related to reinforcement learning should be reviewed:
    Zhang, J., Li, X., Yuan, Y., Yang, D., Xu, P. and Au, F.T., 2024. A multi-agent ranking proximal policy optimization framework for bridge network life-cycle maintenance decision-making. Structural and Multidisciplinary Optimization, 67(11), p.194.
    Zhang, Y., Zhao, W., Wang, J. and Yuan, Y., 2024. Recent progress, challenges and future prospects of applied deep reinforcement learning: A practical perspective in path planning. Neurocomputing, 608, p.128423.
    Chen, X., Liu, S., Zhao, J., Wu, H., Xian, J. and Montewka, J., 2024. Autonomous port management based AGV path planning and optimization via an ensemble reinforcement learning framework. Ocean & Coastal Management, 251, p.107087.

  2. 2. Complete the Algorithmic Description
    Algorithm 1 currently shows only the actor update, but PPO requires both actor and critic components. Please add the critic loss function, the critic update step, clarification on whether and how the bottleneck objectives affect the critic, and details on how value targets and advantage estimates are computed. Without these details, the method is incompletely specified and difficult to reproduce.
  3. 3. Provide Complete Technical Details
    (1) For the state bottleneck, specify: the exact form of pθ(T|S) including mean parameterization, whether σ is fixed or learned, diagonal or full covariance structure, the closed-form KL divergence formula used.
    (2) Clarify if the encoder is shared between actor and critic.
    (3) For the policy bottleneck with masked actions, explain how the prior p(A) is defined over the feasible action set only (renormalized after masking) and how infeasible actions are handled in the logits/softmax and KL computation.
  4. 4. Provide more detailed hyperparameter setting and results
    Add: instance generation details (distributions, scaling, normalization), hardware specifications and wall-clock time, all PPO hyperparameters (GAE, clipping ε, batch size, update epochs, entropy coefficients, learning rates), mean ± standard deviation over multiple random seeds with statistical significance tests.
  5. 5. Fix Inconsistencies and Presentation Issues
    Correct the following:
    (1) "Travel Sales Man" → "Traveling Salesman Problem";
    (2) IBE vs. IBE-RL: double check whether mistaken using IBE instead of IBE-RL in text;
    (3) ensure all figures have properly labeled axes with units and consistent scales across subfigures.
    (4) double check other grammar problems

 

Comments for author File: Comments.pdf

Author Response

We sincerely appreciate the valuable comments from the reviewer. Please find in the attached document our reply.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

Please refer to the attached pdf.

Comments for author File: Comments.pdf

Author Response

We sincerely appreciate the valuable comments from the reviewer. Please find in the attached document our reply.

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report

Comments and Suggestions for Authors

Minor comments

  1. Language and minor wording
  • The language is much improved overall. Please do one more light pass (or copy-editing) to remove residual minor issues and ensure consistent, polished phrasing (e.g., “travel sales man task”, “Travel Sales Man Problem”, “TSP task”, etc.).
  1. Small clarifications
  • When you discuss the policy bottleneck vs entropy regularization, you might add one concrete sentence summarizing how a non-uniform prior p(A) could encode domain knowledge (for example, prioritizing feasible or shorter routes), further emphasizing the practical value of the framework.
  • In the limitations/future work, consider briefly mentioning potential extensions to more complex constraints or larger-scale industrial instances.

Author Response

We thank the reviewer for valuable comments. Please find the attached document of our reply to your concerns.

Author Response File: Author Response.pdf

Back to TopTop