Next Article in Journal
An Edge AI Approach for Low-Power, Real-Time Atrial Fibrillation Detection on Wearable Devices Based on Heartbeat Intervals
Previous Article in Journal
Rectus Femoris and Gastrocnemius EMG Driven Cheonjiin Speller for Korean Text Input
Previous Article in Special Issue
AI-Native PHY-Layer in 6G Orchestrated Spectrum-Aware Networks
 
 
Article
Peer-Review Record

Higher-Order Markov Model-Based Analysis of Reinforcement Learning in 6G Mobile Retrial Queueing Systems

Sensors 2025, 25(23), 7245; https://doi.org/10.3390/s25237245
by Djamila Talbi * and Zoltan Gal *
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Sensors 2025, 25(23), 7245; https://doi.org/10.3390/s25237245
Submission received: 20 October 2025 / Revised: 23 November 2025 / Accepted: 26 November 2025 / Published: 27 November 2025
(This article belongs to the Special Issue Feature Papers in Communications Section 2025)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

The paper proposes an integrated queuing system based on 6G technology, adopting a method that combines deep Q-network reinforcement learning with a re-examination queuing system to provide a new approach for adaptive queue management and service optimization, whose performance has been verified through relevant experiments. Suggestions for the paper are as follows:

  1. In Section 1.2 of the paper, regarding content such as "Markovian queueing systems" and "Reinforcement Learning (RL) based optimization for network management" carried out in References 2, 3, and 4, it is recommended to provide detailed descriptions and add a comparative analysis between the paper's work and these studies.
  2. For the description of the loss function in Formula 8, some symbols have not been explained, and supplementary explanations are suggested.
  3. The meanings of some symbols appearing in Figures 4 and 5 are not labeled, so please add them.
  4. In Algorithm 1 "Pseudo-code of the integrated RL-RQS framework for dynamic queue management", is the maximum queue length "k" in the third line the same as the queue size "K" in Table 3? If so, it is recommended to unify the format. "Action 1/2" is mentioned in Lines 284 and 285 of the paper, but it is described as "Action = 0/1" in the last two lines of Algorithm 1; please unify the format.
  5. In the neural network architecture of Figure 6, the input layer is labeled as "imageInputLayer"—is there an error here? Are the input values of the neural network used in the paper image data? Additionally, it is recommended to add a specific description of the neural network architecture in the figure caption, such as the specific dimensions of the input layer, hidden layers, and output layer.
  6. Line 336 of the paper mentions that a high weight value is used for the scaling factor, but the actual setting in the paper is 0.1, which is not a high weight value—please explain this inconsistency.
  7. Line 413 of the paper states "These results meet the 6G smart technologies requirements", but the paper does not mention the specific requirements of 6G smart technologies for the experiment, nor does the experimental result analysis section explain how the requirements are specifically met.
  8. In the final conclusion section of the paper, it is recommended to conduct a targeted summary based on the three research gaps proposed by the authors: fixed parameters, limitations of RL application scenarios, and the lack of integration between retrial queuing and RL. This will illustrate that the proposed problems have been solved and strengthen the logical closure of the paper.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

See the attach.file

Comments for author File: Comments.pdf

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors
  1. The introduction gives an overview of the challenges of 6G, but it does not clearly explain the specific research gap about combining retrial queueing systems with reinforcement learning.
  2. In the research, only 10 training episodes were used, which is not enough for the DQN to achieve a stable result. It may lead to results that are not statistically significant.
  3. There is no comparison to measure performance improvement, such as against classical Q-learning or a queueing system without reinforcement learning.
  4. Using Dynamic Time Warping (DTW) to compare steady-state distributions needs a good reason. Why not just use regular probabilistic distance measures?
  5. No tables are showing key performance metrics, such as throughput, latency, or average queue size.
  6. Figures 7–13 are conceptually informative but lack clear axis labels, units, and legends.
  7. The claim that the study “provides a strong basis for improving 6G queueing strategies” is not supported.
  8. The conclusions can be adapted, given the limited experimental evidence.
Comments on the Quality of English Language

The English language is mostly clear, but needs some editing. There are several grammatical and spelling mistakes, and some technical terms are used incorrectly.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 3 Report

Comments and Suggestions for Authors
  1. It would be helpful to add a clear statement about the scope in the Abstract and Introduction: the focus is on convergence (policy stabilization, spectral gap, minimal training), while benchmarking is outside the scope of this paper.
  2. In the conclusion or abstract, soften the claims "enhance performance/low energy consumption" to indicate they are based on simulation conditions (without benchmarking).
Comments on the Quality of English Language

Acceptable

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Back to TopTop