Efficient Delay-Sensitive Task Offloading to Fog Computing with Multi-Agent Twin Delayed Deep Deterministic Policy Gradient
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThe authors present a conceptually rich integration of Multi-Agent Deep Reinforcement Learning (MADRL) with a constrained optimisation framework for dynamic task offloading in IoT-Fog-Cloud environments. The problem formulation includes realistic constraints, and the proposed cooperative MADRL approach aims to adapt dynamically to varying network and task conditions. While the topic is timely and the effort appears rigorous, the paper exhibits several limitations that significantly affect clarity, coherence, and technical robustness.
-
The abstract is overly dense, relying heavily on undefined acronyms and technical jargon. It lacks a structured articulation of the problem, fails to highlight the novelty of the approach, and does not communicate the real-world impact or significance of the contribution.
-
Sections I and II are excessively wordy and repetitive. They blend background, challenges, methods, and contributions without clear delineation or focus, making it hard to follow the paper’s key direction.
-
Section III lacks a cohesive optimisation strategy suitable for dynamic, resource-constrained, latency-sensitive environments in heterogeneous multi-agent fog systems. The authors should also cite prior relevant work, such as [https://onlinelibrary.wiley.com/doi/full/10.1155/2017/2363240], to ground their approach in existing literature.
-
The described Fog-to-Fog (F2F) cooperation model assumes decentralised decision-making by fog nodes (Fm, Fj), yet it omits any conflict resolution or coordination protocol. This poses a risk of inefficient or conflicting offloading and resource allocation strategies.
-
There is no defined mechanism for distributed consensus or control among agents—no reinforcement learning logic, game-theoretic approach, blockchain, or auction-based strategy is included to synchronise decisions in real-time.
-
Although task constraints such as deadline and energy consumption are considered, these are not integrated into a unified cost function or optimisation policy, limiting the model's practical applicability.
-
The absence of a multi-objective optimisation framework to jointly balance latency, energy usage, bandwidth constraints, and battery health under dynamic and uncertain network conditions is a major gap.
-
The model mentions partial offloading but provides no policy or mechanism to determine task partitioning. It does not address inter-task dependencies, data splitting overhead, or result reassembly delays at the fog layer.
-
No task scheduling strategy is described (e.g., FCFS, priority-based, deadline-aware) to efficiently manage task queues at fog nodes during varying load conditions.
-
The proposed model does not account for realistic wireless impairments like dynamic fading, interference from concurrent users, or wireless link load balancing—all critical in fog networks.
-
There is no fallback mechanism to offload tasks to the cloud when both Fm and Fj nodes are overloaded or unavailable, which raises reliability concerns in mission-critical scenarios.
-
The model lacks a well-defined utility or reward function governing fog agent decisions. In multi-agent reinforcement learning, this is crucial for convergence and cooperative behaviour.
-
There is no evidence of adaptive learning (e.g., Q-learning or deep RL) that would enable agents to optimise task decisions based on past performance or environmental feedback.
-
Section IV lacks clarity due to overly complex technical language, inconsistent notation, and minimal explanation of key equations. This makes understanding the MADRL optimisation process challenging.
-
The MAFCTORA framework and its associated MAPTO and MAFCRA algorithms are presented without any computational complexity analysis. There is no discussion of time or space complexity, memory usage, or convergence guarantees. This omission severely limits the assessment of scalability, feasibility, and real-time deployment of the solution.
Overall, my point is that while the paper introduces an innovative perspective on cooperative MADRL in fog computing, it requires significant revisions in clarity, organisation, technical completeness, and methodological rigour. The authors should particularly improve structural presentation, integrate formal optimisation techniques, and address practical deployment concerns before the work can be considered for publication.
-
Author Response
Please see the attachment
Author Response File: Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for AuthorsThe manuscript examined, although interesting from a scientific point of view, presents within it the following specific issues that must necessarily be addressed:
It is not clear how the "fully cooperative" setting has been hypothesized through a rigorous characterization that distinguishes it from partially cooperative and adversarial multi-agent systems.
The description that refers to the adoption of Dec-POMDP for fog computing agents is not very clear and should be integrated with a specific discussion that explains how the system satisfies the conditions of partial observability required for Dec-POMDP modeling.
The optimization framework reposted in Section 4 is too simplistic and, therefore, requires a more detailed description of the queuing or scheduling model because, for example, constraints such as C3 are apparent and do not realistically reflect resource contention phenomena.
The paper flow is not very logical, and it frequently oscillates between problem statement and solution proposal without establishing all the assumptions, goals, and constraints in a logical, sequential, and coherent way that must be outlined clearly before introducing the MADRL approach.
The reward formulation and hyperparameters need to be more specified because the relative weighting between latency and energy (ωₜ, ωâ‚‘) are not clearly provided to allow the result's reproducibility.
An experimental validation section that contains a comprehensive performance evaluation with multiple traffic load scenarios and ablation studies on key parameters is totally missing and should be provided.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 3 Report
Comments and Suggestions for Authors- It is inappropriate to model the problem as a MDP because the optimization objective is to minimize the energy consumption and delay at the current moment. The optimization objective is a single moment rather than a long-term average value.
- Where does the algorithm run, on the user's device or fog layer?
- Will the algorithm bring additional communication overhead? What communication overheads will multi-agent collaboration bring?
- What is the complexity of the algorithm?
- The author needs to cite some key literatures, such as " H. Hao, C. Xu, W. Zhang, S. Yang and G. -M. Muntean, "Joint Task Offloading, Resource Allocation, and Trajectory Design for Multi-UAV Cooperative Edge Computing With Task Priority," in IEEE Transactions on Mobile Computing, vol. 23, no. 9, pp. 8649-8663, Sept. 2024, doi: 10.1109/TMC.2024.3350078."
- In Secition 5, the author does not provide detailed definitions of the action space and the state space.
- The authors only used the time slot in Eq. (3). But in other places, there is no concept of time slots. Is this a mistake? Please provide a detailed explanation.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Round 2
Reviewer 1 Report
Comments and Suggestions for AuthorsComment 4: Section III has been improved but missed the suggested reference as an input to a cohesive optimisation strategy suitable for dynamic, resource-constrained, latency-sensitive environments in heterogeneous multi-agent fog systems. The authors should also cite relevant work, such as [https://onlinelibrary.wiley.com/doi/full/10.1155/2017/2363240], to ground their approach in existing literature, especially in Spine-leaf architectures.
Author Response
Comment 1: Section III has been improved but missed the suggested reference as an input to a cohesive optimisation strategy suitable for dynamic, resource-constrained, latency-sensitive environments in heterogeneous multi-agent fog systems. The authors should also cite relevant work, such as [https://onlinelibrary.wiley.com/doi/full/10.1155/2017/2363240], to ground their approach in existing literature, especially in Spine-leaf architectures.
Response: We sincerely apologize for the oversight in omitting the suggested reference. This occurred as the manuscript was being edited concurrently from two sides, which led to the unintended omission. We fully acknowledge that the referenced work is foundational and offers important insights into cohesive optimization strategies in dynamic, resource-constrained, and latency-sensitive environments, particularly in the context of heterogeneous multi-agent fog systems and spine-leaf architectures. Accordingly, we have now incorporated the reference [https://onlinelibrary.wiley.com/doi/full/10.1155/2017/2363240]
Reviewer 2 Report
Comments and Suggestions for AuthorsConsidering my previous review concerns, the following should be addressed more clearly:
The described C3 optimization constraint only considers the maximum capacity, not the contention dynamics (e.g., queues or priorities), so the scheduling model is still generic in proposing a more realistic contention modeling. It would be appropriate to provide a detailed analysis of delays from specific queues or scheduling processes.
Section 6 presents an evaluation of multiple methods (MAPPO, MASAC, MAIDDPG, MATD3), however, no ablation study on key hyperparameters such as ωₜ/ωâ‚‘ has been presented, nor an explicit variation of loads (λ, µ) in multiple scenarios. Table 5 mentions loads, but lacks graphs or comparative analyses made under variable traffic conditions.
Author Response
Comment 1: Considering my previous review concerns, the following should be addressed more clearly: The described C3 optimization constraint only considers the maximum capacity, not the contention dynamics (e.g., queues or priorities), so the scheduling model is still generic in proposing a more realistic contention modeling. It would be appropriate to provide a detailed analysis of delays from specific queues or scheduling processes.
Response: Thank you for the valuable comment. In this work, we adopt a simple First-Come, First-Served (FCFS) scheduling approach where each fog node accepts tasks only if buffer space is available; otherwise, the task is rejected. We do not explicitly model contention dynamics such as queue interference or task prioritization in this version of the paper. However, to partially address this challenge, fog nodes (agents) cooperate with neighboring nodes through an advertisement mechanism, as shown in Table 4. This enables nodes to offload tasks horizontally when local resources are constrained.
We did not address contention dynamics in the current manuscript, as doing so would require significant changes to the overall framework and analysis. A more detailed investigation incorporating contention dynamics, queue delays, and advanced scheduling policies is planned for future work.
Comment 2: Section 6 presents an evaluation of multiple methods (MAPPO, MASAC, MAIDDPG, MATD3), however, no ablation study on key hyperparameters such as ωₜ/ωâ‚‘ has been presented, nor an explicit variation of loads (λ, µ) in multiple scenarios. Table 5 mentions loads, but lacks graphs or comparative analyses made under variable traffic conditions.
Response: Thank you for the valuable comment. We appreciate the suggestion to include a more detailed ablation study. Based on our evaluation, the proposed MAFCPTORA algorithm demonstrated the best performance when integrated with the MATD3 framework, outperforming MAPPO, MASAC, and MAIDDPG. Accordingly, we conducted an ablation study on the MATD3 configuration by varying the key weight coefficient hyperparameters (ωₜ and ωâ‚‘) and analyzed their impact on average reward, latency, and energy consumption.
In line with the context of fog nodes, where latency is often more critical than energy consumption due to presumed energy availability. We prioritize latency in our trade-off analysis. We have also updated the manuscript to reflect these results, including a new figure illustrating the effects of varying ωₜ/ω.
Reviewer 3 Report
Comments and Suggestions for Authors- Ref Comment 1, the author's answer is too far-fetched. The optimization problem in the paper is obviously only aimed at the current moment and does not consider the impact on subsequent moments. The energy consumption and latency are truly influenced by the system's evolving state in time steps. But in this paper, authors only optimizes a single moment, not the the system's whole evolving state in time steps. So this is a simple optimization problem, not a MDP.
- Ref Comment 2, if the algorithm is deployed independently at fog node, how to determine the offloading strategy after a user generates a task? Is it necessary to first send the request to the fog node and then return the decision to the user? This will cause a delay that cannot be ignored. So, I do not think this is a good idea.
- Ref Comment 5, most of the references cited by the author are from two years ago, and the proportion of the latest references is too low. Besides, authors did not read or cite the suggested literature either.
- Ref Comment 7, the author's reply is obviously full of loopholes. Since the signal-to-noise ratio is related to time slots, the transmission rate must also be related to time slots. Thus, a series of indicators such as transmission delay and optimization objectives are all related to time slots. But the time slot only appears in Eq.3. This reply is so ridiculous.
Author Response
Ref Comment 1, the author's answer is too far-fetched. The optimization problem in the paper is obviously only aimed at the current moment and does not consider the impact on subsequent moments. The energy consumption and latency are truly influenced by the system's evolving state in time steps. But in this paper, the authors only optimize a single moment, not the system's whole evolving state in time steps. So this is a simple optimization problem, not a MDP.
Response: Thank you for your insightful comment. We understand your concern regarding the temporal nature of decision-making in fog environments and agree that our initial formulation did not explicitly model the evolving state across time steps. In the revised manuscript, we have reformulated the optimization problem as a Markov Decision Process (MDP), where the system evolves over discrete time steps. The statenow captures the current task queue, resource availability, and channel conditions; the action represents the offloading decisions; and the reward reflects both energy consumption and latency over time.
This MDP formulation allows the learning agent to optimize long-term cumulative rewards instead of immediate performance, aligning better with the dynamic nature of fog computing environments. All relevant updates are reflected on the updated manuscripts in Section III from pages 6 - 13. We sincerely appreciate your comment, which helped improve the rigor of our formulation.
Ref Comment 2: if the algorithm is deployed independently at fog node, how to determine the offloading strategy after a user generates a task? Is it necessary to first send the request to the fog node and then return the decision to the user? This will cause a delay that cannot be ignored. So, I do not think this is a good idea.
Response: Thank you for your insightful comment. In our proposed architecture, we do not assume a three-way handshake between the user device and the fog node. Rather, we consider a scenario in which user devices are computationally constrained and unable to meet the task's QoS requirements. Therefore, tasks are offloaded directly to the nearest fog node Fm​ by default.
The autonomous decision-making process for further offloading is handled entirely within the fog layer. Once the task reaches Fm​, the fog node evaluates its own load and local conditions to decide whether to execute the task locally or offload it to a neighboring fog node (fog-to-fog offloading). This design avoids additional communication overhead and delay between the user and the fog layer, as the user is not involved in the offloading decision beyond the initial handoff.
Ref Comment 5: Most of the references cited by the author are from two years ago, and the proportion of the latest references is too low. Besides, the authors did not read or cite the suggested literature either.
Response: We sincerely apologize for the oversight in not including the suggested and recent relevant references. During the revision process, the manuscript was edited collaboratively from multiple ends, which unfortunately led to a communication gap and some inconsistencies in the final submission. We truly appreciate the reviewer’s feedback and have now carefully reviewed and incorporated the suggested references, along with several other recent and relevant works published within the last two years, to strengthen the quality and currency of the manuscript.
Ref Comment 7: The author's reply is obviously full of loopholes. Since the signal-to-noise ratio is related to time slots, the transmission rate must also be related to time slots. Thus, a series of indicators, such as transmission delay and optimization objectives, are all related to time slots. But the time slot only appears in Eq.3. This reply is so ridiculous.
Response: Thank you for your insightful and critical comment. We acknowledge the reviewer’s observation that the signal-to-noise ratio (SNR), transmission rate, transmission delay, and optimization objectives should be time slot–dependent, given the dynamics in the system.
In response, we have revised the manuscript accordingly, the signal-to-noise ratio and the corresponding transmission rate are now explicitly modeled as functions of the time slot t, and this is reflected in the updated formulation in the Equations. Consequently, transmission delay and the optimization objective have also been modified to incorporate time slot dependencies, as shown in Equations 12 and 24. These changes are also discussed in Section III, where we describe the system model and the optimization formulation. We believe this revision addresses the concerns raised and improves the clarity and rigor of the model.
Round 3
Reviewer 3 Report
Comments and Suggestions for AuthorsNone