1. Introduction
Wireless sensor networks (WSNs) and Internet of Things (IoT) sensing systems have become the data-generation front end of modern digital infrastructures [
1]. They support a wide range of applications, including environmental monitoring, industrial inspection, intelligent transportation, emergency response, and infrastructure health surveillance. In these applications, large numbers of energy-constrained sensor nodes continuously generate heterogeneous data streams, such as scalar measurements, images, vibration signals, and event-triggered alarms. Although cloud computing provides powerful processing capability, directly uploading all sensing data to remote cloud servers often leads to excessive latency, unstable quality of service, and unnecessary energy expenditure. Mobile edge computing (MEC) offers an effective alternative by moving computation closer to sensing devices and enabling delay-sensitive analytics near the network edge [
2].
Nevertheless, directly applying conventional MEC formulations to sensor networks is insufficient, as they fail to capture the sensing-driven characteristics, energy constraints, and bidirectional communication requirements inherent in practical WSNs. Conventional MEC studies mainly focus on communication-oriented devices, while sensor nodes are fundamentally constrained by sensing tasks, residual battery energy, and mission-oriented reliability requirements [
3]. Practical sensing systems also involve not only uplink transmission of raw or preprocessed measurements but also downlink dissemination of control commands, model parameters, decision feedback, and reconfiguration instructions [
4]. Ignoring downlink overhead may underestimate end-to-end delay and obscure the resource competition between communication and computation. In addition, dense sensor deployments intensify the competition for uplink bandwidth, transmission power, and edge-computing resources under rapidly changing channel conditions. These factors make centralized optimization difficult to scale in practical sensing environments [
5].
User-centric or cell-free access architectures provide an attractive foundation for edge-assisted sensor systems [
6]. By allowing sensor nodes to be flexibly served by cooperative access points (APs) rather than a single fixed cell, such architectures can improve coverage reliability and alleviate service degradation in edge regions [
7]. They also provide favorable support for distributed edge processing in dense wireless sensing environments [
8]. However, integrating sensor-aware traffic, bidirectional transmission, and dynamic edge coordination into such architectures introduces a challenging long-term joint optimization problem. The decision process must determine whether a sensing task should be processed locally, partially preprocessed, then offloaded, or fully offloaded. It must also determine how much transmission power should be used and how edge resources should be allocated across competing sensors, while preserving network lifetime and sensing fidelity.
To address these issues, this paper substantially reformulates the original generic MEC problem into a sensor-aware optimization framework for edge-assisted WSNs. Instead of treating devices merely as communication users, we explicitly model them as sensing agents that produce heterogeneous data with different urgency and mission value. We further incorporate bidirectional sensing-related traffic, where uplink channels carry sensing data and downlink channels support edge feedback and control signaling. Building on this formulation, we develop a joint optimization method that combines multi-agent reinforcement learning (MARL) for distributed per-node decision-making [
9]. We further employ proximal policy optimization with clipping to stabilize distributed policy learning [
10]. At the network side, the alternating direction method of multipliers (ADMM) is used for centralized edge-resource coordination under shared capacity constraints, following the idea of coordinated optimization in user-centric MEC systems [
11].
Although energy-aware computation offloading for IoT sensors and WSN-assisted MEC has been studied in prior works [
12,
13], most existing studies mainly focus on energy-efficient task execution under specific cloudlet, SWIPT, or MEC settings. It should also be noted that power efficiency in WSNs is influenced by multiple factors. For example, node mobility may introduce additional routing overhead and congestion, which affects energy efficiency [
14]. Communication strategies such as gossiping-based routing can help reduce redundant transmissions and improve energy usage [
15]. In addition, data management mechanisms, including energy-efficient cache placement, also play an important role in reducing transmission cost [
16]. In contrast, this paper jointly considers heterogeneous sensing-task priority, bidirectional sensing feedback, residual-energy constraints, packet-loss penalty, uplink power control, and coordinated edge-resource allocation.
This work develops a sensor-aware joint optimization framework for edge-assisted WSNs by incorporating sensing-task priority, bidirectional feedback, residual-energy constraints, packet-loss penalty, and edge-resource competition into the MEC offloading formulation. The resulting mixed discrete–continuous decision problem is addressed through a MARL–ADMM structure, where PPO-Clip enables sensor nodes to learn distributed offloading and power-control policies, and ADMM ensures feasible edge-resource allocation under shared AP-capacity constraints.
The main contributions of this work are summarized as follows.
We redesign the original MEC framework as a sensor-aware three-layer architecture for edge-assisted WSNs, where sensor nodes, AP-side edge servers, and a cloud coordination layer jointly support sensing-data processing, transmission, and feedback.
We reformulate task offloading and power control as a joint sensing–communication–computing optimization problem. Unlike conventional MEC studies that treat devices as generic communication users, our formulation explicitly captures sensing-task priority, bidirectional traffic, residual-energy dynamics, and packet-loss penalty in a unified objective. In addition to delay and energy, the objective accounts for sensing priority, packet-loss penalty, and residual-energy constraints, making the model more consistent with practical sensor networks.
We develop a sensor-aware MARL–ADMM framework for edge-assisted WSNs that is not a trivial integration of existing methods. The MARL layer learns distributed offloading and power-control policies from local observations, avoiding the curse of dimensionality of centralized optimization, while the ADMM layer provides mathematically guaranteed feasible edge-resource allocation under shared AP-capacity constraints. This principled separation of distributed learning and constrained optimization is specifically designed for energy-constrained sensor networks where centralized optimization is infeasible.
We revise the evaluation from a generic wireless setting to a sensor-network monitoring scenario and assess the proposed method using not only average delay and energy consumption, but also sensing delivery ratio and network lifetime, which are key performance indicators for practical sensing applications.
The remainder of this paper is organized as follows.
Section 2 reviews related work on MEC-enabled sensing networks, user-centric edge systems, and reinforcement learning-based offloading.
Section 3 presents the sensor-aware system model and problem formulation.
Section 4 details the MARL–ADMM solution.
Section 5 describes the experimental settings and performance evaluation.
Section 6 concludes the paper.
4. Proposed MARL–ADMM Algorithm
4.1. Multi-Agent Reinforcement Learning for Sensor Nodes
Each sensor node is modeled as an autonomous agent that observes local sensing, channel, and battery states and outputs an offloading and power-control decision. The observation of node
n at time slot
t is defined as
where
is the queue state of the serving AP and
is the historical average delay.
The action of sensor node
n is defined as
where
denotes the binary offloading decision,
denotes the uplink transmission power, and
denotes the requested edge-resource ratio.
The PPO actor network outputs a hybrid action distribution for the mixed discrete–continuous action space. Specifically, the offloading decision
is sampled from a Bernoulli distribution. For the requested edge-resource ratio, the actor network generates two positive parameters through a softplus activation, and
is sampled from a Beta distribution:
where
and
are the distribution parameters produced by the actor network. During training,
is sampled from the Beta distribution to encourage exploration. During evaluation, the deterministic mean of the distribution is used:
This design allows each sensor node to express a flexible resource demand while ensuring that the requested ratio remains within the feasible interval .
Before ADMM-based coordination, each AP first collects the offloading decisions and requested resource ratios from its associated sensor nodes. For AP
m, the raw requested edge-computing demand of sensor node
n is calculated as
When the aggregate requested demand does not exceed the AP capacity, the requests are preserved. When the aggregate requested demand exceeds the available AP capacity, the AP constructs a feasible initial allocation by proportional normalization:
The normalized allocation is then used as the feasible initialization for the subsequent ADMM-based resource-allocation procedure.
PPO-Clip is adopted to improve training stability in this multi-agent environment [
10]. For each sensor agent, both the actor and critic are implemented as two-layer multilayer perceptrons with ReLU activation functions. The actor network outputs the parameters of the hybrid action distribution, while the critic network estimates the scalar state-value function.
The reward function is constructed to reflect sensing-specific objectives:
where
is a positive reward associated with preserving residual energy and
is the indicator function. This reward explicitly discourages strategies that reduce short-term delay at the expense of sensor lifetime.
4.2. ADMM-Based Edge-Resource Coordination
Given the requested offloading loads from all agents, the cloud layer coordinates edge-computing allocations by solving
subject to the AP-capacity constraints. For AP
m, define
and
. The feasible resource-allocation set is given by
To solve the above quadratic allocation problem, we introduce an auxiliary variable
for each AP and rewrite the problem as
where
is the indicator function of the feasible set
. Using the scaled dual variable
, the ADMM updates at iteration
k are given by
where
denotes the Euclidean projection onto
. For a vector
, this projection is computed as
where
is applied element-wise. If
, then
; otherwise,
is chosen such that
A fixed penalty parameter is used for ADMM, and its value is set to
in all simulations. The primal and dual residuals are defined as
The ADMM iterations terminate when
or when the maximum number of ADMM iterations is reached. In the simulations,
and the maximum number of ADMM iterations is set to 50. The ADMM resource-allocation procedure is executed once in each decision slot after the sensor agents generate their offloading and resource-request actions.
4.3. Integrated Operation
The complete optimization procedure operates as follows:
Each sensor node senses its local state and generates an action through the PPO-based policy network.
The proposed action is projected onto the feasible region defined by power and energy constraints.
APs aggregate offloading requests and forward resource demands to the cloud coordinator.
The cloud solves the ADMM resource-allocation subproblem and returns edge-resource decisions.
Sensor nodes receive feedback, update task execution, observe rewards, and continue policy learning.
4.4. Complexity and Convergence Analysis
The proposed MARL–ADMM framework consists of two computational components: PPO-based policy learning at the sensor-agent side and ADMM-based edge-resource coordination at the cloud or AP coordination layer. This subsection discusses the computational complexity and convergence properties of these two components.
For the PPO component, the training complexity depends on the number of sensor agents, the number of sampled transitions, the number of PPO update epochs, and the size of the actor–critic networks. Let
N denote the number of sensor nodes,
B the batch size,
the number of PPO update epochs per training iteration, and
the computational cost of one forward–backward update of the actor–critic networks. The complexity of one PPO training iteration can be approximated as
If parameter sharing is adopted among homogeneous sensor agents, the number of trainable network parameters does not increase linearly with N, although the sampling and inference cost still increases with the number of active sensor nodes.
During online execution, each sensor node only performs a forward pass through the trained actor network. Let
denote the computational cost of one actor–network inference. The online policy-inference cost of all sensor nodes in one decision slot is therefore
This is substantially lighter than solving the original mixed discrete–continuous optimization problem at each sensor node and is therefore more suitable for energy-constrained WSN deployment.
For the ADMM component, the resource-allocation subproblem is executed after sensor agents generate their offloading and resource-request actions. For AP
m, let
denote the number of associated sensor nodes and
denote the maximum number of ADMM iterations. Since the edge-resource allocation problem has a quadratic objective and linear AP-capacity constraints, the main computational cost in each ADMM iteration comes from the primal update and the projection onto the feasible AP-resource set. Therefore, the complexity of ADMM coordination at AP
m can be approximated as
where the logarithmic term is associated with projection onto the box-constrained simplex. Across all APs, the total coordination complexity is
Regarding convergence, the PPO-based MARL component is a stochastic policy-gradient method applied to a non-convex multi-agent learning problem. Therefore, the global optimality of the overall long-term stochastic control problem cannot be theoretically guaranteed. Nevertheless, PPO-Clip improves empirical training stability by restricting the policy update ratio within a clipped interval, which prevents excessively large policy updates during learning.
In contrast, the ADMM-based edge-resource allocation subproblem is convex under the quadratic objective and linear AP-capacity constraints. Specifically, for a fixed set of PPO-generated resource requests, the per-slot resource-allocation problem minimizes a convex quadratic function over a closed and convex feasible set. Under standard ADMM assumptions, including convexity, closedness, and nonempty feasibility of the constraint set, the ADMM iterations converge to a primal-dual optimal solution of the per-slot resource-allocation subproblem. Therefore, although the full MARL–ADMM framework does not guarantee global optimality for the original non-convex long-term decision problem, the ADMM layer guarantees feasibility and convergence for the edge-resource coordination problem in each decision slot.
5. Results
5.1. Experimental Setup
The proposed sensor-aware MEC framework is evaluated through extensive simulations using a hybrid simulation platform. OMNeT++ is employed for network-level simulation, while Python 3.8 with PyTorch 1.13 is used to implement the proposed MARL–ADMM algorithm. The test environment utilizes an NVIDIA GeForce RTX 3080 GPU and a 12-core Intel Xeon E5-2680 v4 CPU @ 2.4 GHz with 96 GB RAM.
Table 2 summarizes the key simulation parameters derived from the system model in
Section 3.
The simulation scenario considers an edge-assisted wireless sensor network deployed in a m2 monitoring area with 3 CPUs positioned at fixed locations. Sensor nodes randomly distributed in the sensing field periodically upload sensed data and may additionally generate event-triggered sensing tasks under abnormal conditions. The average sensing-task arrival rate is set to tasks/s/node. Each task requires [100, 200] KB data transmission and [1000, 2000] cycles/bit computation density. In this way, the simulation emulates practical sensing scenarios such as environmental monitoring, industrial condition sensing, and anomaly-aware edge intelligence, where both sensing data upload and downlink feedback are involved in the task execution process.
5.2. Comparison Schemes
To evaluate the effectiveness of the proposed optimization scheme, the following benchmark methods are implemented for comparative analysis:
Clipped Objective Policy Optimization (CBO) [
17]: A clipped-policy-based learning benchmark adopted for comparison with the proposed method under similar policy-constrained optimization logic.
Trust Region Policy Optimization (TRPO) [
27]: TRPO iteratively updates a stochastic policy by maximizing a surrogate objective subject to a KL-divergence constraint, ensuring stable policy improvement.
Maximum a Posteriori Policy Optimization (MPO) [
28]: MPO optimizes policies using a probabilistic update mechanism that balances exploration and exploitation according to observed rewards.
Local Processing (LP): All sensing tasks are processed locally at the sensor nodes, thus avoiding communication overhead while suffering from limited on-device computation capability.
All Offloading (AO): All sensing tasks are offloaded to the edge/cloud side, which may reduce local processing burden but often causes transmission congestion and resource mismatch under dense deployments.
Sensing-Aware Offloading (SAO) [
5]: This scheme performs collaborative sensing-aware task offloading by jointly considering sensing-task processing and communication-computing resource allocation. It can effectively improve the efficiency of sensing-oriented task execution, but its adaptability may be limited under highly dynamic network conditions.
Multi-Agent Proximal Policy Optimization (MAPPO) [
32]: MAPPO is an on-policy multi-agent reinforcement learning method with centralized training and decentralized execution and is adopted as a representative cooperative MARL baseline for dynamic task offloading and resource allocation. The actor and critic networks use the same MLP architecture (128–64, ReLU) as the proposed method, with a learning rate of
, discount factor
, clip ratio
, GAE
, and batch size 256.
Multi-Agent Deep Deterministic Policy Gradient (MADDPG) [
33]: MADDPG is an off-policy actor–critic MARL algorithm designed for continuous multi-agent decision-making and is used to evaluate the proposed method against a deterministic policy-gradient-based baseline. The actor network uses an MLP with 128–64 hidden units (ReLU), and the critic network uses 256–128 hidden units (ReLU). The learning rate is
, with a replay buffer of size
, batch size 128, and OU exploration noise with
.
Graph Reinforcement Learning (Graph RL) [
34]: Graph RL combines graph-based representation learning with reinforcement learning to capture task or network dependencies and is selected as a topology-aware learning baseline for task offloading. It uses a GCN with two graph convolution layers (hidden dimension 64) followed by an MLP (64–32, ReLU). The learning rate is
, and the discount factor is
.
Federated Reinforcement Learning (Federated RL) [
35]: Federated RL integrates federated learning with reinforcement learning to support distributed policy learning without sharing raw local data and is used as a privacy-preserving distributed learning baseline. PPO training is distributed across sensor nodes, with FedAvg aggregation every 20 episodes and a participation ratio of
. The local PPO hyperparameters are identical to those of the proposed method.
5.3. Performance Evaluation
To evaluate the reproducibility and statistical stability of the experimental results, all compared methods were independently executed over five random seeds under the same simulation configuration. In each run, the neural-network initialization, task-arrival process, wireless channel condition, and sensor–node deployment were randomly regenerated. Therefore, the learning curves in
Figure 2,
Figure 3 and
Figure 4 represent the averaged results over independent runs rather than a single deterministic trial.
To evaluate the reproducibility and statistical stability of the experimental results, all compared methods were independently executed over five random seeds (28, 42, 1024, 2025, and 3407) under the same simulation configuration. All baselines were trained and evaluated under identical simulation settings, including the same task-arrival process, channel conditions, and sensor–node deployment, differing only in the offloading and power-control algorithm. In each run, the neural-network initialization, task-arrival process, wireless channel condition, and sensor–node deployment were randomly regenerated. Therefore, the learning curves in
Figure 2,
Figure 3 and
Figure 4 represent the averaged results over independent runs rather than a single deterministic trial.
Table 3 further reports the statistical results of delay, power consumption, and cumulative reward, including the mean value, standard deviation, 95% confidence interval, and
p-value compared with the proposed MARL–ADMM framework. The
p-values are computed using a two-tailed independent two-sample t-test that compares each baseline method against the proposed MARL–ADMM framework under the null hypothesis that the two methods have equal means. For delay and power consumption, lower values indicate better performance, while for cumulative reward, higher values are preferred.
The statistical results in
Table 3 provide quantitative evidence for the stability and reproducibility of the proposed method. The detailed performance comparisons are discussed in the following subsections according to task delay, power consumption, and cumulative reward, respectively.
In addition, to complement the theoretical convergence discussion, we measure the empirical convergence behavior of the ADMM component used for edge-resource coordination. The ADMM layer achieves convergence within iterations on average (maximum 25 iterations) across all evaluation time slots. The average computation time for ADMM coordination per time slot is 4.2 ms. Across all 50 APs, the total coordination time remains below 12 ms per slot, which is well within typical sensing-task latency deadlines (100–500 ms in the evaluated scenario).
5.3.1. Task Delay
As illustrated by the sensing-task delay comparison in
Figure 2, the MARL–ADMM framework minimizes latency while exhibiting stable training convergence. The statistical evidence underscores its capacity to buffer dynamic workloads and coordinate edge resources effectively. Crucially, this latency reduction does not come at the expense of other metrics; instead, the framework maintains a co-optimized balance across power consumption and cumulative reward.
The MARL agents adaptively determine offloading and power-control strategies according to workload variations and channel conditions, while the ADMM-based coordination layer efficiently allocates edge-side resources among APs. Consequently, the proposed framework effectively reduces queue accumulation and alleviates resource mismatch under dynamic sensing scenarios, leading to more stable and efficient delay performance.
5.3.2. Power Consumption
Figure 3 confirms that the MARL–ADMM framework yields the lowest average power consumption. This efficiency stems from its adaptive offloading and power-control policies, which eliminate superfluous transmission overhead and local computational redundancy.
The improvement is mainly attributed to the interaction between local policy learning and ADMM-based resource coordination. The MARL agents adjust their actions according to workload and channel states, while the ADMM layer mitigates resource conflicts among APs. As a result, the proposed method avoids both excessive local computation and inefficient full offloading, leading to more energy-efficient sensing-task execution.
5.3.3. Reward
Figure 4 shows that the proposed MARL–ADMM framework achieves the highest cumulative reward. Since the reward jointly reflects task delay, power consumption, and queue stability, this result demonstrates that the proposed method optimizes the overall system objective rather than a single performance metric.
The reward gain confirms the benefit of combining MARL-based local decision-making with ADMM-based global coordination. While some baselines may perform well on one metric, they usually sacrifice energy efficiency or long-term reward. In contrast, the proposed framework achieves a better balance among latency, energy consumption, and sensing-task execution efficiency.
5.4. Ablation Study
To further validate the effectiveness of each design component in the proposed sensor-aware MARL–ADMM framework, we conduct an ablation study in the context of edge-assisted wireless sensor networks. In practical sensing applications, the end-to-end performance depends not only on task offloading itself but also on coordinated resource allocation, bidirectional transmission modeling, and task-aware reward design. Therefore, we consider the following ablated variants: (1) removing the ADMM-based coordination module, (2) removing downlink transmission modeling, (3) removing the queue-aware reward term, and (4) removing the task-priority mechanism. The corresponding results are summarized in
Table 4.
Table 4 demonstrates that the complete framework consistently outperforms all ablated variants across all metrics. Notably, omitting the ADMM-based coordination module triggers the most severe performance degradation, with a 26.0% increase in delay and a 13.0% increase in power consumption. This highlights that global coordination among APs and cloud processors is indispensable in dense network deployments, where uncoordinated local policies inevitably exacerbate channel contention and resource mismatch. Without such coordination, local decisions made by agents are more likely to cause queue accumulation and resource mismatch at the edge layer.
Similarly, excluding downlink transmission modeling also leads to a noticeable increase in end-to-end delay of 19.1% and a 9.0% rise in power consumption. This result is especially important from the perspective of practical sensing applications because sensing systems usually involve not only uplink data reporting but also downlink feedback, control instructions, acknowledgment signaling, or lightweight model updates. Ignoring the downlink part causes the framework to underestimate the actual service latency of sensing tasks and may produce overly optimistic offloading decisions.
When the queue-aware reward term is removed, both delay and reward deteriorate, with a 12.8% increase in delay and a 5.7% increase in power consumption. This demonstrates that queue-state information plays a key role in stabilizing the sensing workload across APs, especially when event-triggered sensor traffic bursts occur. In addition, removing task-priority awareness also weakens the overall performance, resulting in a 7.1% delay increase. Since sensing applications often contain heterogeneous tasks with different urgency levels, such as routine environmental monitoring and emergency anomaly reporting, task-priority modeling is beneficial for protecting delay-sensitive sensing information.
Overall, the ablation study confirms that the performance improvement of the proposed framework does not arise from a single module. Instead, it is the result of the joint effect of hierarchical coordination, bidirectional sensing-communication modeling, and sensor-aware reward shaping, with ADMM-based global coordination and downlink modeling being the most critical contributors.
5.5. Sensitivity Analysis
To further examine the adaptability of the proposed framework to different wireless sensing environments, we conduct a sensitivity analysis from three aspects: the number of sensor nodes, the sensing-task arrival rate, and the reward weights. These factors are highly relevant to practical sensing scenarios, where node density, sensing frequency, and system objectives may vary significantly across application domains such as industrial monitoring, intelligent transportation, and environmental surveillance.
5.5.1. Impact of the Number of Sensor Nodes
We first investigate the impact of sensor–node density on the proposed framework.
Table 5 reports the performance under different numbers of sensor nodes.
It can be seen from
Table 5 that as the number of sensor nodes increases, the average delay and power consumption gradually rise, while the cumulative reward decreases. Notably, the tightly bounded standard deviations across scales confirm the framework’s operational consistency. Although dense deployments intensify intra-cell interference and computational contention, the resultant performance degradation remains marginal. This scalability validates the framework’s viability for high-density WSNs characterized by concurrent, large-scale data generation.
5.5.2. Impact of Task-Arrival Rate
We then evaluate the influence of the sensing-task arrival rate. In practical sensor systems, task arrivals may vary from periodic low-rate reporting to bursty event-driven uploads.
Table 6 summarizes the corresponding results.
As the task-arrival rate increases, both delay and power consumption rise, whereas the cumulative reward decreases. The low variance within each workload tier indicates that our model effectively tracks dynamic traffic changes with robust empirical consistency. This indicates that heavier sensing workloads lead to higher communication contention and more severe queue accumulation at edge servers. Nevertheless, the proposed framework still exhibits relatively stable performance under moderate and high traffic loads, which demonstrates its ability to adapt to dynamic sensing workloads. This feature is particularly important for sensing scenarios involving event-triggered reporting, where sudden bursts of sensing traffic may occur due to detected abnormalities or environmental changes.
5.5.3. Impact of Reward Weights
Finally, we analyze the sensitivity of the proposed framework to different reward weights associated with delay and energy consumption. Since different sensing applications may have different operational priorities, such as timeliness in emergency detection or energy efficiency in long-term environmental monitoring, this analysis is necessary to demonstrate the flexibility of the proposed framework. The results are shown in
Table 7.
It can be observed that when a larger weight is assigned to delay, the framework tends to adopt more aggressive offloading and transmission decisions, thereby achieving lower latency at the cost of slightly higher power consumption. The stable and small standard deviations across all resource ratios confirm that the trade-off optimization results are highly reliable and repeatable. In contrast, when energy is emphasized, the framework becomes more conservative in transmission and offloading, which reduces power usage but leads to a moderate increase in delay. This result demonstrates that the proposed sensor-aware MARL–ADMM framework can flexibly support different sensing applications with different service requirements. Therefore, the framework is applicable not only to low-latency sensor systems but also to energy-constrained monitoring scenarios requiring long-term and sustainable operation.
5.6. Discussion and Limitations
The comprehensive experimental evaluations conducted across various traffic scenarios and node scales demonstrate the strong efficacy and stability of the proposed MARL–ADMM framework. By compiling the empirical findings from both the baseline comparisons and statistical stability analyses, a clear synergistic performance paradigm emerges. While pure deep reinforcement learning baselines frequently suffer from severe resource violation and training instability under tight multi-access constraints, the coupling between local MARL inference and the edge ADMM consensus layer inherently bounds the network-wide delay and power consumption. The consistent superiority in total reward validates that introducing mathematical decomposition into multi-agent decision-making can effectively prevent chaotic resource mismatch and queue congestion under the considered simulation conditions.
From the perspective of simulation-based computational analysis, the computational footprint of this framework appears well-balanced within the evaluated scenario. On the client side, the computational overhead is extremely lightweight. Each sensor node only executes the decentralized forward inference of its local actor network, which is structurally configured as a basic multilayer perceptron (MLP). The per-step computational complexity scales as based on the hidden layer dimensions, requiring only standard matrix-vector multiplications that can be efficiently handled by low-power commercial microcontrollers within a few milliseconds with negligible power consumption. On the edge side, the computational complexity is governed by the centralized ADMM resource coordination loop. Since the dual updates are linear and the primal allocation subproblems yield analytical closed-form solutions, the coordination layer converges reliably within a deterministic maximum iteration bound , exhibiting a practical complexity of for N sensors and M access points. Hosted on dedicated edge servers with robust processing capabilities, this coordination achieves real-time scheduling responsiveness without draining the limited batteries of the sensory nodes.
To further quantify the practical feasibility, we report the measured execution time statistics under the baseline configuration. For offline training, the MARL component requires approximately 8.5 h. The training is performed on the edge server side and is a one-time cost before deployment. For online execution, the per-sensor decision time (actor–network forward pass) is measured at an average of 0.35 ms per decision slot on the sensor side, which is negligible compared with typical sensing-task processing delays. These statistics confirm that the proposed framework is computationally feasible for real-time deployment in resource-constrained edge-assisted WSNs.
Despite these advantages, several specific limitations of the current framework should be noted. First, the current model assumes relatively accurate system state information, whereas practical sensing environments inevitably involve estimation errors and dynamic reporting delays, which may lead to sub-optimal ADMM convergence. Moreover, the scalability of the MARL-based approach may be affected as the number of sensor nodes increases significantly, because the expanding joint state-action space increases the offline training time and memory footprint on the edge servers during the centralized training phase. Lastly, the framework adopts simplified sensing-task representations and does not explicitly incorporate domain-specific sensing-quality metrics such as data fidelity or information freshness.
6. Conclusions
This paper proposes a sensor-aware joint optimization framework for edge-assisted wireless sensor networks. The framework models devices as sensing nodes with heterogeneous task priorities, limited battery energy, bidirectional communication needs, and mission-oriented delivery requirements. Based on this framework, a MARL–ADMM solution is developed to jointly optimize sensing-task offloading, transmission power, and coordinated edge-resource allocation.
By embedding sensing semantics into system modeling, optimization objectives, and performance evaluation, the proposed framework provides a more realistic abstraction for intelligent sensor networks than conventional uplink-only MEC formulations. The simulation results demonstrate that the proposed method significantly reduces task delay, improves energy efficiency, and enhances the sustainability of sensing-network operation compared with several representative baselines in the evaluated scenario, indicating its potential for practical applications such as environmental monitoring, industrial sensing, intelligent inspection, and infrastructure surveillance.
Future work may incorporate explicit sensing-quality models, such as coverage fidelity, event-detection accuracy, or information freshness, and extend the current framework to mobile sensors, UAV-assisted relays, mobile sinks, and integrated sensing, communication, and computing scenarios. Although the proposed framework has been evaluated under different sensor-network scales and dynamic workload settings, the current validation remains simulation-based. Future work will further examine the proposed MARL–ADMM framework using real sensing datasets, embedded edge devices, and hardware testbeds, so as to assess the impact of practical factors such as hardware heterogeneity, protocol overhead, packet retransmission, and environmental interference.