1. Introduction
With the rapid development of the Industrial Internet of Things (IIoT) and embodied intelligence technologies, intelligent warehousing systems are evolving from conventional equipment automation toward the integration of perception, computation, decision making, and execution [
1]. In this context, automated guided vehicles (AGVs) have become key mobile execution units that connect physical warehouse operations with information-driven decision-making systems [
2,
3]. Unlike conventional AGVs that mainly undertake material-handling tasks, AGVs in intelligent warehouses are generally equipped with cameras, radio-frequency identification (RFID) readers, and various onboard sensors. During inbound, outbound, inventory checking, inspection, and service-area operations, they can continuously collect information on cargo status, inventory state, traffic conditions, and equipment operating states, and further generate computation tasks such as cargo-status recognition, inventory-state monitoring, obstacle detection, anomaly identification, and operation control. These tasks are typically characterized by large data volumes, computation-intensive workloads, and stringent latency requirements. If the processing results cannot be returned in time, warehouse operation efficiency may degrade, task execution may be blocked, and operational safety may even be threatened. Therefore, an AGV-assisted intelligent warehousing system not only requires efficient mobility scheduling and task allocation mechanisms, but also needs a communication and computing architecture capable of supporting near-real-time data processing.
However, constrained by onboard computing capability, battery capacity, and vehicle-mounted resources, AGVs can hardly satisfy the real-time requirements of complex perception and control tasks through purely local computation. Offloading all tasks to remote cloud servers may also incur excessive transmission latency, backhaul burden, and service-latency fluctuations. By deploying computing resources close to the wireless access side, mobile edge computing (MEC) can alleviate the tension between limited terminal resources and long-distance cloud transmission, while cloud servers can still provide strong processing capability for computation-intensive or less latency-sensitive tasks [
4]. Therefore, cloud-edge-end collaborative computing offers a feasible paradigm for AGV task processing in intelligent warehousing [
5].
Nevertheless, in multi-AGV warehousing scenarios, task offloading decisions are tightly coupled with AGV service-point selection. On the one hand, the selected service point affects wireless channel quality, associated wireless access point (WAP)/MEC resources, movement delay, and mobility energy consumption, and may further cause service-point conflicts among multiple AGVs. On the other hand, the offloading mode in each time slot directly determines task upload latency, local/edge/cloud computation latency, and AGV-side energy consumption. Since AGV service-point migration usually occurs at a slower operational-stage time scale, whereas task generation and offloading decisions are made at a faster slot-level time scale, separate optimization may lead to myopic or energy-inefficient strategies. Therefore, for multi-AGV intelligent warehousing, it is necessary to jointly optimize slow-time-scale service-point migration and fast-time-scale cloud-edge-end task offloading, so as to reduce long-term accumulated system latency while satisfying task latency constraints and AGV energy budgets.
Existing studies have investigated AGV path planning, cloud-edge-end task offloading, MEC scheduling, and learning-based resource optimization from different perspectives. However, most of them focus on mobility scheduling, computation offloading, or resource allocation separately. The coupling between stage-wise AGV service-point migration and slot-level task offloading in multi-AGV intelligent warehousing has not been sufficiently investigated. A detailed discussion of related studies is provided in
Section 2.
Motivated by the above limitations, this paper investigates the dual-time-scale joint optimization of AGV service-point migration and task offloading under a cloud-edge-end collaborative computing architecture for multi-AGV intelligent warehousing in embodied-intelligence IIoT. Unlike existing studies that mainly focus on fixed-route offloading or single-time-scale resource scheduling, this work jointly considers the coupling among stage-wise AGV service-point migration, wireless access-state variations, edge/cloud computing resources, local computing capability, and AGV energy constraints. The objective is to minimize the long-term accumulated system delay by jointly optimizing slow-time-scale service-point migration and fast-time-scale multi-type task offloading. The main contributions are summarized as follows:
A cloud-edge-end collaborative computing model is established for multi-AGV intelligent warehousing, and a dual-time-scale joint optimization problem of service-point migration and task offloading is formulated. The model captures stage-wise AGV migration among candidate service points, task generation during service periods, wireless transmission, local/edge/cloud computation, and AGV-side energy consumption, including movement, uplink transmission, and local computing energy. At the slow time scale, each AGV selects its next service point under candidate-point, maximum movement distance, and multi-AGV service-conflict constraints. At the fast time scale, each AGV makes offloading decisions among local, edge, and cloud execution according to task attributes, communication states, MEC resources, and residual energy. The problem minimizes the long-term accumulated system delay subject to task latency and AGV energy-budget constraints, thereby characterizing the coupling between mobility decisions and computation offloading in multi-AGV intelligent warehousing.
A DPSO-MAPPO dual-time-scale solution algorithm is proposed. For the discrete combinatorial structure of service-point selection at the slow time scale, discrete particle swarm optimization (DPSO) is employed to search feasible migration plans, with feasibility correction for movement-distance and service-conflict constraints. For fast-time-scale multi-AGV cooperative offloading, multi-agent proximal policy optimization (MAPPO) is adopted with a centralized-training and decentralized-execution mechanism to learn offloading policies under decentralized observations. By feeding the delay and energy information obtained from MAPPO into the DPSO fitness evaluation, the proposed algorithm realizes coordinated optimization between service-point planning and task offloading.
Extensive numerical simulations are conducted to verify the effectiveness of the proposed DPSO-MAPPO algorithm. The evaluation covers convergence behavior, system scale, time-scale parameter, MAPPO training parameters, DPSO population size, number of AGVs, number of candidate service points, and task data size, with comparisons against Random + MAPPO, DPSO + Dueling DQN, and Random + Greedy benchmarks. Results show that the proposed method converges stably and generates effective AGV service-point migration trajectories and cloud-edge-end offloading strategies. In a typical setting, it reduces system delay by 13.55% over the benchmarks and achieves better total energy consumption and energy-violation control.
The remainder of this paper is organized as follows.
Section 2 reviews related studies on AGV scheduling, MEC scheduling, cloud-edge-end task offloading, and learning-based optimization methods.
Section 3 presents the system model and problem formulation.
Section 4 introduces the proposed DPSO–MAPPO dual-timescale joint solution algorithm.
Section 5 provides numerical simulations and analyzes the convergence performance and optimization effectiveness of the proposed algorithm. Finally,
Section 6 concludes this paper.
3. System Model and Problem Formulation
3.1. System Architecture
This paper considers an embodied-intelligence industrial Internet of Things (IIoT) intelligent warehousing scenario and constructs a cloud–edge–end collaborative computing framework, as shown in
Figure 1. The system consists of multiple AGVs that execute environmental perception and operation-control tasks, multiple MEC servers providing edge computing resources, and a cloud server [
23,
24]. Each AGV is equipped with cameras and other onboard sensors to continuously acquire environmental information and operational states during warehouse operations, and generates corresponding computation tasks, such as cargo-status perception, inventory-status monitoring, and anomaly detection in service areas [
25]. Since such tasks are typically featured by large data volumes, computation-intensive workloads, and stringent delay requirements, relying solely on cloud processing may incur substantial data-return overhead and unstable latency. Therefore, MEC servers are deployed at the access side to provide near-end computing support. Let the set of AGVs be
, the set of MEC servers be
, and denote the cloud server by
. AGVs access the network via wireless access points (WAPs), and task data can be offloaded through WAPs to either an edge-side MEC server or the cloud server for processing.
To capture the different decision frequencies between the stage-wise service-point migration of AGVs and the slot-level task processing, we adopt a two-time-scale modeling approach. Specifically, the system operation is divided into discrete small time-scale slots, with the slot set denoted by and each slot having duration . Meanwhile, every H consecutive small time-scale slots form one large time-scale decision interval, i.e., the n-th large time-scale decision is triggered when , where . At the large time scale, each AGV determines its next service point; between two adjacent large time-scale decision instants, the AGV stays within the current service area to process tasks, and in each small time-scale slot it generates tasks and makes the corresponding offloading decisions. Let the set of task types be , where k is the task-type index. The task of type k generated by AGV i in slot t is denoted by . For each task type k, represents a slot-level aggregated computation task generated by AGV i in slot t, with time-varying workload attributes. The task attributes are denoted by , where is the input data size, is the required CPU cycles per bit, and is the maximum tolerable delay for tasks of type k.
3.2. Mobility Model
We consider AGVs performing embodied-intelligence operations in a warehouse environment. Each AGV migrates between candidate service points in a stage-wise manner, and after arriving at a target service point, it stays for
H consecutive small time-scale slots to complete task generation and processing within the corresponding service stage [
26]. Let the service-point position of AGV
i at the
n-th large time-scale decision interval be
, and let
denote its set of candidate service points, such that
. Considering that multiple AGVs coexist in the system, selecting the same service point may lead to operation conflicts. Therefore, we impose that the service-point sets of different AGVs do not overlap, i.e.,
, where
denotes the service-point set of AGV
i over the operation horizon.
Accordingly, when AGV
i moves from the current service point
to the next service point
with speed
, the moving latency is defined as
Furthermore, we adopt a constant-traction power model to characterize the mobility energy consumption of AGV
i, and the corresponding moving energy consumption is expressed as
where
is the constant moving power of AGV
i. We assume a sequential service process in which task generation, uploading, and computation are performed only after the AGV arrives at the selected service point; therefore, no task data upload is carried out during migration.
3.3. Communication Model
As discussed above, each AGV performs task uploading and processing only at service points; therefore, the communication process at the small time scale takes place within the current service stage. For task , if it is executed locally, at the edge, or at the cloud, the corresponding offloading decision variables are denoted by , , and , respectively.
In a two-dimensional Cartesian coordinate system, we assume that in each time slot the AGV associates with the nearest wireless access point (WAP) [
27]. Let the coordinates of the WAP associated with AGV
i in time slot
t be
. Considering distance-dependent propagation and path loss, the wireless channel gain of AGV
i is given by
where
X is the channel-gain constant at a reference distance of 1 m. Accordingly, the uplink transmission rate of AGV
i in time slot
t can be written as
where
B is the uplink bandwidth allocated to the AGV,
is the transmit power of AGV
i, and
is the receiver noise power.
When task
is offloaded for edge execution, the wireless uplink transmission latency is
When task
is offloaded for cloud execution, the task data is first uploaded to the associated WAP via the wireless link and then forwarded to the cloud server through the backhaul link, and thus the uplink transmission latency is expressed as
where
denotes the equivalent backhaul transmission rate from the WAP associated with AGV
i to the cloud server.
Accordingly, the wireless uplink energy consumption at the AGV side can be written as
Since the size of the returned result is typically much smaller than that of the input data, we neglect the transmission latency and reception energy consumption in the result-return stage.
3.4. Computation Model
Due to the limited computing capability at the AGV side, task
can be executed locally, at an edge-side MEC server, or at the cloud. When the task is executed locally on AGV
i, the corresponding computation latency is
where
denotes the computing frequency of AGV
i.
When the task is offloaded for edge execution, we assume that it is completed at the MEC server currently associated with the AGV, and the corresponding edge computation latency is
where
denotes the computing frequency of the MEC server associated with AGV
i in time slot
t.
When the task is offloaded for cloud execution, the cloud computation latency is given by
Here,
denotes the computing frequency of the cloud server.
For the energy model, we consider only the local computation energy consumption at the AGV side. The computation energy consumption of task
when executed locally can be written as
where
is the energy coefficient of AGV
i. Since the computation energy consumption at the edge and cloud is supplied by the infrastructure, it is not included in the AGV energy budget in this paper.
3.5. Latency and Energy Consumption Model
The uplink transmission latency of task
consists of the edge-uplink latency and the cloud-uplink latency, i.e.,
Correspondingly, the computation latency of task
consists of local computation, edge computation, and cloud computation, i.e.,
Since task uploading and computation are performed after the AGV arrives at the current service point, the end-to-end processing latency of task
at the small time scale can be expressed as
Furthermore, considering the combined impact of the small time-scale task-processing latency and the large time-scale service-point migration latency over the entire operation horizon, the long-term accumulated system latency is defined as
where
N denotes the number of large time-scale decision intervals in the operation horizon, and
denotes the latency weight of task type
k.
For energy consumption, we consider both the mobility energy at the large time scale and the task uplink energy and local computation energy at the small time scale [
28]. Accordingly, the total energy consumption of AGV
i over the operation horizon can be written as
3.6. Dual-Time-Scale Joint Optimization Problem
This subsection formulates the dual-time-scale joint decision problem in the intelligent warehousing scenario. The objective is to minimize the long-term accumulated system latency while satisfying the energy budget constraint of each AGV. At the small time scale, we optimize the task-offloading decision variables
; at the large time scale, we optimize the service-point migration decision variables
. Accordingly, the dual-time-scale joint optimization problem of interest can be expressed as
Here, constraint (17b) ensures that each task chooses one among local, edge, and cloud execution; constraint (17c) is the task latency constraint; constraints (17d)–(17f) characterize the feasibility of service points, the feasibility of stage-wise migration, and the operation-conflict avoidance among multiple AGVs, respectively; constraint (17g) is the energy budget constraint of each AGV; and constraint (17h) specifies the domains of the offloading decision variables.
4. DPSO–MAPPO Dual-Time-Scale Joint Solution Algorithm
Problem
involves both the large time-scale service-point migration decisions of AGVs and the small time-scale task offloading decisions [
19,
21]. The former affects the wireless access conditions, moving latency, and mobility energy consumption during the subsequent service period, while the latter determines the execution mode of each task, i.e., local, edge, or cloud execution. Since these two types of decisions differ in time scale and variable type and are coupled through the system latency objective and the energy constraints, directly solving
is challenging.
To address this issue, we develop a DPSO–MAPPO dual-time-scale joint solution framework, as illustrated in
Figure 2. At the large time scale, the service-point selection is a discrete decision problem over a finite set of candidate service points, which is solved by a discrete particle swarm optimization (DPSO) algorithm. At the small time scale, the task offloading of multiple AGVs is modeled as a multi-agent decision-making problem, and MAPPO is adopted to learn the offloading policy. Specifically, the outer-layer DPSO generates service-point migration plans, while the inner-layer MAPPO performs task offloading under a given loitering plan. The accumulated latency and the satisfaction of energy constraints are then fed back to the outer layer as evaluation criteria.
4.1. Large-Time-Scale Service-Point Migration Optimization via DPSO
At the large time scale, each AGV selects its next service point from a candidate service-point set. Since
is a discrete selection variable, we employ an integer-encoded discrete particle swarm optimization (DPSO) algorithm to solve the service-point migration subproblem [
19]. Let the DPSO population size be
D. The position of the
d-th particle in the
r-th search iteration is denoted by
, where
indicates that AGV
i selects the
-th service point in
. Accordingly, the next-loitering plan corresponding to particle
d can be expressed as
To evaluate the quality of different service-point plans, we adopt the system accumulated latency under a given plan as the primary fitness metric, and impose penalties for violations of the energy constraints. The fitness function of particle
d is defined as
where
denotes the system accumulated latency obtained by the small-time-scale offloading policy under the loitering plan represented by the current particle, and
is the penalty coefficient for energy-budget violations. A smaller fitness value indicates a better service-point plan in terms of latency and energy feasibility.
In the
r-th search iteration, let the velocity of particle
d be
, and let its personal best position and the global best position be
and
, respectively. The particle velocity is updated as
where
is the inertia weight,
and
are learning factors, and
and
are random numbers in
. Since the service-point selection is discrete, the particle position is updated with an integer projection as
where
denotes the rounding operation, and
projects the updated index onto the admissible index range of candidate service points in
.
To ensure that the service-point plan represented by each particle satisfies the mobility constraints and service-point conflict constraints, we perform feasibility corrections after each position update. Specifically, if , it is remapped to a feasible candidate point in ; if or , the next service point is reselected from candidates satisfying the migration-distance constraint; if multiple AGVs select the same service point, the selection with a better fitness contribution is retained, while the remaining conflicting AGVs are reassigned to feasible service points. After the above updates and corrections, DPSO yields a large-time-scale service-point migration plan, which provides the location and access-state inputs for the subsequent small-time-scale task-offloading optimization.
4.2. Small-Time-Scale Task Offloading Optimization via MAPPO
Given a large-time-scale loitering plan
, each AGV makes offloading decisions during the service period according to the current tasks, communication conditions, available edge computing resources, and its remaining energy budget. We model each AGV as an agent and formulate the small-time-scale task offloading process as a multi-agent MDP, denoted by
, where
,
,
r, and
represent the state space, action space, reward function, and discount factor, respectively [
20,
21].
At time slot
t, the local state of AGV
i is defined as
where
denotes the remaining energy budget of AGV
i at time slot
t. The global state is the aggregation of all local states, i.e.,
The action of AGV
i at time slot
t represents its offloading choice for each task type, i.e.,
For any task
, the action satisfies
. The joint action is given by
To guide the agents to reduce task processing latency while satisfying the constraints, the instantaneous reward at the small time scale is defined as
where
and
are penalty coefficients, and
denotes the accumulated energy consumption of AGV
i up to time slot
t. The state transition is jointly determined by task generation, channel variations, offloading actions, and energy consumption.
To solve the above multi-agent MDP, we adopt MAPPO with centralized training and decentralized execution [
29]. During training, the critic network evaluates the state value based on the global state
, while each actor network outputs an offloading action based on the corresponding local state
. During execution, each AGV makes its offloading decision independently according to its own local observation.
Let the actor and critic network parameters be
and
, respectively. Under the old policy
and the current policy
, the policy probability ratio for AGV
i at time slot
t is defined as
Based on sampled trajectories, the discounted return at time slot
t is defined as
Accordingly, the advantage estimate is
The MAPPO actor networks are updated using the PPO clipped objective [
30]. For AGV
i, the actor optimization objective is
where
is the clipping coefficient. All AGV actor networks update their policies according to (30). The critic network is updated by minimizing the value estimation error, with the loss function given by
In each training iteration, all AGVs interact with the environment under the current policies and collect into an experience buffer; then, the actor and critic networks are updated using mini-batch samples. After training, MAPPO yields the small-time-scale task offloading policy, which provides task-processing decisions under a given loitering plan.
4.3. DPSO–MAPPO Joint Solving Procedure
To jointly solve the large-time-scale service-point migration decisions and the small-time-scale task offloading decisions, we develop a DPSO–MAPPO joint solving framework. DPSO searches for candidate service-point migration plans at the large time scale; given a loitering plan, MAPPO performs multi-AGV task offloading decisions at the small time scale, and feeds back the resulting system cumulative latency and energy-budget satisfaction to DPSO for particle fitness evaluation.
Specifically, within the n-th large-time-scale decision interval, DPSO first initializes particle positions and velocities, and maps each particle to a candidate loitering plan. To ensure the comparability of particle fitness values, each particle is evaluated via simulation starting from the same system state at the current large-time-scale decision instant. For a given particle, the system determines the current AGV locations, wireless access conditions, and edge computing conditions according to the corresponding loitering plan. Then, MAPPO executes small-time-scale offloading decisions during the service period using the current policy, collects per-slot offloading actions, rewards, and state-transition samples, and computes the system cumulative latency and energy consumption associated with this particle. DPSO then calculates the particle fitness according to (19), updates the personal-best and global-best positions, and generates new candidate loitering plans via position updating and feasibility repair.
It should be noted that MAPPO is not retrained or updated during each particle evaluation. During the fitness evaluation of all DPSO particles within the same large-time-scale decision interval, the MAPPO policy parameters are kept fixed, so that different candidate service-point plans are evaluated under the same offloading policy. Instead, MAPPO interacts at the small time scale using the fixed current policy and stores the generated samples into an experience buffer. After completing the DPSO search under the current large-time-scale decision interval, the actor and critic networks are updated using the collected samples.The above procedure is repeated across large-time-scale decision intervals until the training termination condition is satisfied. Finally, DPSO outputs the service-point migration plan, and MAPPO outputs the corresponding task offloading policy. Algorithm 1 summarizes the overall DPSO–MAPPO joint solving procedure.
4.4. Computational Complexity and Deployment Discussion
Let
denote the number of AGVs,
denote the number of task types,
D denote the DPSO population size,
denote the maximum number of DPSO iterations, and
H denote the number of small time-scale slots within each large time-scale decision interval. In the proposed DPSO–MAPPO framework, the main computational cost comes from the outer-layer DPSO particle evaluation and the inner-layer MAPPO offloading decision. For each large time-scale interval, DPSO evaluates
D particles over
iterations, and each particle evaluation involves MAPPO-based offloading decisions for
N AGVs over
H small time-scale slots. Therefore, the dominant online evaluation complexity can be expressed as
where
denotes the computational cost of one actor-network forward inference. Since the MAPPO actor network is a lightweight multilayer perceptron, the online inference cost is much lower than the offline training cost.
To evaluate the online execution efficiency, we measured the CPU-only inference latency of the trained MAPPO policy on the simulation platform. The actor network contains two hidden layers with 64 neurons each, and each AGV actor has 5705 trainable parameters. The measurement was conducted with 100 warm-up runs and 5000 repeated runs. The results are summarized in
Table 1.
| Algorithm 1: DPSO–MAPPO Dual-Timescale Joint Optimization Algorithm. |
![Sensors 26 03936 i001 Sensors 26 03936 i001]() |
As shown in
Table 1, the average latency for generating the joint offloading action of all AGVs is 0.2392 ms, and the average end-to-end online decision latency is 1.3793 ms. These values are much smaller than the duration of a small time-scale slot in the considered warehouse scheduling scenario. Therefore, the trained MAPPO policy can support real-time offloading decisions. In practical deployment, MAPPO training can be completed offline, while online execution only requires lightweight actor inference. The DPSO-based service-point optimization is performed at the larger time scale and can be executed by an edge scheduler or warehouse control server. Thus, the proposed DPSO–MAPPO framework is suitable for hierarchical deployment in intelligent warehousing systems.
6. Conclusions
This paper studies a dual-time-scale joint optimization of AGV service-point migration and task offloading in a multi-AGV intelligent warehousing scenario under a cloud–edge–device collaborative computing architecture. A system model integrating AGV migration, wireless transmission, and local/edge/cloud computing is established, and the objective is to minimize the long-term accumulated system delay subject to task latency, AGV energy-budget, and service-point conflict constraints. To solve the resulting mixed discrete decision-making problem, a DPSO–MAPPO joint algorithm is proposed, where DPSO searches service-point migration plans on the slow time scale and MAPPO learns coordinated multi-AGV offloading on the fast time scale. Simulation results show that the proposed method generates effective migration trajectories and offloading strategies, reducing the system delay by 13.55% in a typical setting while improving total energy consumption and energy-violation control.
The results indicate that the proposed dual-time-scale design can effectively coordinate service-point migration and task offloading, thereby improving both delay performance and energy-constraint control. However, several issues remain for future work. First, more flexible time-indexed service-point sharing and dynamic collision avoidance can be considered to better support high-demand warehouse areas. Second, the communication and computation models can be extended by incorporating multi-user interference, dynamic bandwidth allocation, MEC queue evolution, and fine-grained CPU scheduling. Finally, larger-scale simulations and digital-twin or real-world AGV testbeds can be further used to validate the engineering applicability of the proposed framework.