Energy-Efficient Scheduling of Multi-Load AGVs Based on the SARSA-TTAO Algorithm

Tang, Hongtao; Wang, Hanyue; Zhan, Yan; Xu, Xuesong

doi:10.3390/su17167353

Open AccessArticle

Energy-Efficient Scheduling of Multi-Load AGVs Based on the SARSA-TTAO Algorithm

by

Hongtao Tang

^*

,

Hanyue Wang

,

Yan Zhan

and

Xuesong Xu

College of Mechanical Engineering, Zhejiang University of Technology, Hangzhou 310000, China

^*

Author to whom correspondence should be addressed.

Sustainability 2025, 17(16), 7353; https://doi.org/10.3390/su17167353

Submission received: 29 June 2025 / Revised: 4 August 2025 / Accepted: 5 August 2025 / Published: 14 August 2025

(This article belongs to the Section Energy Sustainability)

Download

Browse Figures

Versions Notes

Abstract

The Multi-load Automated Guided Vehicle (M-AGV) has emerged as a key enabling technology for intelligent and sustainable workshop logistics owing to its potential to enhance transportation efficiency and reduce system costs. To address the limitations in energy optimization caused by simplified AGV speed and payload modeling in existing scheduling models, this study develops a multi-factor coupled energy consumption model—integrating vehicle speed, travel distance, and dynamic payload—to minimize the total energy consumption of M-AGV systems. To effectively solve the model, a hybrid optimization algorithm that combines the State–Action–Reward–State–Action (SARSA) learning algorithm with the Triangulation Topology Aggregation Optimizer (TTAO), complemented by a similarity-based individual generation strategy, is designed to jointly enhance the algorithm’s exploration and exploitation capabilities. Comparative experiments were conducted across task scenarios involving three different handling task scales and three levels of M-AGV fleet heterogeneity, demonstrating that the proposed SARSA-TTAO algorithm outperforms Genetic Algorithm (GA), Particle Swarm Optimization (PSO), and the Hybrid Genetic Algorithm with Large Neighborhood Search (GA-LNS) in terms of solution accuracy and convergence performance. The study also reveals the differences between homogeneous and heterogeneous M-AGV fleets in task allocation and resource utilization under energy-optimal conditions.

Keywords:

multi-load automated guided vehicle; energy-efficient scheduling; multi-factor energy consumption modeling; SARSA; TTAO

1. Introduction

In the context of Industry 4.0 and green manufacturing, production systems are rapidly evolving toward greater flexibility and intelligence, driving the widespread adoption of Automated Guided Vehicles (AGVs) in the manufacturing sector. This has led to the gradual emergence of energy consumption in material handling systems as a critical bottleneck hindering the advancement of green manufacturing [1,2,3]. With the continuous iteration and upgrading of AGVs, their material handling flexibility has significantly improved, and their configurations have become increasingly diverse, thereby introducing new challenges for scheduling strategies and energy optimization.

In the development of AGV systems, multi-load capacity and fleet heterogeneity offer new avenues for reducing energy consumption and improving operational efficiency [4]. Compared to single-load AGV systems, M-AGVs can significantly reduce the frequency of vehicle trips, mitigate traffic conflicts, and lower system energy consumption through Task Aggregation Scheduling [5,6]. Furthermore, heterogeneous fleets, by configuring M-AGVs with varying load capacities and performance parameters, can more precisely match task requirements, thus enhancing transportation flexibility and resource utilization efficiency [4]. However, this also leads to a substantial increase in the complexity of task scheduling, placing higher demands on energy consumption modeling and algorithm design.

To this end, this study focuses on the energy-efficient scheduling problem for M-AGV systems and establishes a multi-factor coupled energy consumption model that integrates AGV speed, travel distance, and dynamic load variations. A hybrid optimization algorithm is proposed, combining the SARSA learning mechanism with the TTAO metaheuristic search strategy to enhance search capability and convergence stability. Experiments were conducted across task scenarios involving three different task scales and three levels of AGV fleet heterogeneity to evaluate the algorithm’s performance. Furthermore, the differences in energy consumption and vehicle usage between heterogeneous and homogeneous fleets under energy-optimal conditions were analyzed.

The paper is organized as follows. Section 2 provides a review of related research. Section 3 defines the energy-efficient scheduling problem for M-AGVs and develops the energy consumption model. Section 4 presents the design of the optimization algorithm. Section 5 discusses the experimental study and algorithm performance. Finally, Section 6 provides concluding remarks and outlines directions for future research.

2. Literature Review

As a major component of overall production energy consumption, the energy usage of material handling systems has a profound impact on the sustainability of production logistics [3,7]. Existing studies have shown that the energy consumption of M-AGVs is significantly influenced by various factors such as effective payload, travel speed, and distance. For example, Huo et al. validated through experimental data that for every additional 1 kg of load, the energy consumption rate of an M-AGV increases by approximately 9.2 × 10⁻⁶% per meter [8]. Gurel et al. indicated that the energy consumption of handling robots primarily depends on their operating speed, payload, and travel distance, and through case analysis, they highlighted the significant impact of speed on energy usage [9]. In addition, Briand et al. developed a time-varying function for vehicle mass and effective payload, which was incorporated into the energy consumption model, thereby improving the model’s prediction accuracy [5].

However, most existing studies adopt simplified energy consumption models, often overlooking the dynamic effects of real-time payload and vehicle operating states such as speed variations [7,10,11]. In energy optimization for homogeneous M-AGV fleets, the high consistency of vehicle parameters has led research to focus primarily on the integrated optimization of task allocation and path planning [12,13], or toward joint scheduling between workshops and AGVs [14], with the modeling of energy consumption often being relatively simplified. For example, Hang et al. proposed a bi-objective model for hybrid M-AGV scheduling, simultaneously optimizing task assignments and routing to reduce total energy consumption and equipment costs. However, the model assumes constant speed and fixed payload, thus simplifying the energy formulation process [7]. Zacharia et al. introduced a fuzzy effective load model for analyzing vehicle energy consumption; however, it still assumes constant speed, overlooking the dynamic effects of speed variation [14]. In contrast, energy optimization for heterogeneous M-AGV fleets is inherently more complex and challenging due to the diversity of vehicle performance parameters. Nevertheless, existing models in this area also tend to be overly simplified [15,16,17,18]. For example, Zhou and He addressed the sustainable material handling scheduling problem of mixed-load AGVs by optimizing task allocation across different vehicle types to reduce handling energy consumption [15]. Zhou and Zhao proposed a dual-assignment hybrid supply strategy for the material delivery problem in mixed-model assembly lines, in which the path planning and scheduling of multi-load AGVs were optimized to reduce both line-side inventory and overall system energy consumption [16]. Dang et al. addressed the heterogeneous multi-load AGV scheduling problem with battery constraints and formulated a mixed-integer linear programming model aiming to minimize delay costs and travel costs [4]. Although some models have begun to incorporate load variability, the omission of vehicle speed and routing considerations still limits the accuracy of energy predictions and the effectiveness of scheduling strategies.

Various optimization approaches have been applied to the scheduling of M-AGV material handling systems. However, two major limitations remain. First, most studies focus on homogeneous fleets, and the proposed algorithms are often inadequate for addressing the more complex scheduling requirements in heterogeneous environments. Second, the majority of methods rely on traditional heuristic and metaheuristic algorithms, which often suffer from low efficiency in exploiting evolutionary information and a tendency to fall into local optima. In existing studies, traditional heuristic rules have been widely adopted due to their low computational cost and ease of implementation. For example, Ho and Chien proposed task and delivery scheduling rules for M-AGVs and explored optimal rule combinations under various performance metrics [19]. Similarly, Ho and Liu developed multiple rules for load selection and pickup scheduling, verifying the optimal combinations through simulation experiments [20]. However, these rule-based approaches lack flexibility in complex and dynamic environments, making them less adaptable to real-time load variations. To overcome the limitations of traditional heuristics, metaheuristic algorithms—by simulating natural evolution or neighborhood exploration—have significantly improved the quality of scheduling solutions. For instance, Xu et al. integrated local search with a Genetic Algorithm to minimize the travel distance of multi-load AGVs [21]; Huo et al. employed NSGA-II to effectively solve a multi-objective scheduling model aiming to minimize both energy consumption and delay [8]; and Gao et al. introduced a large neighborhood search-enhanced Genetic Algorithm to address the green vehicle routing problem with time windows for a heterogeneous fleet [18]. Nevertheless, these metaheuristic approaches are still based on conventional single- or dual-population evolutionary frameworks, which are prone to premature convergence and limited in optimization efficiency.

Reinforcement Learning (RL), which operates based on a state–action–reward mechanism, dynamically optimizes policies to enhance scheduling efficiency and has shown broad applicability in the optimization of M-AGV scheduling problems. Among the commonly used temporal difference learning methods, SARSA is an on-policy approach that emphasizes feedback from actual actions during the policy update process, resulting in more stable convergence and making it well-suited for dynamically changing scheduling scenarios [22]. In contrast, Q-learning is an off-policy method whose updates are independent of the current behavior policy, making it more amenable to parallel training; however, it carries a higher risk of policy divergence in complex environments [23]. Additionally, although deep reinforcement learning methods such as Deep Q-Networks (DQN) possess powerful feature representation capabilities, they typically involve high training costs and require large amounts of data, which limits their feasibility for deployment in industrial settings [1]. Recent studies have explored the integration of reinforcement learning with metaheuristic algorithms to further enhance performance. For instance, Zhou et al. proposed a multi-objective Quantum-inspired Metaheuristic Archimedes Optimization Algorithm (QMQAOA), which combines Q-learning with the Archimedes Optimization Algorithm (AOA), demonstrating promising results in reducing line-side inventory and the energy consumption of heterogeneous M-AGVs [16]. However, the AOA component of this algorithm still relies on single- or dual-population evolutionary models, which are susceptible to premature convergence and local optima.

Therefore, to address the energy-efficient scheduling problem of M-AGV fleets, this study aims to minimize the total energy consumption by constructing a multi-factor energy optimization model that considers the coupled effects of vehicle speed, effective payload, and travel distance. In addition, the Triangulation Topology Aggregation Optimizer (TTAO) [24] is introduced. This algorithm integrates the geometric principles of similar triangles into the search process of metaheuristic optimization, overcoming the limitations of conventional population-based evolutionary mechanisms and significantly improving the efficiency of individual solution generation. Building upon the TTAO framework, a similarity-based learning strategy is proposed to enhance the diversity of the search space and strengthen the algorithm’s exploration capabilities. Furthermore, the SARSA learning algorithm is incorporated for policy selection, guiding the search process with a better balance between exploration and exploitation, thereby significantly improving the performance of TTAO in solving complex engineering problems.

3. Problem Description and Mathematical Modelling

The scheduling problem for homogeneous M-AGV fleets can be regarded as a special case of that for heterogeneous fleets. Therefore, this study focuses on the energy-efficient scheduling of heterogeneous M-AGV fleets in production and manufacturing environments. The core objective is to minimize the total system energy consumption by optimizing vehicle assignment, task allocation, and path planning, while satisfying vehicle load constraints. The aim is to fully exploit the potential of M-AGV fleets to maximize material handling efficiency and reduce energy usage. Section 3.1 provides a detailed description of the energy-efficient scheduling problem for heterogeneous M-AGVs along with its fundamental assumptions. Section 3.2 defines the notations used in the mathematical model, and Section 3.3 presents the formulation of the mathematical model.

3.1. Problem Description

The scheduling problem for heterogeneous M-AGV fleets involves the following elements: a distribution center (i.e., the material warehouse), a heterogeneous fleet of M-AGVs with different load capacities, and material request workstations (i.e., workstations within the workshop that require material delivery). A simplified layout of the manufacturing workshop is illustrated in Figure 1.

In this study, the distribution center and the n workstations with material demands are defined as a set of n + 1 nodes, denoted as P = {p_i|i = 0, 1, …, n}, where node p₀ represents the distribution center (also the pickup location for AGVs), and the remaining nodes constitute the set of material request workstations, denoted as P′ = P\{p₀}. Each node p_i contains information including its location in the workshop (x_i, y_i) and its material demand r_i (with r₀ at the distribution center defined as 0). A fleet of m heterogeneous M-AGVs with different load capacities is defined as W = {w_k|k = 1, …, m}, where w_k denotes the maximum load capacity of vehicle k.

Against this background, the scheduling problem of a heterogeneous multi-load AGV fleet involves assigning AGVs to execute delivery tasks, allocating specific material handling tasks to the assigned AGVs, and planning their delivery routes. The objective is to deliver the required materials from the distribution center to the designated workstations in sequence, while minimizing total energy consumption and satisfying vehicle load constraints. In the heterogeneous fleet, vehicle assignment is determined based on a priority rule that selects the AGV with the lowest unit energy consumption, which is calculated as follows:

Unit energy consumption = \frac{total energy consumed to complete the tasks}{total quantity of the tasks}

(1)

In this study, time constraints are not considered. The flowcharts for scheduling problems of heterogeneous and homogeneous M-AGV fleets are shown in Figure 2 and Figure 3, respectively.

The following assumptions are made in constructing the mathematical model for the scheduling problem:

(1): The M-AGV fleet consists of two types of M-AGV with different load capacities, and the number of available vehicles is always sufficient;
(2): Each loaded AGV departs from the distribution center, visits a series of workstations in a specific sequence to fulfill its material demands, and returns to the distribution center upon completing all deliveries;
(3): The material demand of each workstation in a single delivery does not exceed the maximum load capacity of the larger AGV, and each workstation’s demand is fulfilled by only one AGV per trip;
(4): AGVs are assumed to operate without breakdowns or path conflicts during the scheduling process;
(5): It is assumed that the AGVs have sufficient battery capacity to complete all delivery tasks;
(6): It is assumed that all material deliveries are completed within their respective time windows;
(7): The width of all aisles in the workshop is assumed to accommodate the movement of both types of AGVs.

3.2. Notation Definitions

The definitions of the key notations used in the mathematical model for the M-AGV fleet scheduling problem are provided in Table 1.

3.3. Energy Consumption Optimization Model

During AGV operation, mechanical power must overcome various forms of resistance, including rolling resistance, air resistance, gradient resistance, acceleration resistance, and gravitational effects [3,25]. However, in practical workshop environments, the floor is typically level, air movement is minimal, and AGVs generally operate at low speeds, falling into the category of low-speed guided vehicles. As a result, the impact of air resistance and gradient resistance on energy consumption during AGV travel is negligible [26]. Based on these considerations, this study excludes air and gradient resistance from the energy consumption model. Inspired by the work of Gao (2022) [18], the AGV’s movement process within the workshop is divided into four stages—acceleration, constant speed, deceleration, and turning—each with its corresponding energy consumption calculated separately. The detailed energy consumption model is presented as follows.

3.3.1. Acceleration Stage

This study establishes a multi-factor coupled energy consumption model that captures the synergistic effects of M-AGV driving state variations (including speed and travel distance) and changes in effective payload, as shown in Equation (3). Additionally, inspired by the work of Briand (2018) [5], the real-time total mass of materials carried by the M-AGV is modeled as a piecewise constant function over time and integrated into the energy consumption formulation.

V (t) = v_{0} + a \times t_{a c c}

(2)

E_{a c c} = \sum_{k = 1}^{m} \sum_{i = 0}^{n} \sum_{j = 0}^{n} \frac{(m_{k} + w_{i j}^{k} (t)) \times (a + g \times C_{r}) \times d_{a c c}^{i j}}{η_{k}}

(3)

where

m_{k}

denotes the self-weight of vehicle k,

w_{i j}^{k} (t)

represents the real-time load weight of vehicle k, and

d_{a c c}^{i j}

is the distance traveled by the M-AGV during acceleration.

3.3.2. Deceleration Stage

Since the kinetic energy of an AGV during deceleration is primarily dissipated through the braking system, with a sharp drop—or even complete cessation—of motor output, this study neglects the energy consumption during the deceleration phase. Accordingly,

E_{d e c}

is simplified to zero.

E_{d e c} = 0,

(4)

3.3.3. Constant-Speed Stage

Once the M-AGV accelerates to its maximum speed, it enters the constant-speed phase. In this study, the distance required for the M-AGV to reach its maximum speed is denoted as

Δ d

, and the travel path within the workshop is simplified using the Manhattan distance

d_{i j}

. Based on this, the energy consumption during the constant-speed phase—accounting for the multi-factor coupling effects—is calculated as follows.

Δ d = \frac{v_{m a x}^{2} - v_{0}^{2}}{2} \times a

(5)

d_{i j} = |x_{i} - x_{j}| + |y_{i} - y_{j}|

(6)

d_{u n i}^{i j} = m a x \{|x_{i} - x_{j}| - 2 \times Δ d, 0\} + m a x \{|y_{i} - y_{j}| - 2 \times Δ d, 0\}

(7)

E_{u n i} = \sum_{k = 1}^{m} \sum_{i = 0}^{n} \sum_{j = 0}^{n} \frac{(m_{k} + w_{i j}^{k} (t)) \times (a + g \times C_{r}) \times d_{u n i}^{i j}}{η_{k}}

(8)

where

x_{i}, x_{j}, y_{i},

and

y_{j}

represent the coordinates of workstations

i

and

j

, respectively;

d_{u n i}^{i j}

is the distance traveled by the AGV at constant speed; and

Δ d

is the distance traveled by the vehicle during acceleration to its maximum speed, derived from the kinematic equation that relates velocity and displacement in physics.

3.3.4. Turning Stage

This study treats the AGV’s turning process as a constant-speed phase and calculates the vehicle’s turning displacement using the AGV’s average turning radius [3]. The energy consumption formula during the AGV’s turning process is as follows.

E_{t u r} = \sum_{k = 1}^{m} \sum_{i = 0}^{n} \sum_{j = 0}^{n} \frac{(m_{k} + w_{i j}^{k}) \times (a + g \times C_{r}) \times (π / 2 \times R_{k})}{η_{k}} \times (n_{t_{1}}^{k} + 2 \times n_{t_{2}}^{k})

(9)

where

n_{t_{1}}^{k}

and

n_{t_{2}}^{k}

represent the number of

90^{°}

and

180^{°}

turns.

3.3.5. Total Energy Consumption Model

Based on the analysis of the various energy consumption processes of M-AGVs, the total energy consumption during the operation of M-AGVs is ultimately calculated as follows.

E_{s u m} = E_{a c c} + E_{d e c} + E_{u n i} + E_{t u r}

(10)

In addition, the constraints to be satisfied for this problem are as follows:

\{\begin{array}{l} q_{i j}^{k} = 0, Vehicle k does not travel from workstation i to j \\ q_{i j}^{k} = 1, Vehicle travel from workstation i to j \end{array}

(11)

\sum_{k = 1}^{m} \sum_{i, j = 0}^{n} q_{i j}^{k} = 1

(12)

\sum_{i, j = 0}^{n} q_{i j}^{k} - \sum_{i, h = 0}^{n} q_{i h}^{k} = 0

(13)

\sum_{i = 0}^{n} q_{0 i}^{k} = 1

(14)

\sum_{j = 0}^{n} q_{j 0}^{k} = 1

(15)

0 \leq w_{i j}^{k} \leq w_{k}

(16)

r_{j} \leq w_{i j}^{k}

(17)

Constraints (12) and (13) specify that the material delivery task for each workstation must be completed by a single vehicle in a single trip. Constraints (14) and (15) require that each vehicle departs from the distribution center and returns to it after completing deliveries to a sequence of workstations. Constraint (16) ensures that the total weight of materials transported in a single trip does not exceed the vehicle’s maximum load capacity. Constraint (17) stipulates that the material demand of any workstation must not exceed the vehicle’s real-time load upon arrival during the delivery process.

4. SARSA-TTAO Algorithm

To efficiently solve the energy-efficient scheduling problem for M-AGVs, this study proposes a novel optimization algorithm—the SARSA–TTAO algorithm. In the generic aggregation phase of the Triangulation Topology Aggregation Optimizer (TTAO), a similarity-based individual generation strategy is introduced to enhance population diversity and mitigate premature convergence. Meanwhile, the dynamic reward feedback mechanism of the State–Action–Reward–State–Action (SARSA) algorithm is integrated to enable adaptive adjustment of the generation strategy, thereby significantly improving the algorithm’s optimization performance and convergence efficiency.

4.1. Encoding and Decoding

The energy-efficient scheduling of heterogeneous M-AGVs involves vehicle type selection, task assignment, and path planning. To address these elements, this study employs two encoding schemes: random number encoding and binary “0–1” encoding. Specifically, a set of random real numbers within the range [0, 1] is generated to correspond one-to-one with the material handling tasks, while AGVs are encoded based on their load capacities—“0” represents an M-AGV with a larger load capacity, and “1” represents an M-AGV with a smaller load capacity. The encoding structure is illustrated in Figure 4.

Subsequently, the array representing the task sequence is input into the SARSA–TTAO algorithm for iterative optimization. The output is a newly generated array that reflects the optimized task sequence.

The decoding process involves transforming the task sequence generated by the algorithm into a feasible AGV scheduling solution. During this process, a priority rule based on the unit energy consumption per kilogram of material is introduced to guide vehicle type selection. Specifically, the lower the unit energy consumption for transporting one kilogram of material, the better the vehicle–task matching. The decoding procedure is illustrated in Figure 5.

The specific decoding steps illustrated in Figure 4 are as follows:

Step 1:

Task Mapping. The real-coded sequence generated by the SARSA-TTAO algorithm is decoded into a specific sequence of material handling tasks, thereby completing the initial mapping from the algorithm space to the task space.

Step 2:

Vehicle Assignment and Task Allocation. Considering the heterogeneity of AGVs (e.g., different load capacities and unit energy consumption parameters), a two-stage scheduling strategy is employed:

Stage 1:: Tasks are sequentially assigned to AGVs based on their load capacities.
Stage 2:: For each task batch, the AGV type with the optimal unit energy efficiency (as defined in Equation (1)) is selected to minimize overall energy consumption.

These two stages are executed iteratively until all tasks are fully allocated.

Step 3:: Path Planning. Based on the task allocation results, the task execution sequence for each AGV is determined, and the travel paths are planned using the Manhattan distance minimization principle.
Step 4:: Objective Evaluation. Taking into account AGV types, task allocation, and path planning results, the total energy consumption of all AGVs (i.e., the objective function value) is calculated to assess the effectiveness of the scheduling solution.

4.2. Design of a Novel Individual Generation Strategy

In recent years, swarm intelligence optimization techniques have garnered significant attention due to their exceptional performance in solving NP-hard problems. In 2023, Zhao et al. first proposed the Triangulation Topology Aggregation Optimizer (TTAO) [24], whose operation consists of three key phases: algorithm initialization, the generic aggregation process, and the local aggregation process, as illustrated in Figure 6. This algorithm integrates the mathematical principles of similar triangle topology with the search mechanisms of metaheuristic algorithms, breaking through the limitations of traditional single-/dual-population evolutionary paradigms and significantly improving individual optimization efficiency. However, in the generic aggregation phase of TTAO, the generation of new individuals relies solely on learning from the best individual within the topological unit, which tends to bias the search process and increase the risk of premature convergence to local optima [27]. To address this limitation, this study proposes a novel similarity-based generic aggregation strategy built upon the TTAO framework. The proposed strategy considers both individual similarity and fitness, leveraging the multi-dimensional characteristics of individuals in TTAO. The triangular topology unit in the TTAO algorithm is illustrated in Figure 7.

The proposed similarity-based generic aggregation strategy in this study comprehensively considers both the similarity and fitness among individuals. It is designed based on the multi-dimensional characteristics of individuals in the TTAO algorithm (as shown in Figure 3, individual

X

is a D-dimensional variable). This strategy calculates the inner product between individuals to measure similarity, retains a set of individuals A with low similarity to the current best individual

X_{i, b e s t}^{t}

, and employs a greedy selection approach to choose the best-performing individual in set A for information exchange with

X_{i, b e s t}^{t}

. The specific interaction mechanism is defined in Equation (20). By promoting interaction between less similar individuals, this strategy effectively exploits the potential of elite individuals and significantly enhances the diversity of the search space, thereby improving the algorithm’s exploration capability.

I = 〈X_{i, b e s t}^{t}, X_{r a n d, b e s t}^{t}〉

(18)

A = \{X_{i - d i s s i m}^{t}\}

(19)

X_{i, n e w 1}^{t + 1} = r_{4} \times X_{i, b e s t}^{t} + (1 - r_{4}) \times X_{i - d i s s i m, f - m i n}^{t}

(20)

where

X_{i, b e s t}^{t}

is the best individual in the triangular topology unit i at iteration t;

I

denotes the inner product between the individual

X_{i, b e s t}^{t}

and the best individual

X_{r a n d, b e s t}^{t}

within a random triangulation topology unit at the t-th iteration—this inner product serves as a measure of similarity between the two vectors;

X_{i - d i s s i m}^{t}

denotes the individual with low similarity to the current best individual

X_{i, b e s t}^{t}

, and a certain number of such

X_{i - d i s s i m}^{t}

individuals constitute the set A;

X_{i - d i s s i m, f - m i n}^{t}

represents the individual with the best fitness in set A;

X_{i, n e w 1}^{t + 1}

refers to the new individual generated in the generic aggregation phase of unit i at iteration t + 1; and

r_{4}

is a random number within the range [0, 1].

4.3. Integration of TTAO and the SARSA Algorithm

The SARSA learning algorithm is a representative on-policy reinforcement learning method, in which the actions ultimately executed by the agent are consistent with those selected during the learning process [28]. Leveraging this characteristic, this study integrates the SARSA algorithm into the generic aggregation phase of the Triangulation Topology Aggregation Optimizer (TTAO) to enhance its optimization capability. In this phase, SARSA dynamically selects the most appropriate aggregation strategy based on feedback from a reward function to generate new individuals. This strategy selection mechanism helps the algorithm avoid local optima while enabling more effective exploration and exploitation of the search space, thereby significantly improving the efficiency of solving complex optimization problems. The integrated framework of the SARSA-TTAO algorithm is illustrated in Figure 8.

The design of the state variables, action space, and the reward function in the SARSA learning algorithm is described as follows.

The state variable provides the necessary information for the agent’s learning and decision-making process. In this study, the state is designed to represent the information contained in the top

⌊N / 3⌋

individuals during the generic aggregation phase. Specifically,

S_{t}

denotes the state of the population individuals at time step i, which includes information on the best and second-best individuals within the currently updating triangular topology unit, as well as their fitness values.

S_{t} = [X_{i, b e s t}^{t}, X_{i, s b e s t}^{t}, f (X_{i, b e s t}^{t}), f (X_{i, s b e s t}^{t})]

(21)

i \in [1, ⌊N / 3⌋]

(22)

where

X_{i, s b e s t}^{t}

denotes the suboptimal individual of the triangular topology unit i at iteration t, and

f (X_{i, b e s t}^{t})

and

f (X_{i, s b e s t}^{t})

represent the corresponding fitness function value of each individual.

The action space represents the complete set of actions available to the agent. In this study, three generic aggregation strategies are defined as action variables within the generic aggregation phase of the TTAO algorithm: (1)

a_{1}

—a strategy inspired by the traditional Genetic Algorithm, corresponding to the original generic aggregation strategy; (2)

a_{2}

—a retrospective learning-based strategy [27]; and (3)

a_{3}

— the strategy proposed in this study, which simultaneously considers individual similarity and fitness. The formal definition of the action space is provided in Equation (23).

A_{t} = \{a_{1}, a_{2}, a_{3}\}

(23)

The reward function evaluates the reward obtained from taking a specific action in a given state. In this study, the energy consumption objective function from the energy-efficient scheduling model for M-AGVs (as defined in Equation (10)) is adopted as the reward evaluation function

f (∙)

, which evaluates the energy performance of each individual in a specific state during the iterative process. Based on this evaluation, the corresponding reward value R is computed and utilized for updating the Q-values in the SARSA learning algorithm. The mathematical expression of the reward function R is presented in Equation (24).

R = \{\begin{array}{l} - 2, f (X_{i, n e w 1}^{t + 1}) < f (X_{i, b e s t}^{t}) \\ - 1, f (X_{i, n e w 1}^{t + 1}) < f (X_{i, s b e s t}^{t}) \\ 0, o t h e r w i s e \end{array}

(24)

5. Experimental Validation

This section aims to validate the effectiveness of the proposed SARSA-TTAO algorithm in addressing the energy-efficient scheduling problem for M-AGV fleets through simulation experiments. All algorithms were implemented in MATLAB R2022b and executed on a computer equipped with a 13th Gen Intel(R) Core(TM) i5-13500H 2.60 GHz processor, 16.0 GB RAM, and a 64-bit Windows 11 operating system.

5.1. Effectiveness Verification

To evaluate the effectiveness of the proposed algorithm, this study employs benchmark test functions from the CEC2017 suite. Specifically, three representative functions were selected: the unimodal function F1, the simple multimodal function F4, and the hybrid function F11, with the problem dimension set to 30. The selected benchmark functions encompass a range of typical scenarios observed in real-world energy-efficient scheduling problems for M-AGVs, from simple task structures and localized complexity to high-dimensional, multi-constraint coupling. They are intended to systematically evaluate the algorithm’s local search accuracy, global optimization capability, and its adaptability and robustness in high-dimensional nonlinear problem settings.

The algorithms used for comparison are consistent with those selected in Section 5.3, including the Genetic Algorithm (GA) [21], Particle Swarm Optimization (PSO) [29], and the Hybrid Genetic Algorithm with Large Neighborhood Search (GA-LNS) [18]. To ensure fairness, each algorithm was independently run 20 times on each test function, and the average solution value and average computational time were recorded. Furthermore, to ensure a comparable number of fitness function evaluations across algorithms, the relationship between the number of evaluations, population size, and the number of iterations was considered during parameter setting. Specifically, for the SARSA-TTAO algorithm, the population size was set to N and the maximum number of iterations to M. Accordingly, the population sizes of PSO and GA were set to 4N/3, with a maximum of M iterations. For GA-LNS, the population size was set to N, with an inner neighborhood search loop of 4N/3 iterations and M outer iterations. In this study, N and M were set to 60 and 300, respectively. The results of the experiments are presented in Figure 9 and Table 2.

Based on the results shown in Figure 8 and Table 2, the proposed SARSA-TTAO algorithm demonstrates clear performance advantages across all three benchmark functions—F1, F4, and F11—with varying levels of complexity. The algorithm achieves rapid convergence to high-quality solutions in the early stages of iteration and maintains relatively stable performance throughout the process. Specifically, for all three functions, SARSA-TTAO is able to approach near-optimal solutions within approximately 25 iterations. Moreover, it consistently outperforms GA-LNS, GA, and PSO in terms of best solution quality, and for the highly complex F11 function, it successfully identifies the global optimum and demonstrates a high degree of robustness. Although SARSA-TTAO exhibits relatively higher computational time, the total runtime remains under 1 s, suggesting good practical feasibility. These results collectively verify the effectiveness of the SARSA-TTAO algorithm.

5.2. Experimental Setup

Research on M-AGV fleet scheduling remains limited, and there is a lack of publicly available benchmark instances. To evaluate the performance of the SARSA-TTAO algorithm on this problem, this study simulates a workshop environment with 100 workstations having material demands. Three test instances of different scales—small, medium, and large—were generated by randomly selecting 20, 30, and 50 workstations, respectively. Each workstation node includes location coordinates (

x_{i}, y_{i}

) in meters and a material demand

r_{i}

in kilograms. The detailed configuration of the small-scale case with n = 20 is shown in Table 3.

Two types of M-AGVs with different payload capacities and self-weights are used in the experimental validation. The AGV parameters are derived from the official technical specifications of Geek+ products and the related literature [3,18], as shown in Table 4. Based on these two AGV types, this study conducts three simulation experiments: (1) energy-efficient scheduling with a heterogeneous M-AGV fleet, (2) with a homogeneous light-load AGV fleet, and (3) with a homogeneous heavy-load AGV fleet. The experiments were conducted to validate the effectiveness of the proposed SARSA-TTAO algorithm and to assess the application potential of heterogeneous and homogeneous AGV fleets in terms of optimal energy consumption and vehicle usage statistics.

5.3. Algorithm Parameter Settings

In the experiments, the proposed SARSA-TTAO algorithm is compared with three widely used algorithms in the field of M-AGVs scheduling: GA [21], PSO [29], and GA-LNS [18]. The six key parameters of the SARSA-TTAO algorithm were configured based on the results of the Taguchi experiments, with the specific parameter ranges presented in Table 5. Based on these ranges, 27 distinct parameter combinations were designed and independently executed 20 times each on small-, medium-, and large-scale instances for Taguchi analysis. The final parameter design scheme is summarized in Table 6.

To ensure fair comparisons, the number of fitness evaluations is fixed across all algorithms, following the same approach used in Section 5.1.

Since the objective function in this study aims to minimize energy consumption, the “smaller-the-better” signal-to-noise (S/N) ratio formula was adopted in the Taguchi experiments. Taking the energy-efficient scheduling problem of heterogeneous multi-load AGVs as an example, the S/N response diagrams of parameter factors for the SARSA-TTAO algorithm are shown in Figure 10.

As shown in the figure, the signal-to-noise (S/N) ratio values obtained from the Taguchi experiments exhibit relatively small variations across heterogeneous M-AGV scheduling problems of three different scales. In addition, the results from the Taguchi experiments on homogeneous light-load and heavy-load M-AGV scheduling problems display a consistent trend, indicating that the algorithm has low sensitivity to parameter perturbations and demonstrates strong robustness.

5.4. Comparative Experiments

Each test case of different scales was independently run 20 times, and the average values were used to eliminate random errors. Figure 11, Figure 12 and Figure 13 illustrate the iteration trends of the four algorithms—SARSA-TTAO, GA-LNS, GA, and PSO—when solving the energy-efficient scheduling problems for heterogeneous, homogeneous light-load, and homogeneous heavy-load M-AGV fleets, respectively. All experiments include small, medium, and large-scale instances. Table 7 presents the best energy consumption results (in kilojoules) for each problem type and the corresponding vehicle usage.

As seen in Figure 11, Figure 12 and Figure 13, the proposed SARSA-TTAO algorithm exhibits superior performance in both heterogeneous and homogeneous M-AGVs scheduling problems. In terms of solution quality, SARSA-TTAO consistently achieves the best results across all instance scales, demonstrating good global search capability. From the perspective of convergence speed, SARSA-TTAO typically achieves faster convergence to better solutions in the early stages of iteration, reflecting its high search efficiency. In terms of stability, SARSA-TTAO displays a relatively smooth downward trend across most iterations, indicating consistent performance. Overall, SARSA-TTAO shows distinct advantages in key performance indicators such as solution quality and convergence speed, validating its effectiveness and reliability in solving complex M-AGV fleet scheduling problems. These findings provide strong algorithmic support for intelligent AGV scheduling in real-world industrial applications.

According to the data presented in Table 7, the heterogeneous M-AGV fleet demonstrates significantly better performance in terms of energy consumption compared to homogeneous fleets under the same task requirements and environmental conditions. In the small-, medium-, and large-scale scenarios, the energy consumption of the heterogeneous fleet is reduced by 24.1%, 29.7%, and 22.0%, respectively, compared to the homogeneous fleet composed of light-load M-AGVs, and by 2.0%, 15.8%, and 21.5%, respectively, compared to the homogeneous heavy-load fleet. These results indicate the strong potential of heterogeneous M-AGVs in energy optimization and confirm the positive role of fleet heterogeneity in promoting energy efficiency. It is also observed that the advantage of the heterogeneous fleet is less pronounced in the small-scale scenario, which may be attributed to the underutilization of the synergy between different types of AGVs within the heterogeneous fleet.

Regarding the total number of vehicles used, the homogeneous heavy-load fleet performs best, followed closely by the heterogeneous fleet. In contrast, the homogeneous light-load fleet consistently requires a significantly larger number of vehicles. This finding suggests that using heavy-load or heterogeneous fleets can effectively reduce traffic complexity in the shop floor logistics system, decrease the likelihood of path congestion, and improve the overall material handling efficiency. Notably, in the large-scale scenario, the advantages of the heterogeneous fleet become more apparent; although its total vehicle usage is slightly higher than that of the homogeneous heavy-load fleet, it uses fewer heavy-load vehicles. This implies that in practical workshop applications, heterogeneous fleets could reduce equipment procurement costs by minimizing the number of heavy-load AGVs needed.

6. Conclusions

This study focuses on the energy-efficient scheduling of M-AGV fleets within workshop material handling systems. A multi-factor energy consumption model is developed with the objective of minimizing total energy consumption, which incorporates the coupled effects of AGV driving states (i.e., speed and distance) and real-time payload variations, thereby enhancing the model’s practical applicability. To efficiently address this problem, a novel SARSA-TTAO algorithm is proposed. This algorithm integrates multiple global aggregation strategies to increase population diversity and incorporates SARSA’s dynamic decision-making mechanism for strategy selection, which effectively balances exploration and exploitation while avoiding premature convergence. A series of simulation experiments involving AGV fleets of different sizes and configurations was conducted to evaluate the algorithm. The results demonstrate that the proposed SARSA-TTAO algorithm significantly outperforms GA-LNS, GA, and PSO in terms of convergence and solution quality. Furthermore, the experiments reveal the dual advantages of heterogeneous M-AGV fleets in both energy consumption and the number of vehicles used. However, in industrial practice, practical limitations such as insufficient charging infrastructure and dynamic demand fluctuations, along with industrial constraints like time window requirements, path conflicts, and equipment failures, may restrict the model’s applicability. Therefore, future research will further explore the scheduling of heterogeneous M-AGVs under multiple real-world constraints, including battery limitations, time windows, path conflicts, and equipment failures, as well as validate the practicality and robustness of the SARSA-TTAO algorithm in more complex scheduling scenarios. This effort aims to provide theoretical support and technical guidance for the development of green and energy-efficient AGV-based material handling systems.

Author Contributions

Conceptualization, H.T. and Y.Z.; data curation, H.W.; methodology, H.T., H.W., Y.Z., and X.X.; Software, H.T.; validation, H.W.; writing—original draft, H.T. and H.W.; writing—review and editing, Y.Z. and X.X. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Key R&D Project in Zhejiang Province under Grant 2023C01063.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

All data used in this study are either generated through simulation or derived from publicly available sources.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had a role in the design of the study and provided input on the research questions.

Abbreviations

The following abbreviations are used in this manuscript:

SARSA	State–Action–Reward–State–Action Learning Algorithm
TTAO	Triangulation Topology Aggregation Optimizer
M-AGV	Multi-load Automated Guided Vehicle
GA	Genetic Algorithm
PSO	Particle Swarm Optimization
GA-LNS	Hybrid Genetic Algorithm with Large Neighborhood Search

References

Hu, H.; Jia, X.; He, Q.; Fu, S.; Liu, K. Deep Reinforcement Learning Based AGVs Real-Time Scheduling with Mixed Rule for Flexible Shop Floor in Industry 4.0. Comput. Ind. Eng. 2020, 149, 106749. [Google Scholar] [CrossRef]
Barak, S.; Moghdani, R.; Maghsoudlou, H. Energy-Efficient Multi-Objective Flexible Manufacturing Scheduling. J. Clean. Prod. 2021, 283, 124610. [Google Scholar] [CrossRef]
Zhang, Z.; Wu, L.; Zhang, W.; Peng, T.; Zheng, J. Energy-Efficient Path Planning for a Single-Load Automated Guided Vehicle in a Manufacturing Workshop. Comput. Ind. Eng. 2021, 158, 107397. [Google Scholar] [CrossRef]
Dang, Q.-V.; Singh, N.; Adan, I.; Martagan, T.; Van De Sande, D. Scheduling Heterogeneous Multi-Load AGVs with Battery Constraints. Comput. Oper. Res. 2021, 136, 105517. [Google Scholar] [CrossRef]
Briand, C.; He, Y.; Ngueveu, S.U. Energy-Efficient Planning for Supplying Assembly Lines with Vehicles. EURO J. Transp. Logist. 2018, 7, 387–414. [Google Scholar] [CrossRef]
Bányai, T. Optimization of Material Supply in Smart Manufacturing Environment: A Metaheuristic Approach for Matrix Production. Machines 2021, 9, 220. [Google Scholar] [CrossRef]
Huang, Y.; Zhao, L.; Zhou, B. An Adaptive Melody Search Algorithm Based on Low-Level Heuristics for Material Feeding Scheduling Optimization in a Hybrid Kitting System. Adv. Eng. Inform. 2024, 62, 102855. [Google Scholar] [CrossRef]
Huo, X.; He, X.; Xiong, Z.; Wu, X. Multi-Objective Optimization for Scheduling Multi-Load Automated Guided Vehicles with Consideration of Energy Consumption. Transp. Res. Part C Emerg. Technol. 2024, 161, 104548. [Google Scholar] [CrossRef]
Gürel, S.; Gultekin, H.; Akhlaghi, V.E. Energy Conscious Scheduling of a Material Handling Robot in a Manufacturing Cell. Robot. Comput. Integr. Manuf. 2019, 58, 97–108. [Google Scholar] [CrossRef]
Singh, N.; Dang, Q.-V.; Akcay, A.; Adan, I.; Martagan, T. A Matheuristic for AGV Scheduling with Battery Constraints. Eur. J. Oper. Res. 2022, 298, 855–873. [Google Scholar] [CrossRef]
Li, J.; Cheng, W.; Lai, K.K.; Ram, B. Multi-AGV Flexible Manufacturing Cell Scheduling Considering Charging. Mathematics 2022, 10, 3417. [Google Scholar] [CrossRef]
Li, G.; Li, X.; Gao, L.; Zeng, B. Tasks Assigning and Sequencing of Multiple AGVs Based on an Improved Harmony Search Algorithm. J. Ambient Intell. Human Comput. 2019, 10, 4533–4546. [Google Scholar] [CrossRef]
Liu, Z.; Park, M.; Bae, J. A Heuristic for Multiple Heterogeneous Mobile Robots Task Assignment under Various Loading Conditions Considering Workload Balance. In Proceedings of the 2023 IEEE International Conference on Electro Information Technology, Romeoville, IL, USA, 18 May 2023; pp. 294–299. [Google Scholar]
Zacharia, P.; Drosos, C.; Piromalis, D.; Papoutsidakis, M. The Vehicle Routing Problem with Fuzzy Payloads Considering Fuel Consumption. Appl. Artif. Intell. 2021, 35, 1755–1776. [Google Scholar] [CrossRef]
Zhou, B.; He, Z. A Novel Hybrid-Load AGV for JIT-Based Sustainable Material Handling Scheduling with Time Window in Mixed-Model Assembly Line. Int. J. Prod. Res. 2023, 61, 796–817. [Google Scholar] [CrossRef]
Zhou, B.; Zhao, L. A Quantum-Inspired Archimedes Optimization Algorithm for Hybrid-Load Autonomous Guided Vehicle Scheduling Problem. Appl. Intell. 2023, 53, 27725–27778. [Google Scholar] [CrossRef]
Zhou, B.; Wen, M. A Mutli-Objective Artificial Electric Field Algorithm with Reinforcement Learning for Milk-Run Assembly Line Feeding and Scheduling Problem. Comput. Ind. Eng. 2024, 190, 110080. [Google Scholar] [CrossRef]
Gao, J.; Zheng, X.; Gao, F.; Tong, X.; Han, Q. Heterogeneous Multitype Fleet Green Vehicle Path Planning of Automated Guided Vehicle with Time Windows in Flexible Manufacturing System. Machines 2022, 10, 197. [Google Scholar] [CrossRef]
Ho, Y.-C.; Chien, S.-H. A Simulation Study on the Performance of Task-Determination Rules and Delivery-Dispatching Rules for Multiple-Load AGVs. Int. J. Prod. Res. 2006, 44, 4193–4222. [Google Scholar] [CrossRef]
Ho, Y.-C.; Liu, H.-C. The Performance of Load-Selection Rules and Pickup-Dispatching Rules for Multiple-Load AGVs. J. Manuf. Syst. 2009, 28, 1–10. [Google Scholar] [CrossRef]
Xu, R.; Yan, L.; Li, Y.; Jie, B. Research on Multi-Load AGV Scheduling Based on Improved Genetic Algorithm. In Proceedings of the 2023 6th International Conference on Computer Network, Electronic and Automation, Xi’an, China, 22–24 September 2023; pp. 110–114. [Google Scholar]
Yao, G.; Zhang, N.; Duan, Z.; Tian, C. Improved SARSA and DQN Algorithms for Reinforcement Learning. Theor. Comput. Sci. 2025, 1027, 115025. [Google Scholar] [CrossRef]
Bouhamed, O.; Ghazzai, H.; Besbes, H.; Massoud, Y. Q-Learning Based Routing Scheduling For a Multi-Task Autonomous Agent. In Proceedings of the 2019 IEEE 62nd International Midwest Symposium on Circuits and Systems (MWSCAS), Dallas, TX, USA, 4–7 August 2019; pp. 634–637. [Google Scholar]
Zhao, S.; Zhang, T.; Cai, L.; Yang, R. Triangulation Topology Aggregation Optimizer: A Novel Mathematics-Based Meta-Heuristic Algorithm for Continuous Optimization and Engineering Applications. Expert Syst. Appl. 2024, 238, 121744. [Google Scholar] [CrossRef]
Zhang, S.; Gajpal, Y.; Appadoo, S.S.; Abdulkader, M.M.S. Electric Vehicle Routing Problem with Recharging Stations for Minimizing Energy Consumption. Int. J. Prod. Econ. 2018, 203, 404–413. [Google Scholar] [CrossRef]
Leng, J.; Peng, J.; Liu, J.; Zhang, Y.; Ji, J.; Zhang, Y. Profiling Power Consumption in Low-Speed Autonomous Guided Vehicles. IEEE Robot. Autom. Lett. 2024, 9, 6027–6034. [Google Scholar] [CrossRef]
Dahou, A.; Abd Elaziz, M.; Mohamed, H.; Dahou, A.H.; Al-qaness, M.A.A.; Ghetas, M.; Ewess, A.; Zheng, Z. Linguistic Feature Fusion for Arabic Fake News Detection and Named Entity Recognition Using Reinforcement Learning and Swarm Optimization. Neurocomputing 2024, 598, 128078. [Google Scholar] [CrossRef]
Mondal, A.; Mishra, D.; Prasad, G.; Hossain, A. Joint Optimization Framework for Minimization of Device Energy Consumption in Transmission Rate Constrained UAV-Assisted IoT Network. IEEE Internet Things J. 2022, 9, 9591–9607. [Google Scholar] [CrossRef]
Qiu, L.; Wang, J.; Chen, W.; Wang, H. Heterogeneous AGV Routing Problem Considering Energy Consumption. In Proceedings of the 2015 IEEE International Conference on Robotics and Biomimetics, Zhuhai, China, 6–9 December 2015; pp. 1894–1899. [Google Scholar]

Figure 1. Simplified manufacturing workshop layout.

Figure 2. Flowchart of the scheduling process for heterogeneous M-AGVs.

Figure 3. Flowchart of the scheduling process for homogeneous M-AGVs.

Figure 4. Encoding process of the heterogeneous M-AGVs scheduling problem.

Figure 5. Decoding process of the heterogeneous M-AGV scheduling problem.

Figure 6. Flowchart of the TTAO Algorithm.

Figure 7. Triangular topology unit in the TTAO Algorithm.

Figure 8. Framework diagram of the SARSA-TTAO algorithm.

Figure 9. Comparison of the performance of four algorithms on benchmark test functions. (a) Computation results on the unimodal function F1; (b) computation results on the simple multimodal function F4; (c) computation results on the hybrid function F11.

Figure 10. Signal–noise response diagram—heterogeneous m-AGVs scheduling problem. Panels (a–c) present the experimental results on small-, medium-, and large-scale instances, respectively.

Figure 11. Performance comparison of different algorithms for energy-efficient scheduling of heterogeneous M-AGVs. Panels (a–c) present the experimental results on small-, medium-, and large-scale instances, respectively.

Figure 12. Performance comparison of different algorithms for energy-efficient scheduling of homogeneous (light-load) M-AGVs. Panels (a–c) present the experimental results on small-, medium-, and large-scale instances, respectively.

Figure 13. Performance comparison of different algorithms for energy-efficient scheduling of homogeneous (heavy-load) M-AGVs. Panels (a–c) present the experimental results on small-, medium-, and large-scale instances, respectively.

Table 1. Key notations.

Notations	Definitions
Indices:
$i, j$	Distribution center and workstation node indices, $(i, j = 0,1, 2, \dots, n)$
$k$	Vehicle type index, $(k = 1,2, \dots, m)$
Parameters:
$r_{i}$	Material requirement at node $i$
$w_{k}$	Maximum capacity limit of vehicle $k$
$m_{k}$	Own weight of vehicle $k$
$R_{k}$	Mean turning radius of vehicle $k$
$a$	Vehicle acceleration
$C_{r}$	Coefficient of rolling friction of vehicle
$η_{k}$	Power factor of vehicle $k$
$v_{0}$	Initial speed of the vehicle
$v_{m a x}$	Maximum vehicle speed
Variables:
$w_{i j}^{k} (t)$	Actual load of vehicle $k$ from workstation $i$ to workstation $j$
$n_{t_{1}}^{k}$	Number of 90-degree turns during the route of vehicle $k$
$n_{t_{2}}^{k}$	Number of 180-degree turns during the route of vehicle $k$
$F_{i j}^{k}$	Travel resistance of vehicle $k$ from workstation $i$ to $j$
$d_{i j}$	Manhattan distance between node $i$ to node $j$
$V (t)$	Vehicle travel speed
$t_{a c c}$	Time spent on vehicle acceleration
$t_{d e c}$	Time spent on vehicle deceleration
$d_{u n i}^{i j}$	The distance traveled by the vehicle from node $i$ to node $j$ at uniform speed
$d_{a c c}^{i j}$	Accelerated distance traveled by the vehicle from node $i$ to node $j$
$E_{a c c}$	Total energy consumption of all vehicles during the acceleration phase
$E_{d e c}$	Total energy consumption of all vehicles during the deceleration phase
$E_{u n i}$	Total energy consumption of all vehicles during the uniform driving phase
$E_{t u r}$	Total energy consumption during turning for all vehicles
$E_{s u m}$	Total energy consumption of all AGVs

Table 2. Comparison of optimal solutions for test functions calculated by four algorithms.

		SARSA-TTAO	GA-LNS	GA	PSO
F1	Value	6.10 × 10⁻⁸³	6.20 × 10⁻¹⁵	5.22	0.04
F1	Time	0.47	0.38	0.22	0.76
F4	Value	7.32 × 10⁻³⁹	1.67	7.09	2.86
F4	Time	0.47	0.42	0.22	0.72
F11	Value	0	0.01	1.251	0.39
F11	Time	0.54	0.49	0.27	0.71

Table 3. Data for small-scale experiments.

Node NO.	Location	Material Demand	Node NO.	Location	Material Demand
1	(48, 48)	86	11	(42, 37)	110
2	(48, 32)	151	12	(24, 21)	98
3	(24, 42)	73	13	(54, 21)	123
4	(54, 16)	89	14	(61, 21)	119
5	(24, 26)	70	15	(11, 37)	161
6	(48, 37)	100	16	(67, 10)	144
7	(5, 21)	132	17	(42, 32)	55
8	(61, 26)	34	18	(36, 5)	30
9	(5, 37)	129	19	(17, 10)	154
10	(5, 48)	101	20	(54, 42)	77

Table 4. Parameters of M-AGV.

Parameters	Light-Load Vehicles	Heavy-Load Vehicles
Sizes	740 × 500 × 210 mm	1100 × 700 × 210 mm
Maximum capacity limit ( $w_{k}$ )	200 kg	600 kg
Own weight of vehicle ( $m_{k}$ )	124 kg	175 kg
Turning radius ( $R_{k}$ )	350 mm	450 mm
Turning speed	90°/1.5 s, 180°/2 s	90°/1.5 s, 180°/2 s
Power factor ( $η_{k}$ )	95	90
Maximum vehicle speed ( $v_{m a x}$ )	1.5 m/s
Vehicle acceleration ( $a$ )	1.5 m/s
Coefficient of rolling friction ( $C_{r}$ )	0.04

Table 5. Parameter range settings in the Taguchi experiment.

Parameter Description	Parameter Symbol	Value Range
Number of Iterations	M	[150, 200, 300]
Population Size	N	[240, 270, 300]
Retention Size of Individuals in the Proposed Strategy	NA	[N/4, N/3,N/2]
Learning Rate	$α$	[0.001, 0.01, 0.1]
Discount Factor	$γ$	[0.9, 0.95, 0.99]
Exploration Factor	$ε$	[0.1, 0.3, 0.5]

Table 6. Parameter design of the SARSA-TTAO algorithm for simulation experiments.

Type of Fleet	Example Scale	Parameter Value
Heterogeneous (Light- and Heavy-Load)	Small	$M = 300, N = 270, N A = N / 3, α = 0.1, γ = 0.9, ε = 0.5$
	Medium	$M = 300, N = 300, N A = N / 3, α = 0.01, γ = 0.95, ε = 0.5$
	Large	$M = 300, N = 300, N A = N / 3, α = 0.001, γ = 0.99, ε = 0.3$
Homogeneous Light-Load (W = 200 kg)	Small	$M = 300, N = 300, N A = N / 3, α = 0.1, γ = 0.95, ε = 0.1$
	Medium	$M = 200, N = 270, N A = N / 2, α = 0.001, γ = 0.95, ε = 0.3$
	Large	$M = 300, N = 270, N A = N / 2, α = 0.1, γ = 0.9, ε = 0.3$
Homogeneous Heavy-Load (W = 600 kg)	Small	$M = 300, N = 240, N A = N / 2, α = 0.01, γ = 0.9, ε = 0.5$
	Medium	$M = 300, N = 300, N A = N / 2, α = 0.1, γ = 0.99, ε = 0.3$
	Large	$M = 200, N = 300, N A = N / 2, α = 0.01, γ = 0.9, ε = 0.3$

Table 7. Optimal scheduling results and vehicle usage statistics for heterogeneous and homogeneous M-AGVs.

Example	Fleet Composition	Optimal Energy Consumption (KJ)	Light-Load Vehicle Usage (Times)	Heavy-Load Vehicle Usage (Times)	Total Vehicle Usage (Times)
Small Scale (N = 20)	Heterogeneous	6.412	1	4	5
	Homogeneous—Light Load	8.443	12	-	12
	Homogeneous—Heavy Load	6.557	-	4	4
Medium Scale (N = 30)	Heterogeneous	10.385	3	6	9
	Homogeneous—Light Load	14.773	19	-	19
	Homogeneous—Heavy Load	12.336	-	6	6
Large Scale (N = 50)	Heterogeneous	18.916	5	9	14
	Homogeneous—Light Load	24.236	34	-	34
	Homogeneous—Heavy Load	22.392	-	11	11

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Tang, H.; Wang, H.; Zhan, Y.; Xu, X. Energy-Efficient Scheduling of Multi-Load AGVs Based on the SARSA-TTAO Algorithm. Sustainability 2025, 17, 7353. https://doi.org/10.3390/su17167353

AMA Style

Tang H, Wang H, Zhan Y, Xu X. Energy-Efficient Scheduling of Multi-Load AGVs Based on the SARSA-TTAO Algorithm. Sustainability. 2025; 17(16):7353. https://doi.org/10.3390/su17167353

Chicago/Turabian Style

Tang, Hongtao, Hanyue Wang, Yan Zhan, and Xuesong Xu. 2025. "Energy-Efficient Scheduling of Multi-Load AGVs Based on the SARSA-TTAO Algorithm" Sustainability 17, no. 16: 7353. https://doi.org/10.3390/su17167353

APA Style

Tang, H., Wang, H., Zhan, Y., & Xu, X. (2025). Energy-Efficient Scheduling of Multi-Load AGVs Based on the SARSA-TTAO Algorithm. Sustainability, 17(16), 7353. https://doi.org/10.3390/su17167353

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Energy-Efficient Scheduling of Multi-Load AGVs Based on the SARSA-TTAO Algorithm

Abstract

1. Introduction

2. Literature Review

3. Problem Description and Mathematical Modelling

3.1. Problem Description

3.2. Notation Definitions

3.3. Energy Consumption Optimization Model

3.3.1. Acceleration Stage

3.3.2. Deceleration Stage

3.3.3. Constant-Speed Stage

3.3.4. Turning Stage

3.3.5. Total Energy Consumption Model

4. SARSA-TTAO Algorithm

4.1. Encoding and Decoding

4.2. Design of a Novel Individual Generation Strategy

4.3. Integration of TTAO and the SARSA Algorithm

5. Experimental Validation

5.1. Effectiveness Verification

5.2. Experimental Setup

5.3. Algorithm Parameter Settings

5.4. Comparative Experiments

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI