1. Introduction
In the context of Industry 4.0 and green manufacturing, production systems are rapidly evolving toward greater flexibility and intelligence, driving the widespread adoption of Automated Guided Vehicles (AGVs) in the manufacturing sector. This has led to the gradual emergence of energy consumption in material handling systems as a critical bottleneck hindering the advancement of green manufacturing [
1,
2,
3]. With the continuous iteration and upgrading of AGVs, their material handling flexibility has significantly improved, and their configurations have become increasingly diverse, thereby introducing new challenges for scheduling strategies and energy optimization.
In the development of AGV systems, multi-load capacity and fleet heterogeneity offer new avenues for reducing energy consumption and improving operational efficiency [
4]. Compared to single-load AGV systems, M-AGVs can significantly reduce the frequency of vehicle trips, mitigate traffic conflicts, and lower system energy consumption through Task Aggregation Scheduling [
5,
6]. Furthermore, heterogeneous fleets, by configuring M-AGVs with varying load capacities and performance parameters, can more precisely match task requirements, thus enhancing transportation flexibility and resource utilization efficiency [
4]. However, this also leads to a substantial increase in the complexity of task scheduling, placing higher demands on energy consumption modeling and algorithm design.
To this end, this study focuses on the energy-efficient scheduling problem for M-AGV systems and establishes a multi-factor coupled energy consumption model that integrates AGV speed, travel distance, and dynamic load variations. A hybrid optimization algorithm is proposed, combining the SARSA learning mechanism with the TTAO metaheuristic search strategy to enhance search capability and convergence stability. Experiments were conducted across task scenarios involving three different task scales and three levels of AGV fleet heterogeneity to evaluate the algorithm’s performance. Furthermore, the differences in energy consumption and vehicle usage between heterogeneous and homogeneous fleets under energy-optimal conditions were analyzed.
The paper is organized as follows.
Section 2 provides a review of related research.
Section 3 defines the energy-efficient scheduling problem for M-AGVs and develops the energy consumption model.
Section 4 presents the design of the optimization algorithm.
Section 5 discusses the experimental study and algorithm performance. Finally,
Section 6 provides concluding remarks and outlines directions for future research.
2. Literature Review
As a major component of overall production energy consumption, the energy usage of material handling systems has a profound impact on the sustainability of production logistics [
3,
7]. Existing studies have shown that the energy consumption of M-AGVs is significantly influenced by various factors such as effective payload, travel speed, and distance. For example, Huo et al. validated through experimental data that for every additional 1 kg of load, the energy consumption rate of an M-AGV increases by approximately 9.2 × 10
−6% per meter [
8]. Gurel et al. indicated that the energy consumption of handling robots primarily depends on their operating speed, payload, and travel distance, and through case analysis, they highlighted the significant impact of speed on energy usage [
9]. In addition, Briand et al. developed a time-varying function for vehicle mass and effective payload, which was incorporated into the energy consumption model, thereby improving the model’s prediction accuracy [
5].
However, most existing studies adopt simplified energy consumption models, often overlooking the dynamic effects of real-time payload and vehicle operating states such as speed variations [
7,
10,
11]. In energy optimization for homogeneous M-AGV fleets, the high consistency of vehicle parameters has led research to focus primarily on the integrated optimization of task allocation and path planning [
12,
13], or toward joint scheduling between workshops and AGVs [
14], with the modeling of energy consumption often being relatively simplified. For example, Hang et al. proposed a bi-objective model for hybrid M-AGV scheduling, simultaneously optimizing task assignments and routing to reduce total energy consumption and equipment costs. However, the model assumes constant speed and fixed payload, thus simplifying the energy formulation process [
7]. Zacharia et al. introduced a fuzzy effective load model for analyzing vehicle energy consumption; however, it still assumes constant speed, overlooking the dynamic effects of speed variation [
14]. In contrast, energy optimization for heterogeneous M-AGV fleets is inherently more complex and challenging due to the diversity of vehicle performance parameters. Nevertheless, existing models in this area also tend to be overly simplified [
15,
16,
17,
18]. For example, Zhou and He addressed the sustainable material handling scheduling problem of mixed-load AGVs by optimizing task allocation across different vehicle types to reduce handling energy consumption [
15]. Zhou and Zhao proposed a dual-assignment hybrid supply strategy for the material delivery problem in mixed-model assembly lines, in which the path planning and scheduling of multi-load AGVs were optimized to reduce both line-side inventory and overall system energy consumption [
16]. Dang et al. addressed the heterogeneous multi-load AGV scheduling problem with battery constraints and formulated a mixed-integer linear programming model aiming to minimize delay costs and travel costs [
4]. Although some models have begun to incorporate load variability, the omission of vehicle speed and routing considerations still limits the accuracy of energy predictions and the effectiveness of scheduling strategies.
Various optimization approaches have been applied to the scheduling of M-AGV material handling systems. However, two major limitations remain. First, most studies focus on homogeneous fleets, and the proposed algorithms are often inadequate for addressing the more complex scheduling requirements in heterogeneous environments. Second, the majority of methods rely on traditional heuristic and metaheuristic algorithms, which often suffer from low efficiency in exploiting evolutionary information and a tendency to fall into local optima. In existing studies, traditional heuristic rules have been widely adopted due to their low computational cost and ease of implementation. For example, Ho and Chien proposed task and delivery scheduling rules for M-AGVs and explored optimal rule combinations under various performance metrics [
19]. Similarly, Ho and Liu developed multiple rules for load selection and pickup scheduling, verifying the optimal combinations through simulation experiments [
20]. However, these rule-based approaches lack flexibility in complex and dynamic environments, making them less adaptable to real-time load variations. To overcome the limitations of traditional heuristics, metaheuristic algorithms—by simulating natural evolution or neighborhood exploration—have significantly improved the quality of scheduling solutions. For instance, Xu et al. integrated local search with a Genetic Algorithm to minimize the travel distance of multi-load AGVs [
21]; Huo et al. employed NSGA-II to effectively solve a multi-objective scheduling model aiming to minimize both energy consumption and delay [
8]; and Gao et al. introduced a large neighborhood search-enhanced Genetic Algorithm to address the green vehicle routing problem with time windows for a heterogeneous fleet [
18]. Nevertheless, these metaheuristic approaches are still based on conventional single- or dual-population evolutionary frameworks, which are prone to premature convergence and limited in optimization efficiency.
Reinforcement Learning (RL), which operates based on a state–action–reward mechanism, dynamically optimizes policies to enhance scheduling efficiency and has shown broad applicability in the optimization of M-AGV scheduling problems. Among the commonly used temporal difference learning methods, SARSA is an on-policy approach that emphasizes feedback from actual actions during the policy update process, resulting in more stable convergence and making it well-suited for dynamically changing scheduling scenarios [
22]. In contrast, Q-learning is an off-policy method whose updates are independent of the current behavior policy, making it more amenable to parallel training; however, it carries a higher risk of policy divergence in complex environments [
23]. Additionally, although deep reinforcement learning methods such as Deep Q-Networks (DQN) possess powerful feature representation capabilities, they typically involve high training costs and require large amounts of data, which limits their feasibility for deployment in industrial settings [
1]. Recent studies have explored the integration of reinforcement learning with metaheuristic algorithms to further enhance performance. For instance, Zhou et al. proposed a multi-objective Quantum-inspired Metaheuristic Archimedes Optimization Algorithm (QMQAOA), which combines Q-learning with the Archimedes Optimization Algorithm (AOA), demonstrating promising results in reducing line-side inventory and the energy consumption of heterogeneous M-AGVs [
16]. However, the AOA component of this algorithm still relies on single- or dual-population evolutionary models, which are susceptible to premature convergence and local optima.
Therefore, to address the energy-efficient scheduling problem of M-AGV fleets, this study aims to minimize the total energy consumption by constructing a multi-factor energy optimization model that considers the coupled effects of vehicle speed, effective payload, and travel distance. In addition, the Triangulation Topology Aggregation Optimizer (TTAO) [
24] is introduced. This algorithm integrates the geometric principles of similar triangles into the search process of metaheuristic optimization, overcoming the limitations of conventional population-based evolutionary mechanisms and significantly improving the efficiency of individual solution generation. Building upon the TTAO framework, a similarity-based learning strategy is proposed to enhance the diversity of the search space and strengthen the algorithm’s exploration capabilities. Furthermore, the SARSA learning algorithm is incorporated for policy selection, guiding the search process with a better balance between exploration and exploitation, thereby significantly improving the performance of TTAO in solving complex engineering problems.
3. Problem Description and Mathematical Modelling
The scheduling problem for homogeneous M-AGV fleets can be regarded as a special case of that for heterogeneous fleets. Therefore, this study focuses on the energy-efficient scheduling of heterogeneous M-AGV fleets in production and manufacturing environments. The core objective is to minimize the total system energy consumption by optimizing vehicle assignment, task allocation, and path planning, while satisfying vehicle load constraints. The aim is to fully exploit the potential of M-AGV fleets to maximize material handling efficiency and reduce energy usage.
Section 3.1 provides a detailed description of the energy-efficient scheduling problem for heterogeneous M-AGVs along with its fundamental assumptions.
Section 3.2 defines the notations used in the mathematical model, and
Section 3.3 presents the formulation of the mathematical model.
3.1. Problem Description
The scheduling problem for heterogeneous M-AGV fleets involves the following elements: a distribution center (i.e., the material warehouse), a heterogeneous fleet of M-AGVs with different load capacities, and material request workstations (i.e., workstations within the workshop that require material delivery). A simplified layout of the manufacturing workshop is illustrated in
Figure 1.
In this study, the distribution center and the n workstations with material demands are defined as a set of n + 1 nodes, denoted as P = {pi|i = 0, 1, …, n}, where node p0 represents the distribution center (also the pickup location for AGVs), and the remaining nodes constitute the set of material request workstations, denoted as P′ = P\{p0}. Each node pi contains information including its location in the workshop (xi, yi) and its material demand ri (with r0 at the distribution center defined as 0). A fleet of m heterogeneous M-AGVs with different load capacities is defined as W = {wk|k = 1, …, m}, where wk denotes the maximum load capacity of vehicle k.
Against this background, the scheduling problem of a heterogeneous multi-load AGV fleet involves assigning AGVs to execute delivery tasks, allocating specific material handling tasks to the assigned AGVs, and planning their delivery routes. The objective is to deliver the required materials from the distribution center to the designated workstations in sequence, while minimizing total energy consumption and satisfying vehicle load constraints. In the heterogeneous fleet, vehicle assignment is determined based on a priority rule that selects the AGV with the lowest unit energy consumption, which is calculated as follows:
In this study, time constraints are not considered. The flowcharts for scheduling problems of heterogeneous and homogeneous M-AGV fleets are shown in
Figure 2 and
Figure 3, respectively.
The following assumptions are made in constructing the mathematical model for the scheduling problem:
- (1)
The M-AGV fleet consists of two types of M-AGV with different load capacities, and the number of available vehicles is always sufficient;
- (2)
Each loaded AGV departs from the distribution center, visits a series of workstations in a specific sequence to fulfill its material demands, and returns to the distribution center upon completing all deliveries;
- (3)
The material demand of each workstation in a single delivery does not exceed the maximum load capacity of the larger AGV, and each workstation’s demand is fulfilled by only one AGV per trip;
- (4)
AGVs are assumed to operate without breakdowns or path conflicts during the scheduling process;
- (5)
It is assumed that the AGVs have sufficient battery capacity to complete all delivery tasks;
- (6)
It is assumed that all material deliveries are completed within their respective time windows;
- (7)
The width of all aisles in the workshop is assumed to accommodate the movement of both types of AGVs.
3.2. Notation Definitions
The definitions of the key notations used in the mathematical model for the M-AGV fleet scheduling problem are provided in
Table 1.
3.3. Energy Consumption Optimization Model
During AGV operation, mechanical power must overcome various forms of resistance, including rolling resistance, air resistance, gradient resistance, acceleration resistance, and gravitational effects [
3,
25]. However, in practical workshop environments, the floor is typically level, air movement is minimal, and AGVs generally operate at low speeds, falling into the category of low-speed guided vehicles. As a result, the impact of air resistance and gradient resistance on energy consumption during AGV travel is negligible [
26]. Based on these considerations, this study excludes air and gradient resistance from the energy consumption model. Inspired by the work of Gao (2022) [
18], the AGV’s movement process within the workshop is divided into four stages—acceleration, constant speed, deceleration, and turning—each with its corresponding energy consumption calculated separately. The detailed energy consumption model is presented as follows.
3.3.1. Acceleration Stage
This study establishes a multi-factor coupled energy consumption model that captures the synergistic effects of M-AGV driving state variations (including speed and travel distance) and changes in effective payload, as shown in Equation (3). Additionally, inspired by the work of Briand (2018) [
5], the real-time total mass of materials carried by the M-AGV is modeled as a piecewise constant function over time and integrated into the energy consumption formulation.
where
denotes the self-weight of vehicle k,
represents the real-time load weight of vehicle k, and
is the distance traveled by the M-AGV during acceleration.
3.3.2. Deceleration Stage
Since the kinetic energy of an AGV during deceleration is primarily dissipated through the braking system, with a sharp drop—or even complete cessation—of motor output, this study neglects the energy consumption during the deceleration phase. Accordingly,
is simplified to zero.
3.3.3. Constant-Speed Stage
Once the M-AGV accelerates to its maximum speed, it enters the constant-speed phase. In this study, the distance required for the M-AGV to reach its maximum speed is denoted as
, and the travel path within the workshop is simplified using the Manhattan distance
. Based on this, the energy consumption during the constant-speed phase—accounting for the multi-factor coupling effects—is calculated as follows.
where
and
represent the coordinates of workstations
and
, respectively;
is the distance traveled by the AGV at constant speed; and
is the distance traveled by the vehicle during acceleration to its maximum speed, derived from the kinematic equation that relates velocity and displacement in physics.
3.3.4. Turning Stage
This study treats the AGV’s turning process as a constant-speed phase and calculates the vehicle’s turning displacement using the AGV’s average turning radius [
3]. The energy consumption formula during the AGV’s turning process is as follows.
where
and
represent the number of
and
turns.
3.3.5. Total Energy Consumption Model
Based on the analysis of the various energy consumption processes of M-AGVs, the total energy consumption during the operation of M-AGVs is ultimately calculated as follows.
In addition, the constraints to be satisfied for this problem are as follows:
Constraints (12) and (13) specify that the material delivery task for each workstation must be completed by a single vehicle in a single trip. Constraints (14) and (15) require that each vehicle departs from the distribution center and returns to it after completing deliveries to a sequence of workstations. Constraint (16) ensures that the total weight of materials transported in a single trip does not exceed the vehicle’s maximum load capacity. Constraint (17) stipulates that the material demand of any workstation must not exceed the vehicle’s real-time load upon arrival during the delivery process.
5. Experimental Validation
This section aims to validate the effectiveness of the proposed SARSA-TTAO algorithm in addressing the energy-efficient scheduling problem for M-AGV fleets through simulation experiments. All algorithms were implemented in MATLAB R2022b and executed on a computer equipped with a 13th Gen Intel(R) Core(TM) i5-13500H 2.60 GHz processor, 16.0 GB RAM, and a 64-bit Windows 11 operating system.
5.1. Effectiveness Verification
To evaluate the effectiveness of the proposed algorithm, this study employs benchmark test functions from the CEC2017 suite. Specifically, three representative functions were selected: the unimodal function F1, the simple multimodal function F4, and the hybrid function F11, with the problem dimension set to 30. The selected benchmark functions encompass a range of typical scenarios observed in real-world energy-efficient scheduling problems for M-AGVs, from simple task structures and localized complexity to high-dimensional, multi-constraint coupling. They are intended to systematically evaluate the algorithm’s local search accuracy, global optimization capability, and its adaptability and robustness in high-dimensional nonlinear problem settings.
The algorithms used for comparison are consistent with those selected in
Section 5.3, including the Genetic Algorithm (GA) [
21], Particle Swarm Optimization (PSO) [
29], and the Hybrid Genetic Algorithm with Large Neighborhood Search (GA-LNS) [
18]. To ensure fairness, each algorithm was independently run 20 times on each test function, and the average solution value and average computational time were recorded. Furthermore, to ensure a comparable number of fitness function evaluations across algorithms, the relationship between the number of evaluations, population size, and the number of iterations was considered during parameter setting. Specifically, for the SARSA-TTAO algorithm, the population size was set to
N and the maximum number of iterations to
M. Accordingly, the population sizes of PSO and GA were set to
4N/3, with a maximum of
M iterations. For GA-LNS, the population size was set to
N, with an inner neighborhood search loop of
4N/3 iterations and
M outer iterations. In this study,
N and
M were set to 60 and 300, respectively. The results of the experiments are presented in
Figure 9 and
Table 2.
Based on the results shown in
Figure 8 and
Table 2, the proposed SARSA-TTAO algorithm demonstrates clear performance advantages across all three benchmark functions—F1, F4, and F11—with varying levels of complexity. The algorithm achieves rapid convergence to high-quality solutions in the early stages of iteration and maintains relatively stable performance throughout the process. Specifically, for all three functions, SARSA-TTAO is able to approach near-optimal solutions within approximately 25 iterations. Moreover, it consistently outperforms GA-LNS, GA, and PSO in terms of best solution quality, and for the highly complex F11 function, it successfully identifies the global optimum and demonstrates a high degree of robustness. Although SARSA-TTAO exhibits relatively higher computational time, the total runtime remains under 1 s, suggesting good practical feasibility. These results collectively verify the effectiveness of the SARSA-TTAO algorithm.
5.2. Experimental Setup
Research on M-AGV fleet scheduling remains limited, and there is a lack of publicly available benchmark instances. To evaluate the performance of the SARSA-TTAO algorithm on this problem, this study simulates a workshop environment with 100 workstations having material demands. Three test instances of different scales—small, medium, and large—were generated by randomly selecting 20, 30, and 50 workstations, respectively. Each workstation node includes location coordinates (
) in meters and a material demand
in kilograms. The detailed configuration of the small-scale case with n = 20 is shown in
Table 3.
Two types of M-AGVs with different payload capacities and self-weights are used in the experimental validation. The AGV parameters are derived from the official technical specifications of Geek+ products and the related literature [
3,
18], as shown in
Table 4. Based on these two AGV types, this study conducts three simulation experiments: (1) energy-efficient scheduling with a heterogeneous M-AGV fleet, (2) with a homogeneous light-load AGV fleet, and (3) with a homogeneous heavy-load AGV fleet. The experiments were conducted to validate the effectiveness of the proposed SARSA-TTAO algorithm and to assess the application potential of heterogeneous and homogeneous AGV fleets in terms of optimal energy consumption and vehicle usage statistics.
5.3. Algorithm Parameter Settings
In the experiments, the proposed SARSA-TTAO algorithm is compared with three widely used algorithms in the field of M-AGVs scheduling: GA [
21], PSO [
29], and GA-LNS [
18]. The six key parameters of the SARSA-TTAO algorithm were configured based on the results of the Taguchi experiments, with the specific parameter ranges presented in
Table 5. Based on these ranges, 27 distinct parameter combinations were designed and independently executed 20 times each on small-, medium-, and large-scale instances for Taguchi analysis. The final parameter design scheme is summarized in
Table 6.
To ensure fair comparisons, the number of fitness evaluations is fixed across all algorithms, following the same approach used in
Section 5.1.
Since the objective function in this study aims to minimize energy consumption, the “smaller-the-better” signal-to-noise (S/N) ratio formula was adopted in the Taguchi experiments. Taking the energy-efficient scheduling problem of heterogeneous multi-load AGVs as an example, the S/N response diagrams of parameter factors for the SARSA-TTAO algorithm are shown in
Figure 10.
As shown in the figure, the signal-to-noise (S/N) ratio values obtained from the Taguchi experiments exhibit relatively small variations across heterogeneous M-AGV scheduling problems of three different scales. In addition, the results from the Taguchi experiments on homogeneous light-load and heavy-load M-AGV scheduling problems display a consistent trend, indicating that the algorithm has low sensitivity to parameter perturbations and demonstrates strong robustness.
5.4. Comparative Experiments
Each test case of different scales was independently run 20 times, and the average values were used to eliminate random errors.
Figure 11,
Figure 12 and
Figure 13 illustrate the iteration trends of the four algorithms—SARSA-TTAO, GA-LNS, GA, and PSO—when solving the energy-efficient scheduling problems for heterogeneous, homogeneous light-load, and homogeneous heavy-load M-AGV fleets, respectively. All experiments include small, medium, and large-scale instances.
Table 7 presents the best energy consumption results (in kilojoules) for each problem type and the corresponding vehicle usage.
As seen in
Figure 11,
Figure 12 and
Figure 13, the proposed SARSA-TTAO algorithm exhibits superior performance in both heterogeneous and homogeneous M-AGVs scheduling problems. In terms of solution quality, SARSA-TTAO consistently achieves the best results across all instance scales, demonstrating good global search capability. From the perspective of convergence speed, SARSA-TTAO typically achieves faster convergence to better solutions in the early stages of iteration, reflecting its high search efficiency. In terms of stability, SARSA-TTAO displays a relatively smooth downward trend across most iterations, indicating consistent performance. Overall, SARSA-TTAO shows distinct advantages in key performance indicators such as solution quality and convergence speed, validating its effectiveness and reliability in solving complex M-AGV fleet scheduling problems. These findings provide strong algorithmic support for intelligent AGV scheduling in real-world industrial applications.
According to the data presented in
Table 7, the heterogeneous M-AGV fleet demonstrates significantly better performance in terms of energy consumption compared to homogeneous fleets under the same task requirements and environmental conditions. In the small-, medium-, and large-scale scenarios, the energy consumption of the heterogeneous fleet is reduced by 24.1%, 29.7%, and 22.0%, respectively, compared to the homogeneous fleet composed of light-load M-AGVs, and by 2.0%, 15.8%, and 21.5%, respectively, compared to the homogeneous heavy-load fleet. These results indicate the strong potential of heterogeneous M-AGVs in energy optimization and confirm the positive role of fleet heterogeneity in promoting energy efficiency. It is also observed that the advantage of the heterogeneous fleet is less pronounced in the small-scale scenario, which may be attributed to the underutilization of the synergy between different types of AGVs within the heterogeneous fleet.
Regarding the total number of vehicles used, the homogeneous heavy-load fleet performs best, followed closely by the heterogeneous fleet. In contrast, the homogeneous light-load fleet consistently requires a significantly larger number of vehicles. This finding suggests that using heavy-load or heterogeneous fleets can effectively reduce traffic complexity in the shop floor logistics system, decrease the likelihood of path congestion, and improve the overall material handling efficiency. Notably, in the large-scale scenario, the advantages of the heterogeneous fleet become more apparent; although its total vehicle usage is slightly higher than that of the homogeneous heavy-load fleet, it uses fewer heavy-load vehicles. This implies that in practical workshop applications, heterogeneous fleets could reduce equipment procurement costs by minimizing the number of heavy-load AGVs needed.
6. Conclusions
This study focuses on the energy-efficient scheduling of M-AGV fleets within workshop material handling systems. A multi-factor energy consumption model is developed with the objective of minimizing total energy consumption, which incorporates the coupled effects of AGV driving states (i.e., speed and distance) and real-time payload variations, thereby enhancing the model’s practical applicability. To efficiently address this problem, a novel SARSA-TTAO algorithm is proposed. This algorithm integrates multiple global aggregation strategies to increase population diversity and incorporates SARSA’s dynamic decision-making mechanism for strategy selection, which effectively balances exploration and exploitation while avoiding premature convergence. A series of simulation experiments involving AGV fleets of different sizes and configurations was conducted to evaluate the algorithm. The results demonstrate that the proposed SARSA-TTAO algorithm significantly outperforms GA-LNS, GA, and PSO in terms of convergence and solution quality. Furthermore, the experiments reveal the dual advantages of heterogeneous M-AGV fleets in both energy consumption and the number of vehicles used. However, in industrial practice, practical limitations such as insufficient charging infrastructure and dynamic demand fluctuations, along with industrial constraints like time window requirements, path conflicts, and equipment failures, may restrict the model’s applicability. Therefore, future research will further explore the scheduling of heterogeneous M-AGVs under multiple real-world constraints, including battery limitations, time windows, path conflicts, and equipment failures, as well as validate the practicality and robustness of the SARSA-TTAO algorithm in more complex scheduling scenarios. This effort aims to provide theoretical support and technical guidance for the development of green and energy-efficient AGV-based material handling systems.