1. Introduction
Bridge cranes, commonly referred to as overhead cranes, are critical equipment and are widely employed in manufacturing plants for the transportation, loading, and unloading of products or raw materials. Their structure comprises three main components (as depicted in
Figure 1): a bridge spanning parallel overhead runways, a hoist and trolley traversing along the bridge and moving up and down, and parallel runways fixed on the top of the building structure. In manufacturing plants, it is typical to operate multiple bridge cranes on the same runway to facilitate the swift transfer of products during production process [
1,
2]. However, due to the shared runways among the cranes, they are unable to pass each other, potentially leading to interference and production interruptions, thereby affecting overall efficiency.
Effective scheduling of bridge cranes is paramount for hybrid-flow assembly shops. These workshops represent a complex production layout where different types of products or components are assembled or processed in the same production shop [
3]. Given the potential variations in the manufacturing processes among different products, crane scheduling must account for these differences to ensure smooth production flow while minimizing production downtime [
4].
Therefore, research on bridge crane scheduling in hybrid flow assembly workshop holds significant importance [
5]. Optimizing crane scheduling arrangements can maximize production efficiency, reduce production downtime, and mitigate interference among cranes. Through effective scheduling, management can better plan production schedules, optimize the utilization of production resources, and enhance overall production efficiency and quality [
6].
In practice, studying crane scheduling issues necessitates consideration of the actual conditions in production workshops, equipment characteristics, and the complexity of production processes. Therefore, conducting in-depth research and analysis can provide effective solutions for production scheduling in mixed-flow assembly workshops, thereby maximizing production efficiency and resource utilization [
7,
8].
Crane scheduling is a scheduling problem with spatial constraints, namely crane interference, which makes crane scheduling more complex than typical job-shop problems [
9]. A significant amount of literature focuses on gantry crane scheduling problems. Gantry cranes are supported by separate rigid structures and operated by individual trolleys on fixed tracks. Unlike overhead cranes in manufacturing plants, gantry cranes are primarily used for loading and unloading at construction sites rather than transporting items from one track position to another [
10]. Therefore, it is challenging to apply the results of gantry crane scheduling problems to the scheduling of overhead cranes in manufacturing plants.
Specifically, Du et al. considered a flexible job shop with a single crane and fixed transport speed, optimizing only makespan [
3]. Liu et al. focused on energy consumption but assumed that there was a single crane and static job release [
5]. In contrast, our work simultaneously addresses: (i) distributed factories with heterogeneous machine portfolios; (ii) multiple cranes operating on shared runways with no-crossing constraints; (iii) crane speed selection as a decision variable affecting both makespan and energy; and (iv) dynamic job release controlled by a high-level RL agent. These differences make DHFSP-MCT substantially more challenging and require the proposed hierarchical RL approach.
In this paper, an HRL-BCMA algorithm is designed to solve the DHFSP-MCT problem. The contribution of this paper is shown as follows.
Modeling DHFSP with multi-crane interference.
Hierarchical RL with GIN for operator selection.
Bi-population co-evolution for multi-objective trade-off.
The remainder of the paper is arranged as follows.
Section 2 provides a review of the latest scholarly works on this subject of study.
Section 3 details the proposed DHFSP-MCT issue.
Section 4 details the HRL-BCMA algorithm, encompassing both encoding and decoding techniques. The experimental results are implemented in
Section 5. Finally, the conclusions and perspectives for future research are summarized in
Section 6.
2. Literature Review
In recent decades, we have witnessed increasing amounts of attention being paid to scheduling problems in manufacturing systems, particularly those involving material handling equipment. This section reviews the relevant literature based on three perspectives: crane scheduling, integrated production and transportation scheduling, and reinforcement learning-based optimization methods.
2.1. Crane Scheduling Problems
Crane scheduling has been extensively studied in various contexts, including container terminals, manufacturing workshops, and construction sites. Fibrianto et al. [
11] investigated the job sequencing problem of overhead shuttle cranes in automated container terminals, proposing a heuristic approach that will minimize tardiness by separating jobs into main and marshaling tasks. Vallada et al. [
12] examined yard crane scheduling in automated container yards, considering the complex interactions with other terminal systems. They developed heuristics with local search procedures that demonstrated the limitations of exact methods for large-scale instances. Xie et al. [
13] addressed the interference issue between two cranes by sequencing loading operations and analyzing the computational complexity of the proposed model. More recently, Zhang et al. [
14] optimized the coordinated operations of automated guided vehicles (AGVs) and double yard cranes in automated terminals using a mixed-integer programming model that explicitly considers crane interference.
Despite these contributions, most existing crane scheduling studies share a common limitation: they assume a predetermined production schedule and adjust crane operations accordingly [
15,
16,
17]. This hierarchical approach treats crane capacity as unlimited during production scheduling, often leading to job waiting times and reduced overall efficiency [
18,
19]. The interdependence between production processing and material transportation is largely overlooked.
2.2. Integrated Production and Crane Scheduling
Recognizing the limitations of hierarchical approaches, some researchers have attempted to integrate crane transportation into production scheduling. Liu et al. [
5] addressed the integrated optimization of flexible job shop scheduling and crane transportation, considering comprehensive energy consumption. Li et al. [
7] proposed a hybrid iterated greedy algorithm for a flexible job shop problem with crane transportation, demonstrating the benefits of integrated scheduling. Du et al. [
4] investigated the distributed flexible job shop scheduling problem with crane transportations, developing a hybrid estimation-of-distribution algorithm.
However, these integrated approaches predominantly focus on single-crane scenarios or assume that multiple cranes operate independently without mutual interference [
20]. In real-world manufacturing environments, particularly in hybrid flow shops, multiple overhead cranes share the same runway and cannot pass each other, creating complex spatial constraints and interference patterns [
1,
2]. The distributed nature of modern manufacturing, with multiple factories operating in parallel, further compounds this complexity. To date, the distributed hybrid flow shop scheduling problem with multi-crane transportation (DHFSP-MCT) remains understudied, with no existing work simultaneously addressing factory assignment, job sequencing, machine selection, and multi-crane coordination under interference constraints.
2.3. Reinforcement Learning in Scheduling
Meta-heuristics have been widely applied to production scheduling problems, including local search [
21], tabu search [
22], simulated annealing [
23], genetic algorithms [
24], ant colony optimization [
25], and particle swarm optimization [
26]. While these methods can find near-optimal solutions within reasonable timeframes, they typically rely on fixed search strategies that do not adapt to problem characteristics during the optimization process [
27].
Recently, reinforcement learning (RL) has emerged as a promising direction for adaptive scheduling. Du et al. [
3] proposed a reinforcement learning approach for flexible job shop scheduling with crane transportation and setup times, demonstrating the potential of learning-based methods. Zhang et al. [
9] developed a Q-learning-based hyper-heuristic evolutionary algorithm for distributed flexible job shop scheduling with crane transportation, where RL is used to select low-level heuristics dynamically. Zhao et al. [
17] presented a reinforcement learning-driven cooperative meta-heuristic algorithm for energy-efficient distributed no-wait flow-shop scheduling [
28].
Despite these advances, existing RL-based scheduling methods typically employ a single-layer RL agent that makes decisions at a fixed level of abstraction. This flat architecture struggles to capture the hierarchical nature of complex scheduling decisions, such as the interplay between factory assignment, job sequencing, machine selection, and crane coordination. Furthermore, most approaches focus on single-objective optimization, whereas real-world applications require the simultaneous optimization of multiple conflicting objectives, such as makespan and energy consumption.
2.4. Research Gaps and Contributions
While existing studies have made significant progress in integrating crane transportation with shop scheduling, several critical gaps remain unaddressed. First, most works assume a single crane or non-interfering cranes, neglecting the realistic constraints of multiple cranes sharing overlapping runways, where cranes cannot pass each other and must maintain safety distances. Second, crane speed is typically treated as constant, overlooking the trade-off between transport duration and energy consumption. Third, existing methods are primarily designed for single-factory settings; distributed hybrid flow shops with heterogeneous factory structures remain largely unexplored in the context of crane scheduling. Fourth, dynamic job release—where jobs arrive over time rather than being all available at time zero—has rarely been considered, despite its practical relevance in just-in-time production environments.
To bridge these gaps, this paper makes the following contributions. First, we formally define the DHFSP-MCT problem with a mixed-integer linear programming model that captures the complex interactions between production processing and multi-crane transportation. Second, we propose a hierarchical reinforcement learning-based bi-population collaborative meta-heuristic algorithm (HRL-BCMA), where a bi-level Deep Q-Network (DQN) framework naturally aligns with the hierarchical decision structure of the problem. Unlike flat RL approaches, our high-level agent learns when to release jobs for processing, while the low-level agent learns to select improvement operators based on solution states. Third, we introduce a bi-population co-evolutionary strategy that maintains separate populations for leaders and followers, enabling balanced optimization of makespan and total energy consumption. Fourth, we design a knowledge-informed strategy that leverages problem-specific features, such as crane positions and job due dates, to guide the search process. This integrated framework represents a fundamental advancement over existing methods, which combine existing ideas incrementally without addressing the hierarchical nature of multi-crane scheduling problems.
3. The Proposed DHFSP Problem with Multi-Crane Transportation
In modern manufacturing industries, overhead cranes are widely used in places such as manufacturing plants and ports; they are responsible for loading and unloading goods, as well as transporting items. However, with the increasingly complex demands of material scheduling, effectively scheduling and managing these overhead cranes has become a prominent issue.
To address the shop scheduling problem with crane transportation, a model of a multi-crane scheduling problem involving overhead crane transportation is proposed. A schematic diagram of the DHFSP-MCT problem is shown in
Figure 2. The model is abstracted from the production enterprise of aluminum. Multi-cranes refer to a transportation system composed of one large crane and one small crane, which work together to accomplish the loading, unloading, and transporting of jobs. The combination of multiple cranes provides the crane system with greater flexibility and efficiency, enabling the transportation system to adapt to various scales and types of logistics transportation needs. There are also some constraints within the multiple-crane transportation system. For example the crossover transportation is not available in the multiple crane transportation system.
The transportation of small cranes and large cranes affects each other. Therefore, it is necessary to consider this impact in the job allocation and scheduling solution and arrange the transportation sequence of cranes in such a way as to avoid conflicts and delays. Additionally, although there is a relationship between large and small cranes, they are not mutually exclusive. Multiple cranes cooperatively transport the processed jobs to accomplish the transportation tasks. This parallel transportation method can fully utilize the functions and resources of the crane system to improve transportation efficiency.
The complexity of multi-crane scheduling for overhead cranes is non-negligible. A reasonable scheduling model and solutions can be found through the detailed analysis of the problem model. The efficiency of the logistics transportation and production scheduling can be improved using the crane transportation system. The multi-crane shop scheduling problem might exist in different production scenarios within the modern production industry.
In the DHFSP-MCT, jobs are processed on machines and transported by overhead cranes, and multiple cranes are available for transportation.
If path conflicts arise between cranes, one must wait for the other to finish. Thus, the crane assignment for each job must be planned to minimize unnecessary movements. Job processing sequences and crane transportation paths are interdependent. Both machining and transportation impact makespan and total energy consumption. Higher crane speeds shorten makespan but increase energy use.
This paper aims to simultaneously minimize makespan and total energy consumption from machining and transportation.
The DHFSP-MCT involves four sub-problems: (1) assigning jobs to factories; (2) sequencing jobs within each factory; (3) selecting machines at each stage; and (4) assigning cranes and speeds for job transportation.
Assumptions of the DHFSP-MCT:
Assumption 1. Jobs arrive dynamically and are held in a buffer. The release of jobs into the shop floor is a decision variable. All jobs must follow a predefined sequence through all stages.
Assumption 2. All machines are available at time zero and remain operational throughout the production horizon.
Assumption 3. All of the operations of the jobs must be completed in a certain factory.
Assumption 4. The structures of factories are heterogeneous.
Assumption 5. Each job is processed on one machine at a time from the available set.
Assumption 6. Each machine processes one job at a time.
Assumption 7. Preemption is not allowed.
Assumption 8. Two overhead cranes transport jobs between machines.
Assumption 9. The first operation stage does not require crane service.
Assumption 10. Cranes handle one transportation operation at a time (no overlapping).
Assumption 11. The machine location for the first operation is the crane’s initial position.
Assumption 12. Cranes operate continuously without shutdown.
Assumption 13. Sufficient intermediate buffers exist for completed jobs awaiting cranes.
Assumption 14. The arrival time for the next job must match or follow the next machine’s idle time; otherwise, the crane waits.
Assumption 15. Energy consumption includes machine processing and transportation.
Assumption 16. Different crane speeds result in different energy consumption levels. Higher transport speed increases motor power draw and dynamic friction, leading to higher energy consumption per unit time. The speed can be reduced during waiting times to lower energy consumption.
The above assumptions reflect a balance between modeling fidelity and computational tractability. Assumptions (1)–(3) and (5)–(7) are standard in distributed scheduling and are reasonable for make-to-order production environments where job preemption is not permitted. Assumption (4) (heterogeneous factories) captures real-world scenarios where different plants possess distinct machine portfolios. Assumption (8) (two cranes) is specific to the target aluminum manufacturing plant; however, the model can be extended to more cranes by generalizing the safety distance constraint. Assumption (9) (no crane at first stage) holds when raw materials are already present at machine locations, which is a common setup in assembly shops. Assumptions (10)–(14) define operational constraints that prevent deadlocks and ensure deterministic behavior. Notably, Assumption (16) (speed-dependent energy) is critical for green scheduling; while it assumes that energy increases monotonically with speed, real cranes may exhibit non-linear efficiency curves at very low speeds—this simplification is acceptable for the typical operating range considered here. Assumptions (11) and (14) introduce idle waiting, which is realistic but may slightly overestimate energy consumption when opportunistic repositioning could occur. Future work could relax Assumptions (8) and (13) to consider varying numbers of cranes and finite buffer capacities.
Notation: The notation is shown in
Table 1.
It is assumed that the positions of all machines in each factory are fixed and known.
represents the energy consumption of production machines, while represents the energy consumption of crane transportation.
represents the processing energy consumption per unit of time for a certain stage;
represents the idle energy consumption per unit of time for a certain stage.
represents processing energy consumption, while
represents machine idle energy consumption. The corresponding process is shown in
Figure 3.
represents the scenario where stage
of job
and stage
of job
are both transported by crane
c, where
denotes the energy consumption of crane c for transportation at a certain speed,
represents the energy consumption of crane
c for idle running, and
indicates the transportation time of crane
c.
represents the time at which the crane lifts job i at stage j, and represents the time at which the crane drops the job. denotes the position of the processing machine for job i at stage j, and represents the position of the processing machine for the previous stage of job i. indicates the transportation speed in the x direction, while represents the transportation speed in the y direction.
Assume that the energy consumption during crane transitions is included in the idle energy consumption of the machines.
The total completion time for job i:
Each job can only be assigned to one factory for processing:
Each job can only be processed on one machine at each processing stage:
For each job, the processing start time at any stage must not be earlier than the completion time of the previous stage:
The following two equations ensure that the processing time of jobs processed on the same machine does not overlap:
The processing start time for any job at any stage is a positive number:
The distance between two adjacent cranes is greater than the safety distance
d, and a crane cannot cross over another crane that is behind it:
5. Experimental Results and Analysis
The experimental instances are designed to confirm HRL-BCMA’s effectiveness in addressing the DHFSP-MCT issue. The number of processed jobs is selected from 10, 20, 50, 100, 150, and 200. The number of processing stages is selected from 3, 4, and 5. The number of factories is selected from 2, 3, 4, 5, and 6. For each stage, the number of unrelated machines is generated between 1 and 4. For each combination, five test instances are generated. The processing time obeys a uniform distribution U[1,99] and is generated randomly.
The energy consumption parameters (, , , ) were obtained from the cooperating aluminum manufacturing plant’s equipment datasheets and validated through on-site measurements over five production days. For speed level 1 (fast), the measured crane energy consumption was kWh/h; for speed level 3 (slow), it was kWh/h. The machine’s idle power was measured at kWh/h across all stages. For hypothetical instances with , values were linearly scaled based on machine power ratings. Sensitivity analysis confirms that the relative algorithm ranking remains stable for variations in these parameters.
All experiments were conducted on a PC equipped with an Intel (R) Xeon (R) W-2123 CPU at 3.6 GHz, 16.00 GB of RAM, and a Windows 10 ×64 operating system. The proposed HRL-BCMA and all baseline algorithms were implemented in Python 3.9 using PyTorch 1.12. For each instance, the maximum computation time was set to s (where n is the number of jobs and m is the number of stages), with a minimum of 500 iterations for small instances () and 200 iterations for large instances (); early stopping was triggered if no improvement in Hypervolume (HV) or Inverted Generational Distance (IGD) was observed for 100 consecutive iterations. Each algorithm was run 10 independent times per instance using fixed random seeds , and the same seed set was used across all algorithms to ensure fair comparison. Parameter tuning for HRL-BCMA was performed using an orthogonal array with five independent runs per configuration, and the same orthogonal design was applied to tune the baseline algorithms (CMA, CBMA, IMOEA/D, MOHIG) on a representative subset of instances (20% of the dataset, stratified by n and m).
5.1. Parameter Calibration of the HRL-BCMA
The hyperparameters of our method are classified into two categories: (1) GIN architecture hyperparameters (number of layers, hidden dimension, learning rate) and (2) RL algorithm hyperparameters (population size , transfer ratio , learning rate , discount factor ).
Four key parameters are considered: population size , transfer ratio , learning rate , and discount factor . An orthogonal array is employed, with each parameter combination independently run five times across all instances.
Unlike single-objective parameter tuning, multi-objective optimization requires metrics that capture both convergence and diversity. We adopt two complementary metrics for calibration: Hypervolume (HV) to assess solution set quality, and Inverted Generational Distance (IGD) to measure proximity to the reference Pareto front. The final parameter configuration is selected based on the average ranking across both metrics.
The ANOVA results (
Table 3) confirm that
(population size) and
(learning rate) are the most influential parameters (
). The optimal configuration is identified as
,
,
,
, which is used in all subsequent experiments. The main effect plot of parameters is shown in
Figure 6. The interaction plot of
and
is shown in
Figure 7. The interaction plot of
and
is shown in
Figure 8.
5.2. Efficiency Analysis of the HRL-BCMA
The improvement part of HRL-BCMA consists of three main parts, including the population initialization method, the bi-population co-evolutionary strategy, and the hierarchical reinforcement learning strategy. To verify the effectiveness of the three main operators, the proposed HRL-BCMA algorithm is compared to the HRL-BCMA with random initialization (HRL-BCMA-RI), HRL-BCMA without bi-population co-evolutionary strategy (HRL-BCMA-CE), and HRL-BCMA without low-level agent (HRL-BCMA-LL), HRL-BCMA without high-level agent (HRL-BCMA-HL). In the compared method, the corresponding strategies are substituted by the random operation. Each HRL-BCMA variant algorithm runs independently 10 times on each instance.
The results of the comprehensive metric IGD of the compared algorithms are shown in
Figure 9 and
Table 4. The pairwise CM results of the compared algorithms are shown in
Figure 10. In
Figure 10, S1 represents HRL-BCMA, while SS1 represents HRL-BCMA-RI. The ONVG results of the compared algorithms are shown in
Table 4. The Pareto front of the compared algorithms is shown in
Figure 11. The results of these experiments show that the proposed HRL-BCMA algorithm outperforms the compared algorithms. The effectiveness of the strategies is verified through these experiments.
To further verify the necessity of the GIN architecture, we compare HRL-BCMA with a variant where the GIN policy network is replaced by a three-layer multi-layer perceptron (MLP) operating on flattened node features (denoted as HRL-BCMA-MLP). As shown in
Table 5, HRL-BCMA significantly outperforms HRL-BCMA-MLP in both IGD and HV metrics across all instance scales. This confirms that capturing the graph structure via GIN is essential for effective operator selection in the low-level agent.
To ensure fair comparison, all benchmark algorithms (CMA, CBMA, IMOEA/D, MOHIG) were implemented and executed under identical conditions.
All algorithms were allocated the same maximum computation time, set to s, where n is the number of jobs and m is the number of stages. This scaling ensures that larger instances receive proportionally more computational resources.
For each benchmark algorithm, we performed parameter tuning using the same orthogonal experimental design methodology. The tuning was conducted on a representative subset of instances (20% of the dataset), and the optimal parameter settings reported in the original papers were used as initial reference points. This two-stage tuning process ensures that each algorithm performs near its best on the DHFSP-MCT problem.
Each algorithm was run 10 independent times on each instance with different random seeds. Statistical results (mean, standard deviation) are reported for all metrics.
5.3. Comparison Results and Analysis
None of the selected baseline algorithms (CMA, CBMA, IMOEA/D, MOHIG) were originally designed for the DHFSP-MCT problem with multiple cranes, interference constraints, speed selection, and dynamic job release. To ensure a fair comparison, we adapted each baseline as follows:
Crane assignment: For algorithms without native crane handling, we added a greedy crane assignment heuristic (Rule 1 from
Section 4.5.4) as a post-processing step after each job sequencing operation. The heuristic selects the crane that minimizes waiting time while respecting safety distance constraints.
Crane speed selection: All baseline algorithms were extended with the same energy-aware speed selection rule (Rule 2) used in HRL-BCMA, with a fixed speed level for each run to avoid additional complexity.
Dynamic job release: For algorithms assuming static job release, we modified the problem input to assume all jobs are available at time zero (upper bound performance). This gives these baselines an advantage, making the comparison conservative for HRL-BCMA.
Parameter re-tuning: Each baseline was re-tuned using the same orthogonal experimental design ( with 5 runs per configuration) on a representative subset of 30 instances (20% of the dataset, stratified by n and m).
The comparison of the HRL-BCMA with the state-of-the-art algorithms to verify the performance of the HRL-BCMA for solving the DHFSP-MCT. The comparison algorithms used are CMA, CBMA, IMOEA/D and MOHIG. The values of the parameters of each comparison algorithm are set to the values recommended in the original article. The parameters of each algorithm are fine-tuned to optimize their performance on the DHFSP-MCT problem.
According to
Figure 12, it can be observed that the stability of the HRL-BCMA algorithm in terms of the IGD indicator is significantly better than that of the comparison algorithms.
Figure 13 demonstrates that the Pareto front of the HRL-BCMA algorithm dominates the results of the comparison algorithms, further highlighting its superiority in solving the DHFSP-MCT problem. These results indicate that under the given parameter settings, the HRL-BCMA algorithm exhibits better convergence and performance in solving this problem.
5.4. The Results of the Statistical Experiment
Through a comprehensive analysis of the performance of HRL-BCMA, CMA, CBMA, IMOEA/D, and MOHIG algorithms based on indicators such as Hypervolume (HV), Inverted Generational Distance (IGD), and Pareto Front, the results indicate that the HRL-BCMA algorithm ranks the highest, followed by the CBMA algorithm. The HV indicator is utilized to assess multi-objective optimization algorithms, reflecting the coverage of solution sets in the objective space. A higher HV value implies that the solution set is closer to the true Pareto front. Analysis results reveal that the solution set generated by the HRL-BCMA algorithm has a smaller distance to the true Pareto front, further confirming its outstanding performance in solving multi-objective optimization problems. Furthermore, through Pareto front comparison, it can be observed that the HRL-BCMA algorithm can dominate the results of other algorithms, indicating that its generated solution set is of a higher quality and better explores the solution space of the problem.
The Wilcoxon signed-rank test was used for pairwise comparisons between HRL-BCMA and each baseline algorithm. This non-parametric test was chosen because the performance metrics (IGD, HV) do not always follow a normal distribution, as verified by Shapiro-Wilk tests (
for most instances). The significance level was set at
for all tests. The effect size (Cohen’s
d) was computed for significant results to quantify the magnitude of improvement. Multiple comparison correction: When comparing more than two algorithms, the Holm–Bonferroni method was applied to control the family-wise error rate. Confidence intervals: 95% confidence intervals for mean IGD and HV values were computed using bootstrapping (10,000 resamples). The results of the Wilcoxon test are shown in
Figure 14. In summary, considering HV, IGD, and Pareto front indicators, the HRL-BCMA algorithm emerges as the optimal algorithm, with the CBMA algorithm following closely behind. Therefore, researchers should prioritize the use of the HRL-BCMA algorithm to achieve better results when addressing multi-objective optimization problems.
To further demonstrate the effectiveness of HRL-BCMA, we compare it with two recent reinforcement learning-based scheduling algorithms:
RL-HH: A DQN-based hyper-heuristic for distributed flexible job shop scheduling with crane transportation. We re-implemented this method with the same state features adapted to DHFSP-MCT.
Flat-DQN: A single-agent deep Q-network that directly selects improvement operators without hierarchical decomposition. The state space and action space are kept identical to the low-level agent of HRL-BCMA for fair comparison.
Both baselines were tuned using the same orthogonal design protocol described in
Section 5.1 and run under identical computational budgets (
s).
Table 6 reports the IGD and HV results averaged over all instances. HRL-BCMA outperforms RL-HH and Flat-DQN by 15.3% and 22.7% in IGD, respectively, confirming the advantage of the hierarchical architecture and the GIN-based policy.
5.5. Computational Efficiency Analysis
HRL-BCMA exhibits near-linear scaling with instance size, with runtime increasing from approximately 12 s for instances to 1240 s for instances. This scaling behavior is comparable to CBMA (the second-best algorithm) and significantly better than CMA and IMOEA/D, which show superlinear growth for .
The computational overhead of HRL-BCMA stems primarily from two sources:
The remaining 20% is attributed to initialization and Pareto set maintenance. While HRL-BCMA is computationally more intensive than simple meta-heuristics, the trade-off is justified by its superior solution quality.
5.6. Validation on Real-World-Inspired Case Study
To substantiate the practical effectiveness of HRL-BCMA beyond randomly generated instances, we constructed a high-fidelity simulation case study based on an aluminum manufacturing plant, which was the source of the problem abstraction.
The simulated workshop consists of three factories, each containing five processing stages with 2–4 machines per stage. Two overhead cranes (one large, one small) operate on shared runways. A total of 150 jobs were processed over a 72 h simulation horizon. Crane speeds, processing times, and energy consumption rates were derived from actual equipment specifications provided by the collaborating manufacturer.
The simulation was run 10 times for each algorithm, with performance measured by makespan and total energy consumption. The results show that HRL-BCMA achieved an average makespan of 2847 min (8.6% improvement over CBMA) and average energy consumption of 18,342 kWh (12.3% improvement over CBMA). Notably, the algorithm successfully avoided 94% of potential crane interference events compared to 78% for the best benchmark.
The computational overhead of HRL-BCMA (approximately 18 min per simulation run) is within acceptable limits for offline scheduling in this manufacturing context, where schedules are generated once per shift. The solution quality improvements translate to estimated annual savings of 285,000 kWh (approximately $22,800 at current industrial electricity rates) and a 7.5% increase in throughput.
To assess the robustness of HRL-BCMA, we varied three key parameters in the real-world case:
Job arrival rate: varied by from the nominal value.
Crane speed energy coefficients: varied by (representing uncertainty in energy measurements).
Safety distance d: varied by to test crane interference sensitivity.
For each variation, we re-ran all algorithms 10 times. HRL-BCMA maintains a makespan improvement of at least
and an energy improvement of at least
over CBMA across all tested variations. Notably, when the job arrival rate increases by
(stress condition), the improvement of HRL-BCMA over CBMA increases to
, indicating that the hierarchical RL agent adapts better to congestion. These results demonstrate the robustness of the proposed method under realistic uncertainties (
Table 7).