5.3. Experimental Results
For each workflow application, we selected different sizes for scheduling experiments to assess the performance of these algorithms across different scales and complexities, and the outcomes are detailed below:
Based on
Table 2, the performance of scheduling algorithms for the Cybershake workflow is evaluated across different task scales (100 and 1000) in terms of makespan and cost. Among the compared algorithms, HICA achieves the shortest makespan (314.38) for 100 tasks but ranks third for 1000 tasks with a makespan of 1305.88, following Greedy-Ant (1293.68) and ICA (1297.95). In terms of cost, HICA demonstrates outstanding performance, achieving the lowest values for both task scales, at 20,066.17 (100 tasks) and 100,200.61 (1000 tasks), respectively. These results indicate that HICA exhibits significant advantages in cost optimization while also maintaining strong competitiveness in execution time.
Table 3 shows the experimental results for different scheduling algorithms applied to the Montage workflow application at varying task sizes (100 and 1000 tasks), focusing on makespan and cost. For the 100-task case, HICA achieves the shortest makespan (103.13), while PEFT-GA has a relatively higher makespan (104.02). In the 1000-task scenario, HICA also performs well, achieving the lowest makespan (906.89), which was closely followed by Greedy-Ant (911.57). In terms of cost, HICA demonstrates the lowest costs across both task sizes. For instance, for the 1000-task case, HICA’s cost is 36,148.09, which is slightly higher than that of GA-PSO at 36,129.15. Overall, HICA showcases strong performance by achieving both low costs and optimized makespan, highlighting its comprehensive advantage.
Table 4 presents an analysis of various scheduling algorithms applied to the LIGO workflow at different scales. Among them, the HICA algorithm demonstrates superior performance with the lowest makespan and cost increasing as the workflow size expands from 100 to 1000. Specifically, HICA’s makespan rises from 1689.97 to 11,193.08, while its cost increases from 62,989.91 to 686,825.01, showcasing its efficiency in managing larger tasks. In comparison, other algorithms like PEFT-GA, GA-PSO, Greedy-Ant, and ICA show a more pronounced degradation in performance with increased scale, which is indicated by higher makespan and cost values. This analysis highlights HICA’s effectiveness in balancing time and cost, making it the most suitable for large-scale LIGO workflow applications among the algorithms tested.
Table 5 presents the experimental results across different scales (100 and 1000) of SIPHT workflow application, comparing the Makepsan and Cost of HICA with those of other algorithms. In terms of Makespan, GA-PSO achieves the shortest execution time for both task scales, with 4471.01 for 100 tasks and 9568.78 for 1000 tasks, while HICA ranks second for 1000 tasks with a Makespan of 9801.55. Regarding cost, HICA demonstrates outstanding performance, achieving the lowest cost for both task scales, at 50,637.07 for 100 tasks and 520,693.86 for 1000 tasks. These results indicate that HICA achieves significant cost optimization while maintaining competitive Makespan performance, and GA-PSO excels in reducing execution time.
Table 6 presents the experimental results of five scheduling algorithms (PEFT-GA, GA-PSO, Greedy-Ant, ICA, and HICA) applied to the Epigenomics workflow under two task scales (100 and 997). In terms of Makespan, Greedy-Ant achieves the shortest time for 100 tasks (32,216.08), which is followed closely by HICA (32,286.09). For 997 tasks, HICA demonstrates the best performance with the shortest Makespan (211,805.01). Regarding cost, HICA consistently achieves the lowest values for both task scales with 1,217,159.95 for 100 tasks and 11,611,202.10 for 997 tasks. These results indicate that HICA excels in cost optimization while maintaining strong Makespan performance particularly for larger task scales. Greedy-Ant performs well in Makespan for smaller task scales, while PEFT-GA generally has higher Makespan and cost values compared to the other algorithms.
Based on the results from
Figure 7 and
Figure 8, we observe significant differences in the performance of various scheduling algorithms (PEFT-GA, GA-PSO, Greedy-Ant, ICA, and HICA) when applied to five typical scientific workflows (Cybershake, Montage, LIGO, SIPHT, and Epigenomics). The performance, measured by Makespan (execution time), was evaluated for both 100-task and 1000-task scales. For the 100-task scale, the differences in performance between algorithms are relatively small, particularly for the Cybershake and Montage workflows, where the Makespan values are very close across all algorithms. However, HICA demonstrates superior scheduling efficiency in most workflows, achieving the shortest Makespan in both Cybershake and Epigenomics workflows. This highlights the strength of HICA in smaller task scales. In the Epigenomics workflow, Greedy-Ant performs comparably to HICA, while other algorithms, such as PEFT-GA and ICA, fall slightly behind. For LIGO and SIPHT workflows, the performance differences are minor, although GA-PSO and ICA show slightly better results compared to the others. As the task scale increases to 1000 tasks, the performance differences between the algorithms become more pronounced. In the Cybershake workflow, Greedy-Ant achieves the shortest Makespan, outperforming HICA, which drops slightly in its ranking. In the Montage workflow, GA-PSO shows a clear advantage, outperforming all other algorithms. For the LIGO and SIPHT workflows, GA-PSO and Greedy-Ant deliver comparable execution times, while HICA’s performance declines slightly compared to the 100-task scale. In contrast, for the Epigenomics workflow, HICA once again demonstrates its strength, achieving the shortest Makespan among all algorithms, highlighting its exceptional capability in handling complex workflows.
The analysis of
Figure 9 and
Figure 10 highlights the distinct cost performance of different scheduling algorithms across five scientific workflows applications under task scales of 100 and 1000. At the 100-task scale, HICA consistently achieves the lowest cost across all workflows, demonstrating its strong cost optimization capabilities. Particularly in the Cybershake and Epigenomics workflows, HICA significantly outperforms the other algorithms. Although Greedy-Ant and GA-PSO occasionally show competitive performance in certain workflows, their overall cost-efficiency is still inferior to HICA. PEFT-GA and ICA generally incur higher costs, indicating room for improvement in handling smaller-scale workflows. For the 1000-task scale, HICA continues to lead in cost optimization, securing the lowest costs across all workflows. Its advantage becomes particularly evident in the Epigenomics workflow, where it achieves a remarkably lower cost compared to the other algorithms. This demonstrates HICA’s ability to handle large-scale workflows effectively. In contrast, the other algorithms exhibit less consistent performance. PEFT-GA and ICA show relatively high costs, especially in complex workflows such as Epigenomics and SIPHT, suggesting limitations in their scalability. While Greedy-Ant and GA-PSO improve in some cases, they are still outperformed by HICA in all scenarios. These findings underline HICA’s superiority in cost optimization and scalability across different task scales. Its consistent performance, regardless of workflow type or scale, makes it a highly reliable choice for cost-sensitive scientific workflow applications. In comparison, the other algorithms may be suitable for specific workflows or scales but lack the overall robustness and adaptability of HICA. This emphasizes the importance of choosing an algorithm that balances cost, scalability, and adaptability, with HICA emerging as the optimal choice for diverse and large-scale workflow scenarios.
5.4. An Actual Application Scenario of HICA: Earth System Model (ESM) Parameter Tuning
To further test the scheduling performance of HICA, we added real-world scientific workflow application scenarios based on the previous section. We selected a scientific workflow for tuning the parameters of Earth system models in atmospheric science, which is widely used in climate research. Running Earth system models is a typical data-intensive process, with global climate simulations potentially lasting decades or even centuries, generating TB- to PB-level data. This not only leads to a significant consumption of storage resources but also requires the parallel involvement of numerous computational nodes to accelerate the execution of complex computations. For scientific research applications of this scale, introducing workflows can effectively manage both computational and storage resources while modularizing and formalizing complex scientific computing processes, thereby improving overall efficiency.
A complete Earth system model parameter tuning workflow consists of the following parts:
Preprocessing: This involves the collection, cleaning, and preparation of input data, such as climate variables and initial model parameters, for model simulations.
Model Simulation: The Earth system model (ESM) is executed using the prepared input data. This step may involve multiple iterations to simulate various climate scenarios over extended time periods, often spanning decades or centuries.
Parameter Tuning: This step adjusts the model parameters to optimize the accuracy of the simulation results. Various optimization techniques, including surrogate models, genetic algorithms, may be used to find the best parameter settings.
Postprocessing: After the simulations, the results are analyzed and compared against observational data or benchmarks to evaluate the performance of the model. This may include error analysis, visualization of results, and statistical testing.
We apply the HICA algorithm proposed in this paper to parameter tuning in Earth System Models, evaluating the changes in workflow makespan and cost after introducing the scheduling algorithm. We normalize the results by using the makespan and cost in the initial scenario without any scheduling algorithm as the baseline and compare the changes after applying HICA. The results are shown in
Figure 11. The HICA algorithm achieves varying degrees of improvement in both makespan and cost in workflow execution with a 13% improvement in makespan and a more significant 21% improvement in cost. These results demonstrate that the HICA algorithm not only effectively addresses the scientific workflow scheduling issues encountered in WorkflowSim but also significantly enhances the execution efficiency of workflows in specific scientific application scenarios.