Loop-Block-Level Automatic Parallelization in Compilers
Abstract
1. Introduction
2. Loop Block Thread Allocation Method
2.1. Limitations of Automatic Parallel Design
2.2. Loop Block Thread Allocation Strategy
2.3. Experimental Analysis
3. Thread Selection Strategy Combined with Iterative Compilation
3.1. Selection of Iterative Algorithm
3.2. Algorithm Design
3.2.1. Chromosome Representation and Fitness Function
3.2.2. Initial Population Generation
3.2.3. Parent Selection and Crossover Operation
3.2.4. Mutation Operation and Genetic Factor Weighting
3.2.5. Algorithm Convergence
3.3. The Overall Structure of the Algorithm
4. Translation Result
4.1. Test Environment and Results
- Serial execution (thread count = 1);
- Baseline automatic parallelization with four different thread counts (thread count = 8 16 24 32);
- The parallel thread allocation algorithm proposed in this paper.
| 1 | 8 | 16 | 24 | 32 | GA | Improvement Ratio | |
|---|---|---|---|---|---|---|---|
| 403.gcc | 8.31 | 8.47 | 8.21 | 8.37 | 8.35 | 8.50 | 1.00 |
| 410.bwaves | 5.35 | 11.50 | 10.04 | 8.45 | 7.00 | 12.70 | 1.10 |
| 429.mcf | 4.60 | 4.88 | 4.60 | 6.65 | 4.64 | 6.77 | 1.02 |
| 433.milc | 6.23 | 6.30 | 6.29 | 5.11 | 6.23 | 6.30 | 1.00 |
| 435.gromacs | 9.22 | 9.22 | 9.22 | 9.17 | 9.12 | 9.26 | 1.00 |
| 436.cactusADM | 10.30 | 31.50 | 34.50 | 38.40 | 48.40 | 57.40 | 1.19 |
| 459.GemsFDTD | 7.03 | 20.60 | 21.20 | 22.02 | 20.80 | 23.70 | 1.08 |
| 471.omnetpp | 10.60 | 9.27 | 9.76 | 9.23 | 9.65 | 10.60 | 1.00 |
| 473.astar | 7.67 | 8.78 | 8.67 | 8.53 | 8.76 | 8.92 | 1.02 |
| 481.wrf | 9.76 | 11.30 | 9.08 | 9.11 | 8.57 | 11.33 | 1.00 |
| 483.xalancbmk | 9.37 | 9.49 | 10.10 | 9.89 | 10.10 | 10.20 | 1.01 |
4.2. Analysis of Experimental Results
5. Conclusions
- Inspired by iterative compilation, a genetic algorithm is integrated as an iterator to solve adaptive thread parameter selection issues.
- The combination forms the loop-block-level automatic parallelization method, validated by experiments to achieve effective speedups.
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Munoz, R. Furthering Moore’s Law Integration Benefits in the Chiplet Era. IEEE Des. Test 2024, 41, 81–90. [Google Scholar] [CrossRef]
- Menon, H.; Diffenderfer, J.; Georgakoudis, G.; Laguna, I.; Lam, M.O.; Osei-Kuffuor, D.; Parasyris, K.; Vanover, J.; Schordan, M. Approximate High-Performance Computing: A Fast and Energy-Efficient Computing Paradigm in the Post-Moore Era. IT Prof. 2023, 25, 7–15. [Google Scholar] [CrossRef]
- Zhao, Y.; Du, Z.; Guo, Q.; Xu, Z.; Chen, Y. Rescue to the Curse of Universality. Sci. China Inf. Sci. 2023, 66, 192102. [Google Scholar] [CrossRef]
- Kadosh, T.; Hasabnis, N.; Mattson, T.; Pinter, Y.; Oren, G. Quantifying OpenMP: Statistical Insights into Usage and Adoption. In Proceedings of the 2023 IEEE High Performance Extreme Computing Conference (HPEC 2023), Boston, MA, USA, 25–29 September 2023. [Google Scholar] [CrossRef]
- Matsuoka, S.; Domke, J.; Wahib, M.; Drozd, A.; Hoefler, T. Myths and Legends in High-Performance Computing. Int. J. High Perform. Comput. Appl. 2023, 37, 245–259. [Google Scholar] [CrossRef]
- TOP500 Team. Frontier Remains No.1 in the TOP500 but Aurora with Intel’s Sapphire Rapids Chips Enters with a Half-Scale System at No. 2. 2023-11-13. Available online: https://www.top500.org/news/frontier-remains-no-1-in-the-top500-but-aurora-with-intels-sapphire-rapids-chips-enters-with-a-half-scale-system-at-no-2/ (accessed on 14 January 2024).
- Kung, H.; Leiserson, C. Systolic Arrays (for VLSI). In Sparse Matrix Proceedings; Society for Industrial and Applied Mathematics: Philadelphia, PA, USA, 1979; Volume 1, pp. 256–282. [Google Scholar]
- Hennessy, J.L.; Patterson, D.A. A New Golden Age for Computer Architecture. Commun. ACM 2019, 62, 48–60. [Google Scholar] [CrossRef]
- Backus, J.; Dai, M. Can Programming Be Liberated from the von Neumann Style? Functional Programming and Its Algebra of Programs. J. Comput. Sci. 1984, 3, 21–43. [Google Scholar]
- Kurra, S.; Singh, N.K.; Panda, P.R. The Impact of Loop Unrolling on Controller Delay in High Level Synthesis. In Proceedings of the Design, Automation & Test in Europe Conference & Exhibition; IEEE Computer Society: Piscataway, NJ, USA, 2007. [Google Scholar]
- Cooper, K.D.; Harvey, T.J.; Waterman, T. An Adaptive Strategy for Inline Substitution. In Proceedings of the International Conference on Compiler Construction; Springer: Berlin, Germany, 2008. [Google Scholar]
- Liu, H.; Xu, J.L.; Zhao, R.C.; Jinyang, Y. Compiler Optimization Sequence Selection Method Guided by Learning Model. Comput. Res. Dev. 2019, 56, 2012–2026. [Google Scholar]
- Sato, Y.; Yuki, T.; Endo, T. An Autotuning Framework for Scalable Execution of Tiled Code via Iterative Polyhedral Compilation. ACM Trans. Archit. Code Optim. 2019, 15, 67. [Google Scholar] [CrossRef]
- Nobre, R.; Martins, L.G.A.; Cardoso, J.M.P. A Graph-Based Iterative Compiler Pass Selection and Phase Ordering Approach. ACM SIGPLAN Not. 2016, 51, 21–30. [Google Scholar] [CrossRef]
- Liu, H.; Zhao, R.C.; Wang, Q. Parameter Selection Method for Function Level Compiler Optimization Guided by Supervised Learning Model. Comput. Eng. Sci. 2018, 40, 957–968. [Google Scholar]
- Fursin, G.; Miranda, C.; Temam, O. MILEPOST GCC: Machine Learning Based Research Compiler. In Proceedings of the GCC Developers’ Summit, Ottawa, ON, Canada, 17–19 June 2008. [Google Scholar]
- Park, E.; Cavazos, J.; Alvarez, M.A. Using Graph-Based Program Characterization for Predictive Modeling. In Proceedings of the Tenth International Symposium on Code Generation and Optimization; Association for Computing Machinery: New York, NY, USA, 2012; pp. 196–206. [Google Scholar]
- Li, Y.P. Load-Balanced OpenMP Static Scheduling Method Under Multi-Threading. Master’s Thesis, Zhengzhou University, Zhengzhou, China, 2022. [Google Scholar] [CrossRef]
- Tzen, T.H.; Ni, L.M. Trapezoid Self-Scheduling: A Practical Scheduling Scheme for Parallel Compilers. IEEE Trans. Parallel Distrib. Syst. 1993, 4, 87–98. [Google Scholar] [CrossRef] [PubMed]
- Gao, Y.C.; Zhao, R.C.; Han, L.; Li, L. Research on Loop Automatic Parallelization Technology. J. Inf. Eng. Univ. 2019, 20, 82–89. [Google Scholar]
- Schryen, G. Speedup and Efficiency of Computational Parallelization: A Unifying Approach and Asymptotic Analysis. J. Parallel Distrib. Comput. 2024, 187, 104835. [Google Scholar] [CrossRef]
- Zhao, D.S. Research on the Thread-Level Speculation Execution Model for LLVM Compiler. Ph.D. Thesis, Northwest A&F University, Xianyang, China, 2021. [Google Scholar] [CrossRef]
- Xu, M. Research on Key Technologies of Reconfigurable Manycore Stream Processor Architecture. Master’s Thesis, University of Science and Technology of China, Hefei, China, 2012. [Google Scholar]
- Liu, B.; Zhao, Y.L.; Han, B.; Li, Y.X.; Ji, S.; Feng, B.Q.; Wu, W.J. A Loop Selection Approach Based on Performance Prediction for Speculative Multithreading. J. Electron. Inf. Technol. 2014, 36, 2768–2774. [Google Scholar] [CrossRef]
- Nie, K. Research on Multi-threaded Compilation Optimization Techniques for Master-Slave Hybrid Architecture of CPU. Ph.D. Thesis, Information Engineering University of the Strategic Support Force, Zhengzhou, China, 2021. [Google Scholar] [CrossRef]
- Xu, J.; Wang, G.; Han, L.; Nie, K.; Li, H.; Chen, M.; Liu, H. Research on Parallel Scheduling Strategy Optimization Technology Based on Sunway Compiler. Comput. Sci. 2025, 52, 137–143. [Google Scholar]





Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Chen, M.; Zhou, Q.; Nie, K.; Li, H. Loop-Block-Level Automatic Parallelization in Compilers. Appl. Sci. 2026, 16, 1533. https://doi.org/10.3390/app16031533
Chen M, Zhou Q, Nie K, Li H. Loop-Block-Level Automatic Parallelization in Compilers. Applied Sciences. 2026; 16(3):1533. https://doi.org/10.3390/app16031533
Chicago/Turabian StyleChen, Mengyao, Qinglei Zhou, Kai Nie, and Haoran Li. 2026. "Loop-Block-Level Automatic Parallelization in Compilers" Applied Sciences 16, no. 3: 1533. https://doi.org/10.3390/app16031533
APA StyleChen, M., Zhou, Q., Nie, K., & Li, H. (2026). Loop-Block-Level Automatic Parallelization in Compilers. Applied Sciences, 16(3), 1533. https://doi.org/10.3390/app16031533
