Next Article in Journal
Image Reconstruction in Diffuse Optical Tomography Using Adaptive Moment Gradient Based Optimizers: A Statistical Study
Next Article in Special Issue
An Optimization Study to Evaluate the Impact of the Supercritical CO2 Brayton Cycle’s Components on Its Overall Performance
Previous Article in Journal
On Combining Static, Dynamic and Interactive Analysis Security Testing Tools to Improve OWASP Top Ten Security Vulnerability Detection in Web Applications
Previous Article in Special Issue
A Novel Optimization Algorithm for Echium amoenum Petals Drying
Article

Performance Analysis of Thread Block Schedulers in GPGPU and Its Implications

1
Embedded Software Research Center, Ewha University, Seoul 03760, Korea
2
Department of Computer Engineering, Ewha University, Seoul 03760, Korea
*
Author to whom correspondence should be addressed.
Appl. Sci. 2020, 10(24), 9121; https://doi.org/10.3390/app10249121
Received: 30 November 2020 / Revised: 14 December 2020 / Accepted: 17 December 2020 / Published: 20 December 2020
(This article belongs to the Special Issue Recent Advances in Sustainable Process Design and Optimization)
GPGPU (General-Purpose Graphics Processing Unit) consists of hardware resources that can execute tens of thousands of threads simultaneously. However, in reality, the parallelism is limited as resource allocation is performed by the base unit called thread block, which is not managed judiciously in the current GPGPU systems. To schedule threads in GPGPU, a specialized hardware scheduler allocates thread blocks to the computing unit called SM (Stream Multiprocessors) in a Round-Robin manner. Although scheduling in hardware is simple and fast, we observe that the Round-Robin scheduling is not efficient in GPGPU, as it does not consider the workload characteristics of threads and the resource balance among SMs. In this article, we present a new thread block scheduling model that has the ability of analyzing and quantifying the performances of thread block scheduling. We implement our model as a GPGPU scheduling simulator and show that the conventional thread block scheduling provided in GPGPU hardware does not perform well as the workload becomes heavy. Specifically, we observe that the performance degradation of Round-Robin can be eliminated by adopting DFA (Depth First Allocation), which is simple but scalable. Moreover, as our simulator consists of modular forms based on the framework and we publicly open it for other researchers to use, various scheduling policies can be incorporated into our simulator for evaluating the performance of GPGPU schedulers. View Full-Text
Keywords: thread block; GPGPU; thread block scheduling; Round-Robin thread block; GPGPU; thread block scheduling; Round-Robin
Show Figures

Figure 1

MDPI and ACS Style

Cho, K.; Bahn, H. Performance Analysis of Thread Block Schedulers in GPGPU and Its Implications. Appl. Sci. 2020, 10, 9121. https://doi.org/10.3390/app10249121

AMA Style

Cho K, Bahn H. Performance Analysis of Thread Block Schedulers in GPGPU and Its Implications. Applied Sciences. 2020; 10(24):9121. https://doi.org/10.3390/app10249121

Chicago/Turabian Style

Cho, KyungWoon, and Hyokyung Bahn. 2020. "Performance Analysis of Thread Block Schedulers in GPGPU and Its Implications" Applied Sciences 10, no. 24: 9121. https://doi.org/10.3390/app10249121

Find Other Styles
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Back to TopTop