Next Article in Journal
Heuristic Approaches for Location Assignment of Capacitated Services in Smart Cities
Previous Article in Journal
Specification and Verification in Integrated Model of Distributed Systems (IMDS)
Previous Article in Special Issue
Model Structure Optimization for Fuel Cell Polarization Curves
Article Menu
Issue 4 (December) cover image

Export Article

Open AccessReview
Computers 2018, 7(4), 66; https://doi.org/10.3390/computers7040066

Extending NUMA-BTLP Algorithm with Thread Mapping Based on a Communication Tree

Department of Computers and Software Engineering, Politehnica University of Timișoara, 300006 Timișoara, Romania
Received: 22 September 2018 / Revised: 28 November 2018 / Accepted: 29 November 2018 / Published: 3 December 2018
Full-Text   |   PDF [1112 KB, uploaded 11 December 2018]   |  

Abstract

The paper presents a Non-Uniform Memory Access (NUMA)-aware compiler optimization for task-level parallel code. The optimization is based on Non-Uniform Memory Access—Balanced Task and Loop Parallelism (NUMA-BTLP) algorithm Ştirb, 2018. The algorithm gets the type of each thread in the source code based on a static analysis of the code. After assigning a type to each thread, NUMA-BTLP Ştirb, 2018 calls NUMA-BTDM mapping algorithm Ştirb, 2016 which uses PThreads routine pthread_setaffinity_np to set the CPU affinities of the threads (i.e., thread-to-core associations) based on their type. The algorithms perform an improve thread mapping for NUMA systems by mapping threads that share data on the same core(s), allowing fast access to L1 cache data. The paper proves that PThreads based task-level parallel code which is optimized by NUMA-BTLP Ştirb, 2018 and NUMA-BTDM Ştirb, 2016 at compile-time, is running time and energy efficiently on NUMA systems. The results show that the energy is optimized with up to 5% at the same execution time for one of the tested real benchmarks and up to 15% for another benchmark running in infinite loop. The algorithms can be used on real-time control systems such as client/server based applications which require efficient access to shared resources. Most often, task parallelism is used in the implementation of the server and loop parallelism is used for the client. View Full-Text
Keywords: thread mapping; NUMA systems; data locality; static code analysis; PThreads Library thread mapping; NUMA systems; data locality; static code analysis; PThreads Library
Figures

Figure 1

This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).
SciFeed

Share & Cite This Article

MDPI and ACS Style

Știrb, I. Extending NUMA-BTLP Algorithm with Thread Mapping Based on a Communication Tree. Computers 2018, 7, 66.

Show more citation formats Show less citations formats

Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Related Articles

Article Metrics

Article Access Statistics

1

Comments

[Return to top]
Computers EISSN 2073-431X Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top