Next Article in Journal / Special Issue
Design of a Wideband Antenna for Wireless Network-On-Chip in Multimedia Applications
Previous Article in Journal
A Novel Design Flow for a Security-Driven Synthesis of Side-Channel Hardened Cryptographic Modules
Article Menu

Export Article

Open AccessArticle
J. Low Power Electron. Appl. 2017, 7(1), 5; doi:10.3390/jlpea7010005

Energy Efficiency Effects of Vectorization in Data Reuse Transformations for Many-Core Processors—A Case Study †

1
Department of Computer Science, Norwegian University of Science and Technology (NTNU), Trondheim NO-7491, Norway
2
Department of Electronic Systems, Norwegian University of Science and Technology (NTNU), Trondheim NO-7491, Norway
3
Barcelona Supercomputing Center (BSC), 08034 Barcelona, Spain
*
Author to whom correspondence should be addressed.
Academic Editor: Davide Patti
Received: 8 November 2016 / Revised: 8 February 2017 / Accepted: 10 February 2017 / Published: 22 February 2017
(This article belongs to the Special Issue Emerging Network-on-Chip Architectures for Low Power Embedded Systems)
View Full-Text   |   Download PDF [1178 KB, uploaded 22 February 2017]   |  

Abstract

Thread-level and data-level parallel architectures have become the design of choice in many of today’s energy-efficient computing systems. However, these architectures put substantially higher requirements on the memory subsystem than scalar architectures, making memory latency and bandwidth critical in their overall efficiency. Data reuse exploration aims at reducing the pressure on the memory subsystem by exploiting the temporal locality in data accesses. In this paper, we investigate the effects on performance and energy from a data reuse methodology combined with parallelization and vectorization in multi- and many-core processors. As a test case, a full-search motion estimation kernel is evaluated on Intel® CoreTM i7-4700K (Haswell) and i7-2600K (Sandy Bridge) multi-core processors, as well as on an Intel® Xeon PhiTM many-core processor (Knights Landing) with Streaming Single Instruction Multiple Data (SIMD) Extensions (SSE) and Advanced Vector Extensions (AVX) instruction sets. Results using a single-threaded execution on the Haswell and Sandy Bridge systems show that performance and EDP (Energy Delay Product) can be improved through data reuse transformations on the scalar code by a factor of ≈3× and ≈6×, respectively. Compared to scalar code without data reuse optimization, the SSE/AVX2 version achieves ≈10×/17× better performance and ≈92×/307× better EDP, respectively. These results can be improved by 10% to 15% using data reuse techniques. Finally, the most optimized version using data reuse and AVX512 achieves a speedup of ≈35× and an EDP improvement of ≈1192× on the Xeon Phi system. While single-threaded execution serves as a common reference point for all architectures to analyze the effects of data reuse on both scalar and vector codes, scalability with thread count is also discussed in the paper. View Full-Text
Keywords: performance; energy efficiency; data reuse transformation methodology; vectorization; parallel programming; Xeon Phi processor; KNL; SIMD performance; energy efficiency; data reuse transformation methodology; vectorization; parallel programming; Xeon Phi processor; KNL; SIMD
Figures

Figure 1

This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. (CC BY 4.0).

Scifeed alert for new publications

Never miss any articles matching your research from any publisher
  • Get alerts for new papers matching your research
  • Find out the new papers from selected authors
  • Updated daily for 49'000+ journals and 6000+ publishers
  • Define your Scifeed now

SciFeed Share & Cite This Article

MDPI and ACS Style

Al Hasib, A.; Natvig, L.; Kjeldsberg, P.G.; Cebrián, J.M. Energy Efficiency Effects of Vectorization in Data Reuse Transformations for Many-Core Processors—A Case Study †. J. Low Power Electron. Appl. 2017, 7, 5.

Show more citation formats Show less citations formats

Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Metrics

Article Access Statistics

1

Comments

[Return to top]
J. Low Power Electron. Appl. EISSN 2079-9268 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top