MDPI - Publisher of Open Access Journals

25 pages, 15500 KB

Open AccessArticle

Optimizing CNN Hardware Acceleration with Configurable Vector Units and Feature Layout Strategies

by Jinzhong He, Ming Zhang, Jian Xu, Lina Yu and Weijun Li

Electronics 2024, 13(6), 1050; https://doi.org/10.3390/electronics13061050 - 12 Mar 2024

Cited by 2 | Viewed by 2498

Convolutional neural network (CNN) hardware acceleration is critical to improve the performance and facilitate the deployment of CNNs in edge applications. Due to its efficiency and simplicity, channel group parallelism has become a popular method for CNN hardware acceleration. However, when processing data [...] Read more.

Convolutional neural network (CNN) hardware acceleration is critical to improve the performance and facilitate the deployment of CNNs in edge applications. Due to its efficiency and simplicity, channel group parallelism has become a popular method for CNN hardware acceleration. However, when processing data involving small channels, there will be a mismatch between feature data and computing units, resulting in a low utilization of the computing units. When processing the middle layer of the convolutional neural network, the mismatch between the feature-usage order and the feature-loading order leads to a low input feature cache hit rate. To address these challenges, this paper proposes an innovative method inspired by data reordering technology, aiming to achieve CNN hardware acceleration that reuses the same multiplier resources. This method focuses on transforming the hardware acceleration process into feature organization, feature block scheduling and allocation, and feature calculation subtasks to ensure the efficient mapping of continuous loading and the calculation of feature data. Specifically, this paper introduces a convolutional algorithm mapping strategy and a configurable vector operation unit to enhance multiplier utilization for different feature map sizes and channel numbers. In addition, an off-chip address mapping and on-chip cache management mechanism is proposed to effectively improve the feature access efficiency and on-chip feature cache hit rate. Furthermore, a configurable feature block scheduling policy is proposed to strike a balance between weight reuse and feature writeback pressure. Experimental results demonstrate the effectiveness of this method. When using 512 multipliers and accelerating VGG16 at 100 MHz, the actual computing performance reaches 102.3 giga operations per second (GOPS). Compared with other CNN hardware acceleration methods, the average computing array utilization is as high as 99.88% and the computing density is higher. Full article

► Show Figures

Figure 1

23 pages, 663 KB

Open AccessArticle

Joint Trajectory Design and Resource Optimization in UAV-Assisted Caching-Enabled Networks with Finite Blocklength Transmissions

by Yang Yang and Mustafa Cenk Gursoy

Drones 2024, 8(1), 12; https://doi.org/10.3390/drones8010012 - 4 Jan 2024

Cited by 4 | Viewed by 2818

Abstract

In this study, we design and analyze a reliability-oriented downlink wireless network assisted by unmanned aerial vehicles (UAVs). This network employs non-orthogonal multiple access (NOMA) transmission and finite blocklength (FBL) codes. In the network, ground user equipments (GUEs) request content from a remote [...] Read more.

In this study, we design and analyze a reliability-oriented downlink wireless network assisted by unmanned aerial vehicles (UAVs). This network employs non-orthogonal multiple access (NOMA) transmission and finite blocklength (FBL) codes. In the network, ground user equipments (GUEs) request content from a remote base station (BS), and there are no direct connections between the BS and the GUEs. To address this, we employ a UAV with a limited caching capacity to assist the BS in completing the communication. The UAV can either request uncached content from the BS and then serve the GUEs or directly transmit cached content to the GUEs. In this paper, we first introduce the decoding error rate within the FBL regime and explore caching policies for the UAV. Subsequently, we formulate an optimization problem aimed at minimizing the average maximum end-to-end decoding error rate across all GUEs while considering the coding length and maximum UAV transmission power constraints. We propose a two-step alternating optimization scheme embedded within a deep deterministic policy gradient (DDPG) algorithm to jointly determine the UAV trajectory and transmission power allocations, as well as blocklength of downloading phase, and our numerical results show that the combined learning-optimization algorithm efficiently addresses the considered problem. In particular, it is shown that a well-designed UAV trajectory, relaxing the FBL constraint, increasing the cache size, and providing a higher UAV transmission power budget all lead to improved performance. Full article

► Show Figures

Figure 1

24 pages, 6198 KB

Open AccessArticle

Resource Allocation in UAV-Enabled NOMA Networks for Enhanced Six-G Communications Systems

by Mostafa Mahmoud El-Gayar and Mohammed Nasser Ajour

Electronics 2023, 12(24), 5033; https://doi.org/10.3390/electronics12245033 - 17 Dec 2023

Cited by 17 | Viewed by 4326

Abstract

Enhancing energy efficiency, content distribution, latency, and transmission speeds are vital components of communication systems. Multiple access methods hold great promise for boosting these performance indicators. This manuscript evaluates the effectiveness of Non-Orthogonal Multiple Access (NOMA) and Orthogonal Multiple Access (OMA) systems within [...] Read more.

Enhancing energy efficiency, content distribution, latency, and transmission speeds are vital components of communication systems. Multiple access methods hold great promise for boosting these performance indicators. This manuscript evaluates the effectiveness of Non-Orthogonal Multiple Access (NOMA) and Orthogonal Multiple Access (OMA) systems within a single cell, where users are scattered randomly and rely on relays for dependability. This paper presents a model for improving energy efficiency, content distribution, latency, and transmission speeds in communication systems using NOMA and OMA systems within a single cell. Additionally, this paper also proposes a caching strategy using unmanned aerial vehicles (UAVs) as aerial base stations for ground users. These UAVs distribute cached content to minimize the overall latency of content demands from ground users while modifying their positions. We carried out simulations using various cache capacities and user counts linked to their respective UAVs. Furthermore, we evaluated OMA and NOMA in terms of the achievable rate and energy efficiency. The proposed model has achieved noteworthy enhancement across various scenarios including different sum rates, numbers of mobility users, diverse cache sizes, and amounts of power allocation. Full article

(This article belongs to the Special Issue Advances in 5G Wireless Edge Computing)

► Show Figures

Figure 1

20 pages, 661 KB

Open AccessArticle

Beamsteering-Aware Power Allocation for Cache-Assisted NOMA mmWave Vehicular Networks

by Wei Cao, Jinyuan Gu, Xiaohui Gu and Guoan Zhang

Electronics 2023, 12(12), 2653; https://doi.org/10.3390/electronics12122653 - 13 Jun 2023

Cited by 1 | Viewed by 1602

Abstract

Cache-enabled networks with multiple access (NOMA) integration have been shown to decrease wireless network traffic congestion and content delivery latency. This work investigates optimal power control in cache-assisted NOMA millimeter-wave (mmWave) vehicular networks, where mmWave channels experience double-Nakagami fading and the mmWave beamforming [...] Read more.

Cache-enabled networks with multiple access (NOMA) integration have been shown to decrease wireless network traffic congestion and content delivery latency. This work investigates optimal power control in cache-assisted NOMA millimeter-wave (mmWave) vehicular networks, where mmWave channels experience double-Nakagami fading and the mmWave beamforming is subjected to beamsteering errors. We aim to optimize vehicular quality of service while maintaining fairness among vehicles, through the maximization of successful signal decoding probability for paired vehicles. A comprehensive analysis is carried out to understand the decoding success probabilities under various caching scenarios, leading to the development of optimal power allocation strategies for diverse caching conditions. Moreover, an optimal power allocation is proposed for the single-antenna case, for exploiting the cached data as side information to cancel interference. The robustness of our proposed scheme against variations in beamforming orientation is assessed by studying the influence of beamsteering errors. Numerical results demonstrate the effectiveness of the proposed cache-assisted NOMA scheme in enhancing cache utility and NOMA efficiency, while underscoring the performance gains achievable with larger cache sizes. Full article

(This article belongs to the Special Issue Advanced Technologies and Applications in Computer Science and Engineering)

► Show Figures

Figure 1

18 pages, 1067 KB

Open AccessArticle

RCM: A Remote Cache Management Framework for Spark

by Yixin Song, Junyang Yu, Bohan Li, Han Li, Xin He, Jinjiang Wang and Rui Zhai

Appl. Sci. 2022, 12(22), 11491; https://doi.org/10.3390/app122211491 - 12 Nov 2022

Cited by 2 | Viewed by 2308

Abstract

With the rapid growth of Internet data, the performance of big data processing platforms is attracting more and more attention. In Spark, cache data are replaced by the Least Recently Used (LRU) Algorithm. LRU cannot identify the cost of cache data, which leads [...] Read more.

With the rapid growth of Internet data, the performance of big data processing platforms is attracting more and more attention. In Spark, cache data are replaced by the Least Recently Used (LRU) Algorithm. LRU cannot identify the cost of cache data, which leads to replacing some important cache data. In addition, the placement of cache data is random, which lacks a measure to find efficient cache servers. Focusing on the above problems, a remote cache management framework (RCM) for the Spark platform was proposed, including a cache weight generation module (CWG), cache replacement module (CREP), and cache placement module (CPL). CWG establishes initial weights from three main factors: the response time of the query database, the number of queries, and the data size. Then, CWG reduces the old data weight through a time loss function. CREP promises that the sum of cache data weights is maximized by a greedy strategy. CPL allocates the best cache server for data based on the Kuhn-Munkres matching algorithm to improve cooperation efficiency. To verify the effectiveness of RCM, RCM is implemented on Redis and deployed on eight computing nodes and four cache servers. Three groups of benchmark jobs, PageRank, K-means and WordCount, is tested. The result of experiments confirmed that compared with MCM, SACM and DMAOM, the execution time of RCM is reduced by 42.1% at most. Full article

► Show Figures

Figure 1

13 pages, 3226 KB

Open AccessFeature PaperArticle

Establishment of the Optimal Common Data Model Environment for EMR Data Considering the Computing Resources of Medical Institutions

by Tong Min Kim, Taehoon Ko, Yoon-sik Yang, Sang Jun Park, In-Young Choi and Dong-Jin Chang

Appl. Sci. 2021, 11(24), 12056; https://doi.org/10.3390/app112412056 - 17 Dec 2021

Cited by 1 | Viewed by 3494

Abstract

Electronic medical record (EMR) data vary between institutions. These data should be converted into a common data model (CDM) for multi-institutional joint research. To build the CDM, it is essential to integrate the EMR data of each hospital and load it according to [...] Read more.

Electronic medical record (EMR) data vary between institutions. These data should be converted into a common data model (CDM) for multi-institutional joint research. To build the CDM, it is essential to integrate the EMR data of each hospital and load it according to the CDM model, considering the computing resources of each hospital. Accordingly, this study attempts to share experiences and recommend computing resource-allocation designs. Here, two types of servers were defined: combined and separated servers. In addition, three database (DB) setting types were selected: desktop application (DA), online transaction processing (OLTP), and data warehouse (DW). Scale, TPS, average latency, 90th percentile latency, and maximum latency were compared across various settings. Virtual memory (vmstat) and disk input/output (disk) statuses were also described. Transactions per second (TPS) decreased as the scale increased in all DB types; however, the average, 90th percentile and maximum latencies exhibited no tendency according to scale. When compared with the maximum number of clients (DA client = 5, OLTP clients = 20, DW clients = 10), the TPS, average latency, 90th percentile latency, and maximum latency values were highest in the order of OLTP, DW, and DA. In vmstat, the amount of memory used for the page cache field and free memory currently available for DA, OLTP, and DW were large compared to other fields. In the disk, DA, OLTP, and DW all recorded the largest value in the average size of write requests, followed by the largest number of write requests per second. In summary, this study presents recommendations for configuring CDM settings. The configuration must be tuned carefully, considering the hospital’s resources and environment, and the size of the database must consider concurrent client connections, architecture, and connections. Full article

(This article belongs to the Special Issue New Trends in Medical Informatics II)

► Show Figures

Figure 1

19 pages, 1103 KB

Open AccessArticle

A Hierarchical Cache Size Allocation Scheme Based on Content Dissemination in Information-Centric Networks

by Hongyu Liu and Rui Han

Future Internet 2021, 13(5), 131; https://doi.org/10.3390/fi13050131 - 15 May 2021

Cited by 7 | Viewed by 3224

Abstract

With the rapid growth of mass content retrieval on the Internet, Information-Centric Network (ICN) has become one of the hotspots in the field of future network architectures. The in-network cache is an important feature of ICN. For better network performance in ICN, the [...] Read more.

With the rapid growth of mass content retrieval on the Internet, Information-Centric Network (ICN) has become one of the hotspots in the field of future network architectures. The in-network cache is an important feature of ICN. For better network performance in ICN, the cache size on each node should be allocated in proportion to its importance. However, in some current studies, the importance of cache nodes is usually determined by their location in the network topology, ignoring their roles in the actual content transmission process. In this paper, we focus on the allocation of cache size for each node within a given total cache space budget. We explore the impact of heterogeneous cache allocation on content dissemination under the same ICN infrastructure and we quantify the importance of nodes from content dissemination and network topology. To this purpose, we implement a hierarchy partitioning method based on content dissemination, then we formulate a set of weight calculation methods for these hierarchies and to provide a per-node cache space allocation to allocate the total cache space budget to each node in the network. The performance of the scheme is evaluated on the Garr topology, and the average hit ratio, latency, and load are compared to show that the proposed scheme has better performance in these aspects than other schemes. Full article

(This article belongs to the Section Network Virtualization and Edge/Fog Computing)

► Show Figures

Figure 1

23 pages, 1401 KB

Open AccessArticle

Polymorphic Memory: A Hybrid Approach for Utilizing On-Chip Memory in Manycore Systems

by Seung-Ho Lim, Hyunchul Seok and Ki-Woong Park

Electronics 2020, 9(12), 2061; https://doi.org/10.3390/electronics9122061 - 3 Dec 2020

Viewed by 2953

Abstract

The key challenges of manycore systems are the large amount of memory and high bandwidth required to run many applications. Three-dimesnional integrated on-chip memory is a promising candidate for addressing these challenges. The advent of on-chip memory has provided new opportunities to rethink [...] Read more.

The key challenges of manycore systems are the large amount of memory and high bandwidth required to run many applications. Three-dimesnional integrated on-chip memory is a promising candidate for addressing these challenges. The advent of on-chip memory has provided new opportunities to rethink traditional memory hierarchies and their management. In this study, we propose a polymorphic memory as a hybrid approach when using on-chip memory. In contrast to previous studies, we use the on-chip memory as both a main memory (called M1 memory) and a Dynamic Random Access Memory (DRAM) cache (called M2 cache). The main memory consists of M1 memory and a conventional DRAM memory called M2 memory. To achieve high performance when running many applications on this memory architecture, we propose management techniques for the main memory with M1 and M2 memories and for polymorphic memory with dynamic memory allocations for many applications in a manycore system. The first technique is to move frequently accessed pages to M1 memory via hardware monitoring in a memory controller. The second is M1 memory partitioning to mitigate contention problems among many processes. Finally, we propose a method to use M2 cache between a conventional last-level cache and M2 memory, and we determine the best cache size for improving the performance with polymorphic memory. The proposed schemes are evaluated with the SPEC CPU2006 benchmark, and the experimental results show that the proposed approaches can improve the performance under various workloads of the benchmark. The performance evaluation confirms that the average performance improvement of polymorphic memory is 21.7%, with 0.026 standard deviation for the normalized results, compared to the previous method of using on-chip memory as a last-level cache. Full article

(This article belongs to the Special Issue Storage Systems with Non-volatile Memory Devices)

► Show Figures

Figure 1

17 pages, 7072 KB

Open AccessArticle

Individual Tree Detection from Unmanned Aerial Vehicle (UAV) Derived Canopy Height Model in an Open Canopy Mixed Conifer Forest

by Midhun Mohan, Carlos Alberto Silva, Carine Klauberg, Prahlad Jat, Glenn Catts, Adrián Cardil, Andrew Thomas Hudak and Mahendra Dia

Forests 2017, 8(9), 340; https://doi.org/10.3390/f8090340 - 11 Sep 2017

Cited by 357 | Viewed by 28373

Abstract

Advances in Unmanned Aerial Vehicle (UAV) technology and data processing capabilities have made it feasible to obtain high-resolution imagery and three dimensional (3D) data which can be used for forest monitoring and assessing tree attributes. This study evaluates the applicability of low consumer [...] Read more.

Advances in Unmanned Aerial Vehicle (UAV) technology and data processing capabilities have made it feasible to obtain high-resolution imagery and three dimensional (3D) data which can be used for forest monitoring and assessing tree attributes. This study evaluates the applicability of low consumer grade cameras attached to UAVs and structure-from-motion (SfM) algorithm for automatic individual tree detection (ITD) using a local-maxima based algorithm on UAV-derived Canopy Height Models (CHMs). This study was conducted in a private forest at Cache Creek located east of Jackson city, Wyoming. Based on the UAV-imagery, we allocated 30 field plots of 20 m × 20 m. For each plot, the number of trees was counted manually using the UAV-derived orthomosaic for reference. A total of 367 reference trees were counted as part of this study and the algorithm detected 312 trees resulting in an accuracy higher than 85% (F-score of 0.86). Overall, the algorithm missed 55 trees (omission errors), and falsely detected 46 trees (commission errors) resulting in a total count of 358 trees. We further determined the impact of fixed tree window sizes (FWS) and fixed smoothing window sizes (SWS) on the ITD accuracy, and detected an inverse relationship between tree density and FWS. From our results, it can be concluded that ITD can be performed with an acceptable accuracy (F > 0.80) from UAV-derived CHMs in an open canopy forest, and has the potential to supplement future research directed towards estimation of above ground biomass and stem volume from UAV-imagery. Full article

(This article belongs to the Special Issue Remote Sensing of Leaf Area Index (LAI) and Other Vegetation Parameters)

► Show Figures

Graphical abstract

Search Results (9)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (9)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI