Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (15)

Search Parameters:
Keywords = High-Bandwidth memory (HBM)

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
17 pages, 9736 KB  
Article
Development and Optimization of Fine-Pitch RDL for RDL Interposer and Embedded Bridge Die Interposer Fabrication Using Fan-Out Wafer-Level Packaging Technology
by Jung Won Lee, Sung Hyuk Lee, Jay Kim, Lewis Kang, Han Ju Yu, Min Ji Lee, Seong Hwan Han, Jae Kyung Lee, Hailey Hwang, Jung Gi Kim, Chan Young Hong, Jade Park, Su Hyun Kim, Myeung Jin Kim and Moon Jung Kim
Microelectronics 2026, 2(1), 3; https://doi.org/10.3390/microelectronics2010003 - 11 Feb 2026
Viewed by 1453
Abstract
Fine-pitch redistribution layers (RDLs) are key enabling technologies for fan-out wafer-level packaging (FOWLP)-based interposers used in chiplet and high-bandwidth memory (HBM) integration. In this study, a CAR-based photolithography process optimized for fine-pitch RDL fabrication was evaluated to realize 2 μm/2 μm line/space (L/S) [...] Read more.
Fine-pitch redistribution layers (RDLs) are key enabling technologies for fan-out wafer-level packaging (FOWLP)-based interposers used in chiplet and high-bandwidth memory (HBM) integration. In this study, a CAR-based photolithography process optimized for fine-pitch RDL fabrication was evaluated to realize 2 μm/2 μm line/space (L/S) RDL structures in an FOWLP environment. Key lithographic parameters, including exposure energy, focus offset, and thermal processing conditions, were systematically optimized to establish a stable and reproducible process window. Cross-sectional analysis confirmed the structural integrity of the electroplated RDL features formed under the optimized conditions. To assess functional feasibility, channel-level electrical simulations were performed using JEDEC-defined HBM3 signal assignments. Simulated eye diagrams indicate that the fabricated fine-pitch RDL interconnects are capable of supporting HBM3-class signal transmission with a moderate level of signal integrity. The presence of jitter and noise suggests that further optimization of RDL transmission line impedance is required. Rather than presenting a fully optimized interposer solution, this work provides an engineering-level assessment of lithographic and process constraints associated with implementing 2 μm class RDLs in FOWLP-based interposers, offering practical insight into fine-pitch RDL process window definition for advanced packaging applications. This work uniquely combines systematic CAR-based lithography optimization with cross-sectional structural validation and HBM3-class channel-level simulations to define a practical process window for 2 μm/2 μm RDLs in an FOWLP environment. Full article
Show Figures

Figure 1

28 pages, 3812 KB  
Article
Vertical vs. Horizontal Integration in HBM and Market-Implied Valuation: A Text-Mining Study
by Hyang Ja Yang and Cheong Kim
Appl. Sci. 2025, 15(22), 12127; https://doi.org/10.3390/app152212127 - 15 Nov 2025
Viewed by 2601
Abstract
High-bandwidth memory (HBM) has become a strategic bottleneck in AI-centric systems, shifting competitive advantage from computing power alone to a design that is orchestrated by memory and packaging. We investigate whether publicly available information about companies’ integration decisions—vertical integration by Samsung Electronics and [...] Read more.
High-bandwidth memory (HBM) has become a strategic bottleneck in AI-centric systems, shifting competitive advantage from computing power alone to a design that is orchestrated by memory and packaging. We investigate whether publicly available information about companies’ integration decisions—vertical integration by Samsung Electronics and horizontal partnerships by SK Hynix—is included in market-expected valuation. We create a Korean-language news corpus spanning January 2023 to September 2025 and use seed-guided topic models to obtain firms’ vertical and horizontal integration. We verify qualitative distinguishability with t-SNE embeddings and use firm-specific ordinary least squares specifications to link topic intensities to equity prices. According to research findings, for Samsung, consolidation-oriented vertical indicators (M&A and risk ring-fencing) positively correlate with valuation, whereas supplier-enablement or operational vertical topics are not reliably factored into their valuation. Vendor-assisted scale-up and joint development topics support positive valuation for SK Hynix. This study provides a scalable framework for text evaluation, which distinguishes between general sentiment and strategic architecture, as well as evidence that capital markets reward consolidation and alliance execution differently depending on the management of the HBM bottleneck. Full article
(This article belongs to the Special Issue Big Data Technology and Its Applications)
Show Figures

Figure 1

23 pages, 6275 KB  
Article
Effects of Hydrolysis Reaction and Abrasive Drag Force Accelerator on Enhancing Si-Wafer Polishing Rate and Improving Si-Wafer Surface Roughness
by Min-Uk Jeon, Pil-Su Kim, Man-Hyup Han, Se-Hui Lee, Hye-Min Lee, Su-Bin Kim, Jin-Hyung Park, Kyoo-Chul Cho, Jinsub Park and Jea-Gun Park
Nanomaterials 2025, 15(16), 1248; https://doi.org/10.3390/nano15161248 - 14 Aug 2025
Viewed by 2119
Abstract
To satisfy the superior surface quality requirements in the fabrication of HBM (High-Bandwidth Memory) and 3D NAND Flash Memory, high-efficiency Si chemical mechanical planarization (CMP) is essential. In this study, a colloidal silica abrasive-based Si-wafer CMP slurry was developed to simultaneously achieve a [...] Read more.
To satisfy the superior surface quality requirements in the fabrication of HBM (High-Bandwidth Memory) and 3D NAND Flash Memory, high-efficiency Si chemical mechanical planarization (CMP) is essential. In this study, a colloidal silica abrasive-based Si-wafer CMP slurry was developed to simultaneously achieve a high polishing rate (≥10 nm/min) and low surface roughness (≤0.2 nm) without inducing CMP-induced scratches. The proposed Si-wafer CMP slurry incorporates two functional components: triammonium phosphate (TAP) as a hydrolysis reaction accelerator and hydroxyethyl cellulose (HEC) as an abrasive drag force accelerator. The polishing rate enhancement mechanism of TAP was analyzed by monitoring the OH mol concentration, surface adsorption behavior, and XPS spectra. The results showed that increasing the TAP concentration raised the OH mol concentration and converted Si–Si and Si–O–Si bonds to Si–OH via a hydrolysis reaction, thereby increasing the polishing rate. However, excessive hydrolysis also led to increased surface roughness. On the other hand, HEC influenced slurry viscosity, abrasive dispersibility, and drag force. At low HEC concentrations, increased abrasive drag force improved the polishing rate. At high concentrations, however, HEC formed a hindrance layer on the Si surface via hydrogen bonding and condensation reactions, reducing the effective contact area of abrasives and thus decreasing the polishing rate. By optimizing the concentrations of TAP (0.0037 wt%) and HEC (≤0.0024 wt%), the proposed slurry formulation achieved high-performance Si-wafer CMP, satisfying both surface roughness and polishing rate targets required for advanced memory packaging applications. Full article
(This article belongs to the Section Nanocomposite Materials)
Show Figures

Figure 1

16 pages, 3521 KB  
Article
HBM Package Interconnection Pseudo All-Channel Signal Integrity Simulation and Implementation Method of the Synchronous Current Load Research
by Wen-Xue Tang, Cong-Jian Mai, Li-Yan Zhou, Ying Sun, Xin-Ran Zhao, Shu-Li Liu, Gang Wang, Da-Wei Wang and Cheng-Qian Wang
Micromachines 2025, 16(8), 896; https://doi.org/10.3390/mi16080896 - 31 Jul 2025
Cited by 1 | Viewed by 3735
Abstract
This paper proposes a pseudo full-channel signal integrity (SI) simulation method tailored for high-bandwidth memory (HBM) interconnects. In this approach, real interconnect models are applied to selected portions of the channel, while the remaining sections are replaced with synchronized current loads that emulate [...] Read more.
This paper proposes a pseudo full-channel signal integrity (SI) simulation method tailored for high-bandwidth memory (HBM) interconnects. In this approach, real interconnect models are applied to selected portions of the channel, while the remaining sections are replaced with synchronized current loads that emulate the electrical behavior of actual signal transmission. This technique enables accurate modeling of the HBM interface under full-channel parallel data transfer conditions. In addition to the simulation methodology itself, this study focuses on three specific implementation schemes for the synchronized current loads and explores their practical applications. Comparative analysis demonstrates the necessity and effectiveness of using synchronized current loads as substitutes for real transmission loads, offering a viable and efficient solution for SI analysis in HBM interconnect systems. Full article
(This article belongs to the Section E:Engineering and Technology)
Show Figures

Figure 1

37 pages, 5280 KB  
Review
Thermal Issues Related to Hybrid Bonding of 3D-Stacked High Bandwidth Memory: A Comprehensive Review
by Seung-Hoon Lee, Su-Jong Kim, Ji-Su Lee and Seok-Ho Rhi
Electronics 2025, 14(13), 2682; https://doi.org/10.3390/electronics14132682 - 2 Jul 2025
Cited by 16 | Viewed by 23595
Abstract
High-Bandwidth Memory (HBM) enables the bandwidth required by modern AI and high-performance computing, yet its three dimensional stack traps heat and amplifies thermo mechanical stress. We first review how conventional solutions such as heat spreaders, microchannels, high density Through-Silicon Vias (TSVs), and Mass [...] Read more.
High-Bandwidth Memory (HBM) enables the bandwidth required by modern AI and high-performance computing, yet its three dimensional stack traps heat and amplifies thermo mechanical stress. We first review how conventional solutions such as heat spreaders, microchannels, high density Through-Silicon Vias (TSVs), and Mass Reflow Molded Underfill (MR MUF) underfills lower but do not eliminate the internal thermal resistance that rises sharply beyond 12layer stacks. We then synthesize recent hybrid bonding studies, showing that an optimized Cu pad density, interface characteristic, and mechanical treatments can cut junction-to-junction thermal resistance by between 22.8% and 47%, raise vertical thermal conductivity by up to three times, and shrink the stack height by more than 15%. A meta-analysis identifies design thresholds such as at least 20% Cu coverage that balances heat flow, interfacial stress, and reliability. The review next traces the chain from Coefficient of Thermal Expansion (CTE) mismatch to Cu protrusion, delamination, and warpage and classifies mitigation strategies into (i) material selection including SiCN dielectrics, nano twinned Cu, and polymer composites, (ii) process technologies such as sub-200 °C plasma-activated bonding and Chemical Mechanical Polishing (CMP) anneal co-optimization, and (iii) the structural design, including staggered stack and filleted corners. Integrating these levers suppresses stress hotspots and extends fatigue life in more than 16layer stacks. Finally, we outline a research roadmap combining a multiscale simulation with high layer prototyping to co-optimize thermal, mechanical, and electrical metrics for next-generation 20-layer HBM. Full article
(This article belongs to the Section Semiconductor Devices)
Show Figures

Figure 1

20 pages, 4631 KB  
Article
An On-Chip Architectural Framework Design for Achieving High-Throughput Multi-Channel High-Bandwidth Memory Access in Field-Programmable Gate Array Systems
by Xiangcong Kong, Zixuan Zhu, Chujun Feng, Yongxin Zhu and Xiaoying Zheng
Electronics 2025, 14(3), 466; https://doi.org/10.3390/electronics14030466 - 24 Jan 2025
Cited by 2 | Viewed by 3780
Abstract
The integration of High-Bandwidth Memory (HBM) into Field-Programmable Gate Arrays (FPGAs) has significantly enhanced data processing capabilities. However, the segmentation of HBM into 32 pseudo-channels, each managed by a performance-limited crossbar, imposes a significant bottleneck on data throughput. To overcome this challenge, we [...] Read more.
The integration of High-Bandwidth Memory (HBM) into Field-Programmable Gate Arrays (FPGAs) has significantly enhanced data processing capabilities. However, the segmentation of HBM into 32 pseudo-channels, each managed by a performance-limited crossbar, imposes a significant bottleneck on data throughput. To overcome this challenge, we propose a transparent HBM access framework that integrates a non-blocking network-on-chip (NoC) module and fine-grained burst control transmission, enabling efficient multi-channel memory access in HBM. Our Omega-based NoC achieves a throughput of 692 million packets per second, surpassing state-of-the-art solutions. When implemented on the Xilinx Alveo U280 FPGA board, the proposed framework attains near-maximum single-channel write bandwidth, delivering 12.94 GB/s in many-to-many unicast communication scenarios, demonstrating its effectiveness in optimizing memory access for high-performance applications. Full article
Show Figures

Figure 1

15 pages, 7305 KB  
Article
Contact Hole Shrinkage: Simulation Study of Resist Flow Process and Its Application to Block Copolymers
by Sang-Kon Kim
Micromachines 2024, 15(9), 1151; https://doi.org/10.3390/mi15091151 - 13 Sep 2024
Cited by 2 | Viewed by 5569
Abstract
For vertical interconnect access (VIA) in three-dimensional (3D) structure chips, including those with high bandwidth memory (HBM), shrinking contact holes (C/Hs) using the resist flow process (RFP) represents the most promising technology for low- [...] Read more.
For vertical interconnect access (VIA) in three-dimensional (3D) structure chips, including those with high bandwidth memory (HBM), shrinking contact holes (C/Hs) using the resist flow process (RFP) represents the most promising technology for low-k1 (where CD=k1λ/NA,CD is the critical dimension, λ is wavelength, and NA is the numerical aperture). This method offers a way to reduce dimensions without additional complex process steps and is independent of optical technologies. However, most empirical models are heuristic methods and use linear regression to predict the critical dimension of the reflowed structure but do not account for intermediate shapes. In this research, the resist flow process (RFP) was modeled using the evolution method, the finite-element method, machine learning, and deep learning under various reflow conditions to imitate experimental results. Deep learning and machine learning have proven to be useful for physical optimization problems without analytical solutions, particularly for regression and classification tasks. In this application, the self-assembly of cylinder-forming block copolymers (BCPs), confined in prepatterns of the resist reflow process (RFP) to produce small contact hole (C/H) dimensions, was described using the self-consistent field theory (SCFT). This research paves the way for the shrink modeling of the enhanced resist reflow process (RFP) for random contact holes (C/Hs) and the production of smaller contact holes. Full article
(This article belongs to the Special Issue Recent Advances in Micro/Nano-Fabrication)
Show Figures

Graphical abstract

21 pages, 640 KB  
Article
A High-Performance Non-Indexed Text Search System
by Binh Kieu-Do-Nguyen, Tuan-Kiet Dang, Nguyen The Binh, Cuong Pham-Quoc, Huynh Phuc Nghi, Ngoc-Thinh Tran, Katsumi Inoue, Cong-Kha Pham and Trong-Thuc Hoang
Electronics 2024, 13(11), 2125; https://doi.org/10.3390/electronics13112125 - 29 May 2024
Cited by 1 | Viewed by 2299
Abstract
Full-text search has a wide range of applications, including tracking systems, computer vision, and natural language processing. Standard methods usually implement a two-phase procedure: indexing and retrieving, with the retrieval performance entirely dependent on the index efficiency. In most cases, the more powerful [...] Read more.
Full-text search has a wide range of applications, including tracking systems, computer vision, and natural language processing. Standard methods usually implement a two-phase procedure: indexing and retrieving, with the retrieval performance entirely dependent on the index efficiency. In most cases, the more powerful the index algorithm, the more memory and processing time are required. The amount of time and memory required to index a collection of documents is proportional to its overall size. In this paper, we propose a full-text search hardware implementation without the indexing phase, thus removing the time and memory requirements for indexing. Additionally, we propose an efficient design to leverage the parallel architecture of High Bandwidth Memory (HBM). To our knowledge, few (if not zero) researchers have integrated their full-text search system with an effective data access control on HBM. The functionality of the proposed system is verified on the Xilinx Alveo U50 Field-Programmable Gate Array (FPGA). The experimental results show that our system achieved a throughput of 8 Gigabytes per second, about 6697× speed-up compared to other software-based approaches. Full article
(This article belongs to the Section Microelectronics)
Show Figures

Figure 1

18 pages, 7810 KB  
Article
A Statistical Approach for Signal and Power Integrity Co-Design in High-Speed Interconnects Considering Non-Linear Power/Ground Noise and Bit-Patterns
by Youngwoo Kim
Micromachines 2023, 14(9), 1654; https://doi.org/10.3390/mi14091654 - 22 Aug 2023
Cited by 2 | Viewed by 4172
Abstract
In this article, a novel statistical approach is proposed and applied to co-design signal and power integrity (SI/PI) in high-speed interconnects considering the non-linear power/ground noise generated by parallel buffers and bit-patterns. With increased data rates and decreased operating voltages, the allowed noise [...] Read more.
In this article, a novel statistical approach is proposed and applied to co-design signal and power integrity (SI/PI) in high-speed interconnects considering the non-linear power/ground noise generated by parallel buffers and bit-patterns. With increased data rates and decreased operating voltages, the allowed noise margin in high-speed interconnects is continuously reduced, and this trend requires SI/PI co-design. Specifically, non-linear power/ground noise associated with simultaneous switching circuits sharing a power delivery network (PDN) and bit-patterns must be carefully considered during the interconnects’ design and analysis phase. In many cases, conventional electromagnetic (EM) and transient circuit simulators require heavy computational resources or even fail to deliver an accurate result. The proposed statistical method estimates the statistical eye-diagram in the high-speed interconnect considering power/ground noise and bit-patterns such as data bus inversion (DBI) coding. The accuracy and computational efficiency of the proposed method are validated by comparing the result with HSPICE transient simulation result. The proposed method is also compared with conventional statistical methods, such as peak distortion analysis (PDA) and statistical channel simulation in the transient simulator. Lastly, the proposed method is applied to the SI/PI co-design and co-analysis in the high bandwidth memory (HBM) interposer channel. Impacts of decoupling capacitors on hierarchical PDN impedance, statistical eye-diagram of the HBM channel, and bit error rate (BER) Bathtub curves are summarized. Finally, the BER eye-diagram is derived from the estimated statistical eye-diagram for timing and voltage analysis. The impacts of hierarchical PDN design and bit-patterns on SI/PI are discussed. Full article
(This article belongs to the Special Issue Advanced Interconnect and Packaging, 2nd Edition)
Show Figures

Figure 1

17 pages, 1621 KB  
Article
FPGA-Based High-Throughput Key-Value Store Using Hashing and B-Tree for Securities Trading System
by Sunil Puranik, Mahesh Barve, Swapnil Rodi and Rajendra Patrikar
Electronics 2023, 12(1), 183; https://doi.org/10.3390/electronics12010183 - 30 Dec 2022
Cited by 2 | Viewed by 4259
Abstract
Field-Programmable Array (FPGA) technology is extensively used in Finance. This paper describes a high-throughput key-value store (KVS) for securities trading system applications using an FPGA. The design uses a combination of hashing and B-Tree techniques and supports a large number of keys (40 [...] Read more.
Field-Programmable Array (FPGA) technology is extensively used in Finance. This paper describes a high-throughput key-value store (KVS) for securities trading system applications using an FPGA. The design uses a combination of hashing and B-Tree techniques and supports a large number of keys (40 million) as required by the Trading System. We have used a novel technique of using buckets of different capacities to reduce the amount of Block-RAM (BRAM) and perform a high-speed lookup. The design uses high-bandwidth-memory (HBM), an On-chip memory available in Virtex Ultrascale+ FPGAs to support a large number of keys. Another feature of this design is the replication of the database and lookup logic to increase the overall throughput. By implementing multiple lookup engines in parallel and replicating the database, we could achieve high throughput (up to 6.32 million search operations/second) as specified by our client, which is a major stock exchange. The design has been implemented with a combination of Verilog and high-level-synthesis (HLS) flow to reduce the implementation time. Full article
(This article belongs to the Special Issue Applications Enabled by FPGA-Based Technology)
Show Figures

Figure 1

17 pages, 7084 KB  
Article
Temperature Estimation of HBM2 Channels with Tail Distribution of Retention Errors in FPGA-HBM2 Platform
by Junhyeong Kwon, Shi-Jie Wen, Rita Fung and Sanghyeon Baeg
Electronics 2023, 12(1), 32; https://doi.org/10.3390/electronics12010032 - 22 Dec 2022
Cited by 4 | Viewed by 4163
Abstract
High-bandwidth memory 2 (HBM2) vertically stacks multiple dynamic random-access memory (DRAM) dies to achieve a small form factor and high capacity. However, it is difficult to diagnose HBM2 issues owing to their structural complexity and 2.5D integration with heterogeneous chips. The effects of [...] Read more.
High-bandwidth memory 2 (HBM2) vertically stacks multiple dynamic random-access memory (DRAM) dies to achieve a small form factor and high capacity. However, it is difficult to diagnose HBM2 issues owing to their structural complexity and 2.5D integration with heterogeneous chips. The effects of the temperature at the base logic die (TL), and the refresh interval at the stacked DRAM dies, were experimentally investigated by counting the dynamic retention errors in the eight channels in an HBM2. TL was indirectly controlled by the heatsink temperature (TS). The lognormal distribution represents the distribution of the cell counts with varying refresh times. All Z-magnitudes (multiples of the distribution standard deviation) over the various refresh cycle times (RCTs) up to 2.045 s in a single channel at TL of 70 °C appeared below 4.4, which means that the error bits belong to the tail distribution. The Z-differences in the eight channels were distinctively larger than the Z-differences of the same channels at a constant temperature, demonstrating that the temperature difference in the stacked dies resulted in larger Z-differences. The largest Z-difference was 0.091 for all the channels at an RCT of 1.406 s, which was approximately 4.82 times smaller than the Z-difference between the TL temperatures of 70 °C and 80 °C in a single channel. The Z-difference between the TL temperatures of 70 °C and 72 °C in a single channel was approximately the same as the Z-difference in all the channels at an RCT of 2.045 s. Full article
(This article belongs to the Section Semiconductor Devices)
Show Figures

Figure 1

17 pages, 7546 KB  
Article
A Novel Interposer Channel Structure with Vertical Tabbed Vias to Reduce Far-End Crosstalk for Next-Generation High-Bandwidth Memory
by Hyunwoong Kim, Seonghi Lee, Kyunghwan Song, Yujun Shin, Dongyrul Park, Jongcheol Park, Jaeyong Cho and Seungyoung Ahn
Micromachines 2022, 13(7), 1070; https://doi.org/10.3390/mi13071070 - 5 Jul 2022
Cited by 7 | Viewed by 5089
Abstract
In this paper, we propose and analyze a novel interposer channel structure with vertical tabbed vias to achieve high-speed signaling and low-power consumption in high-bandwidth memory (HBM). An analytical model of the self- and mutual capacitance of the proposed interposer channel is suggested [...] Read more.
In this paper, we propose and analyze a novel interposer channel structure with vertical tabbed vias to achieve high-speed signaling and low-power consumption in high-bandwidth memory (HBM). An analytical model of the self- and mutual capacitance of the proposed interposer channel is suggested and verified based on a 3D electromagnetic (EM) simulation. We thoroughly analyzed the electrical characteristics of the novel interposer channel considering various design parameters, such as the height and pitch of the vertical tabbed via and the gap of the vertical channel. Based on the frequency-dependent lumped circuit resistance, inductance, and capacitance, we analyzed the channel characteristics of the proposed interposer channel. In terms of impedance, insertion loss, and far-end crosstalk, we analyzed how much the proposed interposer channel improved the signal integrity characteristics compared to a conventional structure consisting of micro-strip and strip lines together. Compared to the conventional worst case, which is the strip line, the eye-width, the eye-height, and eye-jitter of the proposed interposer channel were improved by 17.6%, 29%, and 9.56%, respectively, at 8 Gbps. The proposed interposer channel can reduce dynamic power consumption by about 28% compared with the conventional interposer channel by minimizing the self-capacitance of the off-chip channel. Full article
(This article belongs to the Special Issue Advanced Interconnect and Packaging)
Show Figures

Figure 1

13 pages, 831 KB  
Article
Application-Oriented Data Migration to Accelerate In-Memory Database on Hybrid Memory
by Wenze Zhao, Yajuan Du, Mingzhe Zhang, Mingyang Liu, Kailun Jin and Rachata Ausavarungnirun
Micromachines 2022, 13(1), 52; https://doi.org/10.3390/mi13010052 - 29 Dec 2021
Cited by 6 | Viewed by 3327
Abstract
With the advantage of faster data access than traditional disks, in-memory database systems, such as Redis and Memcached, have been widely applied in data centers and embedded systems. The performance of in-memory database greatly depends on the access speed of memory. With the [...] Read more.
With the advantage of faster data access than traditional disks, in-memory database systems, such as Redis and Memcached, have been widely applied in data centers and embedded systems. The performance of in-memory database greatly depends on the access speed of memory. With the requirement of high bandwidth and low energy, die-stacked memory (e.g., High Bandwidth Memory (HBM)) has been developed to extend the channel number and width. However, the capacity of die-stacked memory is limited due to the interposer challenge. Thus, hybrid memory system with traditional Dynamic Random Access Memory (DRAM) and die-stacked memory emerges. Existing works have proposed to place and manage data on hybrid memory architecture in the view of hardware. This paper considers to manage in-memory database data in hybrid memory in the view of application. We first perform a preliminary study on the hotness distribution of client requests on Redis. From the results, we observe that most requests happen on a small portion of data objects in in-memory database. Then, we propose the Application-oriented Data Migration called ADM to accelerate in-memory database on hybrid memory. We design a hotness management method and two migration policies to migrate data into or out of HBM. We take Redis under comprehensive benchmarks as a case study for the proposed method. Through the experimental results, it is verified that our proposed method can effectively gain performance improvement and reduce energy consumption compared with existing Redis database. Full article
(This article belongs to the Special Issue Microprocessors)
Show Figures

Figure 1

19 pages, 2537 KB  
Article
High-Level Synthesis Design for Stencil Computations on FPGA with High Bandwidth Memory
by Changdao Du and Yoshiki Yamaguchi
Electronics 2020, 9(8), 1275; https://doi.org/10.3390/electronics9081275 - 8 Aug 2020
Cited by 7 | Viewed by 5365
Abstract
Due to performance and energy requirements, FPGA-based accelerators have become a promising solution for high-performance computations. Meanwhile, with the help of high-level synthesis (HLS) compilers, FPGA can be programmed using common programming languages such as C, C++, or OpenCL, thereby improving design efficiency [...] Read more.
Due to performance and energy requirements, FPGA-based accelerators have become a promising solution for high-performance computations. Meanwhile, with the help of high-level synthesis (HLS) compilers, FPGA can be programmed using common programming languages such as C, C++, or OpenCL, thereby improving design efficiency and portability. Stencil computations are significant kernels in various scientific applications. In this paper, we introduce an architecture design for implementing stencil kernels on state-of-the-art FPGA with high bandwidth memory (HBM). Traditional FPGAs are usually equipped with external memory, e.g., DDR3 or DDR4, which limits the design space exploration in the spatial domain of stencil kernels. Therefore, many previous studies mainly relied on exploiting parallelism in the temporal domain to eliminate the bandwidth limitations. In our approach, we scale-up the design performance by considering both the spatial and temporal parallelism of the stencil kernel equally. We also discuss the design portability among different HLS compilers. We use typical stencil kernels to evaluate our design on a Xilinx U280 FPGA board and compare the results with other existing studies. By adopting our method, developers can take broad parallelization strategies based on specific FPGA resources to improve performance. Full article
(This article belongs to the Special Issue Emerging Applications of Recent FPGA Architectures)
Show Figures

Figure 1

12 pages, 874 KB  
Article
Power-Time Exploration Tools for NMP-Enabled Systems
by Chae Eun Rhee, Seung-Won Park, Jungwoo Choi, Hyunmin Jung and Hyuk-Jae Lee
Electronics 2019, 8(10), 1096; https://doi.org/10.3390/electronics8101096 - 28 Sep 2019
Viewed by 2870
Abstract
Recently, dramatic improvements in memory performance have been highly required for data demanding application services such as deep learning, big data, and immersive videos. To this end, the throughput-oriented memory such as high bandwidth memory (HBM) and hybrid memory cube (HMC) has been [...] Read more.
Recently, dramatic improvements in memory performance have been highly required for data demanding application services such as deep learning, big data, and immersive videos. To this end, the throughput-oriented memory such as high bandwidth memory (HBM) and hybrid memory cube (HMC) has been introduced to provide a high bandwidth. For its effective use, various research efforts have been conducted. Among them, the near-memory-processing (NMP) is a concept that utilizes bandwidth and power consumption by placing computation logic near the memory. In the NMP-enabled system, a processor hierarchy consisting of hosts and NMPs is formed based on the distance from the main memory. In this paper, an evaluation tool is proposed to obtain the optimal design decision considering the power-time trade-off in the processor hierarchy. Every time the operating condition and constraints change, the decision of task-level offloading is dynamically made. For the realistic NMP-enabled system environment, the relationship among HBM, host, and NMP should be carefully considered. Hosts and NMPs are almost hidden from each other and the communications between them are extremely limited. In the simulation results, popular benchmarks and a machine learning application are used to demonstrate power-time trade-offs depending on applications and system conditions. Full article
(This article belongs to the Section Computer Science & Engineering)
Show Figures

Figure 1

Back to TopTop