MDPI - Publisher of Open Access Journals

19 pages, 4017 KB

Open AccessArticle

LACX: Locality-Aware Shared Data Migration in NUMA + CXL Tiered Memory

by Hayong Jeong, Binwon Song, Minwoo Jo and Heeseung Jo

Electronics 2025, 14(21), 4235; https://doi.org/10.3390/electronics14214235 - 29 Oct 2025

Viewed by 137

In modern high-performance computing (HPC) and large-scale data processing environments, the efficient utilization and scalability of memory resources are critical determinants of overall system performance. Architectures such as non-uniform memory access (NUMA) and tiered memory systems frequently suffer performance degradation due to remote [...] Read more.

In modern high-performance computing (HPC) and large-scale data processing environments, the efficient utilization and scalability of memory resources are critical determinants of overall system performance. Architectures such as non-uniform memory access (NUMA) and tiered memory systems frequently suffer performance degradation due to remote accesses stemming from shared data among multiple tasks. This paper proposes LACX, a shared data migration technique leveraging Compute Express Link (CXL), to address these challenges. LACX preserves the migration cycle of automatic NUMA balancing (AutoNUMA) while identifying shared data characteristics and migrating such data to CXL memory instead of DRAM, thereby maximizing DRAM locality. The proposed method utilizes existing kernel structures and data to efficiently identify and manage shared data without incurring additional overhead, and it effectively avoids conflicts with AutoNUMA policies. Evaluation results demonstrate that, although remote accesses to shared data can degrade performance in low-tier memory scenarios, LACX significantly improves overall memory bandwidth utilization and system performance in high-tier memory and memory-intensive workload environments by distributing DRAM bandwidth. This work presents a practical, lightweight approach to shared data management in tiered memory environments and highlights new directions for next-generation memory management policies. Full article

(This article belongs to the Special Issue Future Technologies for Data Management, Processing and Application)

► Show Figures

Figure 1

11 pages, 1602 KB

Open AccessArticle

DLL Design with Wide Input Duty Cycle Range and Low Output Clock Duty Cycle Error

by Binyu Qin, Haoyu Qin, Chenyu Fang, Leilei Zhao and Peter Poechmueller

Micromachines 2025, 16(11), 1223; https://doi.org/10.3390/mi16111223 - 27 Oct 2025

Viewed by 257

Abstract

This paper presents the design of a Delay-Locked Loop (DLL) with a simple architecture and a wide input clock duty cycle range. The design is tailored to meet the increasing data rate and stringent clock requirements of modern semiconductor chips, with particular applicability [...] Read more.

This paper presents the design of a Delay-Locked Loop (DLL) with a simple architecture and a wide input clock duty cycle range. The design is tailored to meet the increasing data rate and stringent clock requirements of modern semiconductor chips, with particular applicability to dynamic random-access memory (DRAM) systems. The structure features two Bang-Bang Phase Detectors (BBPDs) to adjust the rising and falling edges of the divided clock. Implemented using a 65 nm CMOS process, the design was verified through simulation. At a working frequency of 3.2 GHz, the input clock duty cycle range spans from 18% to 72%, with a maximum output clock duty cycle error of just 0.6%, a peak-to-peak jitter of 15.73 ps, and a power consumption of 12.7 mW. Full article

► Show Figures

Figure 1

31 pages, 2573 KB

Open AccessArticle

Hardware Design of DRAM Memory Prefetching Engine for General-Purpose GPUs

by Freddy Gabbay, Benjamin Salomon, Idan Golan and Dolev Shema

Technologies 2025, 13(10), 455; https://doi.org/10.3390/technologies13100455 - 8 Oct 2025

Viewed by 574

Abstract

General-purpose graphics computing on processing units (GPGPUs) face significant performance limitations due to memory access latencies, particularly when traditional memory hierarchies and thread-switching mechanisms prove insufficient for complex access patterns in data-intensive applications such as machine learning (ML) and scientific computing. This paper [...] Read more.

General-purpose graphics computing on processing units (GPGPUs) face significant performance limitations due to memory access latencies, particularly when traditional memory hierarchies and thread-switching mechanisms prove insufficient for complex access patterns in data-intensive applications such as machine learning (ML) and scientific computing. This paper presents a novel hardware design for a memory prefetching subsystem targeted at DDR (Double Data Rate) memory in GPGPU architectures. The proposed prefetching subsystem features a modular architecture comprising multiple parallel prefetching engines, each handling distinct memory address ranges with dedicated data buffers and adaptive stride detection algorithms that dynamically identify recurring memory access patterns. The design incorporates robust system integration features, including context flushing, watchdog timers, and flexible configuration interfaces, for runtime optimization. Comprehensive experimental validation using real-world workloads examined critical design parameters, including block sizes, prefetch outstanding limits, and throttling rates, across diverse memory access patterns. Results demonstrate significant performance improvements with average memory access latency reductions of up to 82% compared to no-prefetch baselines, and speedups in the range of 1.240–1.794. The proposed prefetching subsystem successfully enhances memory hierarchy efficiency and provides practical design guidelines for deployment in production GPGPU systems, establishing clear parameter optimization strategies for different workload characteristics. Full article

(This article belongs to the Topic Advances in Microelectronics and Semiconductor Engineering)

► Show Figures

Figure 1

20 pages, 1644 KB

Open AccessArticle

P-HNSW: Crash-Consistent HNSW for Vector Databases on Persistent Memory

by Haena Lee, Taeyoon Park, Yedam Na and Wook-Hee Kim

Appl. Sci. 2025, 15(19), 10554; https://doi.org/10.3390/app151910554 - 29 Sep 2025

Viewed by 764

Abstract

The rapid growth of Large Language Models (LLMs) has generated massive amounts of high-dimensional feature vectors extracted from diverse datasets. Efficient storage and retrieval of such data are critical for enabling accurate and fast query responses. Vector databases (Vector DBs) provide efficient storage [...] Read more.

The rapid growth of Large Language Models (LLMs) has generated massive amounts of high-dimensional feature vectors extracted from diverse datasets. Efficient storage and retrieval of such data are critical for enabling accurate and fast query responses. Vector databases (Vector DBs) provide efficient storage and retrieval for high-dimensional vectors. These systems rely on Approximate Nearest Neighbor Search (ANNS) indexes, such as HNSW, to handle large-scale data efficiently. However, the original HNSW is implemented on DRAM, which is both costly and vulnerable to crashes. Therefore, we propose P-HNSW, a crash-consistent HNSW on persistent memory. To guarantee crash consistency, P-HNSW introduces two logs, NLog and NlistLog. We describe the logging process during the operation and the recovery process in the event of system crashes. Our experimental results demonstrate that the overhead of the proposed logging mechanism is negligible, while P-HNSW achieves superior performance compared with SSD-based recovery mechanisms. Full article

(This article belongs to the Special Issue Resource Management for Emerging Computing Systems)

► Show Figures

Figure 1

14 pages, 3363 KB

Open AccessArticle

Selective Etching of Multi-Stacked Epitaxial Si_1-xGe_x on Si Using CF₄/N₂ and CF₄/O₂ Plasma Chemistries for 3D Device Applications

by Jihye Kim, Joosung Kang, Dongmin Yoon, U-in Chung and Dae-Hong Ko

Materials 2025, 18(18), 4417; https://doi.org/10.3390/ma18184417 - 22 Sep 2025

Viewed by 549

Abstract

The SiGe/Si multilayer is a critical component for fabricating stacked Si channel structures for next-generation three-dimensional (3D) logic and 3D dynamic random-access memory (3D-DRAM) devices. Achieving these structures necessitates highly selective SiGe etching. Herein, CF₄/O₂ and CF₄/N₂ [...] Read more.

The SiGe/Si multilayer is a critical component for fabricating stacked Si channel structures for next-generation three-dimensional (3D) logic and 3D dynamic random-access memory (3D-DRAM) devices. Achieving these structures necessitates highly selective SiGe etching. Herein, CF₄/O₂ and CF₄/N₂ gas chemistries were employed to elucidate and enhance the selective etching mechanism. To clarify the contribution of radicals to the etching process, a nonconducting plate (roof) was placed just above the samples in the plasma chamber to block ion bombardment on the sample surface. The CF₄/N₂ gas chemistries demonstrated superior etch selectivity and profile performance compared with the CF₄/O₂ gas chemistries. When etching was performed using CF₄/O₂ chemistry, the SiGe etch rate decreased compared to that obtained with pure CF₄. This reduction is attributed to surface oxidation induced by O₂, which suppressed the etch rate. By minimizing the ion collisions on the samples with the roof, higher selectivity, and a better etch profile were obtained even in the CF₄/N₂ gas chemistries. Under high-N₂-flow conditions, X-ray photoelectron spectroscopy revealed increased surface concentrations of GeF_x species and confirmed the presence of Si–N bond, which inhibited Si etching by fluorine radicals. A higher concentration of GeF_x species enhanced SiGe layer etching, whereas Si–N bonds inhibited etching on the Si layer. The passivation of the Si layer and the promotion of adhesion of etching species such as F on the SiGe layer are crucial for highly selective etching in addition to etching with pure radicals. This study provides valuable insights into the mechanisms governing selective SiGe etching, offering practical guidance for optimizing fabrication processes of next-generation Si channel and complementary field-effect transistor (CFET) devices. Full article

(This article belongs to the Section Materials Physics)

► Show Figures

Graphical abstract

52 pages, 15058 KB

Open AccessArticle

Optimizing Autonomous Vehicle Navigation Through Reinforcement Learning in Dynamic Urban Environments

by Mohammed Abdullah Alsuwaiket

World Electr. Veh. J. 2025, 16(8), 472; https://doi.org/10.3390/wevj16080472 - 18 Aug 2025

Viewed by 1244

Abstract

Autonomous vehicle (AV) navigation in dynamic urban environments faces challenges such as unpredictable traffic conditions, varying road user behaviors, and complex road networks. This study proposes a novel reinforcement learning-based framework that enhances AV decision making through spatial-temporal context awareness. The framework integrates [...] Read more.

Autonomous vehicle (AV) navigation in dynamic urban environments faces challenges such as unpredictable traffic conditions, varying road user behaviors, and complex road networks. This study proposes a novel reinforcement learning-based framework that enhances AV decision making through spatial-temporal context awareness. The framework integrates Proximal Policy Optimization (PPO) and Graph Neural Networks (GNNs) to effectively model urban features like intersections, traffic density, and pedestrian zones. A key innovation is the urban context-aware reward mechanism (UCARM), which dynamically adapts the reward structure based on traffic rules, congestion levels, and safety considerations. Additionally, the framework incorporates a Dynamic Risk Assessment Module (DRAM), which uses Bayesian inference combined with Markov Decision Processes (MDPs) to proactively evaluate collision risks and guide safer navigation. The framework’s performance was validated across three datasets—Argoverse, nuScenes, and CARLA. Results demonstrate significant improvements: An average travel time of 420 ± 20 s, a collision rate of 3.1%, and energy consumption of 11,833 ± 550 J in Argoverse; 410 ± 20 s, 2.5%, and 11,933 ± 450 J in nuScenes; and 450 ± 25 s, 3.6%, and 13,000 ± 600 J in CARLA. The proposed method achieved an average navigation success rate of 92.5%, consistently outperforming baseline models in safety, efficiency, and adaptability. These findings indicate the framework’s robustness and practical applicability for scalable AV deployment in real-world urban traffic conditions. Full article

(This article belongs to the Special Issue Modeling for Intelligent Vehicles)

► Show Figures

Figure 1

19 pages, 12556 KB

Open AccessArticle

Energy Management for Microgrids with Hybrid Hydrogen-Battery Storage: A Reinforcement Learning Framework Integrated Multi-Objective Dynamic Regulation

by Yi Zheng, Jinhua Jia and Dou An

Processes 2025, 13(8), 2558; https://doi.org/10.3390/pr13082558 - 13 Aug 2025

Cited by 1 | Viewed by 1848

Abstract

The integration of renewable energy resources (RES) into microgrids (MGs) poses significant challenges due to the intermittent nature of generation and the increasing complexity of multi-energy scheduling. To enhance operational flexibility and reliability, this paper proposes an intelligent energy management system (EMS) for [...] Read more.

The integration of renewable energy resources (RES) into microgrids (MGs) poses significant challenges due to the intermittent nature of generation and the increasing complexity of multi-energy scheduling. To enhance operational flexibility and reliability, this paper proposes an intelligent energy management system (EMS) for MGs incorporating a hybrid hydrogen-battery energy storage system (HHB-ESS). The system model jointly considers the complementary characteristics of short-term and long-term storage technologies. Three conflicting objectives are defined: economic cost (EC), system response stability, and battery life loss (BLO). To address the challenges of multi-objective trade-offs and heterogeneous storage coordination, a novel deep-reinforcement-learning (DRL) algorithm, termed MOATD3, is developed based on a dynamic reward adjustment mechanism (DRAM). Simulation results under various operational scenarios demonstrate that the proposed method significantly outperforms baseline methods, achieving a maximum improvement of 31.4% in SRS and a reduction of 46.7% in BLO. Full article

(This article belongs to the Special Issue Advances in Smart Grids and Microgrids: Distributed Generation and Energy Storage Systems)

► Show Figures

Figure 1

17 pages, 3604 KB

Open AccessArticle

Binary-Weighted Neural Networks Using FeRAM Array for Low-Power AI Computing

by Seung-Myeong Cho, Jaesung Lee, Hyejin Jo, Dai Yun, Jihwan Moon and Kyeong-Sik Min

Nanomaterials 2025, 15(15), 1166; https://doi.org/10.3390/nano15151166 - 28 Jul 2025

Cited by 1 | Viewed by 1060

Abstract

Artificial intelligence (AI) has become ubiquitous in modern computing systems, from high-performance data centers to resource-constrained edge devices. As AI applications continue to expand into mobile and IoT domains, the need for energy-efficient neural network implementations has become increasingly critical. To meet this [...] Read more.

Artificial intelligence (AI) has become ubiquitous in modern computing systems, from high-performance data centers to resource-constrained edge devices. As AI applications continue to expand into mobile and IoT domains, the need for energy-efficient neural network implementations has become increasingly critical. To meet this requirement of energy-efficient computing, this work presents a BWNN (binary-weighted neural network) architecture implemented using FeRAM (Ferroelectric RAM)-based synaptic arrays. By leveraging the non-volatile nature and low-power computing of FeRAM-based CIM (computing in memory), the proposed CIM architecture indicates significant reductions in both dynamic and standby power consumption. Simulation results in this paper demonstrate that scaling the ferroelectric capacitor size can reduce dynamic power by up to 6.5%, while eliminating DRAM-like refresh cycles allows standby power to drop by over 258× under typical conditions. Furthermore, the combination of binary weight quantization and in-memory computing enables energy-efficient inference without significant loss in recognition accuracy, as validated using MNIST datasets. Compared to prior CIM architectures of SRAM-CIM, DRAM-CIM, and STT-MRAM-CIM, the proposed FeRAM-CIM exhibits superior energy efficiency, achieving 230–580 TOPS/W in a 45 nm process. These results highlight the potential of FeRAM-based BWNNs as a compelling solution for edge-AI and IoT applications where energy constraints are critical. Full article

(This article belongs to the Special Issue Neuromorphic Devices: Materials, Structures and Bionic Applications)

► Show Figures

Figure 1

19 pages, 8345 KB

Open AccessArticle

A Generalized Optimization Scheme for Memory-Side Prefetching to Enhance System Performance

by Yuzhi Zhuang, Ming Zhang and Binghao Wang

Electronics 2025, 14(14), 2811; https://doi.org/10.3390/electronics14142811 - 12 Jul 2025

Viewed by 849

Abstract

In modern multi-core processors, memory request latency critically constrains overall performance. Prefetching, a promising technique, mitigates memory access latency by pre-loading data into faster cache structures. However, existing core-side prefetchers lack visibility to the DRAM state and may issue suboptimal requests, while conventional [...] Read more.

In modern multi-core processors, memory request latency critically constrains overall performance. Prefetching, a promising technique, mitigates memory access latency by pre-loading data into faster cache structures. However, existing core-side prefetchers lack visibility to the DRAM state and may issue suboptimal requests, while conventional memory-side prefetchers often default to simple next-line policies that miss complex access patterns. We propose a comprehensive memory-side prefetch optimization scheme, which includes a prefetcher that utilizes advanced prefetching algorithms and an optimization module. Our prefetcher is capable of detecting more complex memory access patterns, thereby improving both prefetch accuracy and coverage. Additionally, considering the characteristics of DRAM memory access, the optimization module minimizes the negative impact of prefetch requests on DRAM by enhancing coordination with memory operations. Additionally, our prefetcher works in synergy with core-side prefetchers to deliver superior overall performance. Simulation results using Gem5 and SPEC CPU2017 workloads show that our approach delivers an average performance improvement of 10.5% and reduces memory access latency by 61%. Our prefetcher also operates in conjunction with core-side prefetchers to form a multi-level prefetching hierarchy, enabling further performance gains through coordinated and complementary prefetching strategies. Full article

(This article belongs to the Special Issue Computer Architecture & Parallel and Distributed Computing)

► Show Figures

Figure 1

18 pages, 2290 KB

Open AccessArticle

Improving MRAM Performance with Sparse Modulation and Hamming Error Correction

by Nam Le, Thien An Nguyen, Jong-Ho Lee and Jaejin Lee

Sensors 2025, 25(13), 4050; https://doi.org/10.3390/s25134050 - 29 Jun 2025

Viewed by 870

Abstract

With the rise of the Internet of Things (IoT), smart sensors are increasingly being deployed as compact edge processing units, necessitating continuously writable memory with low power consumption and fast access times. Magnetic random-access memory (MRAM) has emerged as a promising non-volatile alternative [...] Read more.

With the rise of the Internet of Things (IoT), smart sensors are increasingly being deployed as compact edge processing units, necessitating continuously writable memory with low power consumption and fast access times. Magnetic random-access memory (MRAM) has emerged as a promising non-volatile alternative to conventional DRAM and SDRAM, offering advantages such as faster access speeds, reduced power consumption, and enhanced endurance. However, MRAM is subject to challenges including process variations and thermal fluctuations, which can induce random bit errors and result in imbalanced probabilities of 0 and 1 bits. To address these issues, we propose a novel sparse coding scheme characterized by a minimum Hamming distance of three. During the encoding process, three check bits are appended to the user data and processed using a generator matrix. If the resulting codeword fails to satisfy the sparsity constraint, it is inverted to comply with the coding requirement. This method is based on the error characteristics inherent in MRAM to facilitate effective error correction. Furthermore, we introduce a dynamic threshold detection technique that updates bit probability estimates in real time during data transmission. Simulation results demonstrate substantial improvements in both error resilience and decoding accuracy, particularly as MRAM density increases. Full article

(This article belongs to the Section Electronic Sensors)

► Show Figures

Figure 1

22 pages, 1159 KB

Open AccessArticle

Compaction-Aware Flash Memory Remapping for Key–Value Stores

by Jialin Wang, Zhen Yang, Yi Fan and Yajuan Du

Micromachines 2025, 16(6), 699; https://doi.org/10.3390/mi16060699 - 11 Jun 2025

Viewed by 1642

Abstract

With the rapid development of big data and artificial intelligence, the demand for memory has exploded. As a key data structure in modern databases and distributed storage systems, the Log-Structured Merge Tree (LSM-tree) has been widely employed (such as LevelDB, RocksDB, etc.) in [...] Read more.

With the rapid development of big data and artificial intelligence, the demand for memory has exploded. As a key data structure in modern databases and distributed storage systems, the Log-Structured Merge Tree (LSM-tree) has been widely employed (such as LevelDB, RocksDB, etc.) in systems based on key–value pairs due to its efficient writing performance. In LSM-tree-based KV stores, typically deployed on systems with DRAM-SSD storage, the KV items are first organized into MemTable as buffer for SSTables in main memory. When the buffer size exceeds the threshold, MemTable is flushed to the SSD and reorganized into an SSTable, which is then passed down level by level through compaction. However, the compaction degrades write performance and SSD endurance due to significant write amplification. To address this issue, recent proposals have mostly focused on redesigning the structure of LSM trees. We discover the prevalence of unchanged data blocks (UDBs) in the LSM-tree compaction process, i.e., UDBs are written back to SSD the same as they are read into memory, which induces extra write amplification and degrades I/O performance. In this paper, we propose a KV store design in SSD, called RemapCom, to exploit remapping on these UDBs. RemapCom first identifies UDBs with a lightweight state machine integrated into the compaction merge process. In order to increase the ratio of UDBs, RemapCom further designs a UDB retention method to further develop the benefit of remapping. Moreover, we implement a prototype of RemapCom on LevelDB by providing two primitives for the remapping. Compared to the state of the art, the evaluation results demonstrate that RemapCom can reduce write amplification by up to 53% and improve write throughput by up to 30%. Full article

(This article belongs to the Special Issue State-of-the-Art Memristor and Optoelectronic Memristor: Materials, Fabrication, Mechanism and Applications)

► Show Figures

Figure 1

14 pages, 3791 KB

Open AccessEditor’s ChoiceArticle

Deposition of HfO₂ by Remote Plasma ALD for High-Aspect-Ratio Trench Capacitors in DRAM

by Jiwon Kim, Inkook Hwang, Byungwook Kim, Wookyung Lee, Juha Song, Yeonwoong Jung and Changbun Yoon

Nanomaterials 2025, 15(11), 783; https://doi.org/10.3390/nano15110783 - 23 May 2025

Cited by 1 | Viewed by 2868

Abstract

Dynamic random-access memory (DRAM) is a vital component in modern computing systems. Enhancing memory performance requires maximizing capacitor capacitance within DRAM cells, which is achieved using high-k dielectric materials deposited as thin, uniform films via atomic layer deposition (ALD). Precise film deposition that [...] Read more.

Dynamic random-access memory (DRAM) is a vital component in modern computing systems. Enhancing memory performance requires maximizing capacitor capacitance within DRAM cells, which is achieved using high-k dielectric materials deposited as thin, uniform films via atomic layer deposition (ALD). Precise film deposition that minimizes electronic defects caused by charged vacancies is essential for reducing leakage current and ensuring high dielectric strength. In this study, we fabricated metal–insulator–metal (MIM) capacitors in high-aspect-ratio trench structures using remote plasma ALD (RP-ALD) and direct plasma ALD (DP-ALD). The trenches, etched into silicon, featured a 7:1 aspect ratio, 76 nm pitch, and 38 nm critical dimension. We evaluated the electrical characteristics of HfO₂-based capacitors with TiN top and bottom electrodes, focusing on leakage current density and equivalent oxide thickness. Capacitance–voltage analysis and X-ray photoelectron spectroscopy (XPS) revealed that RP-ALD effectively suppressed plasma-induced damage, reducing defect density and leakage current. While DP-ALD offered excellent film properties, it suffered from degraded lateral uniformity due to direct plasma exposure. Given its superior lateral uniformity, lower leakage, and defect suppression, RP-ALD shows strong potential for improving DRAM capacitor performance and serves as a promising alternative to the currently adopted thermal ALD process. Full article

(This article belongs to the Special Issue Novel Photonic and Electronic Devices Based on Semiconductor Nanomaterials)

► Show Figures

Graphical abstract

6 pages, 1831 KB

Open AccessProceeding Paper

Voltage Regulation of Data Strobe Inputs in Mobile Dynamic Random Access Memory to Prevent Unintended Activations

by Yao-Zhong Zhang, Chiung-An Chen, Powen Hsiao, Bo-Yi Li and Van-Khang Nguyen

Eng. Proc. 2025, 92(1), 81; https://doi.org/10.3390/engproc2025092081 - 23 May 2025

Viewed by 376

Abstract

In mobile dynamic random access memory (DRAM) receivers, the data strobe complement (DQS_c) and data strobe true (DQS_t) signals must be maintained at high and low voltage levels in the write data strobe off (WDQS_OFF) mode. Therefore, we developed a voltage regulation circuit [...] Read more.

In mobile dynamic random access memory (DRAM) receivers, the data strobe complement (DQS_c) and data strobe true (DQS_t) signals must be maintained at high and low voltage levels in the write data strobe off (WDQS_OFF) mode. Therefore, we developed a voltage regulation circuit to optimize the differential voltage signals of DQS_c and DQS_t, ensuring a high voltage level above 0.9 V and a low voltage level below 0.3 V. Experimental results showed that the circuit stably maintained DQS_c above 0.9 V and DQS_t below 0.3 V before the write preamble time (tWPRE) and in WDQS_OFF mode. This configuration effectively prevents unintended activation in the mobile DRAM DQS input receiver. Full article

(This article belongs to the Proceedings of 2024 IEEE 6th Eurasia Conference on IoT, Communication and Engineering)

► Show Figures

Figure 1

32 pages, 911 KB

Open AccessArticle

TB-Collect: Efficient Garbage Collection for Non-Volatile Memory Online Transaction Processing Engines

by Jianhao Wei, Qian Zhang, Yiwen Xiang and Xueqing Gong

Electronics 2025, 14(10), 2080; https://doi.org/10.3390/electronics14102080 - 21 May 2025

Viewed by 592

Abstract

Existing databases supporting Online Transaction Processing (OLTP) workloads based on non-volatile memory (NVM) almost all use Multi-Version Concurrency Control (MVCC) protocol to ensure data consistency. MVCC allows multiple transactions to execute concurrently without lock conflicts, reducing the wait time between read and write [...] Read more.

Existing databases supporting Online Transaction Processing (OLTP) workloads based on non-volatile memory (NVM) almost all use Multi-Version Concurrency Control (MVCC) protocol to ensure data consistency. MVCC allows multiple transactions to execute concurrently without lock conflicts, reducing the wait time between read and write operations, and thereby significantly increasing the throughput of NVM OLTP engines. However, it requires garbage collection (GC) to clean up the obsolete tuple versions to prevent storage overflow, which consumes additional system resources. Furthermore, existing GC approaches in NVM OLTP engines are inefficient because they are based on methods designed for dynamic random access memory (DRAM) OLTP engines, without considering the significant differences in read/write bandwidth and cache line size between NVM and DRAM. These approaches either involve excessive random NVM access (traversing tuple versions) or lead to too many additional NVM write operations, both of which degrade the performance and durability of NVM. In this paper, we propose TB-Collect, a high-performance GC approach specifically designed for NVM OLTP engines. On the one hand, TB-Collect separates tuple headers and contents, storing data in an append-only manner, which greatly reduces NVM writes. On the other hand, TB-Collect performs GC at the block level, eliminating the need to traverse tuple versions and improving the utilization of reclaimed space. We have implemented TB-Collect on DBx1000 and MySQL. Experimental results show that TB-Collect achieves 1.15 to 1.58 times the throughput of existing methods when running TPCC and YCSB workloads. Full article

(This article belongs to the Section Computer Science & Engineering)

► Show Figures

Figure 1

12 pages, 2096 KB

Open AccessArticle

Low-Power-Management Engine: Driving DDR Towards Ultra-Efficient Operations

by Zhuorui Liu, Yan Li and Xiaoyang Zeng

Micromachines 2025, 16(5), 543; https://doi.org/10.3390/mi16050543 - 30 Apr 2025

Viewed by 633

Abstract

To address the performance and power concerns in Double-Data-Rate SDRAM (DDR) subsystems, this paper presents an innovative method for the DDR memory controller scheduler. This design aims to strike a balance between power consumption and performance for the DDR subsystem. Our approach entails [...] Read more.

To address the performance and power concerns in Double-Data-Rate SDRAM (DDR) subsystems, this paper presents an innovative method for the DDR memory controller scheduler. This design aims to strike a balance between power consumption and performance for the DDR subsystem. Our approach entails a critical reassessment of established mechanisms and the introduction of a quasi-static arbitration protocol for the DDR’s low-power mode (LPM) transition processes. Central to our proposed DDR power-management framework is the Low-Power-Management Engine (LPME), complemented by a suite of statistical algorithms tailored for implementation within the architecture. Our research strategy encompasses real-time monitoring of the DDR subsystem’s operational states, traffic intervals, and Quality of Service (QoS) metrics. By dynamically fine-tuning the DDR subsystem’s power-management protocols to transition in and out of identical power modes, our method promises substantial enhancements in both energy efficiency and operational performance across a spectrum of practical scenarios. To substantiate the efficacy of our proposed design, an array of experiments was conducted. These rigorous tests evaluated the DDR subsystem’s performance and energy consumption under a diverse set of workloads and system configurations. The findings are compelling: the LPME-driven architecture delivers significant power savings of over 41%, concurrently optimizing performance metrics like latency increase by no more than 22% in a high-performance operational context. Full article

(This article belongs to the Section E：Engineering and Technology)

► Show Figures

Figure 1

Search Results (165)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (165)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI