Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (31)

Search Parameters:
Keywords = Memory-Centric Computing

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
34 pages, 2320 KB  
Article
Research on a Computing First Network Based on Deep Reinforcement Learning
by Qianwen Xu, Jingchao Wang, Shuangyin Ren, Zhongbo Li and Wei Gao
Electronics 2026, 15(3), 638; https://doi.org/10.3390/electronics15030638 - 2 Feb 2026
Abstract
The joint optimization of computing resources and network routing constitutes a central challenge in Computing First Networks (CFNs). However, existing research has predominantly focused on computation offloading decisions, whereas the cooperative optimization of computing power and network routing remains underexplored. Therefore, this study [...] Read more.
The joint optimization of computing resources and network routing constitutes a central challenge in Computing First Networks (CFNs). However, existing research has predominantly focused on computation offloading decisions, whereas the cooperative optimization of computing power and network routing remains underexplored. Therefore, this study investigates the joint routing optimization problem within the CFN framework. We first propose a computing resource scheduling architecture for CFN, termed SICRSA, which integrates Software-Defined Networking (SDN) and Information-Centric Networking (ICN). Building upon this architecture, we further introduce an ICN-based hierarchical naming scheme for computing services, design a computing service request packet format that extends the IP header, and detail the corresponding service request identification process and workflow. Furthermore, we propose Computing-Aware Routing via Graph and Long-term Dependency Learning (CRGLD), a Graph Neural Network (GNN), and Long Short-Term Memory (LSTM)-based routing optimization algorithm, within the SICRSA framework to address the computing-aware routing (CAR) problem. The algorithm incorporates a decision-making framework grounded in spatiotemporal feature learning, thereby enabling the joint and coordinated selection of computing nodes and transmission paths. Simulation experiments conducted on real-world network topologies demonstrate that CRGLD enhances both the quality of service and the intelligence of routing decisions in dynamic network environments. Moreover, CRGLD exhibits strong generalization capability when confronted with unfamiliar topologies and topological changes, effectively mitigating the poor generalization performance typical of traditional Deep Reinforcement Learning (DRL)-based routing models in dynamic settings. Full article
Show Figures

Figure 1

19 pages, 3742 KB  
Article
HBEVOcc: Height-Aware Bird’s-Eye-View Representation for 3D Occupancy Prediction from Multi-Camera Images
by Chuandong Lyu, Wenkai Li, Iman Yi Liao, Fengqian Ding, Han Liu and Hongchao Zhou
Sensors 2026, 26(3), 934; https://doi.org/10.3390/s26030934 (registering DOI) - 1 Feb 2026
Viewed by 60
Abstract
Due to the ability to perceive fine-grained 3D scenes and recognize objects of arbitrary shapes, 3D occupancy prediction plays a crucial role in vision-centric autonomous driving and robotics. However, most existing methods rely on voxel-based methods, which inevitably demand a large amount of [...] Read more.
Due to the ability to perceive fine-grained 3D scenes and recognize objects of arbitrary shapes, 3D occupancy prediction plays a crucial role in vision-centric autonomous driving and robotics. However, most existing methods rely on voxel-based methods, which inevitably demand a large amount of memory and computing resources. To address this challenge and facilitate more efficient 3D occupancy prediction, we propose HBEVOcc, a Bird’s-Eye-View based method for 3D scene representation with a novel height-aware deformable attention module, which can effectively leverage latent height information within BEV framework to compensate for lack of height dimension, significantly reducing computing resource consumption while enhancing the performance. Specifically, our method first extracts multi-camera image features and lifts these 2D features into 3D BEV occupancy features via explicit and implicit view transformations. The BEV features are then further processed by a BEV feature extraction network and height-aware deformable attention module, with the final 3D occupancy prediction results obtained through a prediction head. To further enhance voxel supervision along the height axis, we introduce a height-aware voxel loss with adaptive vertical weighting. Extensive experiments on the Occ3D-nuScenes and OpenOcc dataset demonstrate that HBEVOcc can achieve state-of-the-art results in terms of both mIoU and RayIoU metrics with less training memory (even when trained on 2080Ti). Full article
(This article belongs to the Section Sensing and Imaging)
30 pages, 1176 KB  
Article
Towards Secure and Adaptive AI Hardware: A Framework for Optimizing LLM-Oriented Architectures
by Sabya Shtaiwi and Dheya Mustafa
Computers 2026, 15(1), 10; https://doi.org/10.3390/computers15010010 - 25 Dec 2025
Viewed by 963
Abstract
With the increasing computational demands of large language models (LLMs), there is a pressing need for more specialized hardware architectures capable of supporting their dynamic and memory-intensive workloads. This paper examines recent studies on hardware acceleration for AI, focusing on three critical aspects: [...] Read more.
With the increasing computational demands of large language models (LLMs), there is a pressing need for more specialized hardware architectures capable of supporting their dynamic and memory-intensive workloads. This paper examines recent studies on hardware acceleration for AI, focusing on three critical aspects: energy efficiency, architectural adaptability, and runtime security. While notable advancements have been made in accelerating convolutional and deep neural networks using ASICs, FPGAs, and compute-in-memory (CIM) approaches, most existing solutions remain inadequate for the scalability and security requirements of LLMs. Our comparative analysis highlights two key limitations: restricted reconfigurability and insufficient support for real-time threat detection. To address these gaps, we propose a novel architectural framework grounded in modular adaptivity, memory-centric processing, and security-by-design principles. The paper concludes with a proposed evaluation roadmap and outlines promising future research directions, including RISC-V-based secure accelerators, neuromorphic co-processors, and hybrid quantum-AI integration. Full article
Show Figures

Graphical abstract

23 pages, 3559 KB  
Article
From Static Prediction to Mindful Machines: A Paradigm Shift in Distributed AI Systems
by Rao Mikkilineni and W. Patrick Kelly
Computers 2025, 14(12), 541; https://doi.org/10.3390/computers14120541 - 10 Dec 2025
Viewed by 1158
Abstract
A special class of complex adaptive systems—biological and social—thrive not by passively accumulating patterns, but by engineering coherence, i.e., the deliberate alignment of prior knowledge, real-time updates, and teleonomic purposes. By contrast, today’s AI stacks—Large Language Models (LLMs) wrapped in agentic toolchains—remain rooted [...] Read more.
A special class of complex adaptive systems—biological and social—thrive not by passively accumulating patterns, but by engineering coherence, i.e., the deliberate alignment of prior knowledge, real-time updates, and teleonomic purposes. By contrast, today’s AI stacks—Large Language Models (LLMs) wrapped in agentic toolchains—remain rooted in a Turing-paradigm architecture: statistical world models (opaque weights) bolted onto brittle, imperative workflows. They excel at pattern completion, but they externalize governance, memory, and purpose, thereby accumulating coherence debt—a structural fragility manifested as hallucinations, shallow and siloed memory, ad hoc guardrails, and costly human oversight. The shortcoming of current AI relative to human-like intelligence is therefore less about raw performance or scaling, and more about an architectural limitation: knowledge is treated as an after-the-fact annotation on computation, rather than as an organizing substrate that shapes computation. This paper introduces Mindful Machines, a computational paradigm that operationalizes coherence as an architectural property rather than an emergent afterthought. A Mindful Machine is specified by a Digital Genome (encoding purposes, constraints, and knowledge structures) and orchestrated by an Autopoietic and Meta-Cognitive Operating System (AMOS) that runs a continuous Discover–Reflect–Apply–Share (D-R-A-S) loop. Instead of a static model embedded in a one-shot ML pipeline or deep learning neural network, the architecture separates (1) a structural knowledge layer (Digital Genome and knowledge graphs), (2) an autopoietic control plane (health checks, rollback, and self-repair), and (3) meta-cognitive governance (critique-then-commit gates, audit trails, and policy enforcement). We validate this approach on the classic Credit Default Prediction problem by comparing a traditional, static Logistic Regression pipeline (monolithic training, fixed features, external scripting for deployment) with a distributed Mindful Machine implementation whose components can reconfigure logic, update rules, and migrate workloads at runtime. The Mindful Machine not only matches the predictive task, but also achieves autopoiesis (self-healing services and live schema evolution), explainability (causal, event-driven audit trails), and dynamic adaptation (real-time logic and threshold switching driven by knowledge constraints), thereby reducing the coherence debt that characterizes contemporary ML- and LLM-centric AI architectures. The case study demonstrates “a hybrid, runtime-switchable combination of machine learning and rule-based simulation, orchestrated by AMOS under knowledge and policy constraints”. Full article
(This article belongs to the Special Issue Cloud Computing and Big Data Mining)
Show Figures

Figure 1

28 pages, 3812 KB  
Article
Vertical vs. Horizontal Integration in HBM and Market-Implied Valuation: A Text-Mining Study
by Hyang Ja Yang and Cheong Kim
Appl. Sci. 2025, 15(22), 12127; https://doi.org/10.3390/app152212127 - 15 Nov 2025
Viewed by 1610
Abstract
High-bandwidth memory (HBM) has become a strategic bottleneck in AI-centric systems, shifting competitive advantage from computing power alone to a design that is orchestrated by memory and packaging. We investigate whether publicly available information about companies’ integration decisions—vertical integration by Samsung Electronics and [...] Read more.
High-bandwidth memory (HBM) has become a strategic bottleneck in AI-centric systems, shifting competitive advantage from computing power alone to a design that is orchestrated by memory and packaging. We investigate whether publicly available information about companies’ integration decisions—vertical integration by Samsung Electronics and horizontal partnerships by SK Hynix—is included in market-expected valuation. We create a Korean-language news corpus spanning January 2023 to September 2025 and use seed-guided topic models to obtain firms’ vertical and horizontal integration. We verify qualitative distinguishability with t-SNE embeddings and use firm-specific ordinary least squares specifications to link topic intensities to equity prices. According to research findings, for Samsung, consolidation-oriented vertical indicators (M&A and risk ring-fencing) positively correlate with valuation, whereas supplier-enablement or operational vertical topics are not reliably factored into their valuation. Vendor-assisted scale-up and joint development topics support positive valuation for SK Hynix. This study provides a scalable framework for text evaluation, which distinguishes between general sentiment and strategic architecture, as well as evidence that capital markets reward consolidation and alliance execution differently depending on the management of the HBM bottleneck. Full article
(This article belongs to the Special Issue Big Data Technology and Its Applications)
Show Figures

Figure 1

18 pages, 1906 KB  
Article
Generalizable Interaction Recognition for Learning from Demonstration Using Wrist and Object Trajectories
by Jagannatha Charjee Pyaraka, Mats Isaksson, John McCormick, Sheila Sutjipto and Fouad Sukkar
Electronics 2025, 14(21), 4297; https://doi.org/10.3390/electronics14214297 - 31 Oct 2025
Viewed by 742
Abstract
Learning from Demonstration (LfD) enables robots to acquire manipulation skills by observing human actions. However, existing methods often face challenges such as high computational cost, limited generalizability, and a loss of key interaction details. This study presents a compact representation for interaction recognition [...] Read more.
Learning from Demonstration (LfD) enables robots to acquire manipulation skills by observing human actions. However, existing methods often face challenges such as high computational cost, limited generalizability, and a loss of key interaction details. This study presents a compact representation for interaction recognition in LfD that encodes human–object interactions using 2D wrist trajectories and 3D object poses. A lightweight extraction pipeline combines MediaPipe-based wrist tracking with FoundationPose-based 6-DoF object estimation to obtain these trajectories directly from RGB-D video without specialized sensors or heavy preprocessing. Experiments on the GRAB and FPHA datasets show that the representation effectively captures task-relevant interactions, achieving 94.6% accuracy on GRAB and 96.0% on FPHA with well-calibrated probability predictions. Both Bidirectional Long Short-Term Memory (Bi-LSTM) with attention and Transformer architectures deliver consistent performance, confirming robustness and generalizability. The method achieves sub-second inference, a memory footprint under 1 GB, and reliable operation on both GPU and CPU platforms, enabling deployment on edge devices such as NVIDIA Jetson. By bridging pose-based and object-centric paradigms, this approach offers a compact and efficient foundation for scalable robot learning while preserving essential spatiotemporal dynamics. Full article
(This article belongs to the Section Artificial Intelligence)
Show Figures

Figure 1

45 pages, 10628 KB  
Review
Driving for More Moore on Computing Devices with Advanced Non-Volatile Memory Technology
by Hei Wong, Weidong Li, Jieqiong Zhang, Wenhan Bao, Lichao Wu and Jun Liu
Electronics 2025, 14(17), 3456; https://doi.org/10.3390/electronics14173456 - 29 Aug 2025
Cited by 2 | Viewed by 3708
Abstract
As the CMOS technology approaches its physical and economic limits, further advancement of Moore’s Law for enhanced computing performance can no longer rely solely on smaller transistors and higher integration density. Instead, the computing landscape is poised for a fundamental transformation that transcends [...] Read more.
As the CMOS technology approaches its physical and economic limits, further advancement of Moore’s Law for enhanced computing performance can no longer rely solely on smaller transistors and higher integration density. Instead, the computing landscape is poised for a fundamental transformation that transcends hardware scaling to embrace innovations in architecture, software, application-specific algorithms, and cross-disciplinary integration. Among the most promising enablers of this transition is non-volatile memory (NVM), which provides new technological pathways for restructuring the future of computing systems. Recent advancements in non-volatile memory (NVM) technologies, such as flash memory, Resistive Random-Access Memory (RRAM), and magneto-resistive RAM (MRAM), have significantly narrowed longstanding performance gaps while introducing transformative capabilities, including instant-on functionality, ultra-low standby power, and persistent data retention. These characteristics pave the way for developing more energy-efficient computing systems, heterogeneous memory hierarchies, and novel computational paradigms, such as in-memory and neuromorphic computing. Beyond isolated hardware improvements, integrating NVM at both the architectural and algorithmic levels would foster the emergence of intelligent computing platforms that transcend the limitations of traditional von Neumann architectures and device scaling. Driven by these advances, next-generation computing platforms powered by NVM are expected to deliver substantial gains in computational performance, energy efficiency, and scalability of the emerging data-centric architectures. These improvements align with the broader vision of both “More Moore” and “More than Moore”—extending beyond MOS device miniaturization to encompass architectural and functional innovation that redefines how performance is achieved at the end of CMOS device downsizing. Full article
(This article belongs to the Section Microelectronics)
Show Figures

Figure 1

26 pages, 642 KB  
Article
Bayesian Input Compression for Edge Intelligence in Industry 4.0
by Handuo Zhang, Jun Guo, Xiaoxiao Wang and Bin Zhang
Electronics 2025, 14(17), 3416; https://doi.org/10.3390/electronics14173416 - 27 Aug 2025
Viewed by 692
Abstract
In Industry 4.0 environments, edge intelligence plays a critical role in enabling real-time analytics and autonomous decision-making by integrating artificial intelligence (AI) with edge computing. However, deploying deep neural networks (DNNs) on resource-constrained edge devices remains challenging due to limited computational capacity and [...] Read more.
In Industry 4.0 environments, edge intelligence plays a critical role in enabling real-time analytics and autonomous decision-making by integrating artificial intelligence (AI) with edge computing. However, deploying deep neural networks (DNNs) on resource-constrained edge devices remains challenging due to limited computational capacity and strict latency requirements. While conventional methods primarily focus on structural model compression, we propose an adaptive input-centric approach that reduces computational overhead by pruning redundant features prior to inference. A Bayesian network is employed to quantify the influence of each input feature on the model output, enabling efficient input reduction without modifying the model architecture. A bidirectional chain structure facilitates robust feature ranking, and an automated algorithm optimizes input selection to meet predefined constraints on model accuracy and size. Experimental results demonstrate that the proposed method significantly reduces memory usage and computation cost while maintaining competitive performance, making it highly suitable for real-time edge intelligence in industrial settings. Full article
(This article belongs to the Special Issue Intelligent Cloud–Edge Computing Continuum for Industry 4.0)
Show Figures

Figure 1

30 pages, 2417 KB  
Article
Hardware-Accelerated SMV Subscriber: Energy Quality Pre-Processed Metrics and Analysis
by Mihai-Alexandru Pisla, Bogdan-Adrian Enache, Vasilis Argyriou, Panagiotis Sarigiannidis and George-Calin Seritan
Electronics 2025, 14(16), 3297; https://doi.org/10.3390/electronics14163297 - 19 Aug 2025
Viewed by 589
Abstract
The paper presents an FPGA-based, hardware-accelerated IEC 61850-9-2 Sampled Measured Values (SMV) subscriber—termed the high-speed SMV subscriber (HS3)—by integrating real-time energy-quality (EQ) analytics directly into the subscriber pipeline while preserving a deterministic, microsecond-scale operation under high stream counts. Building on a prior hardware [...] Read more.
The paper presents an FPGA-based, hardware-accelerated IEC 61850-9-2 Sampled Measured Values (SMV) subscriber—termed the high-speed SMV subscriber (HS3)—by integrating real-time energy-quality (EQ) analytics directly into the subscriber pipeline while preserving a deterministic, microsecond-scale operation under high stream counts. Building on a prior hardware decoder that achieved sub-3 μs SMV parsing for up to 512 subscribed svIDs with modest logic utilization (<8%), the proposed design augments the pipeline with fixed-point RTL modules for single-bin DFT frequency estimation, windowed true-RMS computation, and per-sample active power evaluation, all operating in a streaming fashion with configurable windows and resolutions. A lightweight software layer performs only residual scalar combinations (e.g., apparent power, form factor) on pre-aggregated hardware outputs, thereby minimizing CPU load and memory traffic. The paper’s aim is to bridge the gap between software-centric analytics—common in toolkit-based deployments—and fixed-function commercial firmware, by delivering an open, modular architecture that co-locates SMV subscription and EQ pre-processing in the same hardware fabric. Implementation on an MPSoC platform demonstrates that integrating EQ analytics does not compromise the efficiency or accuracy of the primary decoding path and sustains the latency targets required for protection-and-control use cases, with accuracy consistent with offline references across representative test waveforms. In contrast to existing solutions that either compute PQ metrics post-capture in software or offer limited in-FPGA analytics, the main contributions lie in a cohesive, resource-efficient integration that exposes continuous, per-channel EQ metrics at microsecond granularity, together with an implementation-level characterization (latency, resource usage, and error against reference calculations) evidencing suitability for real-time substation automation. Full article
(This article belongs to the Section Circuit and Signal Processing)
Show Figures

Figure 1

14 pages, 1648 KB  
Article
Memory-Efficient Feature Merging for Residual Connections with Layer-Centric Tile Fusion
by Hao Zhang, Jianheng He, Yupeng Gui, Shichen Peng, Leilei Huang, Xiao Yan and Yibo Fan
Electronics 2025, 14(16), 3269; https://doi.org/10.3390/electronics14163269 - 18 Aug 2025
Viewed by 811
Abstract
Convolutional neural networks (CNNs) have achieved remarkable success in computer vision tasks, driving the rapid development of hardware accelerators. However, memory efficiency remains a key challenge, as conventional accelerators adopt layer-by-layer processing, leading to frequent external memory accesses (EMAs) of intermediate feature data, [...] Read more.
Convolutional neural networks (CNNs) have achieved remarkable success in computer vision tasks, driving the rapid development of hardware accelerators. However, memory efficiency remains a key challenge, as conventional accelerators adopt layer-by-layer processing, leading to frequent external memory accesses (EMAs) of intermediate feature data, which increase energy consumption and latency. While layer fusion has been proposed to enhance inter-layer feature reuse, existing approaches typically rely on fixed data management tailored to specific architectures, introducing on-chip memory overhead and requiring trade-offs with EMAs. Moreover, prevalent residual connections further weaken fusion benefits due to diverse data reuse distances. To address these challenges, we propose layer-centric tile fusion, which integrates residual data loading with feature merging by leveraging receptive field relationships among feature tiles. A reuse distance-aware caching strategy is introduced to support flexible storage for various data types. We also develop a modeling framework to analyze the trade-off between on-chip memory usage and EMA-induced energy-delay product (EDP). Experimental results demonstrate that our method achieves 5.04–43.44% EDP reduction and 20.28–58.33% memory usage reduction compared to state-of-the-art designs on ResNet-18 and SRGAN. Full article
(This article belongs to the Special Issue Research on Key Technologies for Hardware Acceleration)
Show Figures

Figure 1

23 pages, 3828 KB  
Article
SARAC4N: Socially and Resource-Aware Caching in Clustered Content-Centric Networks
by Amir Raza Khan, Umar Shoaib and Hannan Bin Liaqat
Future Internet 2025, 17(8), 341; https://doi.org/10.3390/fi17080341 - 29 Jul 2025
Viewed by 1649
Abstract
The Content-Centric Network (CCN) presents an alternative to the conventional TCP/IP network, where IP is fundamental for communication between the source and destination. Instead of relying on IP addresses, CCN emphasizes content to enable efficient data distribution through caching and delivery. The increasing [...] Read more.
The Content-Centric Network (CCN) presents an alternative to the conventional TCP/IP network, where IP is fundamental for communication between the source and destination. Instead of relying on IP addresses, CCN emphasizes content to enable efficient data distribution through caching and delivery. The increasing demand of graphic-intensive applications requires minimal response time and optimized resource utilization. Therefore, the CCN plays a vital role due to its efficient architecture and content management approach. To reduce data retrieval delays in CCNs, traditional methods improve caching mechanisms through clustering. However, these methods do not address the optimal use of resources, including CPU, memory, storage, and available links, along with the incorporation of social awareness. This study proposes SARAC4N, a socially and resource-aware caching framework for clustered Content-Centric Networks that integrates dual-head clustering and popularity-driven content placement. It enhances caching efficiency, reduces retrieval delays, and improves resource utilization across heterogeneous network topologies. This approach will help resolve congestion issues while enhancing social awareness, lowering error rates, and ensuring efficient content delivery. The proposed Socially and Resource-Aware Caching in Clustered Content-Centric Network (SARAC4N) enhances caching effectiveness by optimally utilizing resources and positioning them with social awareness within the cluster. Furthermore, it enhances metrics such as data retrieval time, reduces computation and memory usage, minimizes data redundancy, optimizes network usage, and lowers storage requirements, all while maintaining a very low error rate. Full article
Show Figures

Figure 1

19 pages, 1536 KB  
Article
A Study on Energy Consumption in AI-Driven Medical Image Segmentation
by R. Prajwal, S. J. Pawan, Shahin Nazarian, Nicholas Heller, Christopher J. Weight, Vinay Duddalwar and C.-C. Jay Kuo
J. Imaging 2025, 11(6), 174; https://doi.org/10.3390/jimaging11060174 - 26 May 2025
Cited by 1 | Viewed by 2522
Abstract
As artificial intelligence advances in medical image analysis, its environmental impact remains largely overlooked. This study analyzes the energy demands of AI workflows for medical image segmentation using the popular Kidney Tumor Segmentation-2019 (KiTS-19) dataset. It examines how training and inference differ in [...] Read more.
As artificial intelligence advances in medical image analysis, its environmental impact remains largely overlooked. This study analyzes the energy demands of AI workflows for medical image segmentation using the popular Kidney Tumor Segmentation-2019 (KiTS-19) dataset. It examines how training and inference differ in energy consumption, focusing on factors that influence resource usage, such as computational complexity, memory access, and I/O operations. To address these aspects, we evaluated three variants of convolution—Standard Convolution, Depthwise Convolution, and Group Convolution—combined with optimization techniques such as Mixed Precision and Gradient Accumulation. While training is energy-intensive, the recurring nature of inference often results in significantly higher cumulative energy consumption over a model’s life cycle. Depthwise Convolution with Mixed Precision achieves the lowest energy consumption during training while maintaining strong performance, making it the most energy-efficient configuration among those tested. In contrast, Group Convolution fails to achieve energy efficiency due to significant input/output overhead. These findings emphasize the need for GPU-centric strategies and energy-conscious AI practices, offering actionable guidance for designing scalable, sustainable innovation in medical image analysis. Full article
(This article belongs to the Special Issue Imaging in Healthcare: Progress and Challenges)
Show Figures

Figure 1

20 pages, 1735 KB  
Article
Efficient AI-Driven Query Optimization in Large-Scale Databases: A Reinforcement Learning and Graph-Based Approach
by Najla Sassi and Wassim Jaziri
Mathematics 2025, 13(11), 1700; https://doi.org/10.3390/math13111700 - 22 May 2025
Cited by 6 | Viewed by 3863
Abstract
As data-centric applications become increasingly complex, understanding effective query optimization in large-scale relational databases is crucial for managing this complexity. Yet, traditional cost-based and heuristic approaches simply do not scale, adapt, or remain accurate in highly dynamic multi-join queries. This research work proposes [...] Read more.
As data-centric applications become increasingly complex, understanding effective query optimization in large-scale relational databases is crucial for managing this complexity. Yet, traditional cost-based and heuristic approaches simply do not scale, adapt, or remain accurate in highly dynamic multi-join queries. This research work proposes the reinforcement learning and graph-based hybrid query optimizer (GRQO), the first ever to apply reinforcement learning and graph theory for optimizing query execution plans, specifically in join order selection and cardinality estimation. By employing proximal policy optimization for adaptive policy learning and using graph-based schema representations for relational modeling, GRQO effectively traverses the combinatorial optimization space. Based on TPC-H (1 TB) and IMDB (500 GB) workloads, GRQO runs 25% faster in query execution time, scales 30% better, reduces CPU and memory use by 20–25%, and reduces the cardinality estimation error by 47% compared to traditional cost-based optimizers and machine learning-based optimizers. These findings highlight the ability of GRQO to optimize performance and resource efficiency in database management in cloud computing, data warehousing, and real-time analytics. Full article
(This article belongs to the Section E1: Mathematics and Computer Science)
Show Figures

Figure 1

15 pages, 13605 KB  
Article
Dynamic Performance and Power Optimization with Heterogeneous Processing-in-Memory for AI Applications on Edge Devices
by Sangmin Jeon, Kangju Lee, Kyeongwon Lee and Woojoo Lee
Micromachines 2024, 15(10), 1222; https://doi.org/10.3390/mi15101222 - 30 Sep 2024
Cited by 5 | Viewed by 4120
Abstract
The rapid advancement of artificial intelligence (AI) technology, combined with the widespread proliferation of Internet of Things (IoT) devices, has significantly expanded the scope of AI applications, from data centers to edge devices. Running AI applications on edge devices requires a careful balance [...] Read more.
The rapid advancement of artificial intelligence (AI) technology, combined with the widespread proliferation of Internet of Things (IoT) devices, has significantly expanded the scope of AI applications, from data centers to edge devices. Running AI applications on edge devices requires a careful balance between data processing performance and energy efficiency. This challenge becomes even more critical when the computational load of applications dynamically changes over time, making it difficult to maintain optimal performance and energy efficiency simultaneously. To address these challenges, we propose a novel processing-in-memory (PIM) technology that dynamically optimizes performance and power consumption in response to real-time workload variations in AI applications. Our proposed solution consists of a new PIM architecture and an operational algorithm designed to maximize its effectiveness. The PIM architecture follows a well-established structure known for effectively handling data-centric tasks in AI applications. However, unlike conventional designs, it features a heterogeneous configuration of high-performance PIM (HP-PIM) modules and low-power PIM (LP-PIM) modules. This enables the system to dynamically adjust data processing based on varying computational load, optimizing energy efficiency according to the application’s workload demands. In addition, we present a data placement optimization algorithm to fully leverage the potential of the heterogeneous PIM architecture. This algorithm predicts changes in application workloads and optimally allocates data to the HP-PIM and LP-PIM modules, improving energy efficiency. To validate and evaluate the proposed technology, we implemented the PIM architecture and developed an embedded processor that integrates this architecture. We performed FPGA prototyping of the processor, and functional verification was successfully completed. Experimental results from running applications with varying workload demands on the prototype PIM processor demonstrate that the proposed technology achieves up to 29.54% energy savings. Full article
Show Figures

Figure 1

21 pages, 8281 KB  
Article
Novel Low Power Cross-Coupled FET-Based Sense Amplifier Design for High-Speed SRAM Circuits
by G. Lakshmi Priya, Puneet Saran, Shikhar Kumar Padhy, Prateek Agarwal, A. Andrew Roobert and L. Jerart Julus
Micromachines 2023, 14(3), 581; https://doi.org/10.3390/mi14030581 - 28 Feb 2023
Cited by 11 | Viewed by 7594
Abstract
We live in a technologically advanced society where we all use semiconductor chips in the majority of our gadgets, and the basic criterion concerning data storage and memory is a small footprint and low power consumption. SRAM is a very important part of [...] Read more.
We live in a technologically advanced society where we all use semiconductor chips in the majority of our gadgets, and the basic criterion concerning data storage and memory is a small footprint and low power consumption. SRAM is a very important part of this and can be used to meet all the above criteria. In this study, LTSpice software is used to come up with a high-performance sense amplifier circuit for low-power SRAM applications. Throughout this research, various power reduction approaches were explored, and the optimal solution has been implemented in our own modified SRAM design. In this article, the effect of power consumption and the reaction time of the suggested sense amplifier were also examined by adjusting the width-to-length (W/L) ratio of the transistor, the power supply, and the nanoscale technology. The exact amount of power used and the number of transistors required by different approaches to better comprehend the ideal technique are also provided. Our proposed design of a low-power sense amplifier has shown promising results, and we employ three variations of VLSI power reduction techniques to improve efficiency. Low-power SRAMs embrace the future of memory-centric neuromorphic computing applications. Full article
(This article belongs to the Special Issue Recent Advances in CMOS Devices and Applications)
Show Figures

Figure 1

Back to TopTop