Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (4)

Search Parameters:
Keywords = CPU front-end optimization

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
19 pages, 38545 KB  
Article
Improving Dynamic Visual SLAM in Robotic Environments via Angle-Based Optical Flow Analysis
by Sedat Dikici and Fikret Arı
Electronics 2026, 15(1), 223; https://doi.org/10.3390/electronics15010223 - 3 Jan 2026
Viewed by 315
Abstract
Dynamic objects present a major challenge for visual simultaneous localization and mapping (Visual SLAM), as feature measurements originating from moving regions can corrupt camera pose estimation and lead to inaccurate maps. In this paper, we propose a lightweight, semantic-free front-end enhancement for ORB-SLAM [...] Read more.
Dynamic objects present a major challenge for visual simultaneous localization and mapping (Visual SLAM), as feature measurements originating from moving regions can corrupt camera pose estimation and lead to inaccurate maps. In this paper, we propose a lightweight, semantic-free front-end enhancement for ORB-SLAM that detects and suppresses dynamic features using optical flow geometry. The key idea is to estimate a global motion direction point (MDP) from optical flow vectors and to classify feature points based on their angular consistency with the camera-induced motion field. Unlike magnitude-based flow filtering, the proposed strategy exploits the geometric consistency of optical flow with respect to a motion direction point, providing robustness not only to depth variation and camera speed changes but also to different camera motion patterns, including pure translation and pure rotation. The method is integrated into the ORB-SLAM front-end without modifying the back-end optimization or cost function. Experiments on public dynamic-scene datasets demonstrate that the proposed approach reduces absolute trajectory error by up to approximately 45% compared to baseline ORB-SLAM, while maintaining real-time performance on a CPU-only platform. These results indicate that reliable dynamic feature suppression can be achieved without semantic priors or deep learning models. Full article
(This article belongs to the Section Computer Science & Engineering)
Show Figures

Figure 1

26 pages, 1958 KB  
Article
Real-Time Heartbeat Classification on Distributed Edge Devices: A Performance and Resource Utilization Study
by Eko Sakti Pramukantoro, Kasyful Amron, Putri Annisa Kamila and Viera Wardhani
Sensors 2025, 25(19), 6116; https://doi.org/10.3390/s25196116 - 3 Oct 2025
Viewed by 910
Abstract
Early detection is crucial for preventing heart disease. Advances in health technology, particularly wearable devices for automated heartbeat detection and machine learning, can enhance early diagnosis efforts. However, previous studies on heartbeat classification inference systems have primarily relied on batch processing, which introduces [...] Read more.
Early detection is crucial for preventing heart disease. Advances in health technology, particularly wearable devices for automated heartbeat detection and machine learning, can enhance early diagnosis efforts. However, previous studies on heartbeat classification inference systems have primarily relied on batch processing, which introduces delays. To address this limitation, a real-time system utilizing stream processing with a distributed computing architecture is needed for continuous, immediate, and scalable data analysis. Real-time ECG inference is particularly crucial for immediate heartbeat classification, as human heartbeats occur with durations between 0.6 and 1 s, requiring inference times significantly below this threshold for effective real-time processing. This study implements a real-time heartbeat classification inference system using distributed stream processing with LSTM-512, LSTM-256, and FCN models, incorporating RR-interval, morphology, and wavelet features. The system is developed as a distributed web-based application using the Flask framework with distributed backend processing, integrating Polar H10 sensors via Bluetooth and Web Bluetooth API in JavaScript. The implementation consists of a frontend interface, distributed backend services, and coordinated inference processing. The frontend handles sensor pairing and manages real-time streaming for continuous ECG data transmission. The backend processes incoming ECG streams, performing preprocessing and model inference. Performance evaluations demonstrate that LSTM-based heartbeat classification can achieve real-time performance on distributed edge devices by carefully selecting features and models. Wavelet-based features with an LSTM-Sequential architecture deliver optimal results, achieving 99% accuracy with balanced precision-recall metrics and an inference time of 0.12 s—well below the 0.6–1 s heartbeat duration requirement. Resource analysis on Jetson Orin devices reveals that Wavelet-FCN models offer exceptional efficiency with 24.75% CPU usage, minimal GPU utilization (0.34%), and 293 MB memory consumption. The distributed architecture’s dynamic load balancing ensures resilience under varying workloads, enabling effective horizontal scaling. Full article
(This article belongs to the Special Issue Advanced Sensors for Human Health Management)
Show Figures

Figure 1

22 pages, 762 KB  
Article
BTIP: Branch Triggered Instruction Prefetcher Ensuring Timeliness
by Wenhai Lin, Yiquan Lin, Yiquan Chen, Shishun Cai, Zhen Jin, Jiexiong Xu, Yuzhong Zhang and Wenzhi Chen
Electronics 2024, 13(21), 4323; https://doi.org/10.3390/electronics13214323 - 4 Nov 2024
Viewed by 2348
Abstract
In CPU microarchitecture, caches store frequently accessed instructions and data by exploiting their locality, reducing memory access latency and improving application performance. However, contemporary applications with large code footprints often experience frequent Icache misses, which significantly degrade performance. Although Fetch-Directed Instruction Prefetching (FDIP) [...] Read more.
In CPU microarchitecture, caches store frequently accessed instructions and data by exploiting their locality, reducing memory access latency and improving application performance. However, contemporary applications with large code footprints often experience frequent Icache misses, which significantly degrade performance. Although Fetch-Directed Instruction Prefetching (FDIP) has been widely adopted in commercial processors to reduce Icache misses, our analysis reveals that FDIP still suffers from Icache misses caused by branch mispredictions and late prefetch, leaving considerable opportunity for performance optimization. Priority-Directed Instruction Prefetching (PDIP) has been proposed to reduce Icache misses caused by branch mispredictions in FDIP. However, it neglects Icache misses due to late prefetch and suffers from high storage overhead. In this paper, we proposed a branch-triggered instruction prefetcher (BTIP), which aims to prefetch Icache lines that FDIP cannot efficiently handle, including the Icache misses due to branch misprediction and late prefetch. We also introduce a novel Branch Target Buffer (BTB) organization, BTIP BTB, which stores prefetch metadata and reuses information from existing BTB entries, effectively reducing storage overhead. We implemented BTIP on the Champsim simulator and evaluated BTIP in detail using traces from the 1st Instruction Prefetching Championship (IPC-1). Our evaluation shows that BTIP outperforms both FDIP and PDIP. Specifically, BTIP reduces Icache misses by 38.0% and improves performance by 5.1% compared to FDIP. Additionally, BTIP outperforms PDIP by 1.6% while using only 41.9% of the storage space required by PDIP. Full article
(This article belongs to the Special Issue Computer Architecture & Parallel and Distributed Computing)
Show Figures

Figure 1

19 pages, 755 KB  
Article
A Fresh View on the Microarchitectural Design of FPGA-Based RISC CPUs in the IoT Era
by Giovanni Scotti and Davide Zoni
J. Low Power Electron. Appl. 2019, 9(1), 9; https://doi.org/10.3390/jlpea9010009 - 19 Feb 2019
Cited by 19 | Viewed by 10578
Abstract
The Internet-of-Things (IoT) revolution has shaped a new application domain where low-power RISC architectures constitute the standard computational backbone. The current de-facto design practice for such architectures is to extend the ISA and the corresponding microarchitecture with custom instructions to efficiently manage the [...] Read more.
The Internet-of-Things (IoT) revolution has shaped a new application domain where low-power RISC architectures constitute the standard computational backbone. The current de-facto design practice for such architectures is to extend the ISA and the corresponding microarchitecture with custom instructions to efficiently manage the complex tasks imposed by IoT applications, i.e., augmented reality, artificial intelligence and autonomous driving, within narrow energy and area budgets. However, the new IoT application domain also offers a unique opportunity to revisit and optimize the RISC microarchitectural design flow from a more communication- and memory-centric viewpoint. This manuscript critically explores and optimizes the design of a RISC CPU front-end for IoT delivering a two-fold objective: (i) provide an optimized CPU microarchitecture; and (ii) present a set of three design guidelines to steer the implementation of IoT CPUs. The exploration sits on a newly proposed Systems-on-Chip (SoC) and RISC CPU implementing the RISC-V/IMF ISA and accounting for area, timing, and performance design metrics. Such SoC offers a reference design to evaluate pros and cons of different microarchitectural solutions. A wide combination of microarchitectures considering different branch prediction schemes, cache design architectures and on-chip bus solutions have been evaluated. The entire exploration is focused on the FPGA-based implementation due to the renewed interest for this technology demonstrated by both the research community and companies. We note that ARM launched the DesignStart FPGA program to make available the Cortex-M microcontrollers on Xilinx FPGAs in the form of IP blocks. Full article
(This article belongs to the Special Issue Ultra-low Power Embedded Systems)
Show Figures

Figure 1

Back to TopTop