Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (358)

Search Parameters:
Keywords = hardware overhead

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
24 pages, 2879 KB  
Article
Skeleton-Based Real-Time Hand Gesture Recognition Using Data Fusion and Ensemble Multi-Stream CNN Architecture
by Maki K. Habib, Oluwaleke Yusuf and Mohamed Moustafa
Technologies 2025, 13(11), 484; https://doi.org/10.3390/technologies13110484 (registering DOI) - 26 Oct 2025
Abstract
Hand Gesture Recognition (HGR) is a vital technology that enables intuitive human–computer interaction in various domains, including augmented reality, smart environments, and assistive systems. Achieving both high accuracy and real-time performance remains challenging due to the complexity of hand dynamics, individual morphological variations, [...] Read more.
Hand Gesture Recognition (HGR) is a vital technology that enables intuitive human–computer interaction in various domains, including augmented reality, smart environments, and assistive systems. Achieving both high accuracy and real-time performance remains challenging due to the complexity of hand dynamics, individual morphological variations, and computational limitations. This paper presents a lightweight and efficient skeleton-based HGR framework that addresses these challenges through an optimized multi-stream Convolutional Neural Network (CNN) architecture and a trainable ensemble tuner. Dynamic 3D gestures are transformed into structured, noise-minimized 2D spatiotemporal representations via enhanced data-level fusion, supporting robust classification across diverse spatial perspectives. The ensemble tuner strengthens semantic relationships between streams and improves recognition accuracy. Unlike existing solutions that rely on high-end hardware, the proposed framework achieves real-time inference on consumer-grade devices without compromising accuracy. Experimental validation across five benchmark datasets (SHREC2017, DHG1428, FPHA, LMDHG, and CNR) confirms consistent or superior performance with reduced computational overhead. Additional validation on the SBU Kinect Interaction Dataset highlights generalization potential for broader Human Action Recognition (HAR) tasks. This advancement bridges the gap between efficiency and accuracy, supporting scalable deployment in AR/VR, mobile computing, interactive gaming, and resource-constrained environments. Full article
Show Figures

Figure 1

26 pages, 573 KB  
Article
Mutual V2I Multifactor Authentication Using PUFs in an Unsecure Multi-Hop Wi-Fi Environment
by Mohamed K. Elhadad and Fayez Gebali
Electronics 2025, 14(21), 4167; https://doi.org/10.3390/electronics14214167 (registering DOI) - 24 Oct 2025
Viewed by 148
Abstract
Secure authentication in vehicular ad hoc networks (VANETs) remains a fundamental challenge due to their dynamic topology, susceptibility to attacks, and scalability constraints in multi-hop communication. Existing approaches based on elliptic curve cryptography (ECC), blockchain, and fog computing have achieved partial success but [...] Read more.
Secure authentication in vehicular ad hoc networks (VANETs) remains a fundamental challenge due to their dynamic topology, susceptibility to attacks, and scalability constraints in multi-hop communication. Existing approaches based on elliptic curve cryptography (ECC), blockchain, and fog computing have achieved partial success but suffer from latency, resource overhead, and limited adaptability, leaving a gap for lightweight and hardware-rooted trust models. To address this, we propose a multi-hop mutual authentication protocol leveraging Physical Unclonable Functions (PUFs), which provide tamper-evident, device-specific responses for cryptographic key generation. Our design introduces a structured sequence of phases, including pre-deployment, registration, login, authentication, key establishment, and session maintenance, with optional multi-hop extension through relay vehicles. Unlike prior schemes, our protocol integrates fuzzy extractors for error tolerance, employs both inductive and game-based proofs for security guarantees, and maps BAN-logic reasoning to specific attack resistances, ensuring robustness against replay, impersonation, and man-in-the-middle attacks. The protocol achieves mutual trust between vehicles and RSUs while preserving anonymity via temporary identifiers and achieving forward secrecy through non-reused CRPs. Conceptual comparison with state-of-the-art PUF-based and non-PUF schemes highlights the potential for reduced latency, lower communication overhead, and improved scalability via cloud-assisted CRP lifecycle management, while pointing to the need for future empirical validation through simulation and prototyping. This work not only provides a secure and efficient solution for VANET authentication but also advances the field by offering the first integrated taxonomy-driven evaluation of PUF-enabled V2X protocols in multi-hop Wi-Fi environments. Full article
(This article belongs to the Special Issue Privacy and Security Vulnerabilities in 6G and Beyond Networks)
Show Figures

Figure 1

35 pages, 3920 KB  
Article
Towards Memory-Efficient and High-Performance Branch Prediction: The LXOR Architecture for Control Flow Optimization in Embedded and General-Purpose RISC-V Processors
by Devendra G. Sutar and Nitesh B. Guinde
J. Low Power Electron. Appl. 2025, 15(4), 64; https://doi.org/10.3390/jlpea15040064 (registering DOI) - 24 Oct 2025
Viewed by 47
Abstract
Accurate branch prediction is crucial for achieving high instruction throughput and minimizing control hazards in modern pipelines. This paper presents a novel LXOR (Local eXclusive-OR) branch predictor, which enhances prediction accuracy while reducing hardware complexity and memory usage. Unlike traditional predictors (GAg, GAp, [...] Read more.
Accurate branch prediction is crucial for achieving high instruction throughput and minimizing control hazards in modern pipelines. This paper presents a novel LXOR (Local eXclusive-OR) branch predictor, which enhances prediction accuracy while reducing hardware complexity and memory usage. Unlike traditional predictors (GAg, GAp, PAg, PAp, Gshare, Gselect) that rely on large Pattern History Tables (PHTs) or intricate global/local history combinations, the LXOR predictor employs complemented local history and XOR-based indexing, optimizing table access and reducing aliasing. Implemented and evaluated using the MARSS-RISCV simulator on a 64-bit in-order RISC-V core, the LXOR’s performance was compared against traditional predictors using Coremark and SPEC CPU2017 benchmarks. The LXOR consistently achieved competitive results, with a prediction accuracy of up to 83.92%, lower misprediction rates, and instruction flushes as low as 5.83%. It also attained an IPC rate of up to 0.83, all while maintaining a compact memory footprint of approximately 2 KB, significantly smaller than current alternatives. These findings demonstrate that the LXOR predictor not only matches the performance of more complex predictors but does so with less memory and logic overhead, making it ideal for embedded systems, low-power RISC-V processors, and resource-constrained IoT and edge devices. By balancing prediction accuracy with simplicity, the LXOR offers a scalable and cost-effective solution for next-generation microprocessors. Full article
Show Figures

Figure 1

27 pages, 1008 KB  
Article
Efficient Reliability Block Diagram Evaluation Through Improved Algorithms and Parallel Computing
by Gloria Gori, Marco Papini and Alessandro Fantechi
Appl. Sci. 2025, 15(21), 11397; https://doi.org/10.3390/app152111397 (registering DOI) - 24 Oct 2025
Viewed by 167
Abstract
Quantitative reliability evaluation is essential for optimizing control policies and maintenance strategies in complex industrial systems. While Reliability Block Diagrams (RBDs) are a natural formalism for modeling these hierarchical systems, modern applications require highly efficient, online reliability assessment on resource-constrained embedded hardware. This [...] Read more.
Quantitative reliability evaluation is essential for optimizing control policies and maintenance strategies in complex industrial systems. While Reliability Block Diagrams (RBDs) are a natural formalism for modeling these hierarchical systems, modern applications require highly efficient, online reliability assessment on resource-constrained embedded hardware. This demand presents two fundamental challenges: developing algorithmically efficient RBD evaluation methods that can handle diverse custom distributions while preserving numerical accuracy, and ensuring platform-agnostic performance across diverse multicore architectures. This paper investigates these issues by developing a new version of the librbd open-source RBD library. This version includes advances in efficiency of evaluation algorithms, as well as restructured computation sequences, cache-aware data structures to minimize memory overhead, and an adaptive parallelization framework that scales automatically from embedded processors to high-performance systems. Comprehensive validation demonstrates that these advances significantly reduce computational complexity and improve performance over the original implementation, enabling real-time analysis of substantially larger systems. Full article
(This article belongs to the Special Issue Uncertainty and Reliability Analysis for Engineering Systems)
Show Figures

Figure 1

20 pages, 495 KB  
Article
Efficient Single-Server Private Information Retrieval Based on LWE Encryption
by Hai Huang, Zhibo Guan, Bin Yu, Xiang Li, Mengmeng Ge, Chao Ma and Xiangyu Ma
Mathematics 2025, 13(21), 3373; https://doi.org/10.3390/math13213373 - 23 Oct 2025
Viewed by 208
Abstract
Private Information Retrieval (PIR) is a cryptographic protocol that allows users to retrieve data from one or more databases without revealing any information about their queries. Among existing PIR protocols, single-server schemes based on the Learning With Errors (LWE) assumption currently constitute the [...] Read more.
Private Information Retrieval (PIR) is a cryptographic protocol that allows users to retrieve data from one or more databases without revealing any information about their queries. Among existing PIR protocols, single-server schemes based on the Learning With Errors (LWE) assumption currently constitute the most practical class of constructions. However, existing schemes continue to suffer from high client-side preprocessing complexity and significant server-side storage overhead, leading to degraded overall performance. We propose ShufflePIR, a single-server protocol that marks the first introduction of an SM3-based pseudorandom function into the PIR framework for shuffling during preprocessing and utilizes cryptographic hardware to accelerate computation, thereby improving both efficiency and security. In addition, the adoption of a parallel encryption scheme based on the LWE assumption significantly enhances the client’s computational efficiency when processing long-bit data. We evaluate the performance of our protocol against the latest state-of-the-art PIR schemes. Simulation results demonstrate that ShufflePIR achieves a throughput of 9903 MB/s on a 16 GB database with 1 MB records, outperforming existing single-server PIR schemes. Overall, ShufflePIR provides an efficient and secure solution for privacy-preserving information retrieval in a wide range of applications. Full article
(This article belongs to the Special Issue Mathematical Models in Information Security and Cryptography)
Show Figures

Figure 1

31 pages, 1516 KB  
Article
Federated Quantum Machine Learning for Distributed Cybersecurity in Multi-Agent Energy Systems
by Kwabena Addo, Musasa Kabeya and Evans Eshiemogie Ojo
Energies 2025, 18(20), 5418; https://doi.org/10.3390/en18205418 - 14 Oct 2025
Viewed by 425
Abstract
The increasing digitization and decentralization of modern energy systems have heightened their vulnerability to sophisticated cyber threats, necessitating advanced, scalable, and privacy-preserving detection frameworks. This paper introduces a novel Federated Quantum Machine Learning (FQML) framework tailored for anomaly detection in multi-agent energy environments. [...] Read more.
The increasing digitization and decentralization of modern energy systems have heightened their vulnerability to sophisticated cyber threats, necessitating advanced, scalable, and privacy-preserving detection frameworks. This paper introduces a novel Federated Quantum Machine Learning (FQML) framework tailored for anomaly detection in multi-agent energy environments. By integrating parameterized quantum circuits (PQCs) at the local agent level with secure federated learning protocols, the framework enhances detection accuracy while preserving data privacy. A trimmed-mean aggregation scheme and differential privacy mechanisms are embedded to defend against Byzantine behaviors and data-poisoning attacks. The problem is formally modeled as a constrained optimization task, accounting for quantum circuit depth, communication latency, and adversarial resilience. Experimental validation on synthetic smart grid datasets demonstrates that FQML achieves high detection accuracy (≥96.3%), maintains robustness under adversarial perturbations, and reduces communication overhead by 28.6% compared to classical federated baselines. These results substantiate the viability of quantum-enhanced federated learning as a practical, hardware-conscious approach to distributed cybersecurity in next-generation energy infrastructures. Full article
Show Figures

Graphical abstract

37 pages, 2048 KB  
Article
TrackRISC: An Implicit Attack Flow Model and Hardware Microarchitectural Mitigation for Speculative Cache-Based Covert Channels
by Zhewen Zhang, Abdurrashid Ibrahim Sanka, Yuhan She, Jinfa Hong, Patrick S. Y. Hung and Ray C. C. Cheung
Electronics 2025, 14(20), 3973; https://doi.org/10.3390/electronics14203973 - 10 Oct 2025
Viewed by 529
Abstract
Speculative execution attacks significantly compromise the security of modern processors by enabling information leakage. These well-known attacks exploit speculative cache-based covert channels to effectively exfiltrate secret data by altering cache states. Existing hardware defenses specifically designed to prevent cache-based covert channels are effective [...] Read more.
Speculative execution attacks significantly compromise the security of modern processors by enabling information leakage. These well-known attacks exploit speculative cache-based covert channels to effectively exfiltrate secret data by altering cache states. Existing hardware defenses specifically designed to prevent cache-based covert channels are effective at blocking explicit channels. However, their protection against implicit attack variants remains limited, since these hardware defenses do not fully eliminate secret-dependent microarchitectural changes in caches. In this paper, we propose TrackRISC, a framework which comprises (i) a refined implicit attack flow model specifically for the exploration and analysis of implicit cache-based speculative execution attacks which severely compromise the security of existing hardware defenses, and (ii) a security-enhanced tracking and mitigation microarchitecture, termed TrackRISC-Defense, designed to mitigate both implicit and explicit attack variants that use speculative cache-based covert channels. To obtain realistic hardware evaluation results, we implement and evaluate both TrackRISC-Defense and a representative existing defense on top of the Berkeley’s out-of-order RISC-V processor core (SonicBOOM) using the VCU118 FPGA platform running Linux. Compared to the representative existing defense which incurs a performance overhead of 13.8%, TrackRISC-Defense ensures stronger security guarantees with a performance overhead of 19.4%. In addition, TrackRISC-Defense can mitigate both explicit and implicit speculative cache-based covert channels with a register-based hardware resource overhead of 0.4%. Full article
(This article belongs to the Special Issue Secure Hardware Architecture and Attack Resilience)
Show Figures

Figure 1

36 pages, 3753 KB  
Article
Energy Footprint and Reliability of IoT Communication Protocols for Remote Sensor Networks
by Jerzy Krawiec, Martyna Wybraniak-Kujawa, Ilona Jacyna-Gołda, Piotr Kotylak, Aleksandra Panek, Robert Wojtachnik and Teresa Siedlecka-Wójcikowska
Sensors 2025, 25(19), 6042; https://doi.org/10.3390/s25196042 - 1 Oct 2025
Viewed by 383
Abstract
Excessive energy consumption of communication protocols in IoT/IIoT systems constitutes one of the key constraints for the operational longevity of remote sensor nodes, where radio transmission often incurs higher energy costs than data acquisition or local computation. Previous studies have remained fragmented, typically [...] Read more.
Excessive energy consumption of communication protocols in IoT/IIoT systems constitutes one of the key constraints for the operational longevity of remote sensor nodes, where radio transmission often incurs higher energy costs than data acquisition or local computation. Previous studies have remained fragmented, typically focusing on selected technologies or specific layers of the communication stack, which has hindered the development of comparable quantitative metrics across protocols. The aim of this study is to design and validate a unified evaluation framework enabling consistent assessment of both wired and wireless protocols in terms of energy efficiency, reliability, and maintenance costs. The proposed approach employs three complementary research methods: laboratory measurements on physical hardware, profiling of SBC devices, and simulations conducted in the COOJA/Powertrace environment. A Unified Comparative Method was developed, incorporating bilinear interpolation and weighted normalization, with its robustness confirmed by a Spearman rank correlation coefficient exceeding 0.9. The analysis demonstrates that MQTT-SN and CoAP (non-confirmable mode) exhibit the highest energy efficiency, whereas HTTP/3 and AMQP incur the greatest energy overhead. Results are consolidated in the ICoPEP matrix, which links protocol characteristics to four representative RS-IoT scenarios: unmanned aerial vehicles (UAVs), ocean buoys, meteorological stations, and urban sensor networks. The framework provides well-grounded engineering guidelines that may extend node lifetime by up to 35% through the adoption of lightweight protocol stacks and optimized sampling intervals. The principal contribution of this work is the development of a reproducible, technology-agnostic tool for comparative assessment of IoT/IIoT communication protocols. The proposed framework addresses a significant research gap in the literature and establishes a foundation for further research into the design of highly energy-efficient and reliable IoT/IIoT infrastructures, supporting scalable and long-term deployments in diverse application environments. Full article
(This article belongs to the Collection Sensors and Sensing Technology for Industry 4.0)
Show Figures

Figure 1

20 pages, 4498 KB  
Article
Vessel Traffic Density Prediction: A Federated Learning Approach
by Amin Khodamoradi, Paulo Alves Figueiras, André Grilo, Luis Lourenço, Bruno Rêga, Carlos Agostinho, Ruben Costa and Ricardo Jardim-Gonçalves
ISPRS Int. J. Geo-Inf. 2025, 14(9), 359; https://doi.org/10.3390/ijgi14090359 - 18 Sep 2025
Viewed by 540
Abstract
Maritime safety, environmental protection, and efficient traffic management increasingly rely on data-driven technologies. However, leveraging Automatic Identification System (AIS) data for predictive modelling faces two major challenges: the massive volume of data generated in real-time and growing privacy concerns associated with proprietary vessel [...] Read more.
Maritime safety, environmental protection, and efficient traffic management increasingly rely on data-driven technologies. However, leveraging Automatic Identification System (AIS) data for predictive modelling faces two major challenges: the massive volume of data generated in real-time and growing privacy concerns associated with proprietary vessel information. This paper proposes a novel, privacy-preserving framework for vessel traffic density (VTD) prediction that addresses both challenges. The approach combines the European Maritime Observation and Data Network’s (EMODNet) grid-based VTD calculation method with Convolutional Neural Networks (CNN) to model spatiotemporal traffic patterns and employs Federated Learning to collaboratively build a global predictive model without the need for explicit sharing of proprietary AIS data. Three geographically diverse AIS datasets were harmonized, processed, and used to train local CNN models on hourly VTD matrices. These models were then aggregated via a Federated Learning framework under a lifelong learning scenario. Evaluation using Sparse Mean Squared Error shows that the federated global model achieves promising accuracy in sparse data scenarios and maintains performance parity when compared with local CNN-based models, all while preserving data privacy and minimizing hardware performance needs and data communication overheads. The results highlight the approach’s effectiveness and scalability for real-world maritime applications in traffic forecasting, safety, and operational planning. Full article
Show Figures

Figure 1

24 pages, 2650 KB  
Article
Memory Management Strategies for Software Quantum Simulators
by Gilberto Díaz, Luiz Steffenel, Carlos Barrios and Jean Couturier
Quantum Rep. 2025, 7(3), 41; https://doi.org/10.3390/quantum7030041 - 9 Sep 2025
Viewed by 957
Abstract
Software quantum simulators are essential tools for designing and testing quantum algorithms on classical computing architectures, especially given the current limitations of physical quantum hardware. This work focuses on studying and evaluating memory management strategies for scalable quantum state simulation. We examine full-state [...] Read more.
Software quantum simulators are essential tools for designing and testing quantum algorithms on classical computing architectures, especially given the current limitations of physical quantum hardware. This work focuses on studying and evaluating memory management strategies for scalable quantum state simulation. We examine full-state representation, dynamic state pruning, shared-memory parallelization with OpenMP, distributed memory execution using MPI, and error-bounded floating-point compression with ZFP. These techniques are implemented in a prototype simulator and assessed using the quantum Fourier transform as a benchmark, with performance compared against leading open-source simulators such as Intel-QS, QuEST, and qsim. The results show the trade-offs between computational overhead and memory efficiency, and demonstrate that hybrid approaches combining distributed memory and compression can significantly extend the number of qubits that can be simulated. This work contributes practical insights for improving the scalability of software quantum simulators on classical hardware through optimized memory usage. Full article
Show Figures

Figure 1

24 pages, 32280 KB  
Article
Spectral Channel Mixing Transformer with Spectral-Center Attention for Hyperspectral Image Classification
by Zhenming Sun, Hui Liu, Ning Chen, Haina Yang, Jia Li, Chang Liu and Xiaoping Pei
Remote Sens. 2025, 17(17), 3100; https://doi.org/10.3390/rs17173100 - 5 Sep 2025
Viewed by 1002
Abstract
In recent years, the research trend of HSI classification has focused on the innovative integration of deep learning and Transformer architecture to enhance classification performance through multi-scale feature extraction, attention mechanism optimization, and spectral–spatial collaborative modeling. However, due to the excessive computational complexity [...] Read more.
In recent years, the research trend of HSI classification has focused on the innovative integration of deep learning and Transformer architecture to enhance classification performance through multi-scale feature extraction, attention mechanism optimization, and spectral–spatial collaborative modeling. However, due to the excessive computational complexity and the large number of parameters of the Transformer, there is an expansion bottleneck in long sequence tasks, and the collaborative optimization of the algorithm and hardware is required. To better handle this issue, our paper proposes a method which integrates RWKV linear attention with Transformer through a novel TC-Former framework, combining TimeMixFormer and HyperMixFormer architectures. Specifically, TimeMixFormer has optimized the computational complexity through time decay weights and gating design, significantly improving the processing efficiency of long sequences and reducing the computational complexity. HyperMixFormer employs a gated WKV mechanism and dynamic channel weighting, combined with Mish activation and time-shift operations, to optimize computational overhead while achieving efficient cross-channel interaction, significantly enhancing the discriminative representation of spectral features. The pivotal characteristic of the proposed method lies in its innovative integration of linear attention mechanisms, which enhance HSI classification accuracy while achieving lower computational complexity. Evaluation experiments on three public hyperspectral datasets confirm that this framework outperforms the previous state-of-the-art algorithms in classification accuracy. Full article
(This article belongs to the Section Remote Sensing Image Processing)
Show Figures

Figure 1

19 pages, 13244 KB  
Article
MWR-Net: An Edge-Oriented Lightweight Framework for Image Restoration in Single-Lens Infrared Computational Imaging
by Xuanyu Qian, Xuquan Wang, Yujie Xing, Guishuo Yang, Xiong Dun, Zhanshan Wang and Xinbin Cheng
Remote Sens. 2025, 17(17), 3005; https://doi.org/10.3390/rs17173005 - 29 Aug 2025
Viewed by 849
Abstract
Infrared video imaging is an cornerstone technology for environmental perception, particularly in drone-based remote sensing applications such as disaster assessment and infrastructure inspection. Conventional systems, however, rely on bulky optical architectures that limit deployment on lightweight aerial platforms. Computational imaging offers a promising [...] Read more.
Infrared video imaging is an cornerstone technology for environmental perception, particularly in drone-based remote sensing applications such as disaster assessment and infrastructure inspection. Conventional systems, however, rely on bulky optical architectures that limit deployment on lightweight aerial platforms. Computational imaging offers a promising alternative by integrating optical encoding with algorithmic reconstruction, enabling compact hardware while maintaining imaging performance comparable to sophisticated multi-lens systems. Nonetheless, achieving real-time video-rate computational image restoration on resource-constrained unmanned aerial vehicles (UAVs) remains a critical challenge. To address this, we propose Mobile Wavelet Restoration-Net (MWR-Net), a lightweight deep learning framework tailored for real-time infrared image restoration. Built on a MobileNetV4 backbone, MWR-Net leverages depthwise separable convolutions and an optimized downsampling scheme to minimize parameters and computational overhead. A novel wavelet-domain loss enhances high-frequency detail recovery, while the modulation transfer function (MTF) is adopted as an optics-aware evaluation metric. With only 666.37 K parameters and 6.17 G MACs, MWR-Net achieves a PSNR of 37.10 dB and an SSIM of 0.964 on a custom dataset, outperforming a pruned U-Net baseline. Deployed on an RK3588 chip, it runs at 42 FPS. These results demonstrate MWR-Net’s potential as an efficient and practical solution for UAV-based infrared sensing applications. Full article
Show Figures

Figure 1

24 pages, 2736 KB  
Article
Hybrid Precision Gradient Accumulation for CNN-LSTM in Sports Venue Buildings Analytics: Energy-Efficient Spatiotemporal Modeling
by Lintian Lu, Zhicheng Cao, Xiaolong Chen, Hongfeng Zhang and Cora Un In Wong
Buildings 2025, 15(16), 2926; https://doi.org/10.3390/buildings15162926 - 18 Aug 2025
Viewed by 631
Abstract
We propose a hybrid CNN-LSTM architecture for energy-efficient spatiotemporal modeling in sports venue analytics, addressing the dual challenges of computational efficiency and prediction accuracy in dynamic environments. The proposed method integrates layered mixed-precision training with gradient accumulation, dynamically allocating bitwidths across the spatial [...] Read more.
We propose a hybrid CNN-LSTM architecture for energy-efficient spatiotemporal modeling in sports venue analytics, addressing the dual challenges of computational efficiency and prediction accuracy in dynamic environments. The proposed method integrates layered mixed-precision training with gradient accumulation, dynamically allocating bitwidths across the spatial (CNN) and temporal (LSTM) layers while maintaining robustness through a computational memory unit. The CNN feature extractor employs higher precision for early layers to preserve spatial details, whereas the LSTM reduces the precision for temporal sequences, optimizing energy consumption under a hardware-aware constraint. Furthermore, the gradient accumulation over micro-batches simulates large-batch training without memory overhead, and the computational memory unit mitigates precision loss by storing the intermediate gradients in high-precision buffers before quantization. The system is realized as a ResNet-18 variant with mixed-precision convolutions and a two-layer bidirectional LSTM, deployed on edge devices for real-time processing with sub 5 ms latency. Our theoretical analysis predicts a 35–45% energy reduction versus fixed-precision models while maintaining <2% accuracy degradation, crucial for large-scale deployment. The experimental results demonstrate a 40% reduction in energy consumption compared to fixed-precision models while achieving over 95% prediction accuracy in tasks such as occupancy forecasting and HVAC control. This work bridges the gap between energy efficiency and model performance, offering a scalable solution for large-scale venue analytics. Full article
(This article belongs to the Section Building Energy, Physics, Environment, and Systems)
Show Figures

Figure 1

23 pages, 2709 KB  
Article
Fusion of k-Means and Local Search Approach: An Improved Angular Bisector Insertion Algorithm for Solving the Traveling Salesman Problem
by Xiangfei Zeng, Jeng-Shyang Pan, Shu-Chuan Chu, Rui Wang, Xianquan Luo and Jiaqian Huang
Symmetry 2025, 17(8), 1345; https://doi.org/10.3390/sym17081345 - 18 Aug 2025
Viewed by 787
Abstract
The Angular Bisector Insertion Constructive Heuristic Algorithm (ABIA), though effective for small-scale TSPs, suffers from reduced solution quality and high computational complexity in larger instances due to the degradation of its geometric properties. To address this, two enhanced variants—k-ABIA and k-ABIA-3opt—are proposed. k-ABIA [...] Read more.
The Angular Bisector Insertion Constructive Heuristic Algorithm (ABIA), though effective for small-scale TSPs, suffers from reduced solution quality and high computational complexity in larger instances due to the degradation of its geometric properties. To address this, two enhanced variants—k-ABIA and k-ABIA-3opt—are proposed. k-ABIA employs k-means clustering to decompose large-scale problems into subgroups, each solved via ABIA, with designed inter-cluster connections to reduce global search cost. k-ABIA-3opt further integrates 3-opt local search and ATSP-specific refinement strategies to avoid local optima. Both algorithms were benchmarked against GA, AACO-LST, and the original ABIA on instances ranging from 100 to 1200 nodes, considering solution quality, stability, runtime, and ATSP performance. k-ABIA-3opt achieved the best overall solution quality, with a total deviation of 28.75%, outperforming AACO-LST (44.86%) and ABIA (144.93%). Meanwhile, k-ABIA, with its O(n2) complexity and low constant overhead, was the fastest, solving 1000-node problems within seconds on standard hardware. Both variants exhibit strong robustness due to minimal stochasticity. For ATSP, k-ABIA-3opt further incorporates directed graph-specific optimization strategies, yielding the best solution quality among all tested algorithms. In summary, k-ABIA-3opt is well-suited for scenarios demanding high-quality solutions within tight time constraints, while k-ABIA provides an efficient option for rapid large-scale TSP solving. Together, they offer scalable and effective solutions for both symmetric and asymmetric TSP instances. Full article
(This article belongs to the Section Computer)
Show Figures

Figure 1

14 pages, 1648 KB  
Article
Memory-Efficient Feature Merging for Residual Connections with Layer-Centric Tile Fusion
by Hao Zhang, Jianheng He, Yupeng Gui, Shichen Peng, Leilei Huang, Xiao Yan and Yibo Fan
Electronics 2025, 14(16), 3269; https://doi.org/10.3390/electronics14163269 - 18 Aug 2025
Viewed by 479
Abstract
Convolutional neural networks (CNNs) have achieved remarkable success in computer vision tasks, driving the rapid development of hardware accelerators. However, memory efficiency remains a key challenge, as conventional accelerators adopt layer-by-layer processing, leading to frequent external memory accesses (EMAs) of intermediate feature data, [...] Read more.
Convolutional neural networks (CNNs) have achieved remarkable success in computer vision tasks, driving the rapid development of hardware accelerators. However, memory efficiency remains a key challenge, as conventional accelerators adopt layer-by-layer processing, leading to frequent external memory accesses (EMAs) of intermediate feature data, which increase energy consumption and latency. While layer fusion has been proposed to enhance inter-layer feature reuse, existing approaches typically rely on fixed data management tailored to specific architectures, introducing on-chip memory overhead and requiring trade-offs with EMAs. Moreover, prevalent residual connections further weaken fusion benefits due to diverse data reuse distances. To address these challenges, we propose layer-centric tile fusion, which integrates residual data loading with feature merging by leveraging receptive field relationships among feature tiles. A reuse distance-aware caching strategy is introduced to support flexible storage for various data types. We also develop a modeling framework to analyze the trade-off between on-chip memory usage and EMA-induced energy-delay product (EDP). Experimental results demonstrate that our method achieves 5.04–43.44% EDP reduction and 20.28–58.33% memory usage reduction compared to state-of-the-art designs on ResNet-18 and SRGAN. Full article
(This article belongs to the Special Issue Research on Key Technologies for Hardware Acceleration)
Show Figures

Figure 1

Back to TopTop