MDPI - Publisher of Open Access Journals

23 pages, 2228 KB

Open AccessArticle

Incremental Coding Testing and LT-Net Bit Error Prediction for Aircraft Pod LVDS Links

by Ting Wang, Peilei Xiao, Yong Tang and Ao Pang

Electronics 2026, 15(2), 339; https://doi.org/10.3390/electronics15020339 - 12 Jan 2026

Aircraft pod Low-Voltage Differential Signalling (LVDS) links frequently suffer from transmission errors in adverse environments, compromising reliability. We propose a comprehensive ‘real-time detection—precise prediction—dynamic adaptation’ solution. Firstly, a testing system based on the Xilinx Artix-7 Field Programmable Gate Array (FPGA) was developed using [...] Read more.

Aircraft pod Low-Voltage Differential Signalling (LVDS) links frequently suffer from transmission errors in adverse environments, compromising reliability. We propose a comprehensive ‘real-time detection—precise prediction—dynamic adaptation’ solution. Firstly, a testing system based on the Xilinx Artix-7 Field Programmable Gate Array (FPGA) was developed using incremental coding, verified across diverse hardware with quantitative physical parameters. Secondly, a Long Short-Term Memory (LSTM)-Transformer fusion network (LT-Net) with weighted loss and dynamic regularization was designed to optimize prediction in critical high Bit Error Rate (BER) regimes. To address distribution drift, an online adaptive mechanism utilizing Elastic Weight Consolidation (EWC) was integrated. Results show LT-Net reduces Mean Squared Error (MSE) by 41.7% and maintains superior Mean Absolute Error (MAE) compared to baseline Transformers, with drift-induced degradation kept within 8%. With an inference latency under 0.28 s, the system meets hard real-time requirements for aircraft pod reliability in complex scenarios. Full article

(This article belongs to the Topic AI-Driven Wireless Channel Modeling and Signal Processing)

28 pages, 1828 KB

Open AccessArticle

Edge Detection on a 2D-Mesh NoC with Systolic Arrays: From FPGA Validation to GDSII Proof-of-Concept

by Emma Mascorro-Guardado, Susana Ortega-Cisneros, Francisco Javier Ibarra-Villegas, Jorge Rivera, Héctor Emmanuel Muñoz-Zapata and Emilio Isaac Baungarten-Leon

Appl. Sci. 2026, 16(2), 702; https://doi.org/10.3390/app16020702 - 9 Jan 2026

Viewed by 68

Abstract

Edge detection is a key building block in real-time image-processing applications such as drone-based infrastructure inspection, autonomous navigation, and remote sensing. However, its computational cost remains a challenge for resource-constrained embedded systems. This work presents a hardware-accelerated edge detection architecture based on a [...] Read more.

Edge detection is a key building block in real-time image-processing applications such as drone-based infrastructure inspection, autonomous navigation, and remote sensing. However, its computational cost remains a challenge for resource-constrained embedded systems. This work presents a hardware-accelerated edge detection architecture based on a homogeneous 2D-mesh Network-on-Chip (NoC) integrating systolic arrays to efficiently perform the convolution operations required by the Sobel filter. The proposed architecture was first developed and validated as a 3 × 3 mesh prototype on FPGA (Xilinx Zynq-7000, Zynq-7010, XC7Z010-CLG400A, Zybo board, utilizing 26,112 LUTs, 24,851 flip-flops, and 162 DSP blocks), achieving a throughput of 8.8 Gb/s with a power consumption of 0.79 W at 100 MHz. Building upon this validated prototype, a reduced 2 × 2 node cluster with 14-bit word width was subsequently synthesized at the physical level as a proof-of-concept using the OpenLane RTL-to-GDSII open-source flow targeting the SkyWater 130 nm PDK (sky130A). Post-layout analysis confirms the manufacturability of the design, with a total power consumption of 378 mW and compliance with timing constraints, demonstrating the feasibility of mapping the proposed architecture to silicon and its suitability for drone-based infrastructure monitoring applications. Full article

(This article belongs to the Special Issue Advanced Integrated Circuit Design and Applications)

25 pages, 6136 KB

Open AccessArticle

Design and Implementation of a Decentralized Node-Level Battery Management System Chip Based on Deep Neural Network Algorithms

by Muh-Tian Shiue, Yang-Chieh Ou, Chih-Feng Wu, Yi-Fong Wang and Bing-Jun Liu

Electronics 2026, 15(2), 296; https://doi.org/10.3390/electronics15020296 - 9 Jan 2026

Viewed by 85

Abstract

As Battery Management Systems (BMSs) continue to expand in both scale and capacity, conventional state-of-charge (SOC) estimation methods—such as Coulomb counting and model-based observers—face increasing challenges in meeting the requirements for cell-level precision, scalability, and adaptability under aging and operating variability. To address [...] Read more.

As Battery Management Systems (BMSs) continue to expand in both scale and capacity, conventional state-of-charge (SOC) estimation methods—such as Coulomb counting and model-based observers—face increasing challenges in meeting the requirements for cell-level precision, scalability, and adaptability under aging and operating variability. To address these limitations, this study integrates a Deep Neural Network (DNN)–based estimation framework into a node-level BMS architecture, enabling edge-side computation at each individual battery cell. The proposed architecture adopts a decentralized node-level structure with distributed parameter synchronization, in which each BMS node independently performs SOC estimation using shared model parameters. Global battery characteristics are learned through offline training and subsequently synchronized to all nodes, ensuring estimation consistency across large battery arrays while avoiding centralized online computation. This design enhances system scalability and deployment flexibility, particularly in high-voltage battery strings with isolated measurement requirements. The proposed DNN framework consists of two identical functional modules: an offline training module and a real-time estimation module. The training module operates on high-performance computing platforms—such as in-vehicle microcontrollers during idle periods or charging-station servers—using historical charge–discharge data to extract and update battery characteristic parameters. These parameters are then transferred to the real-time estimation chip for adaptive SOC inference. The decentralized BMS node chip integrates preprocessing circuits, a momentum-based optimizer, a first-derivative sigmoid unit, and a weight update module. The design is implemented using the TSMC 40 nm CMOS process and verified on a Xilinx Virtex-5 FPGA. Experimental results using real BMW i3 battery data demonstrate a Root Mean Square Error (RMSE) of 1.853%, with an estimation error range of [4.324%, −4.346%]. Full article

(This article belongs to the Special Issue New Insights in Power Electronics: Prospects and Challenges)

► Show Figures

Figure 1

16 pages, 2077 KB

Open AccessArticle

Cross Comparison Between Thermal Cycling and High Temperature Stress on I/O Connection Elements

by Mamta Dhyani, Tsuriel Avraham, Joseph B. Bernstein and Emmanuel Bender

Micromachines 2026, 17(1), 88; https://doi.org/10.3390/mi17010088 - 9 Jan 2026

Viewed by 115

Abstract

This work examines resistance drift in FPGA I/O paths subjected to combined electrical and thermal stress, using a Xilinx Spartan-6 device as a representative platform. A multiplexed measurement approach was employed, in which multiple I/O pins were externally shorted and sequentially activated, enabling [...] Read more.

This work examines resistance drift in FPGA I/O paths subjected to combined electrical and thermal stress, using a Xilinx Spartan-6 device as a representative platform. A multiplexed measurement approach was employed, in which multiple I/O pins were externally shorted and sequentially activated, enabling precise tracking of voltage, current, and effective series resistance over time, under controlled bias conditions. Two accelerated stress modes were investigated: high-temperature dwell in the range of 80–120 °C and thermal cycling between 80 and 140 °C. Both stress modes exhibited similar sub-linear (power-law) time dependence on resistance change, indicating cumulative degradation behavior. However, Arrhenius analysis revealed a strong contrast in effective activation energy: approximately 0.62 eV for high-temperature dwell and approximately 1.3 eV for thermal cycling. This divergence indicates that distinct physical mechanisms dominate under each stress regime. The lower activation energy is consistent with electrically and thermally driven on-die degradation within the FPGA I/O macro, including bias-related aging of output drivers and pad-level structures. In contrast, the higher activation energy observed under thermal cycling is characteristic of diffusion- and creep-dominated thermo-mechanical damage in package-level interconnects, such as solder joints. These findings demonstrate that resistance-based monitoring of FPGA I/O paths can discriminate between device-dominated and package-dominated aging mechanisms, providing a practical foundation for reliability assessment and self-monitoring methodologies in complex electronic systems. Full article

(This article belongs to the Special Issue Emerging Packaging and Interconnection Technology, Second Edition)

► Show Figures

Figure 1

25 pages, 7245 KB

Open AccessArticle

A Hardware-Friendly Joint Denoising and Demosaicing System Based on Efficient FPGA Implementation

by Jiqing Wang, Xiang Wang and Yu Shen

Micromachines 2026, 17(1), 44; https://doi.org/10.3390/mi17010044 - 29 Dec 2025

Viewed by 259

Abstract

This paper designs a hardware-implementable joint denoising and demosaicing acceleration system. Firstly, a lightweight network architecture with multi-scale feature extraction based on partial convolution is proposed at the algorithm level. The partial convolution scheme can reduce the redundancy of filters and feature maps, [...] Read more.

This paper designs a hardware-implementable joint denoising and demosaicing acceleration system. Firstly, a lightweight network architecture with multi-scale feature extraction based on partial convolution is proposed at the algorithm level. The partial convolution scheme can reduce the redundancy of filters and feature maps, thereby reducing memory accesses, and achieve excellent visual effects with a smaller model complexity. In addition, multi-scale extraction can expand the receptive field while reducing model parameters. Then, we apply separable convolution and partial convolution to reduce the parameters of the model. Compared with the standard convolutional solution, the parameters and MACs are reduced by 83.38% and 77.71%, respectively. Moreover, different networks bring different memory access and complex computing methods; thus, we introduce a unified and flexibly configurable hardware acceleration processing platform and implement it on the Xilinx Zynq UltraScale + FPGA board. Finally, compared with the state-of-the-art neural network solution on the Kodak24 set, the peak signal-to-noise ratio and the structural similarity index measure are approximately improved by 2.36dB and 0.0806, respectively, and the computing efficiency is improved by 2.09×. Furthermore, the hardware architecture supports multi-parallelism and can adapt to the different edge-embedded scenarios. Overall, the image processing task solution proposed in this paper has positive advantages in the joint denoising and demosaicing system. Full article

(This article belongs to the Special Issue Advances in Field-Programmable Gate Arrays (FPGAs))

► Show Figures

Figure 1

24 pages, 2830 KB

Open AccessArticle

Real-Time Radar-Based Hand Motion Recognition on FPGA Using a Hybrid Deep Learning Model

by Taher S. Ahmed, Ahmed F. Mahmoud, Magdy Elbahnasawy, Peter F. Driessen and Ahmed Youssef

Sensors 2026, 26(1), 172; https://doi.org/10.3390/s26010172 - 26 Dec 2025

Viewed by 341

Abstract

Radar-based hand motion recognition (HMR) presents several challenges, including sensor interference, clutter, and the limitations of small datasets, which collectively hinder the performance and real-time deployment of deep learning (DL) models. To address these issues, this paper introduces a novel real-time HMR framework [...] Read more.

Radar-based hand motion recognition (HMR) presents several challenges, including sensor interference, clutter, and the limitations of small datasets, which collectively hinder the performance and real-time deployment of deep learning (DL) models. To address these issues, this paper introduces a novel real-time HMR framework that integrates advanced signal pre-processing, a hybrid convolutional neural network–support vector machine (CNN–SVM) architecture, and efficient hardware deployment. The pre-processing pipeline applies filtration, squared absolute value computation, and normalization to enhance radar data quality. To improve the robustness of DL models against noise and clutter, time-series radar signals are transformed into binarized images, providing a compact and discriminative representation for learning. A hybrid CNN-SVM model is then utilized for hand motion classification. The proposed model achieves a high classification accuracy of 98.91%, validating the quality of the extracted features and the efficiency of the proposed design. Additionally, it reduces the number of model parameters by approximately 66% relative to the most accurate recurrent baseline (CNN–GRU–SVM) and by up to 86% relative to CNN–BiLSTM–SVM, while achieving the highest SVM test accuracy of 92.79% across all CNN–RNN variants that use the same binarized radar images. For deployment, the model is quantized and implemented on two System-on-Chip (SoC) FPGA platforms—the Xilinx Zynq ZCU102 Evaluation Kit and the Xilinx Kria KR260 Robotics Starter Kit—using the Vitis AI toolchain. The system achieves end-to-end accuracies of 96.13% (ZCU102) and 95.42% (KR260). On the ZCU102, the system achieved a 70% reduction in execution time and a 74% improvement in throughput compared to the PC-based implementation. On the KR260, it achieved a 52% reduction in execution time and a 10% improvement in throughput relative to the same PC baseline. Both implementations exhibited minimal accuracy degradation relative to a PC-based setup—approximately 1% on ZCU102 and 2% on KR260. These results confirm the framework’s suitability for real-time, accurate, and resource-efficient radar-based hand motion recognition across diverse embedded environments. Full article

(This article belongs to the Special Issue Sensor Systems for Gesture Recognition (3rd Edition))

► Show Figures

Figure 1

23 pages, 19868 KB

Open AccessArticle

Pipelined Divider with Precomputed Multiples of Divisor

by Dauren Zhexebay, Symbat Mamanova, Beibit Karibayev, Alisher Skabylov, Nursultan Meirambekuly, Gulfeiruz Ikhsan, Timur Namazbayev and Sakhybay Tynymbayev

Electronics 2026, 15(1), 110; https://doi.org/10.3390/electronics15010110 - 25 Dec 2025

Viewed by 254

Abstract

Division remains one of the most computationally demanding operations in digital arithmetic. Traditional algorithms, such as restoring, non-restoring, and SRT (Sweeney–Robertson–Tocher) division, are limited by sequential dependencies that reduce throughput in hardware implementations. To overcome these constraints, this work proposes a pipelined integer [...] Read more.

Division remains one of the most computationally demanding operations in digital arithmetic. Traditional algorithms, such as restoring, non-restoring, and SRT (Sweeney–Robertson–Tocher) division, are limited by sequential dependencies that reduce throughput in hardware implementations. To overcome these constraints, this work proposes a pipelined integer divider architecture that employs precomputed divisor multiples and comparator-based logic to eliminate the need for full binary adders in the quotient selection stages. The proposed design consists of a three-stage pipeline, where each stage compares the shifted partial remainder with stored multiples of the divisor (B, 2B, 3B) to generate two quotient bits per clock cycle. This approach achieves a 2× reduction in the number of computation stages compared with conventional radix-2 dividers and ensures continuous operation after an initial pipeline latency. The architecture was described in Verilog hardware description language (HDL) and implemented on a Xilinx Artix-7 (XC7A100T-1CSG324C) field-programmable gate array (FPGA) using the Xilinx ISE Design Suite 14.4. Post-synthesis simulation confirmed correct quotient and remainder generation with a maximum operating frequency of 208 MHz. The implementation occupied less than 0.3% the look-up table (LUT) resources, achieving over a twofold performance improvement compared with a non-pipelined baseline. These results demonstrate that the proposed divider provides an efficient trade-off between speed and hardware cost, making it suitable for digital signal processing and embedded computation systems. Full article

(This article belongs to the Section Microelectronics)

► Show Figures

Figure 1

13 pages, 2634 KB

Open AccessArticle

A Rate-Adaptive MAC Protocol for Flexible OFDM-PONs

by Zhe Zheng, Yingying Chi, Xin Wang and Junjie Zhang

Sensors 2026, 26(1), 133; https://doi.org/10.3390/s26010133 - 24 Dec 2025

Viewed by 278

Abstract

The practical deployment of Orthogonal Frequency Division Multiplexing Passive Optical Networks (OFDM-PONs) is hindered by the lack of a Medium Access Network (MAC) protocol capable of managing their flexible, distance-dependent data rates, despite their high spectral efficiency. This paper proposes and validates a [...] Read more.

The practical deployment of Orthogonal Frequency Division Multiplexing Passive Optical Networks (OFDM-PONs) is hindered by the lack of a Medium Access Network (MAC) protocol capable of managing their flexible, distance-dependent data rates, despite their high spectral efficiency. This paper proposes and validates a novel rate-adaptive, Time Division Multiplexing (TDM)-based MAC protocol for OFDM-PON systems. A key contribution is the design of a three-layer header frame structure that supports multi-ONU data scheduling with heterogeneous rate profiles. Furthermore, the protocol incorporates a unique channel probing mechanism to dynamically determine the optimal transmission rate for each Optical Network Unit (ONU) during activation. The proposed Optical Line Terminal (OLT) side MAC protocol has been fully implemented in hardware on a Xilinx VCU118 FPGA platform, featuring a custom-designed ring buffer pool for efficient multi-ONU data management. Experimental results demonstrate robust upstream and downstream data transmission and confirm the system’s ability to achieve flexible net data rate switching on the downlink from 8.1 Gbit/s to 32.8 Gbit/s, contingent on the assigned rate stage. Full article

(This article belongs to the Special Issue Advances in Optical Fibers Sensing and Communication)

► Show Figures

Figure 1

18 pages, 60646 KB

Open AccessArticle

XORSFRO: A Resource-Efficient XOR Self-Feedback Ring Oscillator-Based TRNG Architecture for Securing Distributed Photovoltaic Systems

by Wei Guo, Rui Xia, Jingcheng Wang, Bosong Ding, Chao Xiong, Yuning Zhao and Jinping Li

Electronics 2026, 15(1), 71; https://doi.org/10.3390/electronics15010071 - 23 Dec 2025

Viewed by 162

Abstract

The performance of true random number generators (TRNGs) fundamentally depends on the quality of their entropy sources (ESs). However, many FPGA-friendly designs still rely on a single mechanism and struggle to achieve both high throughput and low resource cost. To address this challenge, [...] Read more.

The performance of true random number generators (TRNGs) fundamentally depends on the quality of their entropy sources (ESs). However, many FPGA-friendly designs still rely on a single mechanism and struggle to achieve both high throughput and low resource cost. To address this challenge, we propose the exclusive OR (XOR) Self-Feedback Ring Oscillator (XORSFRO), an XORNOT-style TRNG that integrates two cross-connected XOR gates with a short inverter delay chain and clocked sampling. A unified timing model is developed to describe how arrival-time skew and gate inertial delay lead to cancellation, narrow-pulse generation, and inversion events, thereby enabling effective entropy extraction. Experimental results on Xilinx Spartan-6 and Artix-7 FPGAs demonstrate that XORSFRO maintains stable operation across standard process–voltage–temperature (PVT) variations, while achieving higher throughput and lower hardware overhead compared with recent FPGA-based TRNGs. The generated bitstreams pass both the NIST SP 800-22 and NIST SP 800-90B test suites without post-processing. Full article

(This article belongs to the Special Issue New Trends in Cybersecurity and Hardware Design for IoT)

► Show Figures

Figure 1

29 pages, 29485 KB

Open AccessArticle

FPGA-Based Dual Learning Model for Wheel Speed Sensor Fault Detection in ABS Systems Using HIL Simulations

by Farshideh Kordi, Paul Fortier and Amine Miled

Electronics 2026, 15(1), 58; https://doi.org/10.3390/electronics15010058 - 23 Dec 2025

Viewed by 215

Abstract

The rapid evolution of modern vehicles into intelligent and interconnected systems presents new complexities in both functional safety and cybersecurity. In this context, ensuring the reliability and integrity of critical sensor data, such as wheel speed inputs for anti-lock brake systems (ABS), is [...] Read more.

The rapid evolution of modern vehicles into intelligent and interconnected systems presents new complexities in both functional safety and cybersecurity. In this context, ensuring the reliability and integrity of critical sensor data, such as wheel speed inputs for anti-lock brake systems (ABS), is essential. Effective detection of wheel speed sensor faults not only improves functional safety, but also plays a vital role in keeping system resilience against potential cyber–physical threats. Although data-driven approaches have gained popularity for system development due to their ability to extract meaningful patterns from historical data, a major limitation is the lack of diverse and representative faulty datasets. This study proposes a novel dual learning model, based on Temporal Convolutional Networks (TCN), designed to accurately distinguish between normal and faulty wheel speed sensor behavior within a hardware-in-the-loop (HIL) simulation platform implemented on an FPGA. To address dataset limitations, a TruckSim–MATLAB/Simulink co-simulation environment is used to generate realistic datasets under normal operation and eight representative fault scenarios, yielding up to 5000 labeled sequences (balanced between normal and faulty behaviors) at a sampling rate of 60 Hz. Two TCN models are trained independently to learn normal and faulty dynamics, and fault decisions are made by comparing the reconstruction errors (MSE and MAE) of both models, thus avoiding manually tuned thresholds. On a test set of 1000 sequences (500 normal and 500 faulty) from the 5000 sample configuration, the proposed dual TCN framework achieves a detection accuracy of 97.8%, a precision of 96.5%, a recall of 98.2%, and an F1-score of 97.3%, outperforming a single TCN baseline, which achieves 91.4% accuracy and an 88.9% F1-score. The complete dual TCN architecture is implemented on a Xilinx ZCU102 FPGA evaluation kit (AMD, Santa Clara, CA, USA), while supporting real-time inference in the HIL loop. These results demonstrate that the proposed approach provides accurate, low-latency fault detection suitable for safety-critical ABS applications and contributes to improving both functional safety and cyber-resilience of braking systems. Full article

(This article belongs to the Special Issue Artificial Intelligence and Microsystems)

► Show Figures

Figure 1

28 pages, 2463 KB

Open AccessArticle

Design of an Energy-Efficient SHA-3 Accelerator on Artix-7 FPGA for Secure Network Applications

by Abdulmunem A. Abdulsamad and Sándor R. Répás

Computers 2026, 15(1), 3; https://doi.org/10.3390/computers15010003 - 21 Dec 2025

Viewed by 248

Abstract

As the demand for secure communication and data integrity in embedded and networked systems continues to grow, there is an increasing need for cryptographic solutions that provide robust security while efficiently using energy and hardware resources. Although software-based implementations of SHA-3 provide design [...] Read more.

As the demand for secure communication and data integrity in embedded and networked systems continues to grow, there is an increasing need for cryptographic solutions that provide robust security while efficiently using energy and hardware resources. Although software-based implementations of SHA-3 provide design flexibility, they often struggle to meet the performance and power limitations of constrained environments. This study introduces a hardware-accelerated SHA-3 solution tailored for the Xilinx Artix-7 FPGA. The architecture includes a fully pipelined Keccak-f [1600] core and incorporates design strategies such as selective loop unrolling, clock gating, and pipeline balancing to enhance overall efficiency. Developed in VHDL and synthesised using Vivado 2024.2.2, the design achieves a throughput of 1.35 Gbps at 210 MHz, with a power consumption of 0.94 W—yielding an energy efficiency of 1.44 Gbps/W. Validation using NIST SHA-3 vectors confirms its reliable performance, making it a promising candidate for secure embedded systems, including IoT platforms, edge devices, and real-time authentication applications. Full article

► Show Figures

Figure 1

19 pages, 2020 KB

Open AccessArticle

A Low-Power SNN Processor Supporting On-Chip Learning for ECG Detection

by Jiada Mao, Youneng Hu, Fan Song, Yitao Li and De Ma

Electronics 2025, 14(24), 4923; https://doi.org/10.3390/electronics14244923 - 15 Dec 2025

Viewed by 295

Abstract

Traditional ECG detection devices are limited in their development due to the constraints of power consumption and differences in data sources. Currently, spiking neural networks (SNNs) have quickly attracted widespread attention owing to their low power consumption enabled by the event-driven nature and [...] Read more.

Traditional ECG detection devices are limited in their development due to the constraints of power consumption and differences in data sources. Currently, spiking neural networks (SNNs) have quickly attracted widespread attention owing to their low power consumption enabled by the event-driven nature and efficient learning capability inspired by the biological brain. This paper proposes a low-power SNN processor that supports on-chip learning. By implementing an efficient on-chip learning algorithm through hardware, adopting a two-layer dynamic neural network architecture, and utilizing an asynchronous communication interface for data transmission, the processor achieves excellent inference and learning performance while maintaining outstanding power efficiency. The proposed design was implemented and verified on Xilinx xc7z045ffg900. On the MIT-BIH database for ECG applications, it achieved an accuracy of 91.4%, with an inference power consumption of 62 mW and 215.53

μ J

per classification. The designed processor is well-suited for ECG applications that demand low power consumption and environmental adaptability. Full article

(This article belongs to the Section Semiconductor Devices)

► Show Figures

Figure 1

15 pages, 632 KB

Open AccessArticle

Efficient Fine-Grained LuT-Based Optimization of AES MixColumns and InvMixColumns for FPGA Implementation

by Oussama Azzouzi, Mohamed Anane, Mohamed Chahine Ghanem, Yassine Himeur and Hamza Kheddar

Electronics 2025, 14(24), 4912; https://doi.org/10.3390/electronics14244912 - 14 Dec 2025

Viewed by 278

Abstract

This paper presents fine-grained Field Programmable Gate Arrays (FPGA) architectures for the Advanced Encryption Standard (AES) MixColumns and InvMixColumns transformations, targeting improved performance and resource utilization. The proposed method reformulates these operations as boolean functions directly mapped onto FPGA Lookup-Table (LuT) primitives, replacing [...] Read more.

This paper presents fine-grained Field Programmable Gate Arrays (FPGA) architectures for the Advanced Encryption Standard (AES) MixColumns and InvMixColumns transformations, targeting improved performance and resource utilization. The proposed method reformulates these operations as boolean functions directly mapped onto FPGA Lookup-Table (LuT) primitives, replacing conventional xor-based arithmetic with memory-level computation. A custom MATLAB-R2019a-based pre-synthesis optimization algorithm performs algebraic simplification and shared subexpression extraction at the polynomial level of Galois Field

G F (2^{8})

, reducing redundant logic memory. This architecture, LuT-level optimization minimizes the delay of the complex InvMixColumns stage and narrows the delay gap between encryption (1.305 ns) and decryption (1.854 ns), resulting in a more balanced and power-efficient AES pipeline. Hardware implementation on a Xilinx Virtex-5 FPGA confirms the efficiency of the design, demonstrating competitive performance compared to state-of-the-art FPGA realizations. Its fast performance and minimal hardware requirements make it well suited for real-time secure communication systems and embedded platforms with limited resources that need reliable bidirectional data processing. Full article

(This article belongs to the Special Issue Cryptography and Computer Security)

► Show Figures

Figure 1

14 pages, 1586 KB

Open AccessArticle

Efficient Error Correction Coding for Physically Unclonable Functions

by Sreehari K. Narayanan, Ramesh Bhakthavatchalu and Remya Ajai Ajayan Sarala

J. Low Power Electron. Appl. 2025, 15(4), 70; https://doi.org/10.3390/jlpea15040070 - 12 Dec 2025

Viewed by 291

Abstract

Physically unclonable functions (PUFs) generate keys for cryptographic applications, eliminating the need for conventional key storage mechanisms. Since PUF responses are inherently noise-sensitive, their reliability can decrease under varying conditions. Integrating channel coding can enhance response stability and consistency. This work presents an [...] Read more.

Physically unclonable functions (PUFs) generate keys for cryptographic applications, eliminating the need for conventional key storage mechanisms. Since PUF responses are inherently noise-sensitive, their reliability can decrease under varying conditions. Integrating channel coding can enhance response stability and consistency. This work presents an efficient scheme that integrates a delay-base d PUF with a Low-Density Parity-Check (LDPC) code. Specifically, a feed-forward PUF is combined with LDPC coding to reliably regenerate the cryptographic key. Our design reproduces the key with minimal error using channel coding. The scheme achieves 96% key-generation reliability, representing a notable improvement over PUF-based key generation without error-correction coding. LDPC decoding with the min-sum algorithm provides better error correction than the bit-flipping algorithm, but it is more computationally intensive. We could design the proposed scheme with minimum hardware resource utilization using Xilinx Vivado 2018.2 and Cadence Genus tools. Full article

► Show Figures

Figure 1

16 pages, 434 KB

Open AccessArticle

Flexible and Area-Efficient Codesign Implementation of AES on FPGA

by Oussama Azzouzi, Mohamed Anane, Mohamed Chahine Ghanem, Yassine Himeur and Dominik Wojtczak

Cryptography 2025, 9(4), 78; https://doi.org/10.3390/cryptography9040078 - 1 Dec 2025

Cited by 1 | Viewed by 531

Abstract

As embedded and IoT systems demand secure and compact encryption, developing cryptographic solutions that are both lightweight and efficient remains a major challenge. Many existing AES implementations either lack flexibility or consume excessive hardware resources. This paper presents an area-efficient and flexible AES-128 [...] Read more.

As embedded and IoT systems demand secure and compact encryption, developing cryptographic solutions that are both lightweight and efficient remains a major challenge. Many existing AES implementations either lack flexibility or consume excessive hardware resources. This paper presents an area-efficient and flexible AES-128 implementation based on a hardware/software (HW/SW) co-design, specifically optimized for platforms with limited hardware resources, resulting in reduced power consumption. In this approach, key expansion is performed in software on a lightweight MicroBlaze processor, while encryption and decryption are accelerated by dedicated hardware IP cores optimized at the Look-up Table (LuT) level. The design is implemented on a Xilinx XC5VLX50T Virtex-5 FPGA, synthesized using Xilinx ISE 14.7, and tested at a 100 MHz system clock. It achieves a throughput of 13.3 Gbps and an area efficiency of 5.44 Gbps per slice, requiring only 2303 logic slices and 7 BRAMs on a Xilinx FPGA. It is particularly well-suited for resource-constrained applications such as IoT nodes, secure mobile devices, and smart cards. Since key expansion is executed only once per session, the runtime is dominated by AES core operations, enabling efficient processing of large data volumes. Although the present implementation targets AES-128, the HW/SW partitioning allows straightforward extension to AES-192 and AES-256 by modifying only the software Key expansion module, ensuring practical scalability with no hardware changes. Moreover, the architecture offers a balanced trade-off between performance, flexibility and resource utilization without relying on complex pipelining. Experimental results demonstrate the effectiveness and flexibility of the proposed lightweight design. Full article

► Show Figures

Figure 1

Search Results (484)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (484)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI