Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (137)

Search Parameters:
Keywords = single-chip computer

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
17 pages, 5543 KB  
Article
TASNet-YOLO: An Identification and Classification Model for Surface Defects of Rough Planed Bamboo Strips
by Yitong Zhang, Rui Gao, Min Ji, Wei Zhang, Wenquan Yu and Xiangfeng Wang
Forests 2025, 16(10), 1595; https://doi.org/10.3390/f16101595 - 17 Oct 2025
Viewed by 204
Abstract
After rough planing, defects such as wormholes and small patches of green bark residue and decay are often overlooked and misclassified. Strip-like defects, including splinters and chipped edges, are easily confused with the natural bamboo grain, and a single elongated defect is frequently [...] Read more.
After rough planing, defects such as wormholes and small patches of green bark residue and decay are often overlooked and misclassified. Strip-like defects, including splinters and chipped edges, are easily confused with the natural bamboo grain, and a single elongated defect is frequently fragmented into multiple detection boxes. This study proposes a modified TASNet-YOLO model, an improved detector built on YOLO11n. Unlike prior YOLO-based bamboo defect detectors, TASNet-YOLO is a mechanism-guided redesign that jointly targets two persistent failure modes—limited visibility of small, low-contrast defects and fragmentation of elongated defects—while remaining feasible for real-time production settings. In the backbone, a newly designed TriMAD_Conv module is introduced as the core unit, enhancing the detection of wormholes as well as small-area defects such as green bark residue and decay. The additive-gated C3k2_AddCGLU is further integrated at selected C3k2 stages. The combination of additive interaction and CGLU improves channel selection and detail retention, highlighting differences between splinters and chipped edges and bamboo grain strips, thereby reducing false positives and improving precision. In the neck, the neck replaces nearest-neighbor upsampling and CBS with SNI-GSNeck to improve cross-scale alignment and fusion. Under an acceptable real-time budget, predictions for splinters and chipped edges become more contiguous and better aligned to edges, while wormholes predictions are more circular and less noisy. Experiments on our in-house dataset (8445 bamboo-strip defect images) show that, compared with YOLO11n, the proposed model improves detection accuracy by 5.1%, achieves 106.4 FPS, and reduces computational costs by 0.4 GFLOPs per forward pass. These properties meet the throughput demand of 2 m/s conveyor lines, and the compact model size and compute footprint make edge deployment straightforward for fast online screening and preliminary quality grading in industrial production. Full article
Show Figures

Figure 1

12 pages, 4545 KB  
Article
Wearable Flexible Wireless Pressure Sensor Based on Poly(vinyl alcohol)/Carbon Nanotube/MXene Composite for Health Monitoring
by Lei Zhang, Junqi Pang, Xiaoling Lu, Xiaohai Zhang and Xinru Zhang
Micromachines 2025, 16(10), 1132; https://doi.org/10.3390/mi16101132 - 30 Sep 2025
Viewed by 461
Abstract
Accurate pressure monitoring is crucial for both human body applications and intelligent robotic arms, particularly for whole-body motion monitoring in human–machine interfaces. Conventional wearable electronic devices, however, often suffer from rigid connections, non-conformity, and inaccuracies. In this study, we propose a high-precision wireless [...] Read more.
Accurate pressure monitoring is crucial for both human body applications and intelligent robotic arms, particularly for whole-body motion monitoring in human–machine interfaces. Conventional wearable electronic devices, however, often suffer from rigid connections, non-conformity, and inaccuracies. In this study, we propose a high-precision wireless flexible sensor using a poly(vinyl alcohol)/single-walled carbon nanotube/MXene composite as the sensitive material, combined with a randomly distributed wrinkle structure to accurately monitor pressure parameters. To validate the sensor’s performance, it was used to monitor movements of the vocal cords, bent fingers, and human pulse. The sensor exhibits a pressure measurement range of approximately 0–130 kPa and a minimum resolution of 20 Pa. At pressures below 1 kPa, the sensor exhibits high sensitivity, enabling the detection of transient pressure changes. Within the pressure range of 1–10 kPa, the sensitivity decreases to approximately 54.71 kPa−1. Additionally, the sensor demonstrates response times of 12.5 ms at 10 kPa. For wireless signal acquisition, the pressure sensor was integrated with a Bluetooth chip, enabling real-time high-precision pressure monitoring. A deep learning-based training model was developed, achieving over 98% accuracy in motion recognition without additional computing equipment. This advancement is significant for streamlined human motion monitoring systems and intelligent components. Full article
Show Figures

Figure 1

13 pages, 2717 KB  
Article
Learning Dynamics of Solitonic Optical Multichannel Neurons
by Alessandro Bile, Arif Nabizada, Abraham Murad Hamza and Eugenio Fazio
Biomimetics 2025, 10(10), 645; https://doi.org/10.3390/biomimetics10100645 - 24 Sep 2025
Viewed by 363
Abstract
This study provides an in-depth analysis of the learning dynamics of multichannel optical neurons based on spatial solitons generated in lithium niobate crystals. Single-node and multi-node configurations with different topological complexities (3 × 3, 4 × 4, and 5 × 5) were compared, [...] Read more.
This study provides an in-depth analysis of the learning dynamics of multichannel optical neurons based on spatial solitons generated in lithium niobate crystals. Single-node and multi-node configurations with different topological complexities (3 × 3, 4 × 4, and 5 × 5) were compared, assessing how the number of channels, geometry, and optical parameters affect the speed and efficiency of learning. The simulations indicate that single-node neurons achieve the desired imbalance more rapidly and with lower energy expenditure, whereas multi-node structures require higher intensities and longer timescales, yet yield a greater variety of responses, more accurately reproducing the functional diversity of biological neural tissues. The results highlight how the plasticity of these devices can be entirely modulated through optical parameters, paving the way for fully optical photonic neuromorphic networks in which memory and computation are co-localized, with potential applications in on-chip learning, adaptive routing, and distributed decision-making. Full article
Show Figures

Figure 1

19 pages, 3327 KB  
Article
Design and Research of High-Energy-Efficiency Underwater Acoustic Target Recognition System
by Ao Ma, Wenhao Yang, Pei Tan, Yinghao Lei, Liqin Zhu, Bingyao Peng and Ding Ding
Electronics 2025, 14(19), 3770; https://doi.org/10.3390/electronics14193770 - 24 Sep 2025
Viewed by 500
Abstract
Recently, with the rapid development of underwater resource exploration and underwater activities, underwater acoustic (UA) target recognition has become crucial in marine resource exploration. However, traditional underwater acoustic recognition systems face challenges such as low energy efficiency, poor accuracy, and slow response times. [...] Read more.
Recently, with the rapid development of underwater resource exploration and underwater activities, underwater acoustic (UA) target recognition has become crucial in marine resource exploration. However, traditional underwater acoustic recognition systems face challenges such as low energy efficiency, poor accuracy, and slow response times. Systems for UA target recognition using deep learning networks have garnered widespread attention. Convolutional neural network (CNN) consumes significant computational resources and energy during convolution operations, which exacerbates the issues of energy consumption and complicates edge deployment. This paper explores a high-energy-efficiency UA target recognition system. Based on the DenseNet CNN, the system uses fine-grained pruning for sparsification and sparse convolution computations. The UA target recognition CNN was deployed on FPGAs and chips to achieve low-power recognition. Using the noise-disturbed ShipsEar dataset, the system reaches a recognition accuracy of 98.73% at 0 dB signal-to-noise ratio (SNR). After 50% fine-grained pruning, the accuracy is 96.11%. The circuit prototype on FPGA shows that the circuit achieves an accuracy of 95% at 0 dB SNR. This work implements the circuit design and layout of the UA target recognition chip based on a 65 nm CMOS process. DC synthesis results show that the power consumption is 90.82 mW, and the single-target recognition time is 7.81 ns. Full article
(This article belongs to the Special Issue Digital Intelligence Technology and Applications)
Show Figures

Figure 1

16 pages, 2816 KB  
Article
Hardware-Encrypted System for Storage of Collected Data Based on Reconfigurable Architecture
by Vasil Gatev, Valentin Mollov and Adelina Aleksieva-Petrova
Appl. Syst. Innov. 2025, 8(5), 136; https://doi.org/10.3390/asi8050136 - 22 Sep 2025
Viewed by 434
Abstract
This submission is focused on the implementation of a system that acquires data from various types of sensors and securely stores them after encryption on a chip with a reconfigurable architecture. The system has the unique capability of encrypting the input data with [...] Read more.
This submission is focused on the implementation of a system that acquires data from various types of sensors and securely stores them after encryption on a chip with a reconfigurable architecture. The system has the unique capability of encrypting the input data with a single secret cryptographic key, which is stored only inside the hardware of the system itself, so the key remains unrecognizable upon completion of the system synthesis for any unauthorized user. Being stored as a part of the whole system architecture, the cryptographic key cannot be attained. It is not stored separately on the system RAM or any other supported memory, making the collected data fully protected. The reported work shows a data acquisition system which measures temperature with a high level of precision, transforms it to degrees Celsius, stores the collected data, and transfers them via serial interface when requested. Before storage, the data are encrypted with a 256-bit key, applying the AES algorithm. The data which are stored in the system memory and sent as UART packets towards the main computer do not include the cryptographic key in the data stream, so it is impossible for it to be retrieved from them. We show the flexibility of such kinds of data acquisition systems for sensing different types of signals, emphasizing secure storage and transferring, including data from meteorological sensors or highly confidential or biometrical data. Full article
Show Figures

Figure 1

33 pages, 4561 KB  
Review
Smartphone-Integrated Electrochemical Devices for Contaminant Monitoring in Agriculture and Food: A Review
by Sumeyra Savas and Seyed Mohammad Taghi Gharibzahedi
Biosensors 2025, 15(9), 574; https://doi.org/10.3390/bios15090574 - 2 Sep 2025
Cited by 2 | Viewed by 1999
Abstract
Recent progress in microfluidic technologies has led to the development of compact and highly efficient electrochemical platforms, including lab-on-a-chip (LoC) systems, that integrate multiple testing functions into a single, portable device. Combined with smartphone-based electrochemical devices, these systems enable rapid and accurate on-site [...] Read more.
Recent progress in microfluidic technologies has led to the development of compact and highly efficient electrochemical platforms, including lab-on-a-chip (LoC) systems, that integrate multiple testing functions into a single, portable device. Combined with smartphone-based electrochemical devices, these systems enable rapid and accurate on-site detection of food contaminants, including pesticides, heavy metals, pathogens, and chemical additives at farms, markets, and processing facilities, significantly reducing the need for traditional laboratories. Smartphones improve the performance of these platforms by providing computational power, wireless connectivity, and high-resolution imaging, making them ideal for in-field food safety testing with minimal sample and reagent requirements. At the core of these systems are electrochemical biosensors, which convert specific biochemical reactions into electrical signals, ensuring highly sensitive and selective detection. Advanced nanomaterials and integration with Internet of Things (IoT) technologies have further improved performance, delivering cost-effective, user-friendly food monitoring solutions that meet regulatory safety and quality standards. Analytical techniques such as voltammetry, amperometry, and impedance spectroscopy increase accuracy even in complex food samples. Moreover, low-cost engineering, artificial intelligence (AI), and nanotechnology enhance the sensitivity, affordability, and data analysis capabilities of smartphone-integrated electrochemical devices, facilitating their deployment for on-site monitoring of food and agricultural contaminants. This review explains how these technologies address global food safety challenges through rapid, reliable, and portable detection, supporting food quality, sustainability, and public health. Full article
Show Figures

Figure 1

25 pages, 7796 KB  
Article
Time-Dependent Optothermal Performance Analysis of a Flexible RGB-W LED Light Engine
by Md Shafiqul Islam and Mehmet Arik
Micromachines 2025, 16(9), 1007; https://doi.org/10.3390/mi16091007 - 31 Aug 2025
Viewed by 710
Abstract
The wide application of light emitting diodes (LEDs) in lighting systems has necessitated the inclusion of spectral tunability by using multi-color LED chips. Since the lighting requirement depends on the specific application, it is very important to have flexibility in terms of the [...] Read more.
The wide application of light emitting diodes (LEDs) in lighting systems has necessitated the inclusion of spectral tunability by using multi-color LED chips. Since the lighting requirement depends on the specific application, it is very important to have flexibility in terms of the driving conditions. While many applications use single or rather white color, some recent applications require multi-spectral lighting systems especially for agricultural or human-medical treatment applications. These systems are underexplored and pose specific challenges. In this paper, a mixture of red, green, blue, white (RGB-W) LED chips was used to develop a compact light engine specifically for agricultural applications. A computational study was performed to understand the optical distribution. Later, attention was turned into development of prototype light engines followed by experimental validation for both the thermal and optical characteristics. Each LED string was driven separately at different current levels enabling an option for obtaining an infinite number of colors for numerous applications. Each LED string on the developed light engine was driven at 300 mA, 500 mA, 700 mA, and 900 mA current levels, and the optical and thermal parameters were recorded simultaneously. A set of computational models and an experimental study were performed to understand the optical and thermal characteristics simultaneously. Full article
Show Figures

Figure 1

22 pages, 6033 KB  
Article
High-Density Neuromorphic Inference Platform (HDNIP) with 10 Million Neurons
by Yue Zuo, Ning Ning, Ke Cao, Rui Zhang, Cheng Fu, Shengxin Wang, Liwei Meng, Ruichen Ma, Guanchao Qiao, Yang Liu and Shaogang Hu
Electronics 2025, 14(17), 3412; https://doi.org/10.3390/electronics14173412 - 27 Aug 2025
Viewed by 690
Abstract
Modern neuromorphic processors exhibit neuron densities that are orders of magnitude lower than those of the biological cortex, hindering the deployment of large-scale spiking neural networks (SNNs) on single chips. To bridge this gap, we propose HDNIP, a 40 nm high-density neuromorphic inference [...] Read more.
Modern neuromorphic processors exhibit neuron densities that are orders of magnitude lower than those of the biological cortex, hindering the deployment of large-scale spiking neural networks (SNNs) on single chips. To bridge this gap, we propose HDNIP, a 40 nm high-density neuromorphic inference platform with a density-first architecture. By eliminating area-intensive on-chip SRAM and using 1280 compact cores with a time-division multiplexing factor of up to 8192, HDNIP integrates 10 million neurons and 80 billion synapses within a 44.39 mm2 synthesized area. This achieves an unprecedented neuron density of 225 k neurons/mm2, over 100 times greater than prior art. The resulting bandwidth challenges are mitigated by a ReRAM-based near-memory computation strategy combined with input reuse, reducing off-chip data transfer by approximately 95%. Furthermore, adaptive TDM and dynamic core fusion ensure high hardware utilization across diverse network topologies. Emulator-based validation using large SNNs, demonstrates a throughput of 13 GSOP/s at a low power consumption of 146 mW. HDNIP establishes a scalable pathway towards single-chip, low-SWaP neuromorphic systems for complex edge intelligence applications. Full article
(This article belongs to the Special Issue Feature Papers in Artificial Intelligence)
Show Figures

Figure 1

23 pages, 11925 KB  
Article
Design and Field Experiment of Synchronous Hole Fertilization Device for Maize Sowing
by Feng Pan, Jincheng Chen, Baiwei Wang, Ziheng Fang, Jinxin Liang, Kangkang He and Chao Ji
Agriculture 2025, 15(13), 1400; https://doi.org/10.3390/agriculture15131400 - 29 Jun 2025
Viewed by 4137
Abstract
The disadvantages of traditional strip fertilization technology for corn planting in China include low fertilizer utilization rates, unstable operation quality, and environmental pollution. Therefore, in this study, a synchronous hole fertilization device for corn planting based on real-time intelligent control is designed, aiming [...] Read more.
The disadvantages of traditional strip fertilization technology for corn planting in China include low fertilizer utilization rates, unstable operation quality, and environmental pollution. Therefore, in this study, a synchronous hole fertilization device for corn planting based on real-time intelligent control is designed, aiming to reduce fertilizer application and increase efficiency through the precise alignment technology of the seed and fertilizer. This device integrates an electric drive precision seeding unit, a slot wheel hole fertilization unit, and a multi-sensor coordinated closed-loop control system. An STM32 single-chip micro-computer is used to dynamically analyze the seed–fertilizer timing signal, and a double closed-loop control strategy (the position loop priority is higher than the speed loop) is used to correct the spatial phase difference between the seed and fertilizer in real time to ensure the precise control of the longitudinal distance (40~70 mm) and the lateral distance (50~80 mm) of the seed and fertilizer. Through the Box–Behnken response surface method, a field multi-factor test was carried out to analyze the mechanism of influence of the implemented forward speed (A), per-hole target fertilizing amount (B), and plant spacing (fertilizer hole interval) (C) on the seed–fertilizer alignment qualification rate (Y1) and the coefficient of variation in the hole fertilizing amount (Y2). The results showed that the order of primary and secondary factors affecting Y1 was A > C > B, and that the order affecting Y2 was C > B > A; the comprehensive performance of the device was best with the optimal parameter combination of A = 4.2 km/h, B = 4.4 g, and C = 30 cm, with Y1 as high as 94.024 ± 0.694% and Y2 as low as 3.147 ± 0.058%, which is significantly better than the traditional strip application method. The device realizes the precise regulation of 2~6 g/hole by optimizing the structural parameters of the outer groove wheel (arc center distance of 25 mm, cross-sectional area of 201.02 mm2, effective filling length of 2.73~8.19 mm), which can meet the differentiated agronomic needs of ordinary corn, silage corn, and popcorn. Field verification shows that the device significantly improves the spatial distribution of the concentration of fertilizer, effectively reduces the amount of fertilizer applied, and improves operational stability and reliability in multiple environments. This provides technical support for the regional application of precision agricultural equipment. Full article
(This article belongs to the Section Agricultural Technology)
Show Figures

Figure 1

13 pages, 5874 KB  
Article
Fano Resonance Mach–Zehnder Modulator Based on a Single Arm Coupled with a Photonic Crystal Nanobeam Cavity for Silicon Photonics
by Enze Shi, Guang Chen, Lidan Lu, Yingjie Xu, Jieyu Yang and Lianqing Zhu
Sensors 2025, 25(10), 3240; https://doi.org/10.3390/s25103240 - 21 May 2025
Viewed by 1510
Abstract
Recently, Fano resonance modulators and photonic crystal nanobeam cavities (PCNCs) have attracted more and more attention due to their superior performance, such as high modulation efficiency and high extinction ratio (ER). In this paper, a silicon Fano resonance Mach–Zehnder modulator (MZM) based on [...] Read more.
Recently, Fano resonance modulators and photonic crystal nanobeam cavities (PCNCs) have attracted more and more attention due to their superior performance, such as high modulation efficiency and high extinction ratio (ER). In this paper, a silicon Fano resonance Mach–Zehnder modulator (MZM) based on a single arm coupled with a PCNC is theoretically analyzed, designed, and numerically simulated. By optimizing the coupling length, lattice constant, coupling gap, and the number of holes in the mirror/taper region, the ER of our MZM can achieve 34 dB. When the applied voltage of the MZM is biased at 4.3 V and the non-return-to-zero on–off keying (NRZ-OOK) signal at a data rate of 10 Gbit/s is modulated, the sharpest asymmetric resonant peak and the most remarkable Fano line shape can be obtained around a wavelength of 1550.68 nm. Compared with the traditional nanobeam cavities, along with the varying radii, our PCNC design has holes with a fixed radius of 90 nm, which is suitable to be fabricated by a 180 nm passive silicon photonic multi-project wafer (MPW). Therefore, our compacted lab-on-chip, resonance-based silicon photonic MZM that is coupled with a PCNC has the advantages of superior performance and easy fabrication, which provide support for photonic integrated circuit designs and can be beneficial to various silicon photonic application fields, including photonic computing, photonic convolutional neural networks, and optical communications, in the future. Full article
(This article belongs to the Special Issue Advances in Microwave Photonics)
Show Figures

Figure 1

22 pages, 3466 KB  
Article
Hardware-Efficient Phase Demodulation for Digital ϕ-OTDR Receivers with Baseband and Analytic Signal Processing
by Shangming Du, Tianwei Chen, Can Guo, Yuxing Duan, Song Wu and Lei Liang
Sensors 2025, 25(10), 3218; https://doi.org/10.3390/s25103218 - 20 May 2025
Viewed by 1338
Abstract
This paper presents hardware-efficient phase demodulation schemes for FPGA-based digital phase-sensitive optical time-domain reflectometry (ϕ-OTDR) receivers. We first derive a signal model for the heterodyne ϕ-OTDR frontend, then propose and analyze three demodulation methods: (1) a baseband reconstruction approach via [...] Read more.
This paper presents hardware-efficient phase demodulation schemes for FPGA-based digital phase-sensitive optical time-domain reflectometry (ϕ-OTDR) receivers. We first derive a signal model for the heterodyne ϕ-OTDR frontend, then propose and analyze three demodulation methods: (1) a baseband reconstruction approach via zero-IF downconversion, (2) an analytic signal generation technique using the Hilbert transform (HT), and (3) a wavelet transform (WT)-based alternative for analytic signal extraction. Algorithm-hardware co-design implementations are detailed for both RFSoC and conventional FPGA platforms, with resource utilization comparisons. Additionally, we introduce an incremental DC-rejected phase unwrapper (IDRPU) algorithm to jointly address phase unwrapping and DC drift removal, minimizing computational overhead while avoiding numerical overflow. Experiments on simulated and real-world ϕ-OTDR data show that the HT method matches the performance of zero-IF demodulation with simpler hardware and lower resource usage, while the WT method offers enhanced robustness against fading noise (3.35–22.47 dB SNR improvement in fading conditions), albeit with slightly ambiguous event boundaries and higher hardware utilization. These findings provide actionable insights for demodulator design in distributed acoustic sensing (DAS) applications and advance the development of single-chip DAS systems. Full article
(This article belongs to the Special Issue Advances in Optical Sensing, Instrumentation and Systems: 2nd Edition)
Show Figures

Figure 1

14 pages, 1084 KB  
Article
Advancing Mapping Strategies and Circuit Optimization for Signed Operations in Compute-in-Memory Architecture
by Zhenjiao Chen, Binghe Ma, Feng Liang, Qi Cao, Yongqiang Wang, Hang Chen, Bin Lu and Shang Wang
Electronics 2025, 14(7), 1340; https://doi.org/10.3390/electronics14071340 - 27 Mar 2025
Viewed by 481
Abstract
Compute-in-memory (CIM) is a key focus in chip design, with mapping strategies gaining attention. However, many studies overlook the arrangement of significant bits in weights and the influence of the input order of activation bits, which are key aspects of bit-level mapping strategies. [...] Read more.
Compute-in-memory (CIM) is a key focus in chip design, with mapping strategies gaining attention. However, many studies overlook the arrangement of significant bits in weights and the influence of the input order of activation bits, which are key aspects of bit-level mapping strategies. While the three existing bit-level mapping strategies have their respective application scenarios and can address the majority of cases through combined use, a major challenge remains: their lack of support for signed computations, which limits their applicability in many practical scenarios. This work improves three existing mapping strategies to support signed weights and activations, optimizing CIM peripheral circuits with minimal overhead. The experimental results show a 68.4% improvement in energy efficiency and 56.2% in speed with a less than 1% area increase on Yolov3-tiny, and a 4× and 3.59× boost in energy efficiency using input-side parallel mapping strategy (ISP) and input- and output-side parallel mapping strategy (IOSP) on a single layer. The proposed work has the potential to significantly advance the field of CIM-based neural network accelerators by enabling efficient signed computations and enhancing flexibility, paving the way for broader adoption in real-time and energy-constrained applications. Full article
(This article belongs to the Section Circuit and Signal Processing)
Show Figures

Figure 1

22 pages, 675 KB  
Article
Enhancing CuFP Library with Self-Alignment Technique
by Fahimeh Hajizadeh, Tarek Ould-Bachir and Jean Pierre David
Computers 2025, 14(4), 118; https://doi.org/10.3390/computers14040118 - 24 Mar 2025
Viewed by 610
Abstract
High-Level Synthesis (HLS) tools have transformed FPGA development by streamlining digital design and enhancing efficiency. Meanwhile, advancements in semiconductor technology now support the integration of hundreds of floating-point units on a single chip, enabling more resource-intensive computations. CuFP, an HLS library, facilitates the [...] Read more.
High-Level Synthesis (HLS) tools have transformed FPGA development by streamlining digital design and enhancing efficiency. Meanwhile, advancements in semiconductor technology now support the integration of hundreds of floating-point units on a single chip, enabling more resource-intensive computations. CuFP, an HLS library, facilitates the creation of customized floating-point operators with configurable exponent and mantissa bit widths, providing greater flexibility and resource efficiency. This paper introduces the integration of the self-alignment technique (SAT) into the CuFP library, extending its capability for customized addition-related floating-point operations with enhanced precision and resource utilization. Our findings demonstrate that incorporating SAT into CuFP enables the efficient FPGA deployment of complex floating-point operators, achieving significant reductions in computational latency and improved resource efficiency. Specifically, for a vector size of 64, CuFPSAF reduces execution cycles by 29.4% compared to CuFP and by 81.5% compared to vendor IP while maintaining the same DSP utilization as CuFP and reducing it by 59.7% compared to vendor IP. These results highlight the efficiency of SAT in FPGA-based floating-point computations. Full article
Show Figures

Figure 1

16 pages, 5213 KB  
Article
Real-Time Temperature Prediction for Large-Scale Multi-Core Chips Based on Graph Convolutional Neural Networks
by Dengbao Miao, Gaoxiang Duan, Danyan Chen, Yongyin Zhu and Xiaoying Zheng
Electronics 2025, 14(6), 1223; https://doi.org/10.3390/electronics14061223 - 20 Mar 2025
Viewed by 1150
Abstract
The real-time temperature prediction of chips is a critical issue in the semiconductor field. As chip designs evolve towards 3D and high integration, traditional analytical methods such as finite element software and HotSpot face bottlenecks such as high difficulty in modeling, costly computation, [...] Read more.
The real-time temperature prediction of chips is a critical issue in the semiconductor field. As chip designs evolve towards 3D and high integration, traditional analytical methods such as finite element software and HotSpot face bottlenecks such as high difficulty in modeling, costly computation, and slow inference speeds when dealing with large-scale, multi-hotspot chip thermal analysis. To address these challenges, this paper proposes a real-time temperature prediction model for multi-core chips based on Graph Convolutional Neural Networks (GCNs) that includes the following specific steps: First, the multi-core chip and its temperature power information are represented by a graph according to the physical pattern of heat transfer; Second, three strategies—full connection, setting a truncation radius, and clustering—are proposed to construct the adjacency matrix of the graph, thus supporting the model to balance between computational complexity and accuracy; Third, the GCN model is improved by assigning learnable weights to the adjacency matrix, thereby enhancing its representational power for the temperature distribution of multiple cores. Experimental results show that, under different node numbers and distributions, our proposed method can control the Mean Squared Error (MSE) error of temperature prediction within 0.5, while the single inference time is within 2 ms, which is at least an order of magnitude faster than traditional methods such as HotSpot, meeting the requirements for real-time prediction. Full article
Show Figures

Figure 1

16 pages, 696 KB  
Article
Optimizing Lattice Basis Reduction Algorithm on ARM V8 Processors
by Ronghui Cao, Julong Wang, Liming Zheng, Jincheng Zhou, Haodong Wang, Tiaojie Xiao and Chunye Gong
Appl. Sci. 2025, 15(4), 2021; https://doi.org/10.3390/app15042021 - 14 Feb 2025
Viewed by 1029
Abstract
The LLL (Lenstra–Lenstra–Lovász) algorithm is an important method for lattice basis reduction and has broad applications in computer algebra, cryptography, number theory, and combinatorial optimization. However, current LLL algorithms face challenges such as inadequate adaptation to domestic supercomputers and low efficiency. To enhance [...] Read more.
The LLL (Lenstra–Lenstra–Lovász) algorithm is an important method for lattice basis reduction and has broad applications in computer algebra, cryptography, number theory, and combinatorial optimization. However, current LLL algorithms face challenges such as inadequate adaptation to domestic supercomputers and low efficiency. To enhance the efficiency of the LLL algorithm in practical applications, this research focuses on parallel optimization of the LLL_FP (LLL double-precision floating-point type) algorithm from the NTL library on the domestic Tianhe supercomputer using the Phytium ARM V8 processor. The optimization begins with the vectorization of the Gram–Schmidt coefficient calculation and row transformation using the SIMD instruction set of the Phytium chip, which significantly improve computational efficiency. Further assembly-level optimization fully utilizes the low-level instructions of the Phytium processor, and this increases execution speed. In terms of memory access, data prefetch techniques were then employed to load necessary data in advance before computation. This will reduce cache misses and accelerate data processing. To further enhance performance, loop unrolling was applied to the core loop, which allows more operations per loop iteration. Experimental results show that the optimized LLL_FP algorithm achieves up to a 42% performance improvement, with a minimum improvement of 34% and an average improvement of 38% in single-core efficiency compared to the serial LLL_FP algorithm. This study provides a more efficient solution for large-scale lattice basis reduction and demonstrates the potential of the LLL algorithm in ARM V8 high-performance computing environments. Full article
(This article belongs to the Special Issue Parallel Computing and Grid Computing: Technologies and Applications)
Show Figures

Figure 1

Back to TopTop