Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (34)

Search Parameters:
Keywords = on-chip training

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
24 pages, 4131 KB  
Article
A Novel SRAM In-Memory Computing Accelerator Design Approach with R2R-Ladder for AI Sensors and Eddy Current Testing
by Kevin Becker, Martin Zimmerling, Matthias Landwehr, Dirk Koster, Hans-Georg Herrmann and Wolf-Joachim Fischer
AI Sens. 2026, 2(1), 2; https://doi.org/10.3390/aisens2010002 - 15 Jan 2026
Viewed by 134
Abstract
This work presents a 6T-SRAM-based in-memory computing (IMC) system fabricated in a 180 nm CMOS technology. A total of 128 integrated polysilicon R2R-DACs for fully analog wordline control and performance analysis are integrated into the system. The proposed architecture enables analog computation directly [...] Read more.
This work presents a 6T-SRAM-based in-memory computing (IMC) system fabricated in a 180 nm CMOS technology. A total of 128 integrated polysilicon R2R-DACs for fully analog wordline control and performance analysis are integrated into the system. The proposed architecture enables analog computation directly inside the memory array and introduces a compact 1-bit per-column comparator scheme for energy-efficient classification without requiring ADCs. A dedicated pull-down-dominant SRAM sizing and an analog activation scheme ensure stable analog discharge behavior and precise control of the computation through time-dependent bitline dynamics. The system integrates a complete sensor front-end, which allows real eddy current data to be classified directly on-chip. Measurements demonstrate a performance density of 3.2 TOPS/mm2, a simulated energy efficiency of 45 TOPS/W at 50 MHz, and a measured efficiency of 3.4 TOPS/W at 5 MHz on silicon. The implemented online training mechanism further improves classification accuracy by adapting the SRAM cell states during operation. These results highlight the suitability of the presented IMC architecture for compact, low-power edge intelligence and sensor-driven machine learning applications. Full article
Show Figures

Figure 1

14 pages, 61684 KB  
Article
A CMOS-Compatible Silicon Nanowire Array Natural Light Photodetector with On-Chip Temperature Compensation Using a PSO-BP Neural Network
by Mingbin Liu, Xin Chen, Jiaye Zeng, Jintao Yi, Wenhe Liu, Xinjian Qu, Junsong Zhang, Haiyan Liu, Chaoran Liu, Xun Yang and Kai Huang
Micromachines 2026, 17(1), 23; https://doi.org/10.3390/mi17010023 - 25 Dec 2025
Viewed by 280
Abstract
Silicon nanowire (SiNW) photodetectors exhibit high sensitivity for natural light detection but suffer from significant performance degradation due to thermal interference. To overcome this limitation, this paper presents a high-performance, CMOS-compatible SiNW array natural light photodetector with monolithic integration of an on-chip temperature [...] Read more.
Silicon nanowire (SiNW) photodetectors exhibit high sensitivity for natural light detection but suffer from significant performance degradation due to thermal interference. To overcome this limitation, this paper presents a high-performance, CMOS-compatible SiNW array natural light photodetector with monolithic integration of an on-chip temperature sensor and an embedded intelligent compensation system. The device, fabricated via microfabrication techniques, features a dual-array architecture that enables simultaneous acquisition of optical and thermal signals, thereby simplifying peripheral circuitry. To achieve high-precision decoupling of the optical and thermal signals, we propose a hybrid temperature compensation algorithm that combines Particle Swarm Optimization (PSO) with a Back Propagation (BP) neural network. The PSO algorithm optimizes the initial weights and thresholds of the BP network, effectively preventing the network from getting trapped in local minima and accelerating the training process. Experimental results demonstrate that the proposed PSO-BP model achieves superior compensation accuracy and a significantly faster convergence rate compared to the traditional BP network. Furthermore, the optimized model was successfully implemented on an STM32 microcontroller. This embedded implementation validates the feasibility of real-time, high-accuracy temperature compensation, significantly enhancing the stability and reliability of the photodetector across a wide temperature range. This work provides a viable strategy for developing highly stable and integrated optical sensing systems. Full article
(This article belongs to the Special Issue Emerging Trends in Optoelectronic Device Engineering, 2nd Edition)
Show Figures

Figure 1

28 pages, 2888 KB  
Article
Decoding Coherent Patterns from Arrayed Waveguides for Free-Space Optical Angle-of-Arrival Estimation
by Jinwen Zhang, Haitao Zhang and Zhuoyi Yang
Sensors 2025, 25(23), 7231; https://doi.org/10.3390/s25237231 - 27 Nov 2025
Viewed by 531
Abstract
This paper presents a novel free-space optical Angle-of-Arrival (AOA) estimation method based on arrayed waveguide coherent mode decoding, aiming to surpass the inherent limitations of traditional AOA detection technologies, which face significant challenges in achieving miniaturization, low complexity, and high reliability. The method [...] Read more.
This paper presents a novel free-space optical Angle-of-Arrival (AOA) estimation method based on arrayed waveguide coherent mode decoding, aiming to surpass the inherent limitations of traditional AOA detection technologies, which face significant challenges in achieving miniaturization, low complexity, and high reliability. The method utilizes the AOA-related phase differences generated by the propagation and interference of incident light in an arrayed input waveguide, forming multi-beam interference fringes at the output end of the slab waveguide. This pattern is then sampled by an arrayed output waveguide to produce an intensity sequence, which is then fed into a trained CNN–Attention Regressor for AOA estimation. This study innovatively applies the method to decoding the spatial angular information of optical signals. Simulation results demonstrate the exceptional performance of our approach, achieving a Mean Absolute Error (MAE) of 0.0142° and a Root Mean Square Error (RMSE) of 0.0193° over a 40° field of view. This precision is significantly superior to conventional peak–linear calibration methods and other common neural network architectures, and exhibits remarkable robustness against simulated phase noise and manufacturing tolerances. This research demonstrates the powerful synergy between integrated photonics and deep learning, paving the way for a new class of highly integrated, robust, and high-performance on-chip optical sensors. Full article
(This article belongs to the Special Issue Advances in Optical Sensing, Instrumentation and Systems: 2nd Edition)
Show Figures

Figure 1

5739 KB  
Proceeding Paper
Smart Cattle Behavior Sensing with Embedded Vision and TinyML at the Edge
by Jazzie R. Jao, Edgar A. Vallar and Ibrahim Hameed
Eng. Proc. 2025, 118(1), 81; https://doi.org/10.3390/ECSA-12-26519 - 7 Nov 2025
Viewed by 197
Abstract
Accurate real-time monitoring of cattle behavior is essential for enabling data-driven decision-making in precision livestock farming. However, existing monitoring solutions often rely on cloud-based processing or high-power hardware, which are impractical for deployment in remote or low-infrastructure agricultural environments. There is a critical [...] Read more.
Accurate real-time monitoring of cattle behavior is essential for enabling data-driven decision-making in precision livestock farming. However, existing monitoring solutions often rely on cloud-based processing or high-power hardware, which are impractical for deployment in remote or low-infrastructure agricultural environments. There is a critical need for low-cost, energy-efficient, and autonomous sensing systems capable of operating independently at the edge. This paper presents a compact, sensor-integrated system for real-time cattle behavior monitoring using an embedded vision sensor and a TinyML-based inference pipeline. The system is designed for low-power deployment in field conditions and integrates the OV2640 image sensor with the Sipeed Maixduino platform, which features the Kendryte K210 RISC-V processor and an on-chip neural network accelerator (KPU). The platform supports fully on-device classification of cattle postures using a quantized convolutional neural network trained on the publicly available cattle behavior dataset, covering standing and lying behavioral states. Sensor data is captured via the onboard camera and preprocessed in real time to meet model input specifications. The trained model is quantized and converted into a K210-compatible. kmodel using the NNCase toolchain, and deployed using MaixPy firmware. System performance was evaluated based on inference latency, classification accuracy, memory usage, and energy efficiency. Results demonstrate that the proposed TinyML-enabled system can accurately classify cattle behaviors in real time while operating within the constraints of a low-power, embedded platform, making it a viable solution for smart livestock monitoring in remote or under-resourced environments. Full article
Show Figures

Figure 1

15 pages, 2516 KB  
Article
Energy-Efficient Training of Memristor Crossbar-Based Multi-Layer Neural Networks
by Raqibul Hasan, Md Shahanur Alam and Tarek M. Taha
Chips 2025, 4(3), 38; https://doi.org/10.3390/chips4030038 - 5 Sep 2025
Viewed by 2940
Abstract
Memristor crossbar-based neural network systems offer high throughput with low energy consumption. A key advantage of on-chip training in these systems is their ability to mitigate the effects of device variability and faults. This paper presents an efficient on-chip training circuit for memristor [...] Read more.
Memristor crossbar-based neural network systems offer high throughput with low energy consumption. A key advantage of on-chip training in these systems is their ability to mitigate the effects of device variability and faults. This paper presents an efficient on-chip training circuit for memristor crossbar-based multi-layer neural networks. We propose a novel method for storing the product of two analog signals directly in a memristor device, eliminating the need for ADC and DAC converters. Experimental results show that the proposed system is approximately twice as energy efficient and 1.5 times faster than existing memristor-based systems for training multi-layer neural networks. Full article
(This article belongs to the Special Issue IC Design Techniques for Power/Energy-Constrained Applications)
Show Figures

Figure 1

12 pages, 7716 KB  
Article
Hardware Accelerator Design by Using RT-Level Power Optimization Techniques on FPGA for Future AI Mobile Applications
by Achyuth Gundrapally, Yatrik Ashish Shah, Sai Manohar Vemuri and Kyuwon (Ken) Choi
Electronics 2025, 14(16), 3317; https://doi.org/10.3390/electronics14163317 - 20 Aug 2025
Cited by 2 | Viewed by 1563
Abstract
In resource-constrained edge environments—such as mobile devices, IoT systems, and electric vehicles—energy-efficient Convolution Neural Network (CNN) accelerators on mobile Field Programmable Gate Arrays (FPGAs) are gaining significant attention for real-time object detection tasks. This paper presents a low-power implementation of the Tiny YOLOv4 [...] Read more.
In resource-constrained edge environments—such as mobile devices, IoT systems, and electric vehicles—energy-efficient Convolution Neural Network (CNN) accelerators on mobile Field Programmable Gate Arrays (FPGAs) are gaining significant attention for real-time object detection tasks. This paper presents a low-power implementation of the Tiny YOLOv4 object detection model on the Xilinx ZCU104 FPGA platform by using Register Transfer Level (RTL) optimization techniques. We proposed three RTL techniques in the paper: (i) Local Explicit Clock Enable (LECE), (ii) operand isolation, and (iii) Enhanced Clock Gating (ECG). A novel low-power design of Multiply-Accumulate (MAC) operations, which is one of the main components in the AI algorithm, was proposed to eliminate redundant signal switching activities. The Tiny YOLOv4 model, trained on the COCO dataset, was quantized and compiled using the Tensil tool-chain for fixed-point inference deployment. Post-implementation evaluation using Vivado 2022.2 demonstrates around 29.4% reduction in total on-chip power. Our design supports real-time detection throughput while maintaining high accuracy, making it ideal for deployment in battery-constrained environments such as drones, surveillance systems, and autonomous vehicles. These results highlight the effectiveness of RTL-level power optimization for scalable and sustainable edge AI deployment. Full article
(This article belongs to the Special Issue Hardware Acceleration for Machine Learning)
Show Figures

Figure 1

21 pages, 3746 KB  
Article
DCP: Learning Accelerator Dataflow for Neural Networks via Propagation
by Peng Xu, Wenqi Shao and Ping Luo
Electronics 2025, 14(15), 3085; https://doi.org/10.3390/electronics14153085 - 1 Aug 2025
Cited by 1 | Viewed by 1378
Abstract
Deep neural network (DNN) hardware (HW) accelerators have achieved great success in improving DNNs’ performance and efficiency. One key reason is the dataflow in executing a DNN layer, including on-chip data partitioning, computation parallelism, and scheduling policy, which have large impacts on latency [...] Read more.
Deep neural network (DNN) hardware (HW) accelerators have achieved great success in improving DNNs’ performance and efficiency. One key reason is the dataflow in executing a DNN layer, including on-chip data partitioning, computation parallelism, and scheduling policy, which have large impacts on latency and energy consumption. Unlike prior works that required considerable efforts from HW engineers to design suitable dataflows for different DNNs, this work proposes an efficient data-centric approach, named Dataflow Code Propagation (DCP), to automatically find the optimal dataflow for DNN layers in seconds without human effort. It has several attractive benefits that prior studies lack, including the following: (i) We translate the HW dataflow configuration into a code representation in a unified dataflow coding space, which can be optimized by back-propagating gradients given a DNN layer or network. (ii) DCP learns a neural predictor to efficiently update the dataflow codes towards the desired gradient directions to minimize various optimization objectives, e.g., latency and energy. (iii) It can be easily generalized to unseen HW configurations in a zero-shot or few-shot learning manner. For example, without using additional training data, Extensive experiments on several representative models such as MobileNet, ResNet, and ViT show that DCP outperforms its counterparts in various settings. Full article
(This article belongs to the Special Issue Applied Machine Learning in Data Science)
Show Figures

Figure 1

24 pages, 6840 KB  
Article
A Tree Crown Segmentation Approach for Unmanned Aerial Vehicle Remote Sensing Images on Field Programmable Gate Array (FPGA) Neural Network Accelerator
by Jiayi Ma, Lingxiao Yan, Baozhe Chen and Li Zhang
Sensors 2025, 25(9), 2729; https://doi.org/10.3390/s25092729 - 25 Apr 2025
Cited by 1 | Viewed by 1355
Abstract
Tree crown detection of high-resolution UAV forest remote sensing images using computer technology has been widely performed in the last ten years. In forest resource inventory management based on remote sensing data, crown detection is the most important and essential part. Deep learning [...] Read more.
Tree crown detection of high-resolution UAV forest remote sensing images using computer technology has been widely performed in the last ten years. In forest resource inventory management based on remote sensing data, crown detection is the most important and essential part. Deep learning technology has achieved good results in tree crown segmentation and species classification, but relying on high-performance computing platforms, edge calculation, and real-time processing cannot be realized. In this thesis, the UAV images of coniferous Pinus tabuliformis and broad-leaved Salix matsudana collected by Jingyue Ecological Forest Farm in Changping District, Beijing, are used as datasets, and a lightweight neural network U-Net-Light based on U-Net and VGG16 is designed and trained. At the same time, the IP core and SoC architecture of the neural network accelerator are designed and implemented on the Xilinx ZYNQ 7100 SoC platform. The results show that U-Net-light only uses 1.56 MB parameters to classify and segment the crown images of double tree species, and the accuracy rate reaches 85%. The designed SoC architecture and accelerator IP core achieved 31 times the speedup of the ZYNQ hard core, and 1.3 times the speedup compared with the high-end CPU (Intel CoreTM i9-10900K). The hardware resource overhead is less than 20% of the total deployment platform, and the total on-chip power consumption is 2.127 W. Shorter prediction time and higher energy consumption ratio prove the effectiveness and rationality of architecture design and IP development. This work departs from conventional canopy segmentation methods that rely heavily on ground-based high-performance computing. Instead, it proposes a lightweight neural network model deployed on FPGA for real-time inference on unmanned aerial vehicles (UAVs), thereby significantly lowering both latency and system resource consumption. The proposed approach demonstrates a certain degree of innovation and provides meaningful references for the automation and intelligent development of forest resource monitoring and precision agriculture. Full article
(This article belongs to the Section Sensor Networks)
Show Figures

Figure 1

19 pages, 4018 KB  
Article
Research on Weather Recognition Based on a Field Programmable Gate Array and Lightweight Convolutional Neural Network
by Liying Chen, Fan Luo, Fei Wang and Liangfu Lv
Electronics 2025, 14(9), 1740; https://doi.org/10.3390/electronics14091740 - 24 Apr 2025
Cited by 2 | Viewed by 851
Abstract
With the rapid development of deep learning, weather recognition has become a research hotspot in the field of computer vision, and the research on field programmable gate array (FPGA) acceleration based on deep learning algorithms has received more and more attention, based on [...] Read more.
With the rapid development of deep learning, weather recognition has become a research hotspot in the field of computer vision, and the research on field programmable gate array (FPGA) acceleration based on deep learning algorithms has received more and more attention, based on which, we propose a method to implement deep neural networks for weather recognition in a small-scale FPGA. First, we train a deep separable convolutional neural network model for weather recognition to reduce the parameters and speed up the performance of hardware implementation. However, large-scale computation also brings the problem of excessive power consumption, which greatly limits the deployment of high-performance network models on mobile platforms. Therefore, we use a lightweight convolutional neural network approach to reduce the scale of computation, and the main idea of lightweight is to use fewer bits to store the weights. In addition, a hardware implementation of this model is proposed to speed up the operation and save on-chip resource consumption. Finally, the network model is deployed on a Xilinx ZYNQ xc7z020 FPGA to verify the accuracy of the recognition results, and the accelerated solution succeeds in achieving excellent performance with a speed of 108 FPS and 3.256 W of power consumption. The purpose of this design is to be able to accurately recognize the weather and deliver current environmental weather information to UAV (unmanned aerial vehicle) pilots and other staff who need to consider the weather, so that they can accurately grasp the current environmental weather conditions at any time. When the weather conditions change, the information can be obtained in a timely and effective manner to make the correct judgment, to ensure the flight of the UAV, and to avoid the equipment being affected by the weather leading to equipment damage and failure of the flight mission. With the help of this design, the UAV flight mission can be better completed. Full article
(This article belongs to the Section Artificial Intelligence)
Show Figures

Figure 1

20 pages, 2239 KB  
Article
A Novel Lightweight Deep Learning Approach for Drivers’ Facial Expression Detection
by Jia Uddin
Designs 2025, 9(2), 45; https://doi.org/10.3390/designs9020045 - 3 Apr 2025
Cited by 3 | Viewed by 2245
Abstract
Drivers’ facial expression recognition systems play a pivotal role in Advanced Driver Assistance Systems (ADASs) by monitoring emotional states and detecting fatigue or distractions in real time. However, deploying such systems in resource-constrained environments like vehicles requires lightweight architectures to ensure real-time performance, [...] Read more.
Drivers’ facial expression recognition systems play a pivotal role in Advanced Driver Assistance Systems (ADASs) by monitoring emotional states and detecting fatigue or distractions in real time. However, deploying such systems in resource-constrained environments like vehicles requires lightweight architectures to ensure real-time performance, efficient model updates, and compatibility with embedded hardware. Smaller models significantly reduce communication overhead in distributed training. For autonomous vehicles, lightweight architectures also minimize the data transfer required for over-the-air updates. Moreover, they are crucial for their deployability on hardware with limited on-chip memory. In this work, we propose a novel Dual Attention Lightweight Deep Learning (DALDL) approach for drivers’ facial expression recognition. The proposed approach combines the SqueezeNext architecture with a Dual Attention Convolution (DAC) block. Our DAC block integrates Hybrid Channel Attention (HCA) and Coordinate Space Attention (CSA) to enhance feature extraction efficiency while maintaining minimal parameter overhead. To evaluate the effectiveness of our architecture, we compare it against two baselines: (a) Vanilla SqueezeNet and (b) AlexNet. Compared with SqueezeNet, DALDL improves accuracy by 7.96% and F1-score by 7.95% on the KMU-FED dataset. On the CK+ dataset, it achieves 8.51% higher accuracy and 8.40% higher F1-score. Against AlexNet, DALDL improves accuracy by 4.34% and F1-score by 4.17% on KMU-FED. Lastly, on CK+, it provides a 5.36% boost in accuracy and a 7.24% increase in F1-score. These results demonstrate that DALDL is a promising solution for efficient and accurate emotion recognition in real-world automotive applications. Full article
Show Figures

Figure 1

16 pages, 5947 KB  
Data Descriptor
Stimulated Microcontroller Dataset for New IoT Device Identification Schemes through On-Chip Sensor Monitoring
by Alberto Ramos, Honorio Martín, Carmen Cámara and Pedro Peris-Lopez
Data 2024, 9(5), 62; https://doi.org/10.3390/data9050062 - 28 Apr 2024
Cited by 3 | Viewed by 2466
Abstract
Legitimate identification of devices is crucial to ensure the security of present and future IoT ecosystems. In this regard, AI-based systems that exploit intrinsic hardware variations have gained notable relevance. Within this context, on-chip sensors included for monitoring purposes in a wide range [...] Read more.
Legitimate identification of devices is crucial to ensure the security of present and future IoT ecosystems. In this regard, AI-based systems that exploit intrinsic hardware variations have gained notable relevance. Within this context, on-chip sensors included for monitoring purposes in a wide range of SoCs remain almost unexplored, despite their potential as a valuable source of both information and variability. In this work, we introduce and release a dataset comprising data collected from the on-chip temperature and voltage sensors of 20 microcontroller-based boards from the STM32L family. These boards were stimulated with five different algorithms, as workloads to elicit diverse responses. The dataset consists of five acquisitions (1.3 billion readouts) that are spaced over time and were obtained under different configurations using an automated platform. The raw dataset is publicly available, along with metadata and scripts developed to generate pre-processed T–V sequence sets. Finally, a proof of concept consisting of training a simple model is presented to demonstrate the feasibility of the identification system based on these data. Full article
Show Figures

Figure 1

22 pages, 4830 KB  
Article
Hybrid Precision Floating-Point (HPFP) Selection to Optimize Hardware-Constrained Accelerator for CNN Training
by Muhammad Junaid, Hayotjon Aliev, SangBo Park, HyungWon Kim, Hoyoung Yoo and Sanghoon Sim
Sensors 2024, 24(7), 2145; https://doi.org/10.3390/s24072145 - 27 Mar 2024
Cited by 5 | Viewed by 3307
Abstract
The rapid advancement in AI requires efficient accelerators for training on edge devices, which often face challenges related to the high hardware costs of floating-point arithmetic operations. To tackle these problems, efficient floating-point formats inspired by block floating-point (BFP), such as Microsoft Floating [...] Read more.
The rapid advancement in AI requires efficient accelerators for training on edge devices, which often face challenges related to the high hardware costs of floating-point arithmetic operations. To tackle these problems, efficient floating-point formats inspired by block floating-point (BFP), such as Microsoft Floating Point (MSFP) and FlexBlock (FB), are emerging. However, they have limited dynamic range and precision for the smaller magnitude values within a block due to the shared exponent. This limits the BFP’s ability to train deep neural networks (DNNs) with diverse datasets. This paper introduces the hybrid precision (HPFP) selection algorithms, designed to systematically reduce precision and implement hybrid precision strategies, thereby balancing layer-wise arithmetic operations and data path precision to address the shortcomings of traditional floating-point formats. Reducing the data bit width with HPFP allows more read/write operations from memory per cycle, thereby decreasing off-chip data access and the size of on-chip memories. Unlike traditional reduced precision formats that use BFP for calculating partial sums and accumulating those partial sums in 32-bit Floating Point (FP32), HPFP leads to significant hardware savings by performing all multiply and accumulate operations in reduced floating-point format. For evaluation, two training accelerators for the YOLOv2-Tiny model were developed, employing distinct mixed precision strategies, and their performance was benchmarked against an accelerator utilizing a conventional brain floating point of 16 bits (Bfloat16). The HPFP selection, employing 10 bits for the data path of all layers and for the arithmetic of layers requiring low precision, along with 12 bits for layers requiring higher precision, results in a 49.4% reduction in energy consumption and a 37.5% decrease in memory access. This is achieved with only a marginal mean Average Precision (mAP) degradation of 0.8% when compared to an accelerator based on Bfloat16. This comparison demonstrates that the proposed accelerator based on HPFP can be an efficient approach to designing compact and low-power accelerators without sacrificing accuracy. Full article
(This article belongs to the Special Issue Edge Computing in Sensors Networks)
Show Figures

Figure 1

11 pages, 2783 KB  
Article
Multimode Optical Interconnects on Silicon Interposer Enable Confidential Hardware-to-Hardware Communication
by Qian Zhang, Sujay Charania, Stefan Rothe, Nektarios Koukourakis, Niels Neumann, Dirk Plettemeier and Juergen W. Czarske
Sensors 2023, 23(13), 6076; https://doi.org/10.3390/s23136076 - 1 Jul 2023
Cited by 5 | Viewed by 2670
Abstract
Following Moore’s law, the density of integrated circuits is increasing in all dimensions, for instance, in 3D stacked chip networks. Amongst other electro-optic solutions, multimode optical interconnects on a silicon interposer promise to enable high throughput for modern hardware platforms in a restricted [...] Read more.
Following Moore’s law, the density of integrated circuits is increasing in all dimensions, for instance, in 3D stacked chip networks. Amongst other electro-optic solutions, multimode optical interconnects on a silicon interposer promise to enable high throughput for modern hardware platforms in a restricted space. Such integrated architectures require confidential communication between multiple chips as a key factor for high-performance infrastructures in the 5G era and beyond. Physical layer security is an approach providing information theoretic security among network participants, exploiting the uniqueness of the data channel. We experimentally project orthogonal and non-orthogonal symbols through 380 μm long multimode on-chip interconnects by wavefront shaping. These interconnects are investigated for their uniqueness by repeating these experiments across multiple channels and samples. We show that the detected speckle patterns resulting from modal crosstalk can be recognized by training a deep neural network, which is used to transform these patterns into a corresponding readable output. The results showcase the feasibility of applying physical layer security to multimode interconnects on silicon interposers for confidential optical 3D chip networks. Full article
(This article belongs to the Special Issue Emerging Multimode Fiber Technologies for Communications and Beyond)
Show Figures

Figure 1

20 pages, 2698 KB  
Article
Automated Signal Quality Assessment of Single-Lead ECG Recordings for Early Detection of Silent Atrial Fibrillation
by Markus Lueken, Michael Gramlich, Steffen Leonhardt, Nikolaus Marx and Matthias D. Zink
Sensors 2023, 23(12), 5618; https://doi.org/10.3390/s23125618 - 15 Jun 2023
Cited by 16 | Viewed by 3560
Abstract
Atrial fibrillation (AF) is an arrhythmic cardiac disorder with a high and increasing prevalence in aging societies, which is associated with a risk for stroke and heart failure. However, early detection of onset AF can become cumbersome since it often manifests in an [...] Read more.
Atrial fibrillation (AF) is an arrhythmic cardiac disorder with a high and increasing prevalence in aging societies, which is associated with a risk for stroke and heart failure. However, early detection of onset AF can become cumbersome since it often manifests in an asymptomatic and paroxysmal nature, also known as silent AF. Large-scale screenings can help identifying silent AF and allow for early treatment to prevent more severe implications. In this work, we present a machine learning-based algorithm for assessing signal quality of hand-held diagnostic ECG devices to prevent misclassification due to insufficient signal quality. A large-scale community pharmacy-based screening study was conducted on 7295 older subjects to investigate the performance of a single-lead ECG device to detect silent AF. Classification (normal sinus rhythm or AF) of the ECG recordings was initially performed automatically by an internal on-chip algorithm. The signal quality of each recording was assessed by clinical experts and used as a reference for the training process. Signal processing stages were explicitly adapted to the individual electrode characteristics of the ECG device since its recordings differ from conventional ECG tracings. With respect to the clinical expert ratings, the artificial intelligence-based signal quality assessment (AISQA) index yielded strong correlation of 0.75 during validation and high correlation of 0.60 during testing. Our results suggest that large-scale screenings of older subjects would greatly benefit from an automated signal quality assessment to repeat measurements if applicable, suggest additional human overread and reduce automated misclassifications. Full article
(This article belongs to the Special Issue ECG Signal Processing Techniques and Applications)
Show Figures

Figure 1

20 pages, 4320 KB  
Article
Research and Implementation of High Computational Power for Training and Inference of Convolutional Neural Networks
by Tianling Li, Bin He and Yangyang Zheng
Appl. Sci. 2023, 13(2), 1003; https://doi.org/10.3390/app13021003 - 11 Jan 2023
Cited by 10 | Viewed by 3884
Abstract
Algorithms and computing power have consistently been the two driving forces behind the development of artificial intelligence. The computational power of a platform has a significant impact on the implementation cost, performance, power consumption, and flexibility of an algorithm. Currently, AI algorithmic models [...] Read more.
Algorithms and computing power have consistently been the two driving forces behind the development of artificial intelligence. The computational power of a platform has a significant impact on the implementation cost, performance, power consumption, and flexibility of an algorithm. Currently, AI algorithmic models are mainly trained using high-performance GPU platforms, and their inferencing can be implemented using GPU, CPU, and FPGA. On the one hand, due to its high-power consumption and extreme cost, GPU is not suitable for power and cost-sensitive application scenarios. On the other hand, because the training and inference of the neural network use different computing power platforms, the data of the neural network model needs to be transmitted on platforms with varying computing power, which affects the data processing capability of the network and affects the real-time performance and flexibility of the neural network. This paper focuses on the high computing power implementation method of the integration of convolutional neural network (CNN) training and inference in artificial intelligence and proposes to implement the process of CNN training and inference by using high-performance heterogeneous architecture (HA) devices with field programmable gate array (FPGA) as the core. Numerous repeated multiplication and accumulation operations in the process of CNN training and inference have been implemented by programmable logic (PL), which significantly improves the speed of CNN training and inference and reduces the overall power consumption, thus providing a modern implementation method for neural networks in an application field that is sensitive to power, cost, and footprint. First, based on the data stream containing the training and inference process of the CNN, this study investigates methods to merge the training and inference data streams. Secondly, high-level language was used to describe the merged data stream structure, and a high-level description was converted to a hardware register transfer level (RTL) description by the high-level synthesis tool (HLS), and the intellectual property (IP) core was generated. The PS was used for overall control, data preprocessing, and result analysis, and it was then connected to the IP core via an on-chip AXI bus interface in the HA device. Finally, the integrated implementation method was tested and validated with the Xilinx HA device, and the MNIST handwritten digit validation set was used in the tests. According to the test results, compared with using a GPU, the model trained in the HA device PL achieves the same convergence rate with only 78.04 percent training time. With a processing time of only 3.31 ms and 0.65 ms for a single frame image, an average recognition accuracy of 95.697%, and an overall power consumption of only 3.22 W @ 100 MHz, the two convolutional neural networks mentioned in this paper are suitable for deployment in lightweight domains with limited power consumption. Full article
(This article belongs to the Special Issue Intelligent Computing and Remote Sensing)
Show Figures

Figure 1

Back to TopTop