Due to scheduled maintenance work on our servers, there may be short service disruptions on this website between 11:00 and 12:00 CEST on March 28th.
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (127)

Search Parameters:
Keywords = DNN accelerators

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
15 pages, 1838 KB  
Article
Rational Design of High-Performance Viscosifying Polymers in Confined Systems via a Machine-Learning-Accelerated Multiscale Framework for Enhanced Hydrocarbon Recovery
by Arturo Alvarez-Cruz, Estela Mayoral-Villa, Alfonso Ramón García-Márquez and Jaime Klapp
Fluids 2026, 11(4), 86; https://doi.org/10.3390/fluids11040086 (registering DOI) - 26 Mar 2026
Abstract
Rational design of high-performance viscosifying polymers is critical for enhancing supercritical CO2 flooding efficiency in enhanced oil recovery (EOR). Traditional experimental and simulation approaches are limited in exploring the vast design space of polymer architecture, flexibility, and intermolecular interactions. This work presents [...] Read more.
Rational design of high-performance viscosifying polymers is critical for enhancing supercritical CO2 flooding efficiency in enhanced oil recovery (EOR). Traditional experimental and simulation approaches are limited in exploring the vast design space of polymer architecture, flexibility, and intermolecular interactions. This work presents an integrated machine learning (ML) and mesoscopic simulation framework using Dissipative Particle Dynamics (DPD) to accelerate the development of tailored polymeric thickeners. We systematically investigate synergistic effects of linear and branched polymer blends on solvent viscosity under Poiseuille flow, representative of flow in micro-fractures and pore throats. Key molecular descriptors are varied to generate a comprehensive rheological database. This data trains a deep neural network (DNN) surrogate model linking molecular parameters to macroscopic viscosity. The DNN is coupled with gradient ascent optimization for inverse design, enabling rapid virtual screening of thousands of formulations. A focused case study demonstrates that the star-like architectures with associative cores and semi-flexible backbones outperform linear analogs for supercritical CO2 viscosity enhancement. The optimal candidate—a four-arm star polymer with linear side chains—was validated by DPD simulation. This multiscale “simulation-to-surrogate” methodology bridges molecular design with continuum-scale flow behavior, offering a transformative tool for formulating cost-effective, efficient, and sustainable next-generation EOR chemicals. Full article
(This article belongs to the Special Issue Pipe Flow: Research and Applications, 2nd Edition)
Show Figures

Figure 1

16 pages, 686 KB  
Article
Design of Network Traffic Analysis Models Based on Deep Neural Networks
by Jiantao Cui and Yixiang Zhao
Future Internet 2026, 18(3), 152; https://doi.org/10.3390/fi18030152 - 16 Mar 2026
Viewed by 160
Abstract
The proliferation of next-generation Internet infrastructures and the Internet of Things (IoT) has exponentially increased network traffic complexity. While deep learning (DL)-based intrusion detection systems (IDSs) show immense potential, they persistently suffer from challenges including high computational overhead, vanishing gradients in deep architectures, [...] Read more.
The proliferation of next-generation Internet infrastructures and the Internet of Things (IoT) has exponentially increased network traffic complexity. While deep learning (DL)-based intrusion detection systems (IDSs) show immense potential, they persistently suffer from challenges including high computational overhead, vanishing gradients in deep architectures, and acute sensitivity to noise. Consequently, these issues impede their real-time deployment in resource-constrained edge computing environments. To overcome these limitations, we propose a novel, lightweight, and robust intrusion detection framework based on deep neural networks (DNNs). Initially, we employ a Robust Scaler-based statistical preprocessing strategy to supersede traditional Z-score standardization, effectively mitigating the adverse impacts of outliers and burst traffic noise. Subsequently, we design an advanced architecture that integrates self-normalizing residual blocks with a channel attention mechanism. Leveraging compressed hidden layers alongside the Scaled Exponential Linear Unit (SELU) activation function, this architecture not only mitigates the vanishing gradient problem but also amplifies critical traffic features. Concurrently, it achieves a substantial reduction in both parameter count and inference latency. Furthermore, we introduce a cosine annealing strategy to dynamically adjust the learning rate during training, thereby facilitating the model’s escape from local optima and accelerating convergence. Extensive experiments on standard benchmark datasets demonstrate that our proposed framework achieves superior detection accuracy while maintaining exceptional computational efficiency compared to state-of-the-art baselines. Full article
(This article belongs to the Section Cybersecurity)
Show Figures

Figure 1

31 pages, 4366 KB  
Article
Distributed Multi-Vehicle Cooperative Trajectory Planning and Control for Ramp Merging and Diverging Based on Deep Neural Networks and MPC
by Linhua Nie, Tingyang Zhang, Yunqing Zhao, Yaqiu Li, Haoran Li and Junru Yang
Machines 2026, 14(3), 262; https://doi.org/10.3390/machines14030262 - 25 Feb 2026
Viewed by 362
Abstract
With the deep integration of the modern automotive industry and artificial intelligence technologies, connected and automated vehicles (CAVs) have emerged as a key breakthrough for improving traffic safety and operational efficiency. This study proposes a distributed multi-vehicle cooperative trajectory planning and control framework [...] Read more.
With the deep integration of the modern automotive industry and artificial intelligence technologies, connected and automated vehicles (CAVs) have emerged as a key breakthrough for improving traffic safety and operational efficiency. This study proposes a distributed multi-vehicle cooperative trajectory planning and control framework for ramp merging and diverging scenarios, integrating Deep Neural Networks (DNNs) with Model Predictive Control (MPC). The methodology consists of three key components: First, a distributed cooperative architecture based on dynamic topology is constructed to effectively reduce communication loads; second, a feature point-based Cubic Bézier Curve trajectory generation method is proposed, enabling flexible path planning with reduced reliance on high-precision maps; finally, a DNN-accelerated MPC solving strategy (NN-MPC) is designed. This strategy employs an offline-trained deep neural network to approximate the online optimization process, supplemented by a terminal Safety Check mechanism and a dynamic surrounding vehicle selection algorithm. Experimental results demonstrate that the proposed method successfully reproduces the planning capability of offline high-precision MPC in ramp merging and diverging scenarios while reducing computation time to the millisecond level. It effectively overcomes the myopic decision-making problem of traditional real-time algorithms, achieving smoother conflict resolution and higher traffic efficiency. Notably, quantitative validation confirms that this cooperative framework achieves an approximate 30% reduction in average travel delay compared to the non-cooperative baseline. This study confirms the engineering advantages of the hybrid architecture under dynamic high-density traffic flows, significantly enhancing the system’s real-time response capability while balancing the safety and riding comfort of cooperative driving. Full article
(This article belongs to the Special Issue Control and Path Planning for Autonomous Vehicles)
Show Figures

Figure 1

19 pages, 662 KB  
Article
FPGA Programmable Logic Block Architecture with High-Density MAC for Deep Learning Inference
by Yanlin Wang, Lijiang Gao and Haigang Yang
Electronics 2026, 15(4), 801; https://doi.org/10.3390/electronics15040801 - 13 Feb 2026
Viewed by 381
Abstract
Compared to half- or single-precision floating-point, reducing the precision of Deep Neural Network (DNN) inference accelerators can yield significant efficiency gains with little to no accuracy degradation by enabling more multiplication operations per unit area. The variable precision capabilities of FPGAs are extremely [...] Read more.
Compared to half- or single-precision floating-point, reducing the precision of Deep Neural Network (DNN) inference accelerators can yield significant efficiency gains with little to no accuracy degradation by enabling more multiplication operations per unit area. The variable precision capabilities of FPGAs are extremely valuable, as a wide range of precisions fall on the Pareto-optimal curve of hardware efficiency versus accuracy, with no single precision dominating. We propose seven variants across three types of logical block designs to improve the area efficiency of multiply accumulate (MAC) implemented in soft structures. Ultimately, we use COFFE and VTR tools to fully evaluate these enhancements. The 2-bit adder BLE (ADD2_BLE) architecture achieves a 7.3% area optimization with only a 1.7% increase in tile area by improving the fracturability of LUTs in the baseline BLE and adding an additional 1-bit adder. However, this comes at the expense of reduced speed. The 9-bit Compact Multiplier (CMUL) architecture based on ADD2_BLE achieved the greatest optimization among the six variants using the Compact Multiplier (CMUL). On average, it reduces the DAP result by up to 72%. Nonetheless, it results in a 13% increase in logic tile area for universal benchmarks that do not use multiplication. Full article
(This article belongs to the Special Issue FPGA-Based Accelerators for Deep Neural Networks)
Show Figures

Figure 1

25 pages, 1705 KB  
Article
A Carbon-Efficient Framework for Deep Learning Workloads on GPU Clusters
by Dong-Ki Kang and Yong-Hyuk Moon
Appl. Sci. 2026, 16(2), 633; https://doi.org/10.3390/app16020633 - 7 Jan 2026
Viewed by 471
Abstract
The explosive growth of artificial intelligence (AI) services has led to massive scaling of GPU computing clusters, causing sharp rises in power consumption and carbon emissions. Although hardware-level accelerator enhancements and deep neural network (DNN) model compression techniques can improve power efficiency, they [...] Read more.
The explosive growth of artificial intelligence (AI) services has led to massive scaling of GPU computing clusters, causing sharp rises in power consumption and carbon emissions. Although hardware-level accelerator enhancements and deep neural network (DNN) model compression techniques can improve power efficiency, they often encounter deployment barriers and risks of accuracy loss in practice. To address these issues without altering hardware or model architectures, we propose a novel Carbon-Aware Resource Management (CA-RM) framework for GPU clusters. In order to minimize the carbon emission, the CA-RM framework dynamically adjusts energy usage by combining real-time GPU core frequency scaling with intelligent workload placement, aligning computation with the temporal availability of renewable generation. We introduce a new metric, performance-per-carbon (PPC), and develop three optimization formulations: carbon-constrained, performance-constrained, and PPC-driven objectives that simultaneously respect DNN model training deadlines, inference latency requirements, and carbon emission budgets. Through extensive simulations using real-world renewable energy traces and profiling data collected from NVIDIA RTX4090 GPU running representative DNN workloads, we show that the CA-RM framework substantially reduces carbon emission while satisfying service-level agreement (SLA) targets across a wide range of workload characteristics. Through experimental evaluation, we verify that the proposed CA-RM framework achieves approximately 35% carbon reduction on average, compared to competing approaches, while still ensuring acceptable processing performance across diverse workload behaviors. Full article
(This article belongs to the Section Green Sustainable Science and Technology)
Show Figures

Figure 1

29 pages, 4094 KB  
Article
Hybrid LSTM–DNN Architecture with Low-Discrepancy Hypercube Sampling for Adaptive Forecasting and Data Reliability Control in Metallurgical Information-Control Systems
by Jasur Sevinov, Barnokhon Temerbekova, Gulnora Bekimbetova, Ulugbek Mamanazarov and Bakhodir Bekimbetov
Processes 2026, 14(1), 147; https://doi.org/10.3390/pr14010147 - 1 Jan 2026
Cited by 1 | Viewed by 580
Abstract
The study focuses on the design of an intelligent information-control system (ICS) for metallurgical production, aimed at robust forecasting of technological parameters and automatic self-adaptation under noise, anomalies, and data drift. The proposed architecture integrates a hybrid LSTM–DNN model with low-discrepancy hypercube sampling [...] Read more.
The study focuses on the design of an intelligent information-control system (ICS) for metallurgical production, aimed at robust forecasting of technological parameters and automatic self-adaptation under noise, anomalies, and data drift. The proposed architecture integrates a hybrid LSTM–DNN model with low-discrepancy hypercube sampling using Sobol and Halton sequences to ensure uniform coverage of operating conditions and the hyperparameter space. The processing pipeline includes preprocessing and temporal synchronization of measurements, a parameter identification module, anomaly detection and correction using an ε-threshold scheme, and a decision-making and control loop. In simulation scenarios modeling the dynamics of temperature, pressure, level, and flow (1 min sampling interval, injected anomalies, and measurement noise), the hybrid model outperformed GRU and CNN architectures: a determination coefficient of R2 > 0.92 was achieved for key indicators, MAE and RMSE improved by 7–15%, and the proportion of unreliable measurements after correction decreased to <2% (compared with 8–12% without correction). The experiments also demonstrated accelerated adaptation during regime changes. The scientific novelty lies in combining recurrent memory and deep nonlinear approximation with deterministic experimental design in the hypercube of states and hyperparameters, enabling reproducible self-adaptation of the ICS and increased noise robustness without upgrading the measurement hardware. Modern metallurgical information-control systems operate under non-stationary regimes and limited measurement reliability, which reduces the robustness of conventional forecasting and decision-support approaches. To address this issue, a hybrid LSTM–DNN architecture combined with low-discrepancy hypercube probing and anomaly-aware data correction is proposed. The proposed approach is distinguished by the integration of hybrid neural forecasting, deterministic hypercube-based adaptation, and anomaly-aware data correction within a unified information-control loop for non-stationary industrial processes. Full article
Show Figures

Figure 1

23 pages, 2239 KB  
Article
SparseDroop: Hardware–Software Co-Design for Mitigating Voltage Droop in DNN Accelerators
by Arnab Raha, Shamik Kundu, Arghadip Das, Soumendu Kumar Ghosh and Deepak A. Mathaikutty
J. Low Power Electron. Appl. 2026, 16(1), 2; https://doi.org/10.3390/jlpea16010002 - 23 Dec 2025
Viewed by 911
Abstract
Modern deep neural network (DNN) accelerators must sustain high throughput while avoiding performance degradation from supply voltage (VDD) droop, which occurs when large arrays of multiply–accumulate (MAC) units switch concurrently and induce high peak current (ICCmax) [...] Read more.
Modern deep neural network (DNN) accelerators must sustain high throughput while avoiding performance degradation from supply voltage (VDD) droop, which occurs when large arrays of multiply–accumulate (MAC) units switch concurrently and induce high peak current (ICCmax) transients on the power delivery network (PDN). In this work, we focus on ASIC-class DNN accelerators with tightly synchronized MAC arrays rather than FPGA-based implementations, where such cycle-aligned switching is most pronounced. Conventional guardbanding and reactive countermeasures (e.g., throttling, clock stretching, or emergency DVFS) either waste energy or incur non-trivial throughput penalties. We propose SparseDroop, a unified hardware-conscious framework that proactively shapes instantaneous current demand to mitigate droop without reducing sustained computing rate. SparseDroop comprises two complementary techniques. (1) SparseStagger, a lightweight hardware-friendly droop scheduler that exploits the inherent unstructured sparsity already present in the weights and activations—it does not introduce any additional sparsification. SparseStagger dynamically inspects the zero patterns mapped to each processing element (PE) column and staggers MAC start times within a column so that high-activity bursts are temporally interleaved. This fine-grain reordering smooths ICC trajectories, lowers the probability and depth of transient VDD dips, and preserves cycle-level alignment at tile/row boundaries—thereby maintaining no throughput loss and negligible control overhead. (2) SparseBlock, an architecture-aware, block-wise-structured sparsity induction method that intentionally introduces additional sparsity aligned with the accelerator’s dataflow. By co-designing block layout with the dataflow, SparseBlock reduces the likelihood that all PEs in a column become simultaneously active, directly constraining ICCmax and peak dynamic power on the PDN. Together, SparseStagger’s opportunistic staggering (from existing unstructured weight zeros) and SparseBlock’s structured, layout-aware sparsity induction (added to prevent peak-power excursions) deliver a scalable, low-overhead solution that improves voltage stability, energy efficiency, and robustness, integrates cleanly with the accelerator dataflow, and preserves model accuracy with modest retraining or fine-tuning. Full article
Show Figures

Figure 1

27 pages, 6107 KB  
Article
A High-Dimensional Parameter Identification Method for Pipelines Based on Static Strain and DNN Surrogate Models to Accelerate Langevin Bayesian Inference
by Li Chen, Zhifeng Wu, Yanwen Liu and Zhiyong Li
Buildings 2025, 15(23), 4254; https://doi.org/10.3390/buildings15234254 - 25 Nov 2025
Viewed by 484
Abstract
This study develops a Bayesian parameter identification framework that uses static strain measurements to update pipeline structural models under complex boundary conditions. Because strain responses are directly linked to internal stress states and are much less sensitive to boundary condition uncertainty, the proposed [...] Read more.
This study develops a Bayesian parameter identification framework that uses static strain measurements to update pipeline structural models under complex boundary conditions. Because strain responses are directly linked to internal stress states and are much less sensitive to boundary condition uncertainty, the proposed approach retains high identification accuracy where conventional methods based on static displacements or modal data are difficult to apply. The method employs the Metropolis-Adjusted Langevin Algorithm (MALA), a gradient-based MCMC scheme with a Metropolis correction that ensures asymptotically exact sampling, to handle the high dimensional parameter space, and integrates a deep neural network (DNN) surrogate model to accelerate sampling. A numerical example demonstrates the efficiency of MALA in high dimensional settings by exploiting the gradient of the log posterior to guide proposals, successfully identifying the stiffness of 30 pipeline segments and showing that combining axial and hoop direction strain data yields more accurate estimates. An experimental case on a real pipeline corroborates the effectiveness of the approach, reducing the mean absolute error (MAE) of predicted strains from 27.3% to 4.2% after updating. Overall, by coupling MALA with a DNN surrogate, the study establishes a static-strain-based Bayesian inference framework for high dimensional parameter identification in pipelines with complex boundaries, providing a practical route for engineering applications and supporting reliable structural safety assessment. Full article
(This article belongs to the Section Building Structures)
Show Figures

Figure 1

16 pages, 17815 KB  
Article
Learning to Equalize for Single-Carrier Underwater Acoustic Communications
by Hao Zhao, Kexing Yao, Dan Xiang, Qisen Wang, Yankun Chen and Yan Wang
J. Mar. Sci. Eng. 2025, 13(11), 2209; https://doi.org/10.3390/jmse13112209 - 20 Nov 2025
Viewed by 674
Abstract
Learning-based equalizers for multicarrier communication systems have been widely studied over underwater acoustic (UWA) channels. In this article, a learning-based equalizer is utilized for single-carrier (SC) underwater acoustic communications. A comprehensive comparison is made between existing deep learning (DL)-based approaches and a classical [...] Read more.
Learning-based equalizers for multicarrier communication systems have been widely studied over underwater acoustic (UWA) channels. In this article, a learning-based equalizer is utilized for single-carrier (SC) underwater acoustic communications. A comprehensive comparison is made between existing deep learning (DL)-based approaches and a classical equalizer designed with adaptive filtering principles. It motivates the design of equalization for SC communications over underwater acoustic channels. To overcome distortion over the UWA channel, we propose a sliding deep learning-based equalizer that uses a sliding nonlinear network for equalization rather than a single-layer linear method. Moreover, to accelerate convergence during training, we proposed a preprocessing-based training phase. To mitigate the impact of time-varying channels, we additionally propose a meta-learning-enhanced adaptive filter algorithm for online adaptive equalization, named Meta-DNN. Based on the proposed DL equalizer, we leverage the pilot and data relationship to perform online transfer to achieve better BER performance. Moreover, to make this work more convincing, we test bit-error-rate (BER) performance across reproducible, realistic multi-scenario channels. Full article
Show Figures

Figure 1

11 pages, 3760 KB  
Article
Enhanced Optical Wireless Communications via Deep Neural Network Assisted Pre-Equalization for Faster-than-Nyquist Transmission
by Xindong Yue, Xingyu Zhang, Zhaoheng Wu, Yue Zhang, Huiqin Wang and Minghua Cao
Photonics 2025, 12(11), 1112; https://doi.org/10.3390/photonics12111112 - 11 Nov 2025
Viewed by 544
Abstract
The Faster-than-Nyquist (FTN) technology is widely used in optical wireless communication (OWC) systems to improve data rates and spectrum efficiency. However, it introduces inter-symbol interference (ISI), which can affect communication reliability. To address this issue, we propose a pre-equalization algorithm based on a [...] Read more.
The Faster-than-Nyquist (FTN) technology is widely used in optical wireless communication (OWC) systems to improve data rates and spectrum efficiency. However, it introduces inter-symbol interference (ISI), which can affect communication reliability. To address this issue, we propose a pre-equalization algorithm based on a deep neural network (DNN). The performance analysis primarily focuses on the bit-error-rate (BER) under a Gamma-Gamma atmospheric turbulence channel with varying acceleration factors. Simulation results show that our scheme effectively reduces the degradation in BER caused by ISI. Additionally, we observe an inverse relationship between the BER performance and the atmospheric refractive index constants as well as transmission distance, while a direct proportionality exists with respect to the filter roll-off factor and laser wavelength. Furthermore, comparing with conventional minimum mean square error (MMSE) and zero-forcing (ZF) algorithms highlights the superior performance of our proposal. Full article
Show Figures

Figure 1

22 pages, 1940 KB  
Article
A Comparative Study of Lightweight, Sparse Autoencoder-Based Classifiers for Edge Network Devices: An Efficiency Analysis of Feed-Forward and Deep Neural Networks
by Mi Young Jo and Hyun Jung Kim
Sensors 2025, 25(20), 6439; https://doi.org/10.3390/s25206439 - 17 Oct 2025
Cited by 2 | Viewed by 1658
Abstract
This study proposes a lightweight classification framework for anomaly traffic detection in edge computing environments. Thirteen packet- and flow-level features extracted from the CIC-IDS2017 dataset were compressed into 4-dimensional latent vectors using a Sparse Autoencoder (SAE). Two classifiers were compared under the same [...] Read more.
This study proposes a lightweight classification framework for anomaly traffic detection in edge computing environments. Thirteen packet- and flow-level features extracted from the CIC-IDS2017 dataset were compressed into 4-dimensional latent vectors using a Sparse Autoencoder (SAE). Two classifiers were compared under the same pipeline: a Feed-Forward network (SAE-FF) and a Deep Neural Network (SAE-DNN). To ensure generalization, all experiments were conducted with 5-fold cross-validation. Performance evaluation revealed that SAE-DNN achieved superior classification performance, with an average accuracy of 99.33% and an AUC of 0.9993. The SAE-FF model, although exhibiting lower performance (average accuracy of 93.66% and AUC of 0.9758), maintained stable outcomes and offered significantly lower computational complexity (~40 FLOPs) compared with SAE-DNN (~8960 FLOPs). Device-level analysis confirmed that SAE-FF was the most efficient option for resource-constrained platforms such as Raspberry Pi 4, whereas SAE-DNN achieved real-time inference capability on the Coral Dev Board by leveraging Edge TPU acceleration. To quantify this trade-off between accuracy and efficiency, we introduce the Edge Performance Efficiency Score (EPES), a composite metric that integrates accuracy, latency, memory usage, FLOPs, and CPU performance into a single score. The proposed EPES provides a practical and comprehensive benchmark for balancing accuracy and efficiency and supporting device-specific model selection in practical edge deployments. These findings highlight the importance of system-aware evaluation and demonstrate that EPES can serve as a valuable guideline for efficient anomaly traffic classification in resource-limited environments. Full article
(This article belongs to the Section Sensor Networks)
Show Figures

Figure 1

20 pages, 1863 KB  
Article
A Novel Analog-Computing-in-Memory Architecture with Scalable Multi-Bit MAC Operations and Flexible Weight Organization for DNN Acceleration
by Ahmet Unutulmaz
Electronics 2025, 14(20), 4030; https://doi.org/10.3390/electronics14204030 - 14 Oct 2025
Viewed by 1460
Abstract
Deep neural networks (DNNs) require efficient hardware accelerators due to the high cost of vector–matrix multiplication operations. Computing-in-memory (CIM) architectures address this challenge by performing computations directly within memory arrays, reducing data movement and improving energy efficiency. This paper introduces a novel analog-domain [...] Read more.
Deep neural networks (DNNs) require efficient hardware accelerators due to the high cost of vector–matrix multiplication operations. Computing-in-memory (CIM) architectures address this challenge by performing computations directly within memory arrays, reducing data movement and improving energy efficiency. This paper introduces a novel analog-domain CIM architecture that enables flexible organization of weights across both rows and columns of the CIM array. A pipelining scheme is also proposed to decouple the multiply-and-accumulate and analog-to-digital conversion operations, thereby enhancing throughput. The proposed architecture is compared with existing approaches in terms of latency, area, energy consumption, and utilization. The comparison emphasizes architectural principles while deliberately avoiding implementation-specific details. Full article
Show Figures

Figure 1

21 pages, 5465 KB  
Article
Surrogate Modelling and Simulation Approaches for Renal Artery Haemodynamics: Balancing Symmetry in Computational Cost and Accuracy
by Dávid Csonka, Tamás Storcz, András Kaszás, Árpád Forberger and Géza Várady
Symmetry 2025, 17(10), 1681; https://doi.org/10.3390/sym17101681 - 8 Oct 2025
Viewed by 1065
Abstract
Finite element analysis (FEA)-based computational fluid dynamics (CFD) simulations are essential in biomedical engineering for studying haemodynamics, yet their high computational cost limits large-scale parametric studies. This paper presents a comparative analysis of FEA and surrogate modelling techniques applied to renal artery haemodynamics. [...] Read more.
Finite element analysis (FEA)-based computational fluid dynamics (CFD) simulations are essential in biomedical engineering for studying haemodynamics, yet their high computational cost limits large-scale parametric studies. This paper presents a comparative analysis of FEA and surrogate modelling techniques applied to renal artery haemodynamics. The aortic–renal bifurcation strongly influences renal perfusion, affecting conditions such as hypertension, infarction, and transplant rejection. This study evaluates GPU-accelerated voxel simulations (Ansys 2024 R2 Discovery), 2D and 3D FEA simulations (COMSOL Multiphysics 6.3), finite volume CFD (Ansys 2020 R2 Fluent), and deep neural networks (DNNs) as surrogate models. Branching angles and blood pressure were systematically varied, and their effects on velocity, pressure, and turbulent kinetic energy were assessed in a time-dependent framework. Fluent provided accurate baseline results, while COMSOL 2D gave sufficient accuracy with much lower runtimes. In contrast, COMSOL 3D required over 160 times longer, making it prohibitive. Surrogate models trained on 6500 or more FEA-derived samples achieved high predictive accuracy (R2 > 0.98 for velocity and pressure), balancing training cost and result quality. Cost analysis showed surrogate models become advantageous after 76–93 simulations. Symmetry is expressed in balancing model fidelity and computational efficiency, providing a resource-effective methodology with broad potential in vascular applications. Full article
Show Figures

Figure 1

14 pages, 769 KB  
Article
A Novel Low-Power Ternary 6T SRAM Design Using XNOR-Based CIM Architecture in Advanced FinFET Technologies
by Adnan A. Patel, Sohan Sai Dasaraju, Achyuth Gundrapally and Kyuwon Ken Choi
Electronics 2025, 14(18), 3737; https://doi.org/10.3390/electronics14183737 - 22 Sep 2025
Viewed by 1237
Abstract
The increasing demand for high-performance and low-power hardware in artificial intelligence (AI) applications—such as speech recognition, facial recognition, and object detection—has driven the exploration of advanced memory designs. Convolutional neural networks (CNNs) and deep neural networks (DNNs) require intensive computational resources, leading to [...] Read more.
The increasing demand for high-performance and low-power hardware in artificial intelligence (AI) applications—such as speech recognition, facial recognition, and object detection—has driven the exploration of advanced memory designs. Convolutional neural networks (CNNs) and deep neural networks (DNNs) require intensive computational resources, leading to significant challenges in terms of memory access time and power consumption. Compute-in-Memory (CIM) architectures have emerged as an alternative by executing computations directly within memory arrays, thereby reducing the expensive data transfer between memory and processor units. In this work, we present a 6T SRAM-based CIM architecture implemented using FinFET technology, aiming to reduce both power consumption and access delay. We explore and simulate three different SRAM cell structures—PLNA (P-Latch N-Access), NLPA (N-Latch P-Access), and SE (Single-Ended)—to assess their suitability for CIM operations. Compared to a reference 10T XNOR-based CIM design, our results show that the proposed structures achieve an average power consumption approximately 70% lower, along with significant delay reduction, without compromising functional integrity. A comparative analysis is presented to highlight the trade-offs between the three configurations, providing insights into their potential applications in low-power AI accelerator design. Full article
Show Figures

Figure 1

25 pages, 4235 KB  
Article
A Performance Study of Deep Neural Network Representations of Interpretable ML on Edge Devices with AI Accelerators
by Julian Schauer, Payman Goodarzi, Jannis Morsch and Andreas Schütze
Sensors 2025, 25(18), 5681; https://doi.org/10.3390/s25185681 - 11 Sep 2025
Cited by 3 | Viewed by 2323
Abstract
With the rising adoption of machine learning (ML) and deep learning (DL) applications, the demand for deploying these algorithms closer to sensors has grown significantly, particularly in sensor-driven use cases such as predictive maintenance (PM) and condition monitoring (CM). This study investigated a [...] Read more.
With the rising adoption of machine learning (ML) and deep learning (DL) applications, the demand for deploying these algorithms closer to sensors has grown significantly, particularly in sensor-driven use cases such as predictive maintenance (PM) and condition monitoring (CM). This study investigated a novel application-oriented approach to representing interpretable ML inference as deep neural networks (DNNs) regarding the latency and energy efficiency on the edge, to tackle the problem of inefficient, high-effort, and uninterpretable-implementation ML algorithms. For this purpose, the interpretable deep neural network representation (IDNNRep) was integrated into an open-source interpretable ML toolbox to demonstrate the inference time and energy efficiency improvements. The goal of this work was to enable the utilization of generic artificial intelligence (AI) accelerators for interpretable ML algorithms to achieve efficient inference on edge hardware in smart sensor applications. This novel approach was applied to one regression and one classification task from the field of PM and validated by implementing the inference on the neural processing unit (NPU) of the QXSP-ML81 Single-Board Computer and the tensor processing unit (TPU) of the Google Coral. Different quantization levels of the implementation were tested against common Python and C++ implementations. The novel implementation reduced the inference time by up to 80% and the mean energy consumption by up to 76% at the lowest precision with only a 0.4% loss of accuracy compared to the C++ implementation. With the successful utilization of generic AI accelerators, the performance was further improved with a 94% reduction for both the inference time and the mean energy consumption. Full article
(This article belongs to the Section Intelligent Sensors)
Show Figures

Figure 1

Back to TopTop