Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (126)

Search Parameters:
Keywords = CPU/GPU architectures

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
31 pages, 741 KiB  
Article
Inspiring from Galaxies to Green AI in Earth: Benchmarking Energy-Efficient Models for Galaxy Morphology Classification
by Vasileios Alevizos, Emmanouil V. Gkouvrikos, Ilias Georgousis, Sotiria Karipidou and George A. Papakostas
Algorithms 2025, 18(7), 399; https://doi.org/10.3390/a18070399 - 28 Jun 2025
Viewed by 259
Abstract
Recent advancements in space exploration have significantly increased the volume of astronomical data, heightening the demand for efficient analytical methods. Concurrently, the considerable energy consumption of machine learning (ML) has fostered the emergence of Green AI, emphasizing sustainable, energy-efficient computational practices. We introduce [...] Read more.
Recent advancements in space exploration have significantly increased the volume of astronomical data, heightening the demand for efficient analytical methods. Concurrently, the considerable energy consumption of machine learning (ML) has fostered the emergence of Green AI, emphasizing sustainable, energy-efficient computational practices. We introduce the first large-scale Green AI benchmark for galaxy morphology classification, evaluating over 30 machine learning architectures (classical, ensemble, deep, and hybrid) on CPU and GPU platforms using a balanced subset of the Galaxy Zoo dataset. Beyond traditional metrics (precision, recall, and F1-score), we quantify inference latency, energy consumption, and carbon-equivalent emissions to derive an integrated EcoScore that captures the trade-off between predictive performance and environmental impact. Our results reveal that a GPU-optimized multilayer perceptron achieves state-of-the-art accuracy of 98% while emitting 20× less CO2 than ensemble forests, which—despite comparable accuracy—incur substantially higher energy costs. We demonstrate that hardware–algorithm co-design, model sparsification, and careful hyperparameter tuning can reduce carbon footprints by over 90% with negligible loss in classification quality. These findings provide actionable guidelines for deploying energy-efficient, high-fidelity models in both ground-based data centers and onboard space observatories, paving the way for truly sustainable, large-scale astronomical data analysis. Full article
(This article belongs to the Special Issue Artificial Intelligence in Space Applications)
Show Figures

Figure 1

13 pages, 1506 KiB  
Article
Edge Artificial Intelligence Device in Real-Time Endoscopy for the Classification of Colonic Neoplasms
by Eun Jeong Gong and Chang Seok Bang
Diagnostics 2025, 15(12), 1478; https://doi.org/10.3390/diagnostics15121478 - 10 Jun 2025
Viewed by 484
Abstract
Objective: Although prior research developed an artificial intelligence (AI)-based classification system predicting colorectal lesion histology, the heavy computational demands limited its practical application. Recent advancements in medical AI emphasize decentralized architectures using edge computing devices, enhancing accessibility and real-time performance. This study aims [...] Read more.
Objective: Although prior research developed an artificial intelligence (AI)-based classification system predicting colorectal lesion histology, the heavy computational demands limited its practical application. Recent advancements in medical AI emphasize decentralized architectures using edge computing devices, enhancing accessibility and real-time performance. This study aims to construct and evaluate a deep learning-based colonoscopy image classification model for automatic histologic categorization for real-time use on edge computing hardware. Design: We retrospectively collected 2418 colonoscopic images, subsequently dividing them into training, validation, and internal test datasets at a ratio of 8:1:1. Primary evaluation metrics included (1) classification accuracy across four histologic categories (advanced colorectal cancer, early cancer/high-grade dysplasia, tubular adenoma, and nonneoplasm) and (2) binary classification accuracy differentiating neoplastic from nonneoplastic lesions. Additionally, an external test was conducted using an independent dataset of 269 colonoscopic images. Results: For the internal-test dataset, the model achieved an accuracy of 83.5% (95% confidence interval: 78.8–88.2%) for the four-category classification. In binary classification (neoplasm vs. nonneoplasm), accuracy improved significantly to 94.6% (91.8–97.4%). The external test demonstrated an accuracy of 82.9% (78.4–87.4%) in the four-category task and a notably higher accuracy of 95.5% (93.0–98.0%) for binary classification. The inference speed of lesion classification was notably rapid, ranging from 2–3 ms/frame in GPU mode to 5–6 ms/frame in CPU mode. During real-time colonoscopy examinations, expert endoscopists reported no noticeable latency or interference from AI model integration. Conclusions: This study successfully demonstrates the feasibility of a deep learning-powered colonoscopy image classification system designed for the rapid, real-time histologic categorization of colorectal lesions on edge computing platforms. This study highlights how nature-inspired frameworks can improve the diagnostic capacities of medical AI systems by aligning technological improvements with biomimetic concepts. Full article
(This article belongs to the Special Issue Computer-Aided Diagnosis in Endoscopy 2025)
Show Figures

Figure 1

20 pages, 2328 KiB  
Article
Adaptive Multitask Neural Network for High-Fidelity Wake Flow Modeling of Wind Farms
by Dichang Zhang, Christian Santoni, Zexia Zhang, Dimitris Samaras and Ali Khosronejad
Energies 2025, 18(11), 2897; https://doi.org/10.3390/en18112897 - 31 May 2025
Viewed by 374
Abstract
Wind turbine wake modeling is critical for the design and optimization of wind farms. Traditional methods often struggle with the trade-off between accuracy and computational cost. Recently, data-driven neural networks have emerged as a promising solution, offering both high fidelity and fast inference [...] Read more.
Wind turbine wake modeling is critical for the design and optimization of wind farms. Traditional methods often struggle with the trade-off between accuracy and computational cost. Recently, data-driven neural networks have emerged as a promising solution, offering both high fidelity and fast inference speeds. To advance this field, a novel machine learning model has been developed to predict wind farm mean flow fields through an adaptive multi-fidelity framework. This model extends transfer-learning-based high-dimensional multi-fidelity modeling to scenarios where varying fidelity levels correspond to distinct physical models, rather than merely differing grid resolutions. Built upon a U-Net architecture and incorporating a wind farm parameter encoder, our framework integrates high-fidelity large-eddy simulation (LES) data with a low-fidelity engineering wake model. By directly predicting time-averaged velocity fields from wind farm parameters, our approach eliminates the need for computationally expensive simulations during inference, achieving real-time performance (1.32×105 GPU hours per instance with negligible CPU workload). Comparisons against field-measured data demonstrate that the model accurately approximates high-fidelity LES predictions, even when trained with limited high-fidelity data. Furthermore, its end-to-end extensible design allows full differentiability and seamless integration of multiple fidelity levels, providing a versatile and scalable solution for various downstream tasks, including wind farm control co-design. Full article
Show Figures

Figure 1

15 pages, 2611 KiB  
Article
GPU-Optimized Implementation for Accelerating CSAR Imaging
by Mengting Cui, Ping Li, Zhaohui Bu, Meng Xun and Li Ding
Electronics 2025, 14(10), 2073; https://doi.org/10.3390/electronics14102073 - 20 May 2025
Viewed by 268
Abstract
The direct porting of the Range Migration Algorithm to GPUs for three-dimensional (3D) cylindrical synthetic aperture radar (CSAR) imaging faces difficulties in achieving real-time performance while the architecture and programming models of GPUs significantly differ from CPUs. This paper proposes a GPU-optimized implementation [...] Read more.
The direct porting of the Range Migration Algorithm to GPUs for three-dimensional (3D) cylindrical synthetic aperture radar (CSAR) imaging faces difficulties in achieving real-time performance while the architecture and programming models of GPUs significantly differ from CPUs. This paper proposes a GPU-optimized implementation for accelerating CSAR imaging. The proposed method first exploits the concentric-square-grid (CSG) interpolation to reduce the computational complexity for reconstructing a uniform 2D wave-number domain. Although the CSG method transforms the 2D traversal interpolation into two independent 1D interpolations, the interval search to determine the position intervals for interpolation results in a substantial computational burden. Therefore, binary search is applied to avoid traditional point-to-point matching for efficiency improvement. Additionally, leveraging the partition independence of the grid distribution of CSG, the 360° data are divided into four streams along the diagonal for parallel processing. Furthermore, high-speed shared memory is utilized instead of high-latency global memory in the Hadamard product for the phase compensation stage. The experimental results demonstrate that the proposed method achieves CSAR imaging on a 1440×100×128 dataset in 0.794 s, with an acceleration ratio of 35.09 compared to the CPU implementation and 5.97 compared to the conventional GPU implementation. Full article
Show Figures

Figure 1

16 pages, 2672 KiB  
Article
AI-Powered Convolutional Neural Network Surrogate Modeling for High-Speed Finite Element Analysis in the NPPs Fuel Performance Framework
by Salvatore A. Cancemi, Andrius Ambrutis, Mantas Povilaitis and Rosa Lo Frano
Energies 2025, 18(10), 2557; https://doi.org/10.3390/en18102557 - 15 May 2025
Viewed by 770
Abstract
Convolutional Neural Networks (CNNs) are proposed for use in the nuclear power plant domain as surrogate models to enhance the computational efficiency of finite element analyses in simulating nuclear fuel behavior under varying conditions. The dataset comprises 3D fuel pellet FE models and [...] Read more.
Convolutional Neural Networks (CNNs) are proposed for use in the nuclear power plant domain as surrogate models to enhance the computational efficiency of finite element analyses in simulating nuclear fuel behavior under varying conditions. The dataset comprises 3D fuel pellet FE models and involves 13 input features, such as pressure, Young’s modulus, and temperature. CNNs predict outcomes like displacement, von Mises stress, and creep strain from these inputs, significantly reducing the simulation time from several seconds per analysis to approximately one second. The data are normalized using local and global min–max scaling to maintain consistency across inputs and outputs, facilitating accurate model learning. The CNN architecture includes multiple dense, reshaping, and transpose convolution layers, optimized through a brute-force hyperparameter tuning process and validated using a 5-fold cross-validation approach. The study employs the Adam optimizer, with a significant reduction in computational time highlighted using a GPU, which outperforms traditional CPUs significantly in training speed. The findings suggest that integrating CNN models into nuclear fuel analysis can drastically reduce computational times while maintaining accuracy, making them valuable for real-time monitoring and decision-making within nuclear power plant operations. Full article
Show Figures

Figure 1

14 pages, 2851 KiB  
Article
Asynchronized Jacobi Solver on Heterogeneous Mobile Devices
by Ziqiang Liao, Xiayun Hong, Yao Cheng, Liyan Chen, Xuan Cheng and Juncong Lin
Electronics 2025, 14(9), 1768; https://doi.org/10.3390/electronics14091768 - 27 Apr 2025
Viewed by 303
Abstract
Many vision and graphics applications involve the efficient solving of various linear systems, which has been a popular topic for decades. With mobile devices arising and becoming popularized, designing a high-performance solver tailored for them, to ensure the smooth migration of various applications [...] Read more.
Many vision and graphics applications involve the efficient solving of various linear systems, which has been a popular topic for decades. With mobile devices arising and becoming popularized, designing a high-performance solver tailored for them, to ensure the smooth migration of various applications from PC to mobile devices, has become urgent. However, the unique features of mobile devices present new challenges. Mainstream mobile devices are equipped with so-called heterogeneous multiprocessor systems-on-chips (MPSoCs), which consist of processors with different architectures and performances. Designing algorithms to push the limits of of MPSoCs is attractive yet difficult. Different cores are suitable for different tasks. Further, data sharing among different cores can easily neutralize performance gains. Fortunately, the comparable performance of CPUs and GPUs on MPSoCs make the heterogeneous systems promising, compared to their counterparts on PCs. This paper is devoted to a high-performance mobile linear solver for a sparse system with a tailored asynchronous algorithm, to fully exploit the computing power of heterogeneous processors on mobile devices while alleviating the data-sharing overhead. Comprehensive evaluations are performed, with in-depth discussion to shed light on the future design of other numerical solvers. Full article
(This article belongs to the Special Issue Ubiquitous Computing and Mobile Computing)
Show Figures

Figure 1

15 pages, 4952 KiB  
Article
Novel Research on a Finite-Difference Time-Domain Acceleration Algorithm Based on Distributed Cluster Graphic Process Units
by Xinbo He, Shenggang Mu, Xudong Han and Bing Wei
Appl. Sci. 2025, 15(9), 4834; https://doi.org/10.3390/app15094834 - 27 Apr 2025
Viewed by 352
Abstract
In computational electromagnetics, the finite-difference time-domain (FDTD) method is recognized for its volumetric discretization approach. However, it can be computationally demanding when addressing large-scale electromagnetic problems. This paper introduces a novel approach by incorporating Graphic Process Units (GPUs) into an FDTD algorithm. It [...] Read more.
In computational electromagnetics, the finite-difference time-domain (FDTD) method is recognized for its volumetric discretization approach. However, it can be computationally demanding when addressing large-scale electromagnetic problems. This paper introduces a novel approach by incorporating Graphic Process Units (GPUs) into an FDTD algorithm. It leverages the Compute Unified Device Architecture (CUDA) along with OpenMPI and the NVIDIA Collective Communications Library (NCCL) to establish a parallel scheme for the FDTD algorithm in distributed cluster GPUs. This approach enhances the computational efficiency of the FDTD algorithm by circumventing data relaying by the CPU and the limitations of the PCIe bus. The improved efficiency renders the FDTD algorithm a more practical and efficient solution for real-world electromagnetic problems. Full article
(This article belongs to the Section Electrical, Electronics and Communications Engineering)
Show Figures

Figure 1

17 pages, 4496 KiB  
Article
Accelerated Method for Simulating the Solidification Microstructure of Continuous Casting Billets on GPUs
by Jingjing Wang, Xiaoyu Liu, Yuxin Li and Ruina Mao
Materials 2025, 18(9), 1955; https://doi.org/10.3390/ma18091955 - 25 Apr 2025
Viewed by 293
Abstract
Microstructure simulations of continuous casting billets are vital for understanding solidification mechanisms and optimizing process parameters. However, the commonly used CA (Cellular Automaton) model is limited by grid anisotropy, which affects the accuracy of dendrite morphology simulations. While the DCSA (Decentered Square Algorithm) [...] Read more.
Microstructure simulations of continuous casting billets are vital for understanding solidification mechanisms and optimizing process parameters. However, the commonly used CA (Cellular Automaton) model is limited by grid anisotropy, which affects the accuracy of dendrite morphology simulations. While the DCSA (Decentered Square Algorithm) reduces anisotropy, its high computational cost due to the use of fine grids and dynamic liquid/solid interface tracking hinders large-scale applications. To address this, we propose a high-performance CA-DCSA method on GPUs (Graphic Processing Units). The CA-DCSA algorithm is first refactored and implemented on a CPU–GPU heterogeneous architecture for efficient acceleration. Subsequently, key optimizations, including memory access management and warp divergence reduction, are proposed to enhance GPU utilization. Finally, simulated results are validated through industrial experiments, with relative errors of 2.5% (equiaxed crystal ratio) and 2.3% (average secondary dendrite arm spacing) in 65# steel, and 2.1% and 0.7% in 60# steel. The maximum temperature difference in 65# steel is 1.8 °C. Compared to the serial implementation, the GPU-accelerated method achieves a 1430× higher speed using two GPUs. This work has provided a powerful tool for detailed microstructure observation and process parameter optimization in continuous casting billets. Full article
Show Figures

Figure 1

21 pages, 1316 KiB  
Article
Implementing a Hybrid Quantum Neural Network for Wind Speed Forecasting: Insights from Quantum Simulator Experiences
by Ying-Yi Hong and Jay Bhie D. Santos
Energies 2025, 18(7), 1771; https://doi.org/10.3390/en18071771 - 1 Apr 2025
Viewed by 512
Abstract
The intermittent nature of wind speed poses challenges for its widespread utilization as an electrical power generation source. As the integration of wind energy into the power system increases, accurate wind speed forecasting becomes crucial. The reliable scheduling of wind power generation heavily [...] Read more.
The intermittent nature of wind speed poses challenges for its widespread utilization as an electrical power generation source. As the integration of wind energy into the power system increases, accurate wind speed forecasting becomes crucial. The reliable scheduling of wind power generation heavily relies on precise wind speed forecasts. This paper presents an extended work that focuses on a hybrid model for 24 h ahead wind speed forecasting. The proposed model combines residual Long Short-Term Memory (LSTM) and a quantum neural network that is studied by a quantum simulator, leveraging the support of NVIDIA Compute Unified Device Architecture (CUDA). To ensure the desired accuracy, a comparative analysis is conducted, examining the qubit count and quantum circuit depth of the proposed model. The execution time required for the model is significantly reduced when the GPU incorporates CUDA, accounting for only 8.29% of the time required by a classical CPU. In addition, different quantum embedding layers with various entangler layers in the quantum neural network are explored. The simulation results utilizing an offshore wind farm dataset demonstrate that the proper number of qubits and embedding layer can achieve favorable 24 h ahead wind speed forecasts. Full article
(This article belongs to the Section A3: Wind, Wave and Tidal Energy)
Show Figures

Figure 1

20 pages, 732 KiB  
Article
VCONV: A Convolutional Neural Network Accelerator for FPGAs
by Srikanth Neelam and A. Amalin Prince
Electronics 2025, 14(4), 657; https://doi.org/10.3390/electronics14040657 - 8 Feb 2025
Cited by 2 | Viewed by 1228
Abstract
Field Programmable Gate Arrays (FPGAs), with their wide portfolio of configurable resources such as Look-Up Tables (LUTs), Block Random Access Memory (BRAM), and Digital Signal Processing (DSP) blocks, are the best option for custom hardware designs. Their low power consumption and cost-effectiveness give [...] Read more.
Field Programmable Gate Arrays (FPGAs), with their wide portfolio of configurable resources such as Look-Up Tables (LUTs), Block Random Access Memory (BRAM), and Digital Signal Processing (DSP) blocks, are the best option for custom hardware designs. Their low power consumption and cost-effectiveness give them an advantage over Graphics Processing Units (GPUs) and Central Processing Units (CPUs) in providing efficient accelerator solutions for compute-intensive Convolutional Neural Network (CNN) models. CNN accelerators are dedicated hardware modules capable of performing compute operations such as convolution, activation, normalization, and pooling with minimal intervention from a host. Designing accelerators for deeper CNN models requires FPGAs with vast resources, which impact its advantages in terms of power and price. In this paper, we propose the VCONV Intellectual Property (IP), an efficient and scalable CNN accelerator architecture for applications where power and cost are constraints. VCONV, with its configurable design, can be deployed across multiple smaller FPGAs instead of a single large FPGA to provide better control over cost and parallel processing. VCONV can be deployed across heterogeneous FPGAs, depending on the performance requirements of each layer. The IP’s performance can be evaluated using embedded monitors to ensure that the accelerator is configured to achieve the best performance. VCONV can be configured for data type format, convolution engine (CE) and convolution unit (CU) configurations, as well as the sequence of operations based on the CNN model and layer. VCONV can be interfaced through the Advanced Peripheral Bus (APB) for configuration and the Advanced eXtensible Interface (AXI) stream for data transfers. The IP was implemented and validated on the Avnet Zedboard and tested on the first layer of AlexNet, VGG16, and ResNet18 with multiple CE configurations, demonstrating 100% performance from MAC units with no idle time. We also synthesized multiple VCONV instances required for AlexNet, achieving the lowest BRAM utilization of just 1.64 Mb and deriving a performance of 56GOPs. Full article
(This article belongs to the Special Issue Convolutional Neural Networks and Vision Applications, 3rd Edition)
Show Figures

Figure 1

35 pages, 2222 KiB  
Article
Multithreaded and GPU-Based Implementations of a Modified Particle Swarm Optimization Algorithm with Application to Solving Large-Scale Systems of Nonlinear Equations
by Bruno Silva, Luiz Guerreiro Lopes and Fábio Mendonça
Electronics 2025, 14(3), 584; https://doi.org/10.3390/electronics14030584 - 1 Feb 2025
Viewed by 850
Abstract
This paper presents a novel Graphics Processing Unit (GPU) accelerated implementation of a modified Particle Swarm Optimization (PSO) algorithm specifically designed to solve large-scale Systems of Nonlinear Equations (SNEs). The proposed GPU-based parallel version of the PSO algorithm uses the inherent parallelism of [...] Read more.
This paper presents a novel Graphics Processing Unit (GPU) accelerated implementation of a modified Particle Swarm Optimization (PSO) algorithm specifically designed to solve large-scale Systems of Nonlinear Equations (SNEs). The proposed GPU-based parallel version of the PSO algorithm uses the inherent parallelism of modern hardware architectures. Its performance is compared against both sequential and multithreaded Central Processing Unit (CPU) implementations. The primary objective is to evaluate the efficiency and scalability of PSO across different hardware platforms with a focus on solving large-scale SNEs involving thousands of equations and variables. The GPU-parallelized and multithreaded versions of the algorithm were implemented in the Julia programming language. Performance analyses were conducted on an NVIDIA A100 GPU and an AMD EPYC 7643 CPU. The tests utilized a set of challenging, scalable SNEs with dimensions ranging from 1000 to 5000. Results demonstrate that the GPU accelerated modified PSO substantially outperforms its CPU counterparts, achieving substantial speedups and consistently surpassing the highly optimized multithreaded CPU implementation in terms of computation time and scalability as the problem size increases. Therefore, this work evaluates the trade-offs between different hardware platforms and underscores the potential of GPU-based parallelism for accelerating SNE solvers. Full article
Show Figures

Figure 1

33 pages, 19016 KiB  
Article
Multitask Learning-Based Pipeline-Parallel Computation Offloading Architecture for Deep Face Analysis
by Faris S. Alghareb and Balqees Talal Hasan
Computers 2025, 14(1), 29; https://doi.org/10.3390/computers14010029 - 20 Jan 2025
Viewed by 1817
Abstract
Deep Neural Networks (DNNs) have been widely adopted in several advanced artificial intelligence applications due to their competitive accuracy to the human brain. Nevertheless, the superior accuracy of a DNN is achieved at the expense of intensive computations and storage complexity, requiring custom [...] Read more.
Deep Neural Networks (DNNs) have been widely adopted in several advanced artificial intelligence applications due to their competitive accuracy to the human brain. Nevertheless, the superior accuracy of a DNN is achieved at the expense of intensive computations and storage complexity, requiring custom expandable hardware, i.e., graphics processing units (GPUs). Interestingly, leveraging the synergy of parallelism and edge computing can significantly improve CPU-based hardware platforms. Therefore, this manuscript explores levels of parallelism techniques along with edge computation offloading to develop an innovative hardware platform that improves the efficacy of deep learning computing architectures. Furthermore, the multitask learning (MTL) approach is employed to construct a parallel multi-task classification network. These tasks include face detection and recognition, age estimation, gender recognition, smile detection, and hair color and style classification. Additionally, both pipeline and parallel processing techniques are utilized to expedite complicated computations, boosting the overall performance of the presented deep face analysis architecture. A computation offloading approach, on the other hand, is leveraged to distribute computation-intensive tasks to the server edge, whereas lightweight computations are offloaded to edge devices, i.e., Raspberry Pi 4. To train the proposed deep face analysis network architecture, two custom datasets (HDDB and FRAED) were created for head detection and face-age recognition. Extensive experimental results demonstrate the efficacy of the proposed pipeline-parallel architecture in terms of execution time. It requires 8.2 s to provide detailed face detection and analysis for an individual and 23.59 s for an inference containing 10 individuals. Moreover, a speedup of 62.48% is achieved compared to the sequential-based edge computing architecture. Meanwhile, 25.96% speed performance acceleration is realized when implementing the proposed pipeline-parallel architecture only on the server edge compared to the sever sequential implementation. Considering classification efficiency, the proposed classification modules achieve an accuracy of 88.55% for hair color and style classification and a remarkable prediction outcome of 100% for face recognition and age estimation. To summarize, the proposed approach can assist in reducing the required execution time and memory capacity by processing all facial tasks simultaneously on a single deep neural network rather than building a CNN model for each task. Therefore, the presented pipeline-parallel architecture can be a cost-effective framework for real-time computer vision applications implemented on resource-limited devices. Full article
Show Figures

Figure 1

23 pages, 1890 KiB  
Article
Physics-Informed Neural Networks for Modal Wave Field Predictions in 3D Room Acoustics
by Stefan Schoder
Appl. Sci. 2025, 15(2), 939; https://doi.org/10.3390/app15020939 - 18 Jan 2025
Cited by 1 | Viewed by 1912
Abstract
The generalization of Physics-Informed Neural Networks (PINNs) used to solve the inhomogeneous Helmholtz equation in a simplified three-dimensional room is investigated. PINNs are appealing since they can efficiently integrate a partial differential equation and experimental data by minimizing a loss function. However, a [...] Read more.
The generalization of Physics-Informed Neural Networks (PINNs) used to solve the inhomogeneous Helmholtz equation in a simplified three-dimensional room is investigated. PINNs are appealing since they can efficiently integrate a partial differential equation and experimental data by minimizing a loss function. However, a previous study experienced limitations in acoustics regarding the source term. A challenging but realistic excitation case is a confined (e.g., single-point) excitation area, yielding a smooth spatial wave field periodically with the wavelength. Compared to studies using smooth (unrealistic) sound excitation, the network’s generalization capabilities regarding a realistic sound excitation are addressed. Different methods like hyperparameter optimization, adaptive refinement, Fourier feature engineering, and locally adaptive activation functions with slope recovery are tested to tailor the PINN’s accuracy to an experimentally validated finite element analysis reference solution computed with openCFS. The hyperparameter study and optimization are conducted regarding the network depth and width, the learning rate, the used activation functions, and the deep learning backends (PyTorch 2.5.1, TensorFlow 2.18.0 1, TensorFlow 2.18.0 2, JAX 0.4.39). A modified (feature-engineered) PINN architecture was designed using input feature engineering to include the dispersion relation of the wave in the neural network. For smoothly (unrealistic) distributed sources, it was shown that the standard PINNs and the feature-engineered PINN converge to the analytic solution, with a relative error of 0.28% and 2×104%, respectively. The locally adaptive activation functions with the slope lead to a relative error of 0.086% with a source sharpness of s=1 m. Similar relative errors were obtained for the case s=0.2 m using adaptive refinement. The feature-engineered PINN significantly outperformed the results of previous studies regarding accuracy. Furthermore, the trainable parameters were reduced to a fraction by Bayesian hyperparameter optimization (around 5%), and likewise, the training time (around 3%) was reduced compared to the standard PINN formulation. By narrowing this excitation towards a single point, the convergence rate and minimum errors obtained of all presented network architectures increased. The feature-engineered architecture yielded a one order of magnitude lower accuracy of 0.20% compared to 0.019% of the standard PINN formulation with a source sharpness of s=1 m. It outperformed the finite element analysis and the standard PINN in terms time needed to obtain the solution, needing 15 min and 30 s on an AMD Ryzen 7 Pro 8840HS CPU (AMD, Santa Clara, CA, USA) for the FEM, compared to about 20 min (standard PINN) and just under a minute of the feature-engineered PINN, both trained on a Tesla T4 GPU (NVIDIA, Santa Clara, CA, USA). Full article
(This article belongs to the Special Issue Artificial Intelligence in Acoustic Simulation and Design)
Show Figures

Figure 1

22 pages, 7428 KiB  
Article
An Integrated Model for Dam Break Flood Including Reservoir Area, Breach Evolution, and Downstream Flood Propagation
by Huiwen Liu, Zhongxiang Wang, Dawei Zhang and Liyun Xiang
Appl. Sci. 2024, 14(23), 10921; https://doi.org/10.3390/app142310921 - 25 Nov 2024
Viewed by 1473
Abstract
The reasonable and efficient prediction of dam failure events is of great significance to the emergency rescue operations and the reduction in dam failure losses. This work presents a model that is based on the physical mechanism. It is coupled with a multi-architecture [...] Read more.
The reasonable and efficient prediction of dam failure events is of great significance to the emergency rescue operations and the reduction in dam failure losses. This work presents a model that is based on the physical mechanism. It is coupled with a multi-architecture (multi-CPU and GPU) open-source two-dimensional flood model, which is based on high-precision terrain and land use data. The aim is to enhance the accuracy of dam break flood process simulations. The model uses DEM data as a computational grid and updates it at each time step to reflect breach evolution. Simultaneously, the breach evolution model incorporates an analysis of stress on sediment particles, establishing the initial erosion state and lateral expansion model while accounting for seepage. The determination of the overflow of the breach is resolved through the application of a two-dimensional hydrodynamic model. This approach achieves a robust connection between the upstream reservoir, the dam structure, and the downstream inundation area. The coupled model is utilized to calculate the failure of earth-rock dams and landslide dams, and a sensitivity analysis is conducted. Taum Sauk Dam and Tangjiashan landslide dam were selected to represent earth dam break and barrier lake break, respectively, which are the main types of dam breaks. The obtained results demonstrate strong concurrence with the measured data, the relative errors of the four important parameters of the application case, the peak discharge of the breach, the top width of the final breach, the depth of the breach and the arrival time of the maximum peak discharge are all within ±10%. Although the relative error of the completion time of the final breach is greater than 10%, it is about 30% less than the relative error of the physical model. Full article
(This article belongs to the Section Earth Sciences)
Show Figures

Figure 1

20 pages, 3466 KiB  
Article
Symmetric Tridiagonal Eigenvalue Solver Across CPU Graphics Processing Unit (GPU) Nodes
by Erika Hernández-Rubio, Alberto Estrella-Cruz, Amilcar Meneses-Viveros, Jorge Alberto Rivera-Rivera, Liliana Ibeth Barbosa-Santillán and Sergio Víctor Chapa-Vergara
Appl. Sci. 2024, 14(22), 10716; https://doi.org/10.3390/app142210716 - 19 Nov 2024
Cited by 1 | Viewed by 1096
Abstract
In this work, an improved and scalable implementation of Cuppen’s algorithm for diagonalizing symmetric tridiagonal matrices is presented. This approach uses a hybrid-heterogeneous parallelization technique, taking advantage of GPU and CPU in a distributed hardware architecture. Cuppen’s algorithm is a theoretical concept and [...] Read more.
In this work, an improved and scalable implementation of Cuppen’s algorithm for diagonalizing symmetric tridiagonal matrices is presented. This approach uses a hybrid-heterogeneous parallelization technique, taking advantage of GPU and CPU in a distributed hardware architecture. Cuppen’s algorithm is a theoretical concept and a powerful tool in various scientific and engineering applications. It is a key player in matrix diagonalization, finding its use in Functional Density Theory (FDT) and Spectral Clustering. This highly efficient and numerically stable algorithm computes eigenvalues and eigenvectors of symmetric tridiagonal matrices, making it a crucial component in many computational methods. One of the challenges in parallelizing algorithms for GPUs is their limited memory capacity. However, we overcome this limitation by utilizing multiple nodes with both CPUs and GPUs. This enables us to solve subproblems that fit within the memory of each device in parallel and subsequently combine these subproblems to obtain the complete solution. The hybrid-heterogeneous approach proposed in this work outperforms the state-of-the-art libraries and also maintains a high degree of accuracy in terms of orthogonality and quality of eigenvectors. Furthermore, the sequential version of the algorithm with our approach in this work demonstrates superior performance and potential for practical use. In the experiments carried out, it was possible to verify that the performance of the implementation that was carried out scales by 2× using two graphic cards in the same node. Notably, Symmetric Tridiagonal Eigenvalue Solvers are fundamental to solving more general eigenvalue problems. Additionally, the divide-and-conquer approach employed in this implementation can be extended to singular value solvers. Given the wide range of eigenvalue problems encountered in scientific and engineering domains, this work is essential in advancing computational methods for efficient and accurate matrix diagonalization. Full article
Show Figures

Figure 1

Back to TopTop