Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (72)

Search Parameters:
Keywords = Compute Unified Device Architecture (CUDA)

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
20 pages, 7789 KB  
Article
Simulation and Analysis of the Second-Order Memristive System in the CUDAynamics Suite
by Alexander Khanov, Maksim Gozhan, Denis Butusov, Yulia Bobrova and Valerii Ostrovskii
Algorithms 2026, 19(5), 402; https://doi.org/10.3390/a19050402 - 17 May 2026
Viewed by 257
Abstract
Cycle-to-cycle variability of switching parameters inherent to memristive devices introduces significant problems in the design of neuromorphic systems and non-volatile memory. This study investigates the dynamics of a second-order memristive system incorporating capacitive effects that model parasitic charge within individual memristors, addressing both [...] Read more.
Cycle-to-cycle variability of switching parameters inherent to memristive devices introduces significant problems in the design of neuromorphic systems and non-volatile memory. This study investigates the dynamics of a second-order memristive system incorporating capacitive effects that model parasitic charge within individual memristors, addressing both the technical need for accurate analysis of complex regimes and the demand for exploratory environments. Simulations were performed using CUDAynamics, an interactive software suite developed by the authors, which utilizes parallel computing, primarily via NVIDIA Compute Unified Device Architecture (CUDA). It integrates multiple analysis tools for dynamical systems, including bifurcation diagrams, the largest Lyapunov exponent and periodicity mapping, and interactive navigation in multidimensional parameter spaces. The memristive system was discretized applying multiple integration methods with a fixed time step and various waveforms of the input signal. Analysis tools revealed well-defined regions of chaotic dynamics in the memristor resistance parameter space as functions of input signal properties. Sinusoidal and triangular waveforms produced topologically similar distributions of dynamical regimes, whereas the square waveform, mimicking digital inputs, generated distinct dynamical patterns while still preserving chaotic trajectories under specific conditions. Interactive visualization capabilities of CUDAynamics effectively demonstrate attractor evolution and hysteresis deformation, providing immediate visual feedback that significantly enhances conceptual comprehension of nonlinear feedback mechanisms. Beyond its practical implications for the design of analog and digital memristive devices, CUDAynamics offers a scalable, open-source toolkit to aid researchers and engineers in exploring complex dynamical phenomena. Full article
(This article belongs to the Special Issue Recent Advances in Numerical Algorithms and Their Applications)
Show Figures

Figure 1

20 pages, 23906 KB  
Article
Improved Depth Imaging of the Chicxulub Impact Crater by GPU-Accelerated Adjoint Reverse Time Migration
by Jesús Antonio Herrera-Pérez, Jose Carlos Ortíz-Alemán, Sebastián López-Juárez, Jhonatan Fernando Eulopa-Hernandez, Carlos Couder-Castañeda, Isaac Medina-Sanchez, Jairo Olguin-Roque and Diego Alfredo Padilla-Pérez
Symmetry 2026, 18(4), 658; https://doi.org/10.3390/sym18040658 - 15 Apr 2026
Viewed by 541
Abstract
Reverse time migration (RTM) exploits time-reversal symmetry and adjoint duality to focus wavefields and reconstruct subsurface reflectivity, but large surveys remain limited by the cost of forward and backward propagation. We present a Graphics Processing Unit (GPU)-accelerated adjoint RTM workflow for depth imaging [...] Read more.
Reverse time migration (RTM) exploits time-reversal symmetry and adjoint duality to focus wavefields and reconstruct subsurface reflectivity, but large surveys remain limited by the cost of forward and backward propagation. We present a Graphics Processing Unit (GPU)-accelerated adjoint RTM workflow for depth imaging of the Chicxulub impact structure using the marine A0/A1 composite profile (1996). The processed stacked section contains 14,172 traces with 6.25 m Common Depth Point (CDP) spacing, 1 ms sampling, and 18 s record length. Forward and adjoint wavefields are computed with a staggered-grid finite-difference scheme (fourth order in space, second in time) and Convolutional Perfectly Matched Layers (CPMLs), which provide stable finite-domain simulations while introducing controlled symmetry breaking through absorption. The solver is verified with the Lamb half-space analytical benchmark and applied through five interpretation-guided velocity/density updates. The final depth image improves reflector continuity and interpretability of crater-scale elements, including post-impact sedimentary fill, melt and breccia units, terrace fault blocks, and deep uplift-related structure. Compute Unified Device Architecture (CUDA) acceleration reduces runtime from ∼32.36 h on a CPU baseline to ∼34.10 min on an RTX 3070 (≈56.9×), enabling practical, reproducible iterative RTM on accessible hardware. Full article
(This article belongs to the Special Issue Symmetry/Asymmetry in Numerical Analysis and Scientific Computing)
Show Figures

Figure 1

24 pages, 504 KB  
Article
Feasibility Study of CUDA-Accelerated Homomorphic Encryption and Benchmarking on Consumer-Grade and Embedded GPUs
by Volodymyr Dubetskyy and Maria-Dolores Cano
Big Data Cogn. Comput. 2026, 10(3), 79; https://doi.org/10.3390/bdcc10030079 - 6 Mar 2026
Viewed by 1467
Abstract
Fully Homomorphic Encryption (FHE) provides strong data confidentiality during computation but often suffers from high latency on Central Processing Units (CPUs). This study evaluates Graphics Processing Unit (GPU) acceleration for modern FHE libraries across a laptop (NVIDIA GTX 1650 Ti), a server (NVIDIA [...] Read more.
Fully Homomorphic Encryption (FHE) provides strong data confidentiality during computation but often suffers from high latency on Central Processing Units (CPUs). This study evaluates Graphics Processing Unit (GPU) acceleration for modern FHE libraries across a laptop (NVIDIA GTX 1650 Ti), a server (NVIDIA RTX 4060), and a Jetson Nano 2 GB embedded GPU. We benchmark key generation, arithmetic operations, Boolean-gate evaluation and scheme-specific tasks such as relinearization and key switching, using library-provided benchmarks with an explicit baseline (operation scope, timing boundaries, and parameter tuples). Moreover, we compare GPU-native libraries (NuFHE, Phantom-FHE, and Troy-Nova) with CPU-oriented ones (Microsoft SEAL, HElib, OpenFHE, Cupcake, and TFHE-rs). Results show GPUs deliver significant speedups for targeted operations. For example, NuFHE’s NVIDIA CUDA (Compute Unified Device Architecture) backend achieves about 1.4× faster Boolean-gate evaluation on the laptop and 3.4× faster on the server compared to its OpenCL backend. Likewise, RLWE (Ring Learning With Errors)-based schemes (BFV, CKKS, and BGV) see marked gains for polynomial arithmetic such as Number Theoretic Transform (NTT) when executed via Phantom-FHE. However, attempts to add CUDA support to Microsoft SEAL reveal four main challenges: high-precision modular arithmetic on GPUs, sequential dependencies in SEAL’s design, limited GPU memory and complex build-system changes. In light of these findings, we propose revised guidelines for GPU-first FHE libraries and practical recommendations for deploying high-throughput, privacy-preserving solutions on modern GPUs. Full article
(This article belongs to the Section Big Data)
Show Figures

Figure 1

22 pages, 5297 KB  
Article
A Space-Domain Gravity Forward Modeling Method Based on Voxel Discretization and Multiple Observation Surfaces
by Rui Zhang, Guiju Wu, Jiapei Wang, Yufei Xi, Fan Wang and Qinhong Long
Symmetry 2026, 18(1), 180; https://doi.org/10.3390/sym18010180 - 19 Jan 2026
Viewed by 791
Abstract
Geophysical forward modeling serves as a fundamental theoretical approach for characterizing subsurface structures and material properties, essentially involving the computation of gravity responses at surface or spatial observation points based on a predefined density distribution. With the rapid development of data-driven techniques such [...] Read more.
Geophysical forward modeling serves as a fundamental theoretical approach for characterizing subsurface structures and material properties, essentially involving the computation of gravity responses at surface or spatial observation points based on a predefined density distribution. With the rapid development of data-driven techniques such as deep learning in geophysical inversion, forward algorithms are facing increasing demands in terms of computational scale, observable types, and efficiency. To address these challenges, this study develops an efficient forward modeling method based on voxel discretization, the enabling rapid calculation of gravity anomalies and radial gravity gradients on multiple observational surfaces. Leveraging the parallel computing capabilities of graphics processing units (GPU), together with tensor acceleration, Compute Unified Device Architecture (CUDA) execution, and Just-in-time (JIT) compilation strategies, the method achieves high efficiency and automation in the forward computation process. Numerical experiments conducted on several typical theoretical models demonstrate the convergence and stability of the calculated results, indicating that the proposed method significantly reduces computation time while maintaining accuracy, thus being well-suited for large-scale 3D modeling and fast batch simulation tasks. This research can efficiently generate forward datasets with multi-view and multi-metric characteristics, providing solid data support and a scalable computational platform for deep-learning-based geophysical inversion studies. Full article
Show Figures

Figure 1

35 pages, 11134 KB  
Article
Error Classification and Static Detection Methods in Tri-Programming Models: MPI, OpenMP, and CUDA
by Saeed Musaad Altalhi, Fathy Elbouraey Eassa, Sanaa Abdullah Sharaf, Ahmed Mohammed Alghamdi, Khalid Ali Almarhabi and Rana Ahmad Bilal Khalid
Computers 2025, 14(5), 164; https://doi.org/10.3390/computers14050164 - 28 Apr 2025
Viewed by 2173
Abstract
The growing adoption of supercomputers across various scientific disciplines, particularly by researchers without a background in computer science, has intensified the demand for parallel applications. These applications are typically developed using a combination of programming models within languages such as C, C++, and [...] Read more.
The growing adoption of supercomputers across various scientific disciplines, particularly by researchers without a background in computer science, has intensified the demand for parallel applications. These applications are typically developed using a combination of programming models within languages such as C, C++, and Fortran. However, modern multi-core processors and accelerators necessitate fine-grained control to achieve effective parallelism, complicating the development process. To address this, developers commonly utilize high-level programming models such as Open Multi-Processing (OpenMP), Open Accelerators (OpenACCs), Message Passing Interface (MPI), and Compute Unified Device Architecture (CUDA). These models may be used independently or combined into dual- or tri-model applications to leverage their complementary strengths. However, integrating multiple models introduces subtle and difficult-to-detect runtime errors such as data races, deadlocks, and livelocks that often elude conventional compilers. This complexity is exacerbated in applications that simultaneously incorporate MPI, OpenMP, and CUDA, where the origin of runtime errors, whether from individual models, user logic, or their interactions, becomes ambiguous. Moreover, existing tools are inadequate for detecting such errors in tri-model applications, leaving a critical gap in development support. To address this gap, the present study introduces a static analysis tool designed specifically for tri-model applications combining MPI, OpenMP, and CUDA in C++-based environments. The tool analyzes source code to identify both actual and potential runtime errors prior to execution. Central to this approach is the introduction of error dependency graphs, a novel mechanism for systematically representing and analyzing error correlations in hybrid applications. By offering both error classification and comprehensive static detection, the proposed tool enhances error visibility and reduces manual testing effort. This contributes significantly to the development of more robust parallel applications for high-performance computing (HPC) and future exascale systems. Full article
(This article belongs to the Special Issue Best Practices, Challenges and Opportunities in Software Engineering)
Show Figures

Figure 1

15 pages, 4952 KB  
Article
Novel Research on a Finite-Difference Time-Domain Acceleration Algorithm Based on Distributed Cluster Graphic Process Units
by Xinbo He, Shenggang Mu, Xudong Han and Bing Wei
Appl. Sci. 2025, 15(9), 4834; https://doi.org/10.3390/app15094834 - 27 Apr 2025
Cited by 1 | Viewed by 1482
Abstract
In computational electromagnetics, the finite-difference time-domain (FDTD) method is recognized for its volumetric discretization approach. However, it can be computationally demanding when addressing large-scale electromagnetic problems. This paper introduces a novel approach by incorporating Graphic Process Units (GPUs) into an FDTD algorithm. It [...] Read more.
In computational electromagnetics, the finite-difference time-domain (FDTD) method is recognized for its volumetric discretization approach. However, it can be computationally demanding when addressing large-scale electromagnetic problems. This paper introduces a novel approach by incorporating Graphic Process Units (GPUs) into an FDTD algorithm. It leverages the Compute Unified Device Architecture (CUDA) along with OpenMPI and the NVIDIA Collective Communications Library (NCCL) to establish a parallel scheme for the FDTD algorithm in distributed cluster GPUs. This approach enhances the computational efficiency of the FDTD algorithm by circumventing data relaying by the CPU and the limitations of the PCIe bus. The improved efficiency renders the FDTD algorithm a more practical and efficient solution for real-world electromagnetic problems. Full article
(This article belongs to the Section Electrical, Electronics and Communications Engineering)
Show Figures

Figure 1

21 pages, 1316 KB  
Article
Implementing a Hybrid Quantum Neural Network for Wind Speed Forecasting: Insights from Quantum Simulator Experiences
by Ying-Yi Hong and Jay Bhie D. Santos
Energies 2025, 18(7), 1771; https://doi.org/10.3390/en18071771 - 1 Apr 2025
Viewed by 1376
Abstract
The intermittent nature of wind speed poses challenges for its widespread utilization as an electrical power generation source. As the integration of wind energy into the power system increases, accurate wind speed forecasting becomes crucial. The reliable scheduling of wind power generation heavily [...] Read more.
The intermittent nature of wind speed poses challenges for its widespread utilization as an electrical power generation source. As the integration of wind energy into the power system increases, accurate wind speed forecasting becomes crucial. The reliable scheduling of wind power generation heavily relies on precise wind speed forecasts. This paper presents an extended work that focuses on a hybrid model for 24 h ahead wind speed forecasting. The proposed model combines residual Long Short-Term Memory (LSTM) and a quantum neural network that is studied by a quantum simulator, leveraging the support of NVIDIA Compute Unified Device Architecture (CUDA). To ensure the desired accuracy, a comparative analysis is conducted, examining the qubit count and quantum circuit depth of the proposed model. The execution time required for the model is significantly reduced when the GPU incorporates CUDA, accounting for only 8.29% of the time required by a classical CPU. In addition, different quantum embedding layers with various entangler layers in the quantum neural network are explored. The simulation results utilizing an offshore wind farm dataset demonstrate that the proper number of qubits and embedding layer can achieve favorable 24 h ahead wind speed forecasts. Full article
(This article belongs to the Section A3: Wind, Wave and Tidal Energy)
Show Figures

Figure 1

23 pages, 16714 KB  
Article
A Geographically Weighted Regression–Compute Unified Device Architecture Approach to Explore the Spatial Agglomeration and Heterogeneity in Arable Land Consumption in Southwest China
by Chang Liu, Tingting Xu, Letao Han, Sapu Du and Aohua Tian
Agriculture 2024, 14(10), 1675; https://doi.org/10.3390/agriculture14101675 - 25 Sep 2024
Cited by 2 | Viewed by 2454
Abstract
Arable land loss has become a critical issue in China because of rapid urbanization, industrial expansion, and unsustainable agricultural practices. While previous studies have explored the factors contributing to this loss, they often fall short in addressing the challenges of spatial heterogeneity and [...] Read more.
Arable land loss has become a critical issue in China because of rapid urbanization, industrial expansion, and unsustainable agricultural practices. While previous studies have explored the factors contributing to this loss, they often fall short in addressing the challenges of spatial heterogeneity and large-scale dataset analysis. This research introduces an innovative approach to geographically weighted regression (GWR) for assessing arable land loss in China, effectively addressing these challenges. Focusing on Chongqing, Guizhou, and Yunnan Provinces over the past two decades, it examines spatial autocorrelation with R-squared values exceeding 0.6 and residuals. Eight factors, including environmental elements (rain, evaporation, slope, digital elevation model) and human activities (distance to city, distance to roads, population, GDP), were analyzed. By visualizing and analyzing R² spatial patterns, the results reveal a clear spatial agglomeration distribution, primarily in urban areas with industries, highly urbanized cities, and flat terrains near rivers, influenced by GDP, population, rain, and slope. The novelty of this study is that it significantly enhances GWR computational capabilities for handling extensive datasets by utilizing Compute Unified Device Architecture (CUDA) on a high-performance GPU cloud server. Simultaneously, it conducts comprehensive analyses of the GWR model’s local results through visualization and spatial autocorrelation tools, enhancing the interpretability of the GWR model. Through spatial clustering analysis of local results, this study enables targeted exploration of factors influencing arable land changes in various temporal and spatial dimensions while also evaluating the reliability of the model results. Full article
(This article belongs to the Section Agricultural Economics, Policies and Rural Management)
Show Figures

Figure 1

27 pages, 4031 KB  
Article
Polarization Characteristics of Massive HVI Debris Clouds Using an Improved Monte Carlo Ray Tracing Method for Remote Sensing Applications
by Guangsen Liu, Peng Rao, Yao Li and Wen Sun
Remote Sens. 2024, 16(16), 2925; https://doi.org/10.3390/rs16162925 - 9 Aug 2024
Viewed by 1964
Abstract
As a signature phenomenon of massive hypervelocity impacts (HVIs) in space, debris clouds provide critical optical information for satellite remote sensing and the assessment of large-scale impacts. However, studies of the optical scattering properties of debris clouds remain limited, and existing vector radiative [...] Read more.
As a signature phenomenon of massive hypervelocity impacts (HVIs) in space, debris clouds provide critical optical information for satellite remote sensing and the assessment of large-scale impacts. However, studies of the optical scattering properties of debris clouds remain limited, and existing vector radiative transfer (VRT) methods struggle to accurately simulate the optical characteristics of these complex scatterers. To address this gap, this paper presents an improved Monte Carlo VRT program (PGS–MC) for multicomponent polydisperse scatterers to precisely evaluate the radiation and polarization characteristics of complex scatterers. Based on the Monte Carlo ray tracing (MCRT) method, our program introduces a particle grouping strategy (PGS) to further emphasize the importance of accounting for optical property discrepancies between different materials and particle sizes, thus significantly improving the fidelity of VRT simulations. Moreover, our program, developed using the compute unified device architecture (CUDA), can be run parallelly on graphics processing units (GPUs), which effectively reduces the computational time. The validation results indicated that the developed PGS–MC program can accurately and efficiently simulate the polarization of complex 3D scatterers. A further investigation showed that the polarization characteristics of debris clouds are highly sensitive to parameters such as the angle between the incident and detection directions, number density, particle size distribution, debris material, and wavelength. In addition, the polarization imaging of debris clouds offers distinct advantages over intensity imaging. This study offers guidance for analyzing the VRT properties of massive HVI debris clouds. Additionally, it provides a practical tool and concrete ideas for modeling the polarization characteristics of various complex scatterers, such as aircraft contrails and clouds, etc. Full article
Show Figures

Figure 1

37 pages, 9513 KB  
Article
Parallel Implicit Solvers for 2D Numerical Models on Structured Meshes
by Yaoxin Zhang, Mohammad Z. Al-Hamdan and Xiaobo Chao
Mathematics 2024, 12(14), 2184; https://doi.org/10.3390/math12142184 - 12 Jul 2024
Cited by 1 | Viewed by 1760
Abstract
This paper presents the parallelization of two widely used implicit numerical solvers for the solution of partial differential equations on structured meshes, namely, the ADI (Alternating-Direction Implicit) solver for tridiagonal linear systems and the SIP (Strongly Implicit Procedure) solver for the penta-diagonal systems. [...] Read more.
This paper presents the parallelization of two widely used implicit numerical solvers for the solution of partial differential equations on structured meshes, namely, the ADI (Alternating-Direction Implicit) solver for tridiagonal linear systems and the SIP (Strongly Implicit Procedure) solver for the penta-diagonal systems. Both solvers were parallelized using CUDA (Computer Unified Device Architecture) Fortran on GPGPUs (General-Purpose Graphics Processing Units). The parallel ADI solver (P-ADI) is based on the Parallel Cyclic Reduction (PCR) algorithm, while the parallel SIP solver (P-SIP) uses the wave front method (WF) following a diagonal line calculation strategy. To map the solution schemes onto the hierarchical block-threads framework of the CUDA on the GPU, the P-ADI solver adopted two mapping methods, one block thread with iterations (OBM-it) and multi-block threads (MBMs), while the P-SIP solver also used two mappings, one conventional mapping using effective WF lines (WF-e) with matrix coefficients and solution variables defined on original computational mesh, and a newly proposed mapping using all WF mesh (WF-all), on which matrix coefficients and solution variables are defined. Both the P-ADI and the P-SIP have been integrated into a two-dimensional (2D) hydrodynamic model, the CCHE2D (Center of Computational Hydroscience and Engineering) model, developed by the National Center for Computational Hydroscience and Engineering at the University of Mississippi. This study for the first time compared these two parallel solvers and their efficiency using examples and applications in complex geometries, which can provide valuable guidance for future uses of these two parallel implicit solvers in computational fluids dynamics (CFD). Both parallel solvers demonstrated higher efficiency than their serial counterparts on the CPU (Central Processing Unit): 3.73~4.98 speedup ratio for flow simulations, and 2.166~3.648 speedup ratio for sediment transport simulations. In general, the P-ADI solver is faster than but not as stable as the P-SIP solver; and for the P-SIP solver, the newly developed mapping method WF-all significantly improved the conventional mapping method WF-e. Full article
(This article belongs to the Special Issue Mathematical Modeling and Numerical Simulation in Fluids)
Show Figures

Figure 1

25 pages, 9200 KB  
Article
Bounding Volume Hierarchy-Assisted Fast SAR Image Simulation Based on Spatial Segmentation
by Ke Wu, Guowang Jin, Xin Xiong and Quanjie Shi
Appl. Sci. 2024, 14(8), 3340; https://doi.org/10.3390/app14083340 - 16 Apr 2024
Cited by 3 | Viewed by 2252
Abstract
In order to improve the simulation efficiency under the premise of ensuring the fidelity of synthetic aperture radar (SAR) simulation images, we propose a BVH-assisted fast SAR image simulation method based on spatial segmentation. The beam scanning model is established based on RD [...] Read more.
In order to improve the simulation efficiency under the premise of ensuring the fidelity of synthetic aperture radar (SAR) simulation images, we propose a BVH-assisted fast SAR image simulation method based on spatial segmentation. The beam scanning model is established based on RD imaging geometric relation, and the bounding volume hierarchy (BVH) algorithm is used to assist in obtaining the time-varying latticed radiation and shadow areas within the radar beam, combining them with the real-time position of the sensors to complete the simulation of the electromagnetic (EM) wave transmission. The ray tracing algorithm is used to calculate the multiple backscatter fields of EM waves, including various material properties of the target surface. The SAR spatial traversal is adopted to spatially segment the latticed radiation area, and the compute unified device architecture (CUDA) kernel function is designed using the echo matrix cell method to make each cell of the target echo matrix as a subfield of the backscattering field, and the position of the echo matrix cell is traversed to obtain the target backscattering field. The target simulated echo is processed by the range Doppler (RD) imaging algorithm to obtain the SAR-simulated image. The simulation results show that compared with a CPU single-thread simulation, the simulation speed of the proposed method is significantly improved, and the SAR simulation image has high structural similarity with the real image, which fully verifies the effectiveness of the proposed method. Full article
(This article belongs to the Special Issue Advances in Radar Imaging and Signal Processing)
Show Figures

Figure 1

20 pages, 2060 KB  
Article
Turbomachinery GPU Accelerated CFD: An Insight into Performance
by Daniel Molinero-Hernández, Sergio R. Galván-González, Nicolás D. Herrera-Sandoval, Pablo Guzman-Avalos, J. Jesús Pacheco-Ibarra and Francisco J. Domínguez-Mota
Computation 2024, 12(3), 57; https://doi.org/10.3390/computation12030057 - 11 Mar 2024
Cited by 2 | Viewed by 5718
Abstract
Driven by the emergence of Graphics Processing Units (GPUs), the solution of increasingly large and intricate numerical problems has become feasible. Yet, the integration of GPUs into Computational Fluid Dynamics (CFD) codes still presents a significant challenge. This study undertakes an evaluation of [...] Read more.
Driven by the emergence of Graphics Processing Units (GPUs), the solution of increasingly large and intricate numerical problems has become feasible. Yet, the integration of GPUs into Computational Fluid Dynamics (CFD) codes still presents a significant challenge. This study undertakes an evaluation of the computational performance of GPUs for CFD applications. Two Compute Unified Device Architecture (CUDA)-based implementations within the Open Field Operation and Manipulation (OpenFOAM) environment were employed for the numerical solution of a 3D Kaplan turbine draft tube workbench. A series of tests were conducted to assess the fixed-size grid problem speedup in accordance with Amdahl’s Law. Additionally, tests were performed to identify the optimal configuration utilizing various linear solvers, preconditioners, and smoothers, along with an analysis of memory usage. Full article
Show Figures

Figure 1

33 pages, 11426 KB  
Article
Plant Disease Identification Using Machine Learning Algorithms on Single-Board Computers in IoT Environments
by George Routis, Marios Michailidis and Ioanna Roussaki
Electronics 2024, 13(6), 1010; https://doi.org/10.3390/electronics13061010 - 7 Mar 2024
Cited by 30 | Viewed by 6151
Abstract
This paper investigates the usage of machine learning (ML) algorithms on agricultural images with the aim of extracting information regarding the health of plants. More specifically, a custom convolutional neural network is trained on Google Colab using photos of healthy and unhealthy plants. [...] Read more.
This paper investigates the usage of machine learning (ML) algorithms on agricultural images with the aim of extracting information regarding the health of plants. More specifically, a custom convolutional neural network is trained on Google Colab using photos of healthy and unhealthy plants. The trained models are evaluated using various single-board computers (SBCs) that demonstrate different essential characteristics. Raspberry Pi 3 and Raspberry Pi 4 are the current mainstream SBCs that use their Central Processing Units (CPUs) for processing and are used for many applications for executing ML algorithms based on popular related libraries such as TensorFlow. NVIDIA Graphic Processing Units (GPUs) have a different rationale and base the execution of ML algorithms on a GPU that uses a different architecture than a CPU. GPUs can also implement high parallelization on the Compute Unified Device Architecture (CUDA) cores. Another current approach involves using a Tensor Processing Unit (TPU) processing unit carried by the Google Coral Dev TPU Board, which is an Application-Specific Integrated Circuit (ASIC) specialized for accelerating ML algorithms such as Convolutional Neural Networks (CNNs) via the usage of TensorFlow Lite. This study experiments with all of the above-mentioned devices and executes custom CNN models with the aim of identifying plant diseases. In this respect, several evaluation metrics are used, including knowledge extraction time, CPU utilization, Random Access Memory (RAM) usage, swap memory, temperature, current milli Amperes (mA), voltage (Volts), and power consumption milli Watts (mW). Full article
Show Figures

Figure 1

18 pages, 20257 KB  
Technical Note
Fast Digital Orthophoto Generation: A Comparative Study of Explicit and Implicit Methods
by Jianlin Lv, Guang Jiang, Wei Ding and Zhihao Zhao
Remote Sens. 2024, 16(5), 786; https://doi.org/10.3390/rs16050786 - 24 Feb 2024
Cited by 13 | Viewed by 4365
Abstract
A digital orthophoto is an image with geometric accuracy and no distortion. It is acquired through a top view of the scene and finds widespread applications in map creation, planning, and related fields. This paper classifies the algorithms for digital orthophoto generation into [...] Read more.
A digital orthophoto is an image with geometric accuracy and no distortion. It is acquired through a top view of the scene and finds widespread applications in map creation, planning, and related fields. This paper classifies the algorithms for digital orthophoto generation into two groups: explicit methods and implicit methods. Explicit methods rely on traditional geometric methods, obtaining geometric structure presented with explicit parameters with Multi-View Stereo (MVS) theories, as seen in our proposed Top view constrained Dense Matching (TDM). Implicit methods rely on neural rendering, obtaining implicit neural representation of scenes through the training of neural networks, as exemplified by Neural Radiance Fields (NeRFs). Both of them obtain digital orthophotos via rendering from a top-view perspective. In addition, this paper conducts an in-depth comparative study between explicit and implicit methods. The experiments demonstrate that both algorithms meet the measurement accuracy requirements and exhibit a similar level of quality in terms of generated results. Importantly, the explicit method shows a significant advantage in terms of efficiency, with a time consumption reduction of two orders of magnitude under our latest Compute Unified Device Architecture (CUDA) version TDM algorithm. Although explicit and implicit methods differ significantly in their representation forms, they share commonalities in the implementation across algorithmic stages. These findings highlight the potential advantages of explicit methods in orthophoto generation while also providing beneficial references and practical guidance for fast digital orthophoto generation using implicit methods. Full article
Show Figures

Figure 1

16 pages, 1302 KB  
Article
Acceleration of Hyperspectral Skin Cancer Image Classification through Parallel Machine-Learning Methods
by Bernardo Petracchi, Emanuele Torti, Elisa Marenzi and Francesco Leporati
Sensors 2024, 24(5), 1399; https://doi.org/10.3390/s24051399 - 21 Feb 2024
Cited by 11 | Viewed by 3246
Abstract
Hyperspectral imaging (HSI) has become a very compelling technique in different scientific areas; indeed, many researchers use it in the fields of remote sensing, agriculture, forensics, and medicine. In the latter, HSI plays a crucial role as a diagnostic support and for surgery [...] Read more.
Hyperspectral imaging (HSI) has become a very compelling technique in different scientific areas; indeed, many researchers use it in the fields of remote sensing, agriculture, forensics, and medicine. In the latter, HSI plays a crucial role as a diagnostic support and for surgery guidance. However, the computational effort in elaborating hyperspectral data is not trivial. Furthermore, the demand for detecting diseases in a short time is undeniable. In this paper, we take up this challenge by parallelizing three machine-learning methods among those that are the most intensively used: Support Vector Machine (SVM), Random Forest (RF), and eXtreme Gradient Boosting (XGB) algorithms using the Compute Unified Device Architecture (CUDA) to accelerate the classification of hyperspectral skin cancer images. They all showed a good performance in HS image classification, in particular when the size of the dataset is limited, as demonstrated in the literature. We illustrate the parallelization techniques adopted for each approach, highlighting the suitability of Graphical Processing Units (GPUs) to this aim. Experimental results show that parallel SVM and XGB algorithms significantly improve the classification times in comparison with their serial counterparts. Full article
Show Figures

Figure 1

Back to TopTop