Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (254)

Search Parameters:
Keywords = multiple CPUs

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
16 pages, 5273 KB  
Article
Fog Computing and Graph-Based Databases for Remote Health Monitoring in IoMT Settings
by Karrar A. Yousif, Jorge Calvillo-Arbizu and Agustín W. Lara-Romero
IoT 2025, 6(4), 76; https://doi.org/10.3390/iot6040076 - 3 Dec 2025
Viewed by 204
Abstract
Remote patient monitoring is a promising and transformative pillar of healthcare. However, deploying such systems at a scale—across thousands of patients and Internet of Medical Things (IoMT) devices—demands robust, low-latency, and scalable storage systems. This research examines the application of Fog Computing for [...] Read more.
Remote patient monitoring is a promising and transformative pillar of healthcare. However, deploying such systems at a scale—across thousands of patients and Internet of Medical Things (IoMT) devices—demands robust, low-latency, and scalable storage systems. This research examines the application of Fog Computing for remote patient monitoring in IoMT settings, where a large volume of data, low latency, and secure management of confidential healthcare information are essential. We propose a four-layer IoMT–Fog–Cloud architecture in which Fog nodes, equipped with graph-based databases (Neo4j), conduct local processing, filtering, and integration of heterogeneous health data before transmitting it to cloud servers. To assess the viability of our approach, we implemented a containerised Fog node and simulated multiple patient-device networks using a real-world dataset. System performance was evaluated using 11 scenarios with varying numbers of devices and data transmission frequencies. Performance metrics include CPU load, memory footprint, and query latency. The results demonstrate that Neo4j can efficiently ingest and query millions of health observations with an acceptable latency of less than 500 ms, even in extreme scenarios involving more than 12,000 devices transmitting data every 50 ms. The resource consumption remained well below the critical thresholds, highlighting the suitability of the proposed approach for Fog nodes. Combining Fog computing and Neo4j is a novel approach that meets the latency and real-time data ingestion requirements of IoMT environments. Therefore, it is suitable for supporting delay-sensitive monitoring programmes, where rapid detection of anomalies is critical (e.g., a prompt response to cardiac emergencies or early detection of respiratory deterioration in patients with chronic obstructive pulmonary disease), even at a large scale. Full article
(This article belongs to the Special Issue IoT-Based Assistive Technologies and Platforms for Healthcare)
Show Figures

Figure 1

24 pages, 15285 KB  
Article
An Efficient and Accurate UAV State Estimation Method with Multi-LiDAR–IMU–Camera Fusion
by Junfeng Ding, Pei An, Kun Yu, Tao Ma, Bin Fang and Jie Ma
Drones 2025, 9(12), 823; https://doi.org/10.3390/drones9120823 - 27 Nov 2025
Viewed by 319
Abstract
State estimation plays a vital role in UAV navigation and control. With the continuous decrease in sensor cost and size, UAVs equipped with multiple LiDARs, Inertial Measurement Units (IMUs), and cameras have attracted increasing attention. Such systems can acquire rich environmental and motion [...] Read more.
State estimation plays a vital role in UAV navigation and control. With the continuous decrease in sensor cost and size, UAVs equipped with multiple LiDARs, Inertial Measurement Units (IMUs), and cameras have attracted increasing attention. Such systems can acquire rich environmental and motion information from multiple perspectives, thereby enabling more precise navigation and mapping in complex environments. However, efficiently utilizing multi-sensor data for state estimation remains challenging. There is a complex coupling relationship between IMUs’ bias and UAV state. To address these challenges, this paper proposes an efficient and accurate UAV state estimation method tailored for multi-LiDAR–IMU–camera systems. Specifically, we first construct an efficient distributed state estimation model. It decomposes the multi-LiDAR–IMU–camera system into a series of single LiDAR–IMU–camera subsystems, reformulating the complex coupling problem as an efficient distributed state estimation problem. Then, we derive an accurate feedback function to constrain and optimize the UAV state using estimated subsystem states, thus enhancing overall estimation accuracy. Based on this model, we design an efficient distributed state estimation algorithm with multi-LiDAR-IMU-Camerafusion, termed DLIC. DLIC achieves robust multi-sensor data fusion via shared feature maps, effectively improving both estimation robustness and accuracy. In addition, we design an accelerated image-to-point cloud registration module (A-I2P) to provide reliable visual measurements, further boosting state estimation efficiency. Extensive experiments are conducted on 18 real-world indoor and outdoor scenarios from the public NTU VIRAL dataset. The results demonstrate that DLIC consistently outperforms existing multi-sensor methods across key evaluation metrics, including RMSE, MAE, SD, and SSE. More importantly, our method runs in real time on a resource-constrained embedded device equipped with only an 8-core CPU, while maintaining low memory consumption. Full article
(This article belongs to the Special Issue Advances in Guidance, Navigation, and Control)
Show Figures

Figure 1

10 pages, 1336 KB  
Article
GWAS Reveals Stable Genetic Loci and Candidate Genes for Grain Protein Content in Wheat
by Yuxuan Zhao, Renjie Wang, Keling Tu, Yi Hong, Feifei Wang, Juan Zhu, Chao Lv, Rugen Xu and Baojian Guo
Curr. Issues Mol. Biol. 2025, 47(12), 981; https://doi.org/10.3390/cimb47120981 - 25 Nov 2025
Viewed by 396
Abstract
Grain protein content (GPC) is a key quality trait in wheat, determining both nutritional value and end-use functionality, yet its genetic architecture is complex and highly influenced by the environment. In this study, a diverse panel of 327 wheat accessions was evaluated for [...] Read more.
Grain protein content (GPC) is a key quality trait in wheat, determining both nutritional value and end-use functionality, yet its genetic architecture is complex and highly influenced by the environment. In this study, a diverse panel of 327 wheat accessions was evaluated for GPC across multiple environments. Significant phenotypic variation was observed, with best linear unbiased estimates (BLUEs) ranging from 12.80% to 18.79%, and a moderate broad-sense heritability (h2 = 0.52) was estimated. Genotype-by-environment interactions were highly significant. Genome-wide association analysis using the FarmCPU model identified seven stable quantitative trait nucleotides (QTNs) associated with GPC on chromosomes 1A, 1B, 2A, 2D, 3B, 5A, and 6A. Among these, QGpc.yzu-2A was consistently detected in three environments. Further analysis of the QGpc.yzu-2A region identified 26 annotated genes, 8 of which were expressed in grains. One gene, TraesCS2A02G473000 (RNA-binding protein), exhibited high nucleotide diversity and is a strong candidate for functional validation. Additionally, QGpc.yzu-6A co-localized with the known TaNAM-6A gene, reinforcing the role of this region in GPC regulation. This study provides valuable insights into the genetic basis of GPC in wheat and offers molecular markers and candidate genes for marker-assisted selection to improve grain protein content in breeding programs. Full article
(This article belongs to the Section Molecular Plant Sciences)
Show Figures

Figure 1

25 pages, 9168 KB  
Article
A Resilient Deep Learning Framework for Mobile Malware Detection: From Architecture to Deployment
by Aysha Alfaw, Mohsen Rouached and Aymen Akremi
Future Internet 2025, 17(12), 532; https://doi.org/10.3390/fi17120532 - 21 Nov 2025
Viewed by 587
Abstract
Mobile devices are frequent targets of malware due to the large volume of sensitive personal, financial, and corporate data they process. Traditional static, dynamic, and hybrid analysis methods are increasingly insufficient against evolving threats. This paper proposes a resilient deep learning framework for [...] Read more.
Mobile devices are frequent targets of malware due to the large volume of sensitive personal, financial, and corporate data they process. Traditional static, dynamic, and hybrid analysis methods are increasingly insufficient against evolving threats. This paper proposes a resilient deep learning framework for Android malware detection, integrating multiple models and a CPU-aware selection algorithm to balance accuracy and efficiency on mobile devices. Two benchmark datasets (i.e., the Android Malware Dataset for Machine Learning and CIC-InvesAndMal2019) were used to evaluate five deep learning models: DNN, CNN, RNN, LSTM, and CNN-LSTM. The results show that CNN-LSTM achieves the highest detection accuracy of 97.4% on CIC-InvesAndMal2019, while CNN delivers strong accuracy of 98.07%, with the lowest CPU usage (5.2%) on the Android Dataset, making it the most practical for on-device deployment. The framework is implemented as an Android application using TensorFlow Lite, providing near-real-time malware detection with an inference time of under 150 ms and memory usage below 50 MB. These findings confirm the effectiveness of deep learning for mobile malware detection and demonstrate the feasibility of deploying resilient detection systems on resource-constrained devices. Full article
(This article belongs to the Special Issue Cybersecurity in the Age of AI, IoT, and Edge Computing)
Show Figures

Figure 1

18 pages, 1628 KB  
Article
VSwap: A New Extension to the Swap Mechanism for Enabling Swap Memory Space Optimization
by Gyupin Moon and Donghyun Kang
Appl. Sci. 2025, 15(22), 12049; https://doi.org/10.3390/app152212049 - 12 Nov 2025
Viewed by 433
Abstract
The memory demand of modern applications has been rapidly increasing with the continuous growth of data volume across industrial and academic domains. As a result, computing devices (i.e., IoT devices, smartphones, and tablets) often experience memory shortages that degrade system performance and quality [...] Read more.
The memory demand of modern applications has been rapidly increasing with the continuous growth of data volume across industrial and academic domains. As a result, computing devices (i.e., IoT devices, smartphones, and tablets) often experience memory shortages that degrade system performance and quality of service by wasting CPU cycles and energy. Thus, most operating systems rely on the swap mechanism to mitigate the memory shortage situation in advance, even if the swap memory fragmentation problem occurs over time. In this paper, we analyze the fragmentation behavior of the swap memory space within storage devices over time and demonstrate that the latency of swap operations increases significantly under aged conditions. We also propose a new extension of the traditional swap mechanism, called VSwap, that mitigates the swap memory fragmentation problem in advance by introducing two core techniques, virtual migration and address remapping. In VSwap, virtual migration gathers valid swap pages scattered across multiple clusters into contiguous regions within the swap memory space, while address remapping updates the corresponding page table entries to preserve consistency after migration. For experiments, we enable VSwap on the traditional swap mechanism (i.e., kswapd) by implementing it with simple code modifications. To confirm the effectiveness of VSwap, we performed a comprehensive evaluation based on various workloads. Our evaluation results confirm that VSwap is more useful and highly valuable than the original swap mechanism. In particular, VSwap improves the overall performance up to 48.18% by harvesting available swap memory space in advance with negligible overhead; it performs close to the ideal performance. Full article
Show Figures

Figure 1

20 pages, 2087 KB  
Article
Automatic Sparse Matrix Format Selection via Dynamic Labeling and Clustering on Heterogeneous CPU–GPU Systems
by Zheng Shi, Yi Zou and Xianfeng Song
Electronics 2025, 14(19), 3895; https://doi.org/10.3390/electronics14193895 - 30 Sep 2025
Viewed by 420
Abstract
Sparse matrix–vector multiplication (SpMV) is a fundamental kernel in high-performance computing (HPC) whose efficiency depends heavily on the storage format across central processing unit (CPU) and graphics processing unit (GPU) platforms. Conventional supervised approaches often use execution time as training labels, but our [...] Read more.
Sparse matrix–vector multiplication (SpMV) is a fundamental kernel in high-performance computing (HPC) whose efficiency depends heavily on the storage format across central processing unit (CPU) and graphics processing unit (GPU) platforms. Conventional supervised approaches often use execution time as training labels, but our experiments on 1786 matrices reveal two issues: labels are unstable across runs due to execution-time variability, and single-label assignment overlooks cases where multiple formats perform similarly well. We propose a dynamic labeling strategy that assigns a single label when the fastest format shows clear superiority, and multiple labels when performance differences are small, thereby reducing label noise. We further extend feature analysis to multi-dimensional structural descriptors and apply clustering to refine label distributions and enhance prediction robustness. Experiments demonstrate 99.2% accuracy in hardware (CPU/GPU) selection and up to 98.95% accuracy in format prediction, with up to 10% robustness gains over traditional methods. Under cost-aware, end-to-end evaluation that accounts for feature extraction, prediction, conversion, and kernel execution, CPUs achieve speedups up to 3.15× and GPUs up to 1.94× over a CSR baseline. Cross-round evaluations confirm stability and generalization, providing a reliable path toward automated, cross-platform SpMV optimization. Full article
Show Figures

Figure 1

13 pages, 2046 KB  
Article
High-Resolution Hogel Image Generation Using GPU Acceleration
by Hyunmin Kang, Byungjoon Kim and Yongduek Seo
Photonics 2025, 12(9), 882; https://doi.org/10.3390/photonics12090882 - 1 Sep 2025
Viewed by 820
Abstract
A holographic stereogram displays reconstructed 3D images by rearranging multiple 2D viewpoint images into small holographic pixels (hogels). However, conventional CPU-based hogel generation processes these images sequentially, causing computation times to soar with as the resolution and number of viewpoints increase, which makes [...] Read more.
A holographic stereogram displays reconstructed 3D images by rearranging multiple 2D viewpoint images into small holographic pixels (hogels). However, conventional CPU-based hogel generation processes these images sequentially, causing computation times to soar with as the resolution and number of viewpoints increase, which makes real-time implementation difficult. In this study, we introduce a GPU-accelerated parallel processing method to speed up the generation of high-resolution hogel images and achieve near-real-time performance. Specifically, we implement the pixel-rearrangement algorithm for multiple viewpoint images as a CUDA-based GPU kernel, designing it so that thousands of threads process individual pixels simultaneously. We also optimize CPU–GPU data transfers and improve memory access efficiency to maximize GPU parallel performance. The experimental results show that the proposed method achieves over a 5× speedup compared to the CPU across resolutions from FHD to 8K while maintaining output image quality equivalent to that of the CPU approach. Notably, we confirm near-real-time performance by processing large-scale 8K resolution with 16 viewpoints in just tens of milliseconds. This achievement significantly alleviates the computational bottleneck in large-scale holographic image synthesis, bringing real-time 3D holographic displays one step closer to realization. Furthermore, the proposed GPU acceleration technique is expected to serve as a foundational technology for real-time high-resolution hogel image generation in next-generation immersive display devices such as AR/VR/XR. Full article
(This article belongs to the Special Issue Holographic Information Processing)
Show Figures

Figure 1

8 pages, 921 KB  
Proceeding Paper
Design of Complementary Metal–Oxide–Semiconductor Encoder/Decoder with Compact Circuit Structure for Booth Multiplier
by Yu-Nsin Wang and Yu-Cherng Hung
Eng. Proc. 2025, 103(1), 21; https://doi.org/10.3390/engproc2025103021 - 1 Sep 2025
Viewed by 574
Abstract
Multipliers are crucial components in digital processing and the arithmetic logic unit (ALU) of central processing unit (CPU) design. As the data bit length increases, the number of partial products in the multiplication process increases, resulting in an increased summation time for the [...] Read more.
Multipliers are crucial components in digital processing and the arithmetic logic unit (ALU) of central processing unit (CPU) design. As the data bit length increases, the number of partial products in the multiplication process increases, resulting in an increased summation time for the partial products. Consequently, the speed of the multiplier circuit is adversely affected by increased time delays. In this article, we present a combined radix-4 Booth encoding module that employs metal–oxide–semiconductor (MOS) transistors that share common control signals to reduce the transistor count. In HSPICE simulations, the functionality of the proposed circuit architecture was verified, and the number of transistors used was successfully reduced. Full article
(This article belongs to the Proceedings of The 8th Eurasian Conference on Educational Innovation 2025)
Show Figures

Figure 1

16 pages, 1684 KB  
Article
Adaptive Feature- and Scale-Based Object Tracking with Correlation Filters for Resource-Constrained End Devices in the IoT
by Shengjie Li, Kaiwen Kang, Shuai Zhao, Bo Cheng and Junliang Chen
Sensors 2025, 25(16), 5025; https://doi.org/10.3390/s25165025 - 13 Aug 2025
Viewed by 681
Abstract
Sixth-generation (6G) wireless technology has facilitated the rapid development of the Internet of Things (IoT), enabling various end devices to be deployed in applications such as wireless multimedia sensor networks. However, most end devices encounter difficulties when dealing a large amount of IoT [...] Read more.
Sixth-generation (6G) wireless technology has facilitated the rapid development of the Internet of Things (IoT), enabling various end devices to be deployed in applications such as wireless multimedia sensor networks. However, most end devices encounter difficulties when dealing a large amount of IoT video data due to their lack of computational resources for visual object tracking. Discriminative correlation filter (DCF)-based tracking approaches possess favorable properties for resource-constrained end devices, such as low computational costs and robustness to motion blur and illumination variations. Most current DCF trackers employ multiple features and the spatial–temporal scale space to estimate the target state, both of which may be suboptimal due to their fixed feature dimensions and dense scale intervals. In this paper, we present an adaptive mapped-feature and scale-interval method based on DCF to alleviate the problem of suboptimality. Specifically, we propose an adaptive mapped-feature response based on dimensionality reduction and histogram score maps to integrate multiple features and boost tracking effectiveness. Moreover, an adaptive temporal scale estimation method with sparse intervals is proposed to further improve tracking efficiency. Extensive experiments on the DTB70, UAV112, UAV123@10fps and UAVDT datasets demonstrate the superiority of our method, with a running speed of 41.3 FPS on a cheap CPU, compared to state-of-the-art trackers. Full article
(This article belongs to the Section Internet of Things)
Show Figures

Figure 1

18 pages, 1582 KB  
Article
Design of an ASIC Vector Engine for a RISC-V Architecture
by Miguel Bucio-Macías, Luis Pizano-Escalante and Omar Longoria-Gandara
Chips 2025, 4(3), 33; https://doi.org/10.3390/chips4030033 - 5 Aug 2025
Viewed by 2550
Abstract
Nowadays, Graphical Processor Units (GPUs) are a great technology to implement Artificial Intelligence (AI) processes; however, a challenge arises when the inclusion of a GPU is not feasible due to the cost, power consumption, or the size of the hardware. This issue is [...] Read more.
Nowadays, Graphical Processor Units (GPUs) are a great technology to implement Artificial Intelligence (AI) processes; however, a challenge arises when the inclusion of a GPU is not feasible due to the cost, power consumption, or the size of the hardware. This issue is particularly relevant for portable devices, such as laptops or smartphones, where the inclusion of a dedicated GPU is not the best option. One possible solution to that problem is the use of a CPU with AI capabilities, i.e., parallelism and high performance. In particular, RISC-V architecture is considered a good open-source candidate to support such tasks. These capabilities are based on vector operations that, by definition, operate over many elements at the same time, allowing for the execution of SIMD instructions that can be used to implement typical AI routines and procedures. In this context, the main purpose of this proposal is to develop an ASIC Vector Engine RISC-V architecture compliant that implements a minimum set of the Vector Extension capable of the parallel processing of multiple data elements with a single instruction. These instructions operate on vectors and involve addition, multiplication, logical, comparison, and permutation operations. Especially, the multiplication was implemented using the Vedic multiplication algorithm. Contributions include the description of the design, synthesis, and validation processes to develop the ASIC, and a performance comparison between the FPGA implementation and the ASIC using different nanometric technologies, where the best performance of 110 MHz, and the best implementation in terms of silicon area, was achieved by 7 nm technology. Full article
Show Figures

Figure 1

22 pages, 5625 KB  
Article
Computer Vision-Based Multiple-Width Measurements for Agricultural Produce
by Cannayen Igathinathane, Rangaraju Visvanathan, Ganesh Bora and Shafiqur Rahman
AgriEngineering 2025, 7(7), 204; https://doi.org/10.3390/agriengineering7070204 - 1 Jul 2025
Viewed by 1234
Abstract
The most common size measurements for agricultural produce, including fruits and vegetables, are length and width. While the length of any agricultural produce can be unique, the width varies continuously along its length. Single-width measurements alone are insufficient for accurately characterizing varying width [...] Read more.
The most common size measurements for agricultural produce, including fruits and vegetables, are length and width. While the length of any agricultural produce can be unique, the width varies continuously along its length. Single-width measurements alone are insufficient for accurately characterizing varying width profiles, resulting in an inaccurate representation of the shape or mean dimension. Consequently, the manual measurement of multiple mean dimensions is laborious or impractical, and no information in this domain is available. Therefore, an efficient alternative computer vision measurement tool was developed utilizing ImageJ (Ver. 1.54p). Twenty sample sets, comprising fruits and vegetables, with each representing different shapes, were selected and measured for length and multiple widths. A statistically significant minimum number of multiple widths was determined for practical measurements based on an object’s shape. The “aspect ratio” (width/length) was identified to serve as an effective indicator of the minimum multiple width measurements. In general, 50 multiple width measurements are recommended; however, even 15 measurements would be satisfactory (1.0%±0.6% deviation from 50 widths). The developed plugin was fast (734 ms ± 365 ms CPU time/image), accurate (>99.6%), and cost-effective, and it incorporated several user-friendly and helpful features. This study’s outcomes have practical applications in the characterization, quality control, grading and sorting, and pricing determination of agricultural produce. Full article
Show Figures

Graphical abstract

26 pages, 1929 KB  
Article
PASS: A Flexible Programmable Framework for Building Integrated Security Stack in Public Cloud
by Wenwen Fu, Jinli Yan, Jian Zhang, Yinhan Sun, Yong Wang, Ziwen Zhang, Qianming Yang and Yongwen Wang
Electronics 2025, 14(13), 2650; https://doi.org/10.3390/electronics14132650 - 30 Jun 2025
Viewed by 765
Abstract
Integrated security stacks, which offer diverse security function chains in a single device, hold substantial potential to satisfy the security requirements of multiple tenants on a public cloud. However, it is difficult for the software-only or hardware-customized security stack to establish a good [...] Read more.
Integrated security stacks, which offer diverse security function chains in a single device, hold substantial potential to satisfy the security requirements of multiple tenants on a public cloud. However, it is difficult for the software-only or hardware-customized security stack to establish a good tradeoff between performance and flexibility. SmartNIC overcomes these limitations by providing a programmable platform for implementing these functions with hardware acceleration. Significantly, without a professional CPU/SmartNIC co-design, developing security function chains from scratch with low-level APIs is challenging and tedious for network operators. This paper presents PASS, a flexible programmable framework for the fast development of high-performance security stacks with SmartNIC acceleration. In the data plane, PASS provides modular abstractions to extract the shared security logic and eliminate redundant operations by reusing the intermediate results with the customized metadata. In the control plane, PASS offloads the tedious security policy conversion to the proposed security auxiliary plane. With well-defined APIs, developers only need to focus on the core logic instead of labor-intensive shared logic. We built a PASS prototype based on a CPU-FPGA platform and developed three typical security components. Compared to implementation from scratch, PASS reduces the code by 65% on average. Additionally, PASS improves security processing performance by 76% compared to software-only implementations and optimizes the latency of policy translation and distribution by 90% versus the architecture without offloading. Full article
Show Figures

Figure 1

21 pages, 6865 KB  
Article
Elegante+: A Machine Learning-Based Optimization Framework for Sparse Matrix–Vector Computations on the CPU Architecture
by Muhammad Ahmad, Sardar Usman, Ameer Hamza, Muhammad Muzamil and Ildar Batyrshin
Information 2025, 16(7), 553; https://doi.org/10.3390/info16070553 - 29 Jun 2025
Viewed by 840
Abstract
Sparse matrix–vector multiplication (SpMV) plays a significant role in the computational costs of many scientific applications such as 2D/3D robotics, power network problems, and computer vision. Numerous implementations using different sparse matrix formats have been introduced to optimize this kernel on CPUs and [...] Read more.
Sparse matrix–vector multiplication (SpMV) plays a significant role in the computational costs of many scientific applications such as 2D/3D robotics, power network problems, and computer vision. Numerous implementations using different sparse matrix formats have been introduced to optimize this kernel on CPUs and GPUs. However, due to the sparsity patterns of matrices and the diverse configurations of hardware, accurately modeling the performance of SpMV remains a complex challenge. SpMV computation is often a time-consuming process because of its sparse matrix structure. To address this, we propose a machine learning-based tool, namely Elegante+, that predicts optimal scheduling policies by analyzing matrix structures. This approach eliminates the need for repetitive trial and error, minimizes errors, and finds the best solution of the SpMV kernel, which enables users to make informed decisions about scheduling policies that maximize computational efficiency. For this purpose, we collected 1000+ sparse matrices from the SuiteSparse matrix market collection and converted them into the compressed sparse row (CSR) format, and SpMV computation was performed by extracting 14 key sparse matrix features. After creating a comprehensive dataset, we trained various machine learning models to predict the optimal scheduling policy, significantly enhancing the computational efficiency and reducing the overhead in high-performance computing environments. Our proposed tool, Elegante+ (XGB with all SpMV features), achieved the highest cross-validation score of 79% and performed five times faster than the default scheduling policy during SpMV in a high-performance computing (HPC) environment. Full article
Show Figures

Graphical abstract

22 pages, 40818 KB  
Article
Real-Time Cloth Simulation in Extended Reality: Comparative Study Between Unity Cloth Model and Position-Based Dynamics Model with GPU
by Taeheon Kim, Jun Ma and Min Hong
Appl. Sci. 2025, 15(12), 6611; https://doi.org/10.3390/app15126611 - 12 Jun 2025
Cited by 1 | Viewed by 2921
Abstract
This study proposes a GPU-accelerated Position-Based Dynamics (PBD) system for realistic and interactive cloth simulation in Extended Reality (XR) environments, and comprehensively evaluates its performance and functional capabilities on standalone XR devices, such as the Meta Quest 3. To overcome the limitations of [...] Read more.
This study proposes a GPU-accelerated Position-Based Dynamics (PBD) system for realistic and interactive cloth simulation in Extended Reality (XR) environments, and comprehensively evaluates its performance and functional capabilities on standalone XR devices, such as the Meta Quest 3. To overcome the limitations of traditional CPU-based physics simulations, we designed and optimized highly parallelized algorithms utilizing Unity’s Compute Shader framework. The proposed system achieves real-time performance by implementing efficient collision detection and response handling with complex environmental meshes (RoomMesh) and dynamic hand meshes (HandMesh), as well as capsule colliders based on hand skeleton tracking (OVRSkeleton). Performance evaluations were conducted for both single-sided and double-sided cloth configurations across multiple resolutions. At a 32 × 32 resolution, both configurations maintained stable frame rates of approximately 72 FPS. At a 64 × 64 resolution, the single-sided cloth achieved around 65 FPS, while the double-sided configuration recorded approximately 40 FPS, demonstrating scalable quality adaptation depending on application requirements. Functionally, the GPU-PBD system significantly surpasses Unity’s built-in Cloth component by supporting double-sided cloth rendering, fine-grained constraint control, complex mesh-based collision handling, and real-time interaction with both hand meshes and capsule colliders. These capabilities enable immersive and physically plausible XR experiences, including natural cloth draping, grasping, and deformation behaviors during user interactions. The technical advantages of the proposed system suggest strong applicability in various XR fields, such as virtual clothing fitting, medical training simulations, educational content, and interactive art installations. Future work will focus on extending the framework to general deformable body simulation, incorporating advanced material modeling, self-collision response, and dynamic cutting simulation, thereby enhancing both realism and scalability in XR environments. Full article
(This article belongs to the Special Issue New Insights into Computer Vision and Graphics)
Show Figures

Figure 1

42 pages, 6696 KB  
Article
Design, Implementation and Practical Energy-Efficiency Evaluation of a Blockchain Based Academic Credential Verification System for Low-Power Nodes
by Gabriel Fernández-Blanco, Iván Froiz-Míguez, Paula Fraga-Lamas and Tiago M. Fernández-Caramés
Appl. Sci. 2025, 15(12), 6596; https://doi.org/10.3390/app15126596 - 12 Jun 2025
Cited by 3 | Viewed by 1454
Abstract
The educational system manages extensive documentation and paperwork, which can lead to human errors and sometimes abuse or fraud, such as the falsification of diplomas, certificates or other credentials. In fact, in recent years, multiple cases of fraud have been detected, representing a [...] Read more.
The educational system manages extensive documentation and paperwork, which can lead to human errors and sometimes abuse or fraud, such as the falsification of diplomas, certificates or other credentials. In fact, in recent years, multiple cases of fraud have been detected, representing a significant cost to society, since fraud harms the trustworthiness of certificates and academic institutions. To tackle such an issue, this article proposes a solution aimed at recording and verifying academic records through a decentralized application that is supported by a smart contract deployed in the Ethereum blockchain and by a decentralized storage system based on Inter-Planetary File System (IPFS). The proposed solution is evaluated in terms of performance and energy efficiency, comparing the results obtained with a traditional Proof-of-Work (PoW) consensus protocol and the new Proof-of-Authority (PoA) protocol. The results shown in this paper indicate that the latter is clearly greener and demands less CPU load. Moreover, this article compares the performance of a traditional computer and two Single-Board Computers (SBCs) (a Raspberry Pi 4 and an Orange Pi One), showing that is possible to make use of the latter low-power devices to implement blockchain nodes but at the cost of higher response latency. Furthermore, the impact of Ethereum gas limit is evaluated, demonstrating its significant influence on the blockchain network performance. Thus, this article provides guidelines, useful practical evaluations and key findings that will help the next generation of green blockchain developers and researchers. Full article
Show Figures

Figure 1

Back to TopTop