Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (10)

Search Parameters:
Keywords = CPU-GPU hybrid platform

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
31 pages, 4877 KB  
Article
Fast Fractal Image Compression Using Non-Uniform Partition
by ManLong Li and KinTak U
Electronics 2026, 15(5), 922; https://doi.org/10.3390/electronics15050922 - 25 Feb 2026
Viewed by 433
Abstract
Fractal image compression achieves high compression ratios but suffers from prohibitively long encoding times and limited reconstruction quality. To address these limitations, we propose fast fractal image compression using non-uniform partition (FFICNUP), a hybrid algorithm that adaptively partitions range blocks (R-blocks) and domain [...] Read more.
Fractal image compression achieves high compression ratios but suffers from prohibitively long encoding times and limited reconstruction quality. To address these limitations, we propose fast fractal image compression using non-uniform partition (FFICNUP), a hybrid algorithm that adaptively partitions range blocks (R-blocks) and domain blocks (D-blocks) based on local texture and edge content. Smaller R-blocks are employed in texture-rich regions or edge-dense areas to preserve fine details, while larger R-blocks are adopted in smooth regions to accelerate encoding. By integrating a Task-Serial Workflow with Data-Parallel Vectorization and adaptive block partitioning, FFICNUP substantially accelerates both encoding and decoding processes while enhancing reconstruction fidelity and compression ratios. Experimental results demonstrate that the proposed FFICNUP method significantly outperforms conventional fractal image compression (FIC) approaches. By leveraging vectorized parallelization, the proposed FFICNUP achieves state-of-the-art (SOTA) decoding speed with a 14× acceleration, reduces encoding latency by three orders of magnitude, improves the Peak Signal-to-Noise Ratio (PSNR) by up to 5.19 dB, and attains a compression ratio 2.21 times higher than that of conventional FIC. Validated across both CPU and GPU platforms, FFICNUP dynamically balances encoding speed, reconstruction quality, compression ratio, and latency across varying image sizes, demonstrating its suitability for practical engineering applications. Full article
(This article belongs to the Section Artificial Intelligence)
Show Figures

Figure 1

43 pages, 2828 KB  
Article
Efficient Hybrid Parallel Scheme for Caputo Time-Fractional PDEs on Multicore Architectures
by Mudassir Shams and Bruno Carpentieri
Fractal Fract. 2025, 9(9), 607; https://doi.org/10.3390/fractalfract9090607 - 19 Sep 2025
Cited by 1 | Viewed by 1097
Abstract
We present a hybrid parallel scheme for efficiently solving Caputo time-fractional partial differential equations (CTFPDEs) with integer-order spatial derivatives on multicore CPU and GPU platforms. The approach combines a second-order spatial discretization with the L1 time-stepping scheme and employs MATLAB parfor parallelization [...] Read more.
We present a hybrid parallel scheme for efficiently solving Caputo time-fractional partial differential equations (CTFPDEs) with integer-order spatial derivatives on multicore CPU and GPU platforms. The approach combines a second-order spatial discretization with the L1 time-stepping scheme and employs MATLAB parfor parallelization to achieve significant reductions in runtime and memory usage. A theoretical third-order convergence rate is established under smooth-solution assumptions, and the analysis also accounts for the loss of accuracy near the initial time t=t0 caused by weak singularities inherent in time-fractional models. Unlike many existing approaches that rely on locally convergent strategies, the proposed method ensures global convergence even for distant or randomly chosen initial guesses. Benchmark problems from fractional biological models—including glucose–insulin regulation, tumor growth under chemotherapy, and drug diffusion in tissue—are used to validate the robustness and reliability of the scheme. Numerical experiments confirm near-linear speedup on up to four CPU cores and show that the method outperforms conventional techniques in terms of convergence rate, residual error, iteration count, and efficiency. These results demonstrate the method’s suitability for large-scale CTFPDE simulations in scientific and engineering applications. Full article
Show Figures

Figure 1

21 pages, 3448 KB  
Article
A Welding Defect Detection Model Based on Hybrid-Enhanced Multi-Granularity Spatiotemporal Representation Learning
by Chenbo Shi, Shaojia Yan, Lei Wang, Changsheng Zhu, Yue Yu, Xiangteng Zang, Aiping Liu, Chun Zhang and Xiaobing Feng
Sensors 2025, 25(15), 4656; https://doi.org/10.3390/s25154656 - 27 Jul 2025
Cited by 1 | Viewed by 2052
Abstract
Real-time quality monitoring using molten pool images is a critical focus in researching high-quality, intelligent automated welding. To address interference problems in molten pool images under complex welding scenarios (e.g., reflected laser spots from spatter misclassified as porosity defects) and the limited interpretability [...] Read more.
Real-time quality monitoring using molten pool images is a critical focus in researching high-quality, intelligent automated welding. To address interference problems in molten pool images under complex welding scenarios (e.g., reflected laser spots from spatter misclassified as porosity defects) and the limited interpretability of deep learning models, this paper proposes a multi-granularity spatiotemporal representation learning algorithm based on the hybrid enhancement of handcrafted and deep learning features. A MobileNetV2 backbone network integrated with a Temporal Shift Module (TSM) is designed to progressively capture the short-term dynamic features of the molten pool and integrate temporal information across both low-level and high-level features. A multi-granularity attention-based feature aggregation module is developed to select key interference-free frames using cross-frame attention, generate multi-granularity features via grouped pooling, and apply the Convolutional Block Attention Module (CBAM) at each granularity level. Finally, these multi-granularity spatiotemporal features are adaptively fused. Meanwhile, an independent branch utilizes the Histogram of Oriented Gradient (HOG) and Scale-Invariant Feature Transform (SIFT) features to extract long-term spatial structural information from historical edge images, enhancing the model’s interpretability. The proposed method achieves an accuracy of 99.187% on a self-constructed dataset. Additionally, it attains a real-time inference speed of 20.983 ms per sample on a hardware platform equipped with an Intel i9-12900H CPU and an RTX 3060 GPU, thus effectively balancing accuracy, speed, and interpretability. Full article
(This article belongs to the Topic Applied Computing and Machine Intelligence (ACMI))
Show Figures

Figure 1

31 pages, 741 KB  
Article
Inspiring from Galaxies to Green AI in Earth: Benchmarking Energy-Efficient Models for Galaxy Morphology Classification
by Vasileios Alevizos, Emmanouil V. Gkouvrikos, Ilias Georgousis, Sotiria Karipidou and George A. Papakostas
Algorithms 2025, 18(7), 399; https://doi.org/10.3390/a18070399 - 28 Jun 2025
Viewed by 1400
Abstract
Recent advancements in space exploration have significantly increased the volume of astronomical data, heightening the demand for efficient analytical methods. Concurrently, the considerable energy consumption of machine learning (ML) has fostered the emergence of Green AI, emphasizing sustainable, energy-efficient computational practices. We introduce [...] Read more.
Recent advancements in space exploration have significantly increased the volume of astronomical data, heightening the demand for efficient analytical methods. Concurrently, the considerable energy consumption of machine learning (ML) has fostered the emergence of Green AI, emphasizing sustainable, energy-efficient computational practices. We introduce the first large-scale Green AI benchmark for galaxy morphology classification, evaluating over 30 machine learning architectures (classical, ensemble, deep, and hybrid) on CPU and GPU platforms using a balanced subset of the Galaxy Zoo dataset. Beyond traditional metrics (precision, recall, and F1-score), we quantify inference latency, energy consumption, and carbon-equivalent emissions to derive an integrated EcoScore that captures the trade-off between predictive performance and environmental impact. Our results reveal that a GPU-optimized multilayer perceptron achieves state-of-the-art accuracy of 98% while emitting 20× less CO2 than ensemble forests, which—despite comparable accuracy—incur substantially higher energy costs. We demonstrate that hardware–algorithm co-design, model sparsification, and careful hyperparameter tuning can reduce carbon footprints by over 90% with negligible loss in classification quality. These findings provide actionable guidelines for deploying energy-efficient, high-fidelity models in both ground-based data centers and onboard space observatories, paving the way for truly sustainable, large-scale astronomical data analysis. Full article
(This article belongs to the Special Issue Artificial Intelligence in Space Applications)
Show Figures

Figure 1

20 pages, 2248 KB  
Review
A Review of High-Performance Computing Methods for Power Flow Analysis
by Shadi G. Alawneh, Lei Zeng and Seyed Ali Arefifar
Mathematics 2023, 11(11), 2461; https://doi.org/10.3390/math11112461 - 26 May 2023
Cited by 14 | Viewed by 8061
Abstract
Power flow analysis is critical for power systems due to the development of multiple energy supplies. For safety, stability, and real-time response in grid operation, grid planning, and analysis of power systems, it requires designing high-performance computing methods, accelerating power flow calculation, obtaining [...] Read more.
Power flow analysis is critical for power systems due to the development of multiple energy supplies. For safety, stability, and real-time response in grid operation, grid planning, and analysis of power systems, it requires designing high-performance computing methods, accelerating power flow calculation, obtaining the voltage magnitude and phase angle of buses inside the power system, and coping with the increasingly complex large-scale power system. This paper provides an overview of the available parallel methods to fix the issues. Specifically, these methods can be classified into three categories from a hardware perspective: multi-cores, hybrid CPU-GPU architecture, and FPGA. In addition, from the perspective of numerical computation, the power flow algorithm is generally classified into iterative and direct methods. This review paper introduces models of power flow and hardware computing architectures and then compares their performance in parallel power flow calculations depending on parallel numerical methods on different computing platforms. Furthermore, this paper analyzes the challenges and pros and cons of these methods and provides guidance on how to exploit the parallelism of future power flow applications. Full article
(This article belongs to the Section E: Applied Mathematics)
Show Figures

Figure 1

20 pages, 4226 KB  
Article
AI-For-Mobility—A New Research Platform for AI-Based Control Methods
by Julian Ruggaber, Kenan Ahmic, Jonathan Brembeck, Daniel Baumgartner and Jakub Tobolář
Appl. Sci. 2023, 13(5), 2879; https://doi.org/10.3390/app13052879 - 23 Feb 2023
Cited by 5 | Viewed by 3169
Abstract
AI-For-Mobility (AFM) is the new research platform to investigate and implement novel control methods based on Artificial Intelligence (AI) within the Department of Vehicle System Dynamics at the German Aerospace Center (DLR). A production hybrid vehicle serves as a base platform. Since AI-based [...] Read more.
AI-For-Mobility (AFM) is the new research platform to investigate and implement novel control methods based on Artificial Intelligence (AI) within the Department of Vehicle System Dynamics at the German Aerospace Center (DLR). A production hybrid vehicle serves as a base platform. Since AI-based methods are data-driven, the vehicle is equipped with manifold sensors to provide the required data. They measure the vehicle’s state holistically and perceive the surrounding environment, while high performance on-board CPUs and GPUs handle the sensor data. A full by-wire control system enables the vehicle to be used for applications in the field of automated driving. Despite all modifications, it is approved for public road use and meets the driving dynamics properties of a standard road vehicle. This makes it an attractive research and test platform, both for automotive applications and technology demonstrations in other scientific fields (e.g., robotics, aviation, etc.). This paper presents the vehicle’s design and architecture in a detailed manner and shows a promising application potential of AFM in the context of AI-based control methods. Full article
(This article belongs to the Special Issue Technology Development of Autonomous Vehicles)
Show Figures

Figure 1

12 pages, 3084 KB  
Communication
A Hybrid GPU and CPU Parallel Computing Method to Accelerate Millimeter-Wave Imaging
by Li Ding, Zhaomiao Dong, Huagang He and Qibin Zheng
Electronics 2023, 12(4), 840; https://doi.org/10.3390/electronics12040840 - 7 Feb 2023
Cited by 8 | Viewed by 3348
Abstract
The range migration algorithm (RMA) based on Fourier transformation is widely applied in millimeter-wave (MMW) close-range imaging because of its few operations and small approximation. However, its interpolation stage is not effective due to the involved intensive logic controls, which limits the speed [...] Read more.
The range migration algorithm (RMA) based on Fourier transformation is widely applied in millimeter-wave (MMW) close-range imaging because of its few operations and small approximation. However, its interpolation stage is not effective due to the involved intensive logic controls, which limits the speed performance in a graphics processing unit (GPU) platform. Therefore, in this paper, we present an acceleration optimization method based on the hybrid GPU and central processing unit (CPU) parallel computation for implementing the RMA. The proposed method exploits the strong logic-control capability of the CPU to assist the GPU in processing the logic controls of the interpolation stage. The common positions of wavenumber-domain components to be interpolated are calculated by the CPU and stored in the constant memory for broadcast at any time. This avoids the repetitive computation consumed in a GPU-only scheme. Then the GPU is responsible for the remaining matrix-related steps and outputs the needed wavenumber-domain values. The imaging experiments verify the acceleration efficiency of the proposed method and demonstrate that the speedup ratio of our proposed method is more than 15 times of that by the CPU-only method, and more than 2 times of that by the GPU-only method. Full article
(This article belongs to the Special Issue High-Performance Computing and Its Applications)
Show Figures

Figure 1

20 pages, 5408 KB  
Article
Hybrid Compression Optimization Based Rapid Detection Method for Non-Coal Conveying Foreign Objects
by Mengchao Zhang, Yanbo Yue, Kai Jiang, Meixuan Li, Yuan Zhang and Manshan Zhou
Micromachines 2022, 13(12), 2085; https://doi.org/10.3390/mi13122085 - 26 Nov 2022
Cited by 7 | Viewed by 2295
Abstract
The existence of conveyor foreign objects poses a serious threat to the service life of conveyor belts, which will cause abnormal damage or even tearing, so fast and effective detection of conveyor foreign objects is of great significance to ensure the safe and [...] Read more.
The existence of conveyor foreign objects poses a serious threat to the service life of conveyor belts, which will cause abnormal damage or even tearing, so fast and effective detection of conveyor foreign objects is of great significance to ensure the safe and efficient operation of belt conveyors. Considering the need for the foreign object detection algorithm to operate in edge computing devices, this paper proposes a hybrid compression method that integrates network sparse, structured pruning, and knowledge distillation to compress the network parameters and calculations. Combined with a Yolov5 network for practice, three structured pruning strategies are specifically proposed, all of which are proven to have achieved a good compression effect. The experiment results show that under the pruning rate of 0.9, the proposed three pruning strategies can achieve more than 95% compression for network parameters, more than 90% compression for the computation, and more than 90% compression for the size of the network model, and the optimized network is able to accelerate inference on both Central Processing Unit (CPU) and Graphic Processing Unit (GPU) hardware platforms, with a maximum speedup of 70.3% on the GPU platform and 157.5% on the CPU platform, providing an excellent real-time performance but also causing a large accuracy loss. In contrast, the proposed method balances better real-time performance and detection accuracy (>88.2%) when the pruning rate is at 0.6~0.9. Further, to avoid the influence of motion blur, a method of introducing prior knowledge is proposed to improve the resistance of the network, thus strongly ensuring the detection effect. All the technical solutions proposed are of great significance in promoting the intelligent development of coal mine equipment, ensuring the safe and efficient operation of belt conveyors, and promoting sustainable development. Full article
(This article belongs to the Special Issue Hardware-Friendly Machine Learning and Its Applications)
Show Figures

Figure 1

18 pages, 3528 KB  
Article
Deep Neural Network Based Reconciliation for CV-QKD
by Jun Xie, Ling Zhang, Yijun Wang and Duan Huang
Photonics 2022, 9(2), 110; https://doi.org/10.3390/photonics9020110 - 15 Feb 2022
Cited by 4 | Viewed by 4426
Abstract
High-speed reconciliation is indispensable for supporting the continuous-variable quantum key distribution (CV-QKD) system to generate the secure key in real-time. However, the error correction process’s high complexity and low processing speed limit the reconciliation speed. Therefore, reconciliation has also become the bottleneck of [...] Read more.
High-speed reconciliation is indispensable for supporting the continuous-variable quantum key distribution (CV-QKD) system to generate the secure key in real-time. However, the error correction process’s high complexity and low processing speed limit the reconciliation speed. Therefore, reconciliation has also become the bottleneck of system performance. In this paper, we proposed a high-speed reconciliation scheme that uses the deep neural network to optimize the decoding process of the low-density parity-check (LDPC) code. We first introduced a network structure of decoding implementation based on the deep neural network, which can be applied to decoding algorithms of parallel strategy and significantly reduce the decoding complexity. Subsequently, we proposed two improved decoding algorithms based on this structure, named linear fitting algorithm and deep neural network-assisted decoding algorithm. Finally, we introduced a high-speed reconciliation scheme based on the CPU-GPU hybrid platform. Simulation results show that the proposed reconciliation scheme reduces the complexity and enables us to realize the high-speed CV-QKD system. Furthermore, the improved decoding algorithm can also reduce the FER, thereby increasing the secret key rate. Full article
(This article belongs to the Topic Fiber Optic Communication)
Show Figures

Figure 1

13 pages, 2302 KB  
Article
Length-Bounded Hybrid CPU/GPU Pattern Matching Algorithm for Deep Packet Inspection
by Yi-Shan Lin, Chun-Liang Lee and Yaw-Chung Chen
Algorithms 2017, 10(1), 16; https://doi.org/10.3390/a10010016 - 18 Jan 2017
Cited by 182 | Viewed by 6834
Abstract
Since frequent communication between applications takes place in high speed networks, deep packet inspection (DPI) plays an important role in the network application awareness. The signature-based network intrusion detection system (NIDS) contains a DPI technique that examines the incoming packet payloads by employing [...] Read more.
Since frequent communication between applications takes place in high speed networks, deep packet inspection (DPI) plays an important role in the network application awareness. The signature-based network intrusion detection system (NIDS) contains a DPI technique that examines the incoming packet payloads by employing a pattern matching algorithm that dominates the overall inspection performance. Existing studies focused on implementing efficient pattern matching algorithms by parallel programming on software platforms because of the advantages of lower cost and higher scalability. Either the central processing unit (CPU) or the graphic processing unit (GPU) were involved. Our studies focused on designing a pattern matching algorithm based on the cooperation between both CPU and GPU. In this paper, we present an enhanced design for our previous work, a length-bounded hybrid CPU/GPU pattern matching algorithm (LHPMA). In the preliminary experiment, the performance and comparison with the previous work are displayed, and the experimental results show that the LHPMA can achieve not only effective CPU/GPU cooperation but also higher throughput than the previous method. Full article
(This article belongs to the Special Issue Networks, Communication, and Computing)
Show Figures

Figure 1

Back to TopTop