MDPI - Publisher of Open Access Journals

14 pages, 2150 KB

Open AccessEditor’s ChoiceArticle

A Flexible Multi-Core Hardware Architecture for Stereo-Based Depth Estimation CNNs

by Steven Colleman, Andrea Nardi-Dei, Marc C. W. Geilen, Sander Stuijk and Toon Goedemé

Electronics 2025, 14(22), 4425; https://doi.org/10.3390/electronics14224425 - 13 Nov 2025

Viewed by 364

Stereo-based depth estimation is becoming more and more important in many applications like self-driving vehicles, earth observation, cartography, robotics and so on. Modern approaches to depth estimation employ artificial intelligence techniques, particularly convolutional neural networks (CNNs). However, stereo-based depth estimation networks involve dual [...] Read more.

Stereo-based depth estimation is becoming more and more important in many applications like self-driving vehicles, earth observation, cartography, robotics and so on. Modern approaches to depth estimation employ artificial intelligence techniques, particularly convolutional neural networks (CNNs). However, stereo-based depth estimation networks involve dual processing paths for left and right input images, which merge at intermediate layers, posing challenges for efficient deployment on modern hardware accelerators. Specifically, modern depth-first and layer-fused execution strategies, which are commonly used to reduce I/O communication and on-chip memory demands, are not readily compatible with such non-linear network structures. To address this limitation, we propose a flexible multi-core hardware architecture tailored for stereo-based depth estimation CNNs. The architecture supports layer-fused execution while efficiently managing dual-path computation and its fusion, enabling improved resource utilization. Experimental results demonstrate a latency reduction of up to 24% compared to state-of-the-art depth-first implementations that do not incorporate stereo-specific optimizations. Full article

(This article belongs to the Special Issue Multimedia Signal Processing and Computer Vision)

► Show Figures

Figure 1

30 pages, 27762 KB

Open AccessArticle

An IoV-Based Real-Time Telemetry and Monitoring System for Electric Racing Vehicles: Design, Implementation, and Field Validation

by Andrés Pérez-González, Arley F. Villa-Salazar, Ingry N. Gomez-Miranda, Juan D. Velásquez-Gómez, Andres F. Romero-Maya and Álvaro Jaramillo-Duque

Vehicles 2025, 7(4), 128; https://doi.org/10.3390/vehicles7040128 - 6 Nov 2025

Viewed by 1498

Abstract

The rapid development of Intelligent Connected Vehicles (ICVs) and the Internet of Vehicles (IoV) has paved the way for new real-time monitoring and control systems. However, most existing telemetry solutions remain limited by high costs, reliance on cellular networks, lack of modularity, and [...] Read more.

The rapid development of Intelligent Connected Vehicles (ICVs) and the Internet of Vehicles (IoV) has paved the way for new real-time monitoring and control systems. However, most existing telemetry solutions remain limited by high costs, reliance on cellular networks, lack of modularity, and insufficient field validation in competitive scenarios. To address this gap, this study presents the design, implementation, and real-world validation of a low-cost telemetry platform for electric race vehicles. The system integrates an ESP32-based data acquisition unit, LoRaWAN long-range communication, and real-time visualization via Node-RED on a Raspberry Pi gateway. The platform supports multiple sensors (voltage, current, temperature, Global Positioning System (GPS), speed) and uses a FreeRTOS multi-core architecture for efficient task distribution and consistent data sampling. Field testing was conducted during Colombia’s 2024 National Electric Drive Vehicle Competition (CNVTE), under actual race conditions. The telemetry system achieved sensor accuracy exceeding 95%, stable LoRa transmission with low latency, and consistent performance throughout the competition. Notably, teams using the system reported up to 12% improvements in energy efficiency compared to baseline trials, confirming the system’s technical feasibility and operational impact under real race conditions. This work contributes to the advancement of IoV research by providing a modular, replicable, and cost-effective telemetry architecture, field-validated for use in high-performance electric vehicles. The architecture generalizes to urban e-mobility fleets for energy-aware routing, predictive maintenance, and safety monitoring. Full article

(This article belongs to the Special Issue Intelligent Connected Vehicles)

► Show Figures

Figure 1

22 pages, 1308 KB

Open AccessReview

Comparative Review of Multicore Architectures: Intel, AMD, and ARM in the Modern Computing Era

by Raghad H. AlShekh, Shefa A. Dawwd and Farah N. Qassabbashi

Chips 2025, 4(4), 44; https://doi.org/10.3390/chips4040044 - 27 Oct 2025

Viewed by 5052

Abstract

Every element of our contemporary lives has changed as a result of the widespread use of computing infrastructure and information technology in daily life. Less focus has been placed on the hardware components that underpin the computing revolution, despite the fact that its [...] Read more.

Every element of our contemporary lives has changed as a result of the widespread use of computing infrastructure and information technology in daily life. Less focus has been placed on the hardware components that underpin the computing revolution, despite the fact that its effects on software applications have been the most obvious. The computer chip is the most basic component of computer hardware and powers all digital devices. Every gadget, including mainframes, laptops, cellphones, tablets, desktop PCs, and supercomputers, is powered by different computer chips. Although there are many different types of these chips, the biggest producers in this field are AMD (Advanced Micro Devices), Intel, and ARM (Advanced RISC Machines). These companies make processors for both consumer and business markets. Users have compared their products based on a number of factors, including pricing, cache and memory, approaches, etc. This paper provides a comprehensive comparative analysis of Intel, AMD, and ARM processors, focusing on their architectural characteristics and performance within the context of burgeoning artificial intelligence applications. The detailed architectural features, performance evaluation for AI workloads, a comparison of power efficiency and cost, and analysis for current market trends are presented. By thoroughly examining core architectural elements and key performance factors, this work provides valuable insights for users and developers to seek optimal processor choices to maximize AI tool utilization in the contemporary era. Full article

(This article belongs to the Special Issue IC Design Techniques for Power/Energy-Constrained Applications)

► Show Figures

Figure 1

27 pages, 1008 KB

Open AccessArticle

Efficient Reliability Block Diagram Evaluation Through Improved Algorithms and Parallel Computing

by Gloria Gori, Marco Papini and Alessandro Fantechi

Appl. Sci. 2025, 15(21), 11397; https://doi.org/10.3390/app152111397 - 24 Oct 2025

Viewed by 803

Abstract

Quantitative reliability evaluation is essential for optimizing control policies and maintenance strategies in complex industrial systems. While Reliability Block Diagrams (RBDs) are a natural formalism for modeling these hierarchical systems, modern applications require highly efficient, online reliability assessment on resource-constrained embedded hardware. This [...] Read more.

Quantitative reliability evaluation is essential for optimizing control policies and maintenance strategies in complex industrial systems. While Reliability Block Diagrams (RBDs) are a natural formalism for modeling these hierarchical systems, modern applications require highly efficient, online reliability assessment on resource-constrained embedded hardware. This demand presents two fundamental challenges: developing algorithmically efficient RBD evaluation methods that can handle diverse custom distributions while preserving numerical accuracy, and ensuring platform-agnostic performance across diverse multicore architectures. This paper investigates these issues by developing a new version of the librbd open-source RBD library. This version includes advances in efficiency of evaluation algorithms, as well as restructured computation sequences, cache-aware data structures to minimize memory overhead, and an adaptive parallelization framework that scales automatically from embedded processors to high-performance systems. Comprehensive validation demonstrates that these advances significantly reduce computational complexity and improve performance over the original implementation, enabling real-time analysis of substantially larger systems. Full article

(This article belongs to the Special Issue Uncertainty and Reliability Analysis for Engineering Systems)

► Show Figures

Figure 1

10 pages, 532 KB

Open AccessArticle

3D Non-Uniform Fast Fourier Transform Program Optimization

by Kai Nie, Haoran Li, Lin Han, Yapeng Li and Jinlong Xu

Appl. Sci. 2025, 15(19), 10563; https://doi.org/10.3390/app151910563 - 30 Sep 2025

Viewed by 689

Abstract

MRI (magnetic resonance imaging) technology aims to map the internal structure image of organisms. It is an important application scenario of Non-Uniform Fast Fourier Transform (NUFFT), which can help doctors quickly locate the lesion site of patients. However, in practical application, it has [...] Read more.

MRI (magnetic resonance imaging) technology aims to map the internal structure image of organisms. It is an important application scenario of Non-Uniform Fast Fourier Transform (NUFFT), which can help doctors quickly locate the lesion site of patients. However, in practical application, it has disadvantages such as large computation and difficulty in parallel. Under the architecture of multi-core shared memory, using block pretreatment, color block scheduling NUFFT convolution interpolation offers a parallel solution, and then using a static linked list solves the problem of large memory requirements after the parallel solution on the basis of multithreading to cycle through more source code versions. Then, manual vectorization, such as processing, using short vector components, further accelerates the process. Through a series of optimizations, the final Random, Radial, and Spiral dataset obtained an acceleration effect of 273.8×, 291.8× and 251.7×, respectively. Full article

► Show Figures

Figure 1

43 pages, 2828 KB

Open AccessArticle

Efficient Hybrid Parallel Scheme for Caputo Time-Fractional PDEs on Multicore Architectures

by Mudassir Shams and Bruno Carpentieri

Fractal Fract. 2025, 9(9), 607; https://doi.org/10.3390/fractalfract9090607 - 19 Sep 2025

Viewed by 835

Abstract

We present a hybrid parallel scheme for efficiently solving Caputo time-fractional partial differential equations (CTFPDEs) with integer-order spatial derivatives on multicore CPU and GPU platforms. The approach combines a second-order spatial discretization with the

L 1

time-stepping scheme and employs MATLAB parfor parallelization [...] Read more.

We present a hybrid parallel scheme for efficiently solving Caputo time-fractional partial differential equations (CTFPDEs) with integer-order spatial derivatives on multicore CPU and GPU platforms. The approach combines a second-order spatial discretization with the

L 1

time-stepping scheme and employs MATLAB parfor parallelization to achieve significant reductions in runtime and memory usage. A theoretical third-order convergence rate is established under smooth-solution assumptions, and the analysis also accounts for the loss of accuracy near the initial time

t = t_{0}

caused by weak singularities inherent in time-fractional models. Unlike many existing approaches that rely on locally convergent strategies, the proposed method ensures global convergence even for distant or randomly chosen initial guesses. Benchmark problems from fractional biological models—including glucose–insulin regulation, tumor growth under chemotherapy, and drug diffusion in tissue—are used to validate the robustness and reliability of the scheme. Numerical experiments confirm near-linear speedup on up to four CPU cores and show that the method outperforms conventional techniques in terms of convergence rate, residual error, iteration count, and efficiency. These results demonstrate the method’s suitability for large-scale CTFPDE simulations in scientific and engineering applications. Full article

(This article belongs to the Special Issue Recent Advances in the Spatial and Temporal Discretizations of Fractional PDEs, Second Edition)

► Show Figures

Figure 1

14 pages, 1043 KB

Open AccessArticle

A Dataset and Experimental Evaluation of a Parallel Conflict Detection Solution for Model-Based Diagnosis

by Jessica Janina Cabezas-Quinto, Cristian Vidal-Silva, Jorge Serrano-Malebrán and Nicolás Márquez

Data 2025, 10(9), 139; https://doi.org/10.3390/data10090139 - 29 Aug 2025

Viewed by 930

Abstract

This article presents a dataset and experimental evaluation of a parallelized variant of Junker’s QuickXPlain algorithm, designed to efficiently compute minimal conflict sets in constraint-based diagnosis tasks. The dataset includes performance benchmarks, conflict traces, and solution metadata for a wide range of configurable [...] Read more.

This article presents a dataset and experimental evaluation of a parallelized variant of Junker’s QuickXPlain algorithm, designed to efficiently compute minimal conflict sets in constraint-based diagnosis tasks. The dataset includes performance benchmarks, conflict traces, and solution metadata for a wide range of configurable diagnosis problems based on real-world and synthetic CSP instances. Our parallel variant leverages multicore architectures to reduce computation time while preserving the completeness and minimality guarantees of QuickXPlain. All evaluations were conducted using reproducible scripts and parameter configurations, enabling comparison across different algorithmic strategies. The provided dataset can be used to replicate experiments, analyze scalability under varying problem sizes, and serve as a baseline for future improvements in conflict explanation algorithms. The full dataset, codebase, and benchmarking scripts are openly available and documented to promote transparency and reusability in constraint-based diagnostic systems research. Full article

► Show Figures

Figure 1

15 pages, 3863 KB

Open AccessProceeding Paper

Fast Parallel Gaussian Filter Based on Partial Sums

by Atanaska Bosakova-Ardenska, Hristina Andreeva and Ivan Halvadzhiev

Eng. Proc. 2025, 104(1), 1; https://doi.org/10.3390/engproc2025104001 - 21 Aug 2025

Viewed by 597

Abstract

As a convolutional operation in a space domain, Gaussian filtering involves a large number of computational operations, a number that increases when the sizes of images and the kernel size also increase. Thus, finding methods to accelerate such computations is significant for overall [...] Read more.

As a convolutional operation in a space domain, Gaussian filtering involves a large number of computational operations, a number that increases when the sizes of images and the kernel size also increase. Thus, finding methods to accelerate such computations is significant for overall time complexity enhancement, and the current paper proposes the use of partial sums to achieve this acceleration. The MPI (Message Passing Interface) library and the C programming language are used for the parallel program implementation of Gaussian filtering, based on a 1D kernel and 2D kernel working with and without the use of partial sums, and then a theoretical and practical evaluation of the effectiveness of the proposed implementations is made. The experimental results indicate a significant acceleration of the computational process when partial sums are used in both sequential and parallel processing. A PSNR (Peak Signal to Noise Ratio) metric is used to assess the quality of filtering for the proposed algorithms in comparison with the MATLAB implementation of Gaussian filtering, and time performance for the proposed algorithms is also evaluated. Full article

(This article belongs to the Proceedings of International Conference on Electronics, Engineering Physics and Earth Science (EEPES 2025))

► Show Figures

Figure 1

36 pages, 5771 KB

Open AccessArticle

Improving K-Means Clustering: A Comparative Study of Parallelized Version of Modified K-Means Algorithm for Clustering of Satellite Images

by Yuv Raj Pant, Larry Leigh and Juliana Fajardo Rueda

Algorithms 2025, 18(8), 532; https://doi.org/10.3390/a18080532 - 21 Aug 2025

Cited by 1 | Viewed by 3689

Abstract

Efficient clustering of high-spatial-dimensional satellite image datasets remains a critical challenge, particularly due to the computational demands of spectral distance calculations, random centroid initialization, and sensitivity to outliers in conventional K-Means algorithms. This study presents a comprehensive comparative analysis of eight parallelized variants [...] Read more.

Efficient clustering of high-spatial-dimensional satellite image datasets remains a critical challenge, particularly due to the computational demands of spectral distance calculations, random centroid initialization, and sensitivity to outliers in conventional K-Means algorithms. This study presents a comprehensive comparative analysis of eight parallelized variants of the K-Means algorithm, designed to enhance clustering efficiency and reduce computational burden for large-scale satellite image analysis. The proposed parallelized implementations incorporate optimized centroid initialization for better starting point selection using a dynamic K-Means sharp method to detect the outlier to improve cluster robustness, and a Nearest-Neighbor Iteration Calculation Reduction method to minimize redundant computations. These enhancements were applied to a test set of 114 global land cover data cubes, each comprising high-dimensional satellite images of size 3712 × 3712 × 16 and executed on multi-core CPU architecture to leverage extensive parallel processing capabilities. Performance was evaluated across three criteria: convergence speed (iterations), computational efficiency (execution time), and clustering accuracy (RMSE). The Parallelized Enhanced K-Means (PEKM) method achieved the fastest convergence at 234 iterations and the lowest execution time of 4230 h, while maintaining consistent RMSE values (0.0136) across all algorithm variants. These results demonstrate that targeted algorithmic optimizations, combined with effective parallelization strategies, can improve the practicality of K-Means clustering for high-dimensional-satellites image analysis. This work underscores the potential of improving K-Means clustering frameworks beyond hardware acceleration alone, offering scalable solutions good for large-scale unsupervised image classification tasks. Full article

(This article belongs to the Special Issue Algorithms in Multi-Sensor Imaging and Fusion)

► Show Figures

Graphical abstract

22 pages, 3265 KB

Open AccessArticle

A Novel Multi-Core Parallel Current Differential Sensing Approach for Tethered UAV Power Cable Break Detection

by Ziqiao Chen, Zifeng Luo, Ziyan Wang, Zhou Huang, Yongkang He, Zhiheng Wen, Yuanjun Ding and Zhengwang Xu

Sensors 2025, 25(16), 5112; https://doi.org/10.3390/s25165112 - 18 Aug 2025

Viewed by 816

Abstract

Tethered unmanned aerial vehicles (UAVs) operating in terrestrial environments face critical safety challenges from power cable breaks, yet existing solutions—including fiber optic sensing (cost > USD 20,000) and impedance analysis (35% payload increase)—suffer from high cost or heavy weight. This study proposes a [...] Read more.

Tethered unmanned aerial vehicles (UAVs) operating in terrestrial environments face critical safety challenges from power cable breaks, yet existing solutions—including fiber optic sensing (cost > USD 20,000) and impedance analysis (35% payload increase)—suffer from high cost or heavy weight. This study proposes a dual innovation: a real-time break detection method and a low-cost multi-core parallel sensing system design based on ACS712 Hall sensors, achieving high detection accuracy (100% with zero false positives in tests). Unlike conventional techniques, the approach leverages current differential (ΔI) monitoring across parallel cores, triggering alarms when ΔI exceeds I_rate/2 (e.g., 0.3 A for 0.6 A rated current), corresponding to a voltage deviation ≥ 110 mV (normal baseline ≤ 3 mV). The core innovation lies in the integrated sensing system design: by optimizing the parallel deployment of ACS712 sensors and LMV324-based differential circuits, the solution reduces hardware cost to USD 3 (99.99% lower than fiber optic systems), payload by 18%, and power consumption by 23% compared to traditional methods. Post-fault cable temperatures remain ≤56 °C, ensuring safety margins. The 4-core architecture enhances mean time between failures (MTBF) by 83% over traditional systems, establishing a new paradigm for low-cost, high-reliability sensing systems in terrestrial tethered UAV cable health monitoring. Preliminary theoretical analysis suggests potential extensibility to underwater scenarios with further environmental hardening. Full article

(This article belongs to the Section Sensor Networks)

► Show Figures

Figure 1

18 pages, 1587 KB

Open AccessArticle

Management of Mobile Resonant Electrical Systems for High-Voltage Generation in Non-Destructive Diagnostics of Power Equipment Insulation

by Anatolii Shcherba, Dmytro Vinnychenko, Nataliia Suprunovska, Sergy Roziskulov, Artur Dyczko and Roman Dychkovskyi

Electronics 2025, 14(15), 2923; https://doi.org/10.3390/electronics14152923 - 22 Jul 2025

Cited by 2 | Viewed by 869

Abstract

This research presents the development and management principles of mobile resonant electrical systems designed for high-voltage generation, intended for non-destructive diagnostics of insulation in high-power electrical equipment. The core of the system is a series inductive–capacitive (LC) circuit characterized by a high quality [...] Read more.

This research presents the development and management principles of mobile resonant electrical systems designed for high-voltage generation, intended for non-destructive diagnostics of insulation in high-power electrical equipment. The core of the system is a series inductive–capacitive (LC) circuit characterized by a high quality (Q) factor and operating at high frequencies, typically in the range of 40–50 kHz or higher. Practical implementations of the LC circuit with Q-factors exceeding 200 have been achieved using advanced materials and configurations. Specifically, ceramic capacitors with a capacitance of approximately 3.5 nF and Q-factors over 1000, in conjunction with custom-made coils possessing Q-factors above 280, have been employed. These coils are constructed using multi-core, insulated, and twisted copper wires of the Litzendraht type to minimize losses at high frequencies. Voltage amplification within the system is effectively controlled by adjusting the current frequency, thereby maximizing voltage across the load without increasing the system’s size or complexity. This frequency-tuning mechanism enables significant reductions in the weight and dimensional characteristics of the electrical system, facilitating the development of compact, mobile installations. These systems are particularly suitable for on-site testing and diagnostics of high-voltage insulation in power cables, large rotating machines such as turbogenerators, and other critical infrastructure components. Beyond insulation diagnostics, the proposed system architecture offers potential for broader applications, including the charging of capacitive energy storage units used in high-voltage pulse systems. Such applications extend to the synthesis of micro- and nanopowders with tailored properties and the electrohydropulse processing of materials and fluids. Overall, this research demonstrates a versatile, efficient, and portable solution for advanced electrical diagnostics and energy applications in the high-voltage domain. Full article

(This article belongs to the Special Issue Energy Harvesting and Energy Storage Systems, 3rd Edition)

► Show Figures

Figure 1

22 pages, 2946 KB

Open AccessArticle

Intelligent Transaction Scheduling to Enhance Concurrency in High-Contention Workloads

by Shuhan Chen, Congqi Shen and Chunming Wu

Appl. Sci. 2025, 15(11), 6341; https://doi.org/10.3390/app15116341 - 5 Jun 2025

Viewed by 1718

Abstract

Concurrency control (CC) scheme based on transaction decomposition has significantly enhanced the concurrency performance of multicore in-memory databases, surpassing traditional CC schemes such as two-phase locking (2PL) or optimistic concurrency control (OCC), particularly in high-contention scenarios. However, this performance improvement introduces new challenges, [...] Read more.

Concurrency control (CC) scheme based on transaction decomposition has significantly enhanced the concurrency performance of multicore in-memory databases, surpassing traditional CC schemes such as two-phase locking (2PL) or optimistic concurrency control (OCC), particularly in high-contention scenarios. However, this performance improvement introduces new challenges, as balancing transaction dependency constraints with enhanced concurrency optimization remains a persistent issue, especially with the increased number of concurrent client requests, which can lead to complex transaction dependencies. To address these challenges, we propose Dynamic Contention Scheduling (DCoS), a novel method that enhances transaction concurrency via a dual-granularity architecture. DCoS integrates a deep reinforcement learning (DRL)-based executor to schedule high-contention transactions while preserving dependency correctness. DCoS employs a one-shot execution model that enables fine-grained scheduling in high-contention scenarios, while retaining lightweight in-partition execution under low-contention conditions. The experimental results on both micro- and macro-benchmarks demonstrate that DCoS achieves a throughput up to three times higher than state-of-the-art CC protocols under high-contention workloads. Full article

(This article belongs to the Special Issue AI-Based Data Science and Database Systems)

► Show Figures

Figure 1

18 pages, 7358 KB

Open AccessArticle

A Tile-Based Multi-Core Hardware Architecture for Lossless Image Compression and Decompression

by Xufeng Li, Li Zhou and Yan Zhu

Appl. Sci. 2025, 15(11), 6017; https://doi.org/10.3390/app15116017 - 27 May 2025

Cited by 1 | Viewed by 1094

Abstract

Lossless image compression plays a vital role in improving data storage and transmission efficiency without compromising data integrity. However, the throughput of current lossless compression and decompression systems remains limited and is unable to meet the growing demands of high-speed data transfer. To [...] Read more.

Lossless image compression plays a vital role in improving data storage and transmission efficiency without compromising data integrity. However, the throughput of current lossless compression and decompression systems remains limited and is unable to meet the growing demands of high-speed data transfer. To address this challenge, a previously proposed hybrid lossless compression and decompression algorithm has been implemented on an FPGA platform. This implementation significantly improves processing speed and efficiency. A multi-core system architecture is introduced, utilizing the processing system (PS) and programmable logic (PL) of a Xilinx Zynq-706 evaluation board. The PS handles coordination. The PL performs compression and decompression using multiple cores. Each core can process up to eight image tiles at the same time. The compression process is designed with a four-stage pipeline, and decompression is managed by a dynamic state machine to ensure optimized control. The parallel architecture and innovative algorithm design enable high-throughput operation, achieving compression and decompression rates of 480 Msubpixels/s and 372 Msubpixels/s, respectively. Through this work, a practical and high-performance solution for real-time lossless image compression is demonstrated. Full article

► Show Figures

Figure 1

35 pages, 11134 KB

Open AccessArticle

Error Classification and Static Detection Methods in Tri-Programming Models: MPI, OpenMP, and CUDA

by Saeed Musaad Altalhi, Fathy Elbouraey Eassa, Sanaa Abdullah Sharaf, Ahmed Mohammed Alghamdi, Khalid Ali Almarhabi and Rana Ahmad Bilal Khalid

Computers 2025, 14(5), 164; https://doi.org/10.3390/computers14050164 - 28 Apr 2025

Viewed by 1384

Abstract

The growing adoption of supercomputers across various scientific disciplines, particularly by researchers without a background in computer science, has intensified the demand for parallel applications. These applications are typically developed using a combination of programming models within languages such as C, C++, and [...] Read more.

The growing adoption of supercomputers across various scientific disciplines, particularly by researchers without a background in computer science, has intensified the demand for parallel applications. These applications are typically developed using a combination of programming models within languages such as C, C++, and Fortran. However, modern multi-core processors and accelerators necessitate fine-grained control to achieve effective parallelism, complicating the development process. To address this, developers commonly utilize high-level programming models such as Open Multi-Processing (OpenMP), Open Accelerators (OpenACCs), Message Passing Interface (MPI), and Compute Unified Device Architecture (CUDA). These models may be used independently or combined into dual- or tri-model applications to leverage their complementary strengths. However, integrating multiple models introduces subtle and difficult-to-detect runtime errors such as data races, deadlocks, and livelocks that often elude conventional compilers. This complexity is exacerbated in applications that simultaneously incorporate MPI, OpenMP, and CUDA, where the origin of runtime errors, whether from individual models, user logic, or their interactions, becomes ambiguous. Moreover, existing tools are inadequate for detecting such errors in tri-model applications, leaving a critical gap in development support. To address this gap, the present study introduces a static analysis tool designed specifically for tri-model applications combining MPI, OpenMP, and CUDA in C++-based environments. The tool analyzes source code to identify both actual and potential runtime errors prior to execution. Central to this approach is the introduction of error dependency graphs, a novel mechanism for systematically representing and analyzing error correlations in hybrid applications. By offering both error classification and comprehensive static detection, the proposed tool enhances error visibility and reduces manual testing effort. This contributes significantly to the development of more robust parallel applications for high-performance computing (HPC) and future exascale systems. Full article

(This article belongs to the Special Issue Best Practices, Challenges and Opportunities in Software Engineering)

► Show Figures

Figure 1

16 pages, 665 KB

Open AccessArticle

Modeling and Performance Analysis of Task Offloading of Heterogeneous Mobile Edge Computing Networks

by Wenwang Li and Haohao Zhou

Appl. Sci. 2025, 15(8), 4307; https://doi.org/10.3390/app15084307 - 14 Apr 2025

Viewed by 2183

Abstract

Mobile edge computing architecture (MEC) can provide users with low latency services by integrating computing, storage and processing capabilities near users and data sources. As such, there has been intense interest in this topic, especially in single-server and homogeneous multi-server scenarios. The impact [...] Read more.

Mobile edge computing architecture (MEC) can provide users with low latency services by integrating computing, storage and processing capabilities near users and data sources. As such, there has been intense interest in this topic, especially in single-server and homogeneous multi-server scenarios. The impact of network heterogeneity and load fluctuations is ignored, and the performance evaluation system relies too much on statistical mean indicators, ignoring the impact of real-time indicators. In this paper, we propose a new heterogeneous edge computing network architecture composed of multi-core servers with varying transmission power, computing capabilities and waiting queue length. Since it is necessary to evaluate and analyze the service performance of MEC to guarantee Quality of Service (QoS), we design some indicators by solving the probability distribution function of response time, such as average task offloading delay, immediate service probability and blocking probability. By analyzing the impact of bias factors and network parameters associated with MEC servers on network performance, we provide insights for MEC design, deployment and optimization. Full article

► Show Figures

Figure 1

Search Results (119)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (119)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI