A Multi-Core Benchmark Framework for Linux-Based Embedded Systems Using Synthetic Task-Set Generation

Xing, Yixiao; Li, Yixiao; Takada, Hiroaki

doi:10.3390/electronics14224515

Open AccessArticle

A Multi-Core Benchmark Framework for Linux-Based Embedded Systems Using Synthetic Task-Set Generation

by

Yixiao Xing

,

Yixiao Li

^* and

Hiroaki Takada

Graduate School of Informatics, Nagoya University, Nagoya 464-8601, Japan

^*

Author to whom correspondence should be addressed.

Electronics 2025, 14(22), 4515; https://doi.org/10.3390/electronics14224515

Submission received: 19 October 2025 / Revised: 15 November 2025 / Accepted: 17 November 2025 / Published: 19 November 2025

(This article belongs to the Special Issue Design and Implementation of Embedded Systems for Real-Time Applications)

Download

Browse Figures

Versions Notes

Abstract

Accurately evaluating multi-core embedded systems remains a major challenge, as existing benchmarking methods and tools fail to reproduce realistic workloads with inter-core contentions. This study introduces a benchmark framework for Linux-based embedded systems that integrates a synthetic task-set generation model capable of reproducing both computational and contention characteristics observed in real-world applications. Applying this benchmark to three Linux kernel variants on a 16-core embedded platform, we have identified distinct scalability patterns and contention sensitivities among kernel configurations. The results mainly demonstrate the framework’s capability to reveal performance characteristics under Linux, but the proposed methodology itself has high portability and extendability by design to support various multi-core platforms including the RTOS-based ones.

Keywords:

embedded system; Linux; benchmark; task-set generator

1. Introduction

Embedded systems have become an essential foundation of modern society, supporting applications ranging from consumer electronics to automotive and industrial control [1]. As embedded systems continue to evolve, their computational capability demands have grown substantially due to the increasing complexity of real-world tasks. To address these challenges, multi-core architectures have been widely adopted, offering the potential to boost overall performance by enabling parallel execution [2]. Recent processors for embedded domain often feature four or more cores, and some platforms even use many-core CPU with over a dozen cores [3,4].

However, the actual performance of tasks in a multi-core system is not simply determined by the hardware specification. For instance, it will also be influenced by the entire software stack including components like operating system, device drivers and middleware. Further, most of the embedded systems include some jobs with timing constraints, such as real-time control loops. The performance metrics of application design for such systems are not limited to average throughput but also must encompass timing-related factors such as execution delay and deadline satisfaction.

Traditionally, embedded systems, especially those for hard real-time tasks, are built upon MCUs (microcontroller units) with lightweight RTOS (real-time operating system) kernels. To guide system development and support application design, it is an effective and necessary approach to evaluate the platform characteristics using comprehensive benchmarking tools [5]. For RTOS-based multi-core environments, some studies have proposed benchmark methods and tools to examine various factors (e.g., inter-core resource contention, kernel locking mechanisms), and reveal useful insights on performance analysis and bottleneck identification [4,6].

Beside these RTOS-based systems, Linux has also become a major option in embedded environments equipped with multi-core SoC (system on a chip) to achieve the high performance and rich functionality required by more complex applications [7]. While Linux is not originally designed for real-time systems, its community is actively embracing useful capabilities to improve the support. For instance, the latest long term kernel (6.12 as of writing) has fully integrated the PREEMPT_RT real-time patchset and refined scheduling mechanisms to deliver lower latency and better determinism. However, existing benchmark tools for Linux mainly focus on evaluating overall throughput or computational efficiency rather than real-time performance metrics, leaving timing-related characteristics largely unexplored [8,9]. Technically, it is also possible to port those RTOS-oriented benchmark tools mentioned above to measure Linux, but they primarily employ fixed stress test scenarios for worst-case assessment, which cannot accurately reflect the performance of realistic application workloads on complex embedded systems. Some tools like MiBench [10] and TACLeBench [11] are usually categorized as realistic benchmarks for embedded systems because they provide workloads by taking computation kernels and algorithms from real-world applications. However, these programs are implemented with hard-coded static tasks so they cannot evaluate the performance with different active cores, timing behaviors, resource accesses and contentions.

In this paper, to fill the gaps left by existing approaches, we propose a new multi-core benchmark framework for Linux, which supports performance characteristics analysis with diversified tasks generated to simulate realistic workloads of computing and contention in embedded systems. The main contributions of our work are summarized as follows:

Benchmark methodology: The proposed methodology integrates resource contention model into task-set generation process, which can bridge the gap between theoretical models and executable benchmark programs to enable performance evaluation with various workloads created from controllable parameters.
Open-source framework: Building on this methodology, we design and implement a portable, open-source benchmark framework that automatically generates, compiles, and executes task sets on Linux-based platforms. It can also systematically iterate through parameter grids to produce highly diversified scenarios for analyzing the impacts of related factors quantitatively. The source files are available for reproducibility and extendibility (see “Data Availability Statement” for the repository).
Empirical case study: Using the proposed framework, we perform an experimental study on a 16-core embedded platform to comparatively examine Linux kernel with different versions and preemption modes. The evaluation results do not only demonstrate the effectiveness of our method for multi-core embedded systems, but also provide new insights about the real-time performance characteristics of Linux-based platforms.

The remainder of this paper is organized as follows. In Section 3, we present the proposed benchmarking methodology in detail, including the design of a contention-oriented task-set model, the pipeline of task-set generation and the implementation of the benchmarking framework. In Section 4, we describe the experiments to evaluate the performance characteristics of three Linux configurations and comparatively analyze the results to show the usefulness of our benchmark method. In Section 2, we review mainstream methodologies for performance evaluation in embedded systems and highlight their limitations in the context of multi-core evaluation. Finally, Section 5 concludes the paper and outlines directions for future work.

2. Related Work

Performance evaluation is a critical research area for embedded systems since they usually have requirements in real-time performance. Mainstream approaches fall into two categories: static analysis (e.g., WCET/WCRT) and measurement-based benchmarking. In the single-core era, these methodologies were relatively mature and widely applied. However, in multi-core contexts, both face new challenges, as inter-core contention makes reliable evaluation particularly difficult. This section reviews representative methods and their limitations.

2.1. WCET Analysis and WCRT Analysis

Worst-Case Execution Time (WCET) analysis, models feasible program paths together with micro-architectural effects to compute safe upper bounds on execution times, thereby providing timing guarantees at the program level. At the system level, schedulability analysis (WCRT) combines WCET estimates with scheduling interference and blocking delays due to synchronization in order to derive safe upper bounds on response times and to verify deadline adherence for a task set. These techniques have been well established in the single-core era and have provided solid theoretical guarantees. A representative example is the work of Reinhard Wilhelm [1], who systematically described the modeling and analysis of the WCET problem.

However, in multi-core systems, static analysis faces inherent challenges. The unpredictable effects of cross-core interference make it extremely difficult to derive reliable WCET or WCRT bounds [12,13]. As a result, although static analysis remains theoretically attractive, its practical applicability in multicore settings is severely limited.

2.2. Benchmarking Methods

Benchmarking is the representative measurement-based approach for real-time performance evaluation. We have investigated the existing tools and compared their features with our proposed method, as summarized in Table 1.

Rhealstone [14] is one of the most earliest tools developed for the assessment of fundamental system responsiveness by defining standardized real-time metrics (e.g., interrupt latency, context-switch time, semaphore operation overhead). Due to the lack of multi-core and realistic workload support, it can only show very basic performance of a simple embedded system design.

TASKers [15] is, to our knowledge, the only existing tool that utilizes synthetic task-set to simulate realistic workloads for evaluating real-time systems. However, it is particularly designed for a specific single-core RTOS standard without considering the extensibility for other platforms.

PARSEC [9] is a popular benchmark suite for multi-core UNIX-like platforms including Linux. It focuses on examining the scalability using a collection of programs from various real-world use cases. Because only programs highly optimized for maximal parallelism are included, the scenarios with notable resource contentions are out of scope. Moreover, many of its programs are from the high-performance computing domain and require a large amount of disk space and memory, which are not suitable for measuring embedded systems with constrained resources.

MiBench [10] is a widely used tool targeting the evaluation of computation performance for embedded systems. It collects programs of many common algorithms from real-world applications, including the areas of automotive, industrial control, consumer devices, office automation, networking, security, and telecommunications. All these programs are implemented only using standard C language so they can be easily ported to any platforms including Linux. However, it also means that this benchmark tool cannot assess the performance characteristics of operating system services and multi-core/multi-task applications.

TACLeBench [11] aims to build a realistic benchmark suite for the measurement and comparison of worst-case execution time analysis on real-time systems. It not only includes those computation kernels from existing tools like MiBench, but also provides more complex test programs involving multiple tasks and cores. Nevertheless, all these programs are fixed workloads with static resource usage and contention pattern, which are not suitable for evaluating the full capacity of a system.

RTESBench [4] is one of the state-of-the-art benchmarking tools designed to evaluate the performance of multi-core RTOS kernel under resource contention. It assigns a dedicated core to continuously call and measure a specific RTOS API each time and, simultaneously, uses the remaining cores to produce various contentions by calling other RTOS APIs to access kernel resources. It uses a modular and portable toolchain design able to be rapidly adapted to different hardware environments and operating systems, and has supported multiple representative RTOS-based platforms. The evaluation results have revealed that different contention intensities can impact the performance of RTOS services via architectural or kernel level bottlenecks, providing useful insights for system optimization. Its most significant limitation is the overly simplified task model, which always occupies 100% utilization of all cores to only execute RTOS APIs repeatedly. Such stress workloads cannot effectively capture the overall performance characteristics of a embedded system under regular scenarios, since typical tasks in real world are a mix of computation and RTOS API calls with different execution time and periods.

3. Materials and Methods

The primary goal of our methodology is to design benchmark workloads that resemble realistic tasks in embedded systems, to enable systematic analysis of multi-core performance characteristics. To achieve this goal, we first construct models to capture the common features of tasks in multi-core environment, with both resource contention and computation behaviors considered. Based on the models, we design an algorithm that can synthesize executable task sets according to user-specified parameters, enabling the generation of various benchmark scenarios. Finally, we implement a portable and extendable benchmark framework that integrates the generator with automated compilation, execution, and data collection so the user can effortlessly evaluate and analyze embedded systems under different environment settings.

3.1. Multi-Core Resource Contention Model

We first describe the resource contention model in multi-core systems. As illustrated in Figure 1, multiple tasks running on different cores may attempt to access the same shared resource concurrently. Such concurrent access inevitably introduces contention, which in turn may cause waiting and potential task delays. Most of the contended resources are kernel objects managed by the OS level. Especially for control-flow-oriented embedded systems, synchronization and communication primitives such as semaphores and data queues are heavily used, thereby becoming major sources of inter-core contention. It is noteworthy that many hardware components are shared implicitly, such as cache and memory bus, which should also be considered contended resources.

In theory, the delay caused by multi-core contention can be bounded by the worst-case scenario in which multiple tasks accessing the same resource serialize their entries into the critical section. In practice, however, the observed delays often far exceed this theoretical expectation, particularly when the number of cores is increasing. Several factors contribute to this phenomenon, including contention in multi-level cache hierarchies, operating-system overheads such as context switching and scheduling, as well as different system lock contention.

On the other hand, modern operating systems, including certain real-time operating systems, are increasingly adopting fine-grained locking designs. This trend changes the impact of resource access compared to traditional embedded systems that relied on a single global lock. With a global lock, any resource access may cause contention across all tasks in the system. By contrast, fine-grained locking introduces separate locks for each core or even for individual resources. While this design introduces additional overhead due to lock management, it significantly improves system parallelism. Experience from mature systems such as Linux has shown that fine-grained locking is beneficial for real-time performance, which has motivated several RTOSs to explore similar approaches. Differences in lock design introduce an additional dimension into the resource contention model. In addition to the number of cores and the frequency of resource accesses, one must also consider whether a task accesses private resources, i.e., those confined to its own core, or global resources shared across cores. As the contrast between global and fine-grained locking makes clear, these two types of access behave very differently in terms of contention. We therefore regard this distinction as a key factor in evaluating modern embedded systems that are evolving toward fine-grained locking.

We identify three primary factors that influence multicore contention in embedded systems: the number of cores, the proportion of resource accesses within parallel tasks, and the type of resource being accessed. In this work, we refer to accesses to private resources as paralleled access, whereas accesses to global resources are referred to as contentional access.

3.2. Task-Set Model

Building on the above contention model, we now integrate these factors into a formalized task-set model for multi-core evaluation. In real embedded systems, tasks typically consist of kernel API calls involving synchronization/communication, which explicitly introduce system-level contention, as well as conventional code, such as computation-intensive routines commonly found in control-flow programs. While the latter usually does not directly cause OS-level contention, it may still induce hardware-level contention through accessing to shared components like caches and memory buses. For clarity of modeling, we therefore abstract each task as a combination of NS (non-synchronization) segments and SY (synchronization) segments, where the NS ones represent typical computation workloads and the SY ones explicitly capture OS-level resource accesses.

It is important to note that not all synchronization segments necessarily lead to resource contention. For example, certain synchronization operations may only access resources that are private a specific core, thus avoiding inter-core interference. Therefore, to distinguish between different synchronization segments, we explicitly integrate the resource contention model into the task-set model. Specifically, each synchronization segment for a type of kernel resource is associated with two alternative API implementations. The CON (contended-access) version enforces access to a shared resource, thereby maximizing contention among tasks. By contrast, the PAR (parallel-access) version directs tasks to private resources confined to their own cores, enabling concurrent execution under fine-grained locking schemes. This distinction allows the model to reproduce a broader range of multi-core execution scenarios.

Considering the need to generate executable code for real-system evaluation, we further refine each segment into smaller units, referred to as fragments. A fragment represents a complete and self-contained operation. For non-synchronization segments, a fragment may correspond to a function implementing a mathematical computation or algorithmic routine. For synchronization segments, a fragment corresponds to a specific API invocation, such as a semaphore operation or a message queue access. Each segment is constructed by filling its allocated execution slot with an appropriate collection of fragments. We also include common parameters required for task sets in embedded system analysis, by referring to related works. Specifically, each task is associated with a worst-case execution time (WCET), period, and relative deadline. Incorporating these elements, we define the task-set model—together with its resource model—as follows.

Task-set definition.

Incorporating these elements, we formalize the task-set and resource models as follows. A task set is denoted as

T = {τ_{1}, τ_{2}, \dots, τ_{N}} .

Each task

τ_{i} \in T

is characterized by the tuple

τ_{i} = (C_{i}, T_{i}, D_{i}, {FN}_{i}, S_{i}),

with the parameters defined as

\begin{matrix} C_{i} & : Worst - case execution time of task τ_{i} under single - core, interference - free conditions, \\ T_{i} & : Period of τ_{i}, \\ D_{i} & : Deadline of τ_{i}, \\ {FN}_{i} & : Number of segments in τ_{i}, \\ S_{i} & : Ordered segment sequence of τ_{i}, S_{i} = 〈 σ_{i, 1}, \dots, σ_{i, K_{i}} 〉, K_{i} = {FN}_{i} . \end{matrix}

Each segment

σ_{i, j}

is

σ_{i, j} = (t_{i, j}, r_{i, j}, a_{i, j}, F_{i, j}),

with the parameters defined as

\begin{matrix} t_{i, j} & : Segment type, t_{i, j} \in {NS, SY}, \\ r_{i, j} & : Resource associated with the segment (r_{i, j} = ⊥ if t_{i, j} = NS), \\ a_{i, j} & : Access mode, a_{i, j} \in {CON, PAR} if t_{i, j} = SY; a_{i, j} = ⊥ otherwise, \\ F_{i, j} & : Multiset of fragments filling σ_{i, j} ’ s execution time . \end{matrix}

For

F_{i, j}

:

F_{i, j} = \{\begin{matrix} {t n_{i, j}^{(1)}, \dots, t n_{i, j}^{(n_{i, j})}}, & if t_{i, j} = NS, \\ {t r_{i, j}^{(1)}, \dots, t r_{i, j}^{(m_{i, j})}}, & if t_{i, j} = SY . \end{matrix}

For synchronization segments (

t_{i, j} = SY

), the access mode and resource class are constrained by

\begin{matrix} a_{i, j} = CON & \Rightarrow r_{i, j} \in R^{glob}, \\ a_{i, j} = PAR & \Rightarrow r_{i, j} \in R_{c}^{priv} for some c . \end{matrix}

The resource set is

R = R^{glob} \cup R^{priv}, R^{priv} = ⊎_{c = 1}^{M} R_{c}^{priv},

with the parameters defined as

\begin{matrix} R^{glob} & : Resources shared across all cores, \\ R_{c}^{priv} & : Resources private to core c, \\ M & : Number of active cores . \end{matrix}

3.3. Task-Set Generator

The task-set model effectively captures both common realistic embedded-task characteristics and the multi-core contention model. In this section, we will design a code generator to actually produce executable C programs from this model. The workflow comprises four stages:

(1)

User input

The generator is configured by the following user inputs:

Cores and tasks $(M, N)$ : number of active cores M and total number of tasks N.
Segment cap per task $(FN)$ : an upper bound on the number of segments per task; for each $τ_{i}$ , the segment count satisfies $K_{i} \leq FN$ .
WCET bounds $([\underset{̲}{C}, \bar{C}])$ : lower/upper bounds for sampling task WCETs $C_{i}$ from a user-specified distribution with support $[\underset{̲}{C}, \bar{C}]$ .
Synchronization share $(ρ \in [0, 1])$ : controls the proportion of synchronization (SY) within each task as an expected share of total segment time.
Access-mode mix within SY $(α \in [0, 1])$ : controls the ratio between contended (global-resource) and parallel (core-private) APIs inside SY segments. We interpret $α$ as the expected fraction instantiated as contended (CON), with $1 - α$ instantiated as parallel (PAR).

Remark. Parameters

ρ

and

α

jointly shape OS-level contention:

ρ

sets how much of a task is synchronization-driven, while

α

sets the CON:PAR mix within those SY portions.

(2)

Task-set synthesis

Unlike utilization-driven generators (e.g., UUniFast and related optimizers), we currently focus on contention effects rather than schedulability frontiers. For each task

τ_{i}

, we independently sample its worst-case execution time (WCET)

C_{i}

from a user-specified distribution

D_{C}

, and partition

C_{i}

into

{FN}_{i}

segments with randomized lengths

{ℓ_{i, 1}, \dots, ℓ_{i, {FN}_{i}}}

such that

\sum_{j} ℓ_{i, j} = C_{i}

, and the segment type (NS or SY) of

{FN}_{i}

is constrained by

ρ

. We then fix the timing parameters by

T_{i} = 10 C_{i}, D_{i} = T_{i}, u_{i} = \frac{C_{i}}{T_{i}} = 0.1 .

This “ten-times” heuristic, common in control-oriented embedded workloads where the nominal period substantially exceeds single-core compute time, standardizes load and suppresses utilization-induced schedulability pressure; as a result, observed differences are primarily attributable to inter-core resource contention (e.g., access modes and resource classes) rather than to the utilization distribution itself. In contention-heavy regimes, additional blocking, cache/memory interference, and lock serialization often dominate the baseline compute time and may push response times beyond the nominal budget. For completeness, the generator also provides an optional utilization-driven mode that allocates

{u_{i}}

in a UUniFast-style manner; this mode is included for users who require it, although it is not yet evaluated in this study.

(3)

Segment realization using fragments

To instantiate the model as executable task sets, for each segment created in the previous step, the generator will fill it with a list of concrete fragments drawn from the following two disjoint pools, according to its segment type. (i) Non-synchronization (NS) pool. We source compute kernels from TACLeBench [11] that emphasize arithmetic and algorithmic routines while avoiding data-dependent I/O and OS interactions. To support accurate time matching, we select compute kernels spanning multiple execution scales (via problem-size parameters). Each compute kernel is wrapped into a side-effect–free function with fixed interfaces and no external I/O, enabling standardized inlining into NS segments. (ii) Synchronization (SY) pool. Building on the analysis of common resource-access APIs in real-time embedded system [4], we select some of the most representative synchronization/communication primitives (e.g., semaphores, mutexes, message queues) in Linux kernel. For each primitive, we provide two implementations: a parallel variant bound to core-local (private) resources and a contended variant bound to globally shared resources. Table 2 summarizes the current NS and SY fragment pools we implemented to demonstrate the generator. The generator includes a microbenchmark to calibrate those fragments. When running on a new system, every candidate fragment in the pools will be automatically measured under single-core, interference-free conditions to obtain its baseline execution time. The measurement results are stored as a per-platform calibration map and subsequently used to match segment budgets.

For segment realization, the generator fills each segment’s time budget with a homogeneous set of fragments of the appropriate type. We employ a randomized packing heuristic: fragments are sampled from a size-stratified catalog to promote diversity, and selection proceeds until the cumulative measured cost matches the target duration within a small tolerance. Within SY segments, the fragment version (parallel vs. contended) is selected under the user-specified

α

parameter, which constrains the ratio of PAR:CON mix, and the corresponding resource binding (private vs. global) follows from this choice. This pipeline yields the source code for each segment with concrete fragments filled according to the user parameters. Figure 2 shows an example of an executable task with segments generated and fragments filled.

(4)

Executable benchmark program generation

The synthesized artifacts of task set from the above pipeline is further passed to a Python 3 script to render a C project template in Jinja2 engine. The template provides the minimal runtime skeleton required by the target system to compile and build the benchmark program into an executable binary file.

3.4. Benchmark Framework

In this section, we present the benchmark framework, which leverages the task-set generator above to evaluate across the parameter space and enable a comprehensive system-level perspective. With the generator, we can instantiate task sets for any specified contention configuration

(M, ρ, α)

, namely, the number of active cores M, the synchronization share within tasks

ρ

, and the access-mode mix within synchronization

α

. Our framework can iterate the full

M \times ρ \times α

grid to measure the evaluation metrics explained below for real-time performance analysis.

Figure 3 presents the overall architecture of the benchmark framework. The workflow begins by collecting user inputs, including the chosen evaluation mode and the number of active cores M. Currently the two modes are supported as follows.

Mode 1: systematic profiling. We profile task-set behavior over the parameter space

(M, ρ, α)

, where

ρ

is the synchronization share and

α

is the contended-access ratio. Given user-specified grid resolutions for

ρ

and

α

, the tool instantiates a predetermined number of task sets at every grid point and aggregates performance metrics per point. Python drivers invoke the task-set generator to instantiate concrete programs and use build-and-run scripts to compile and execute them. The tool calculates and visualizes global summaries and preserves per-run artifacts for inspection and reproducing.

Mode 2: randomized stress testing. This mode includes two variants: (2a) fully randomized

(M, ρ, α)

per task set; and (2b) fixed M with randomized

ρ

and

α

. As in Mode 1, the pipeline is implemented in Python and integrates code generation, compilation, and execution, with metrics and raw outputs retained as separate files.

Performance metrics: Because the task-set design adopts

D_{i} = T_{i} = 10 C_{i}

, we complement the standard deadline-miss rate with a delay ratio

DelayRatio = \underset{task sets at the point}{AVERAGE} (\frac{observed response time}{nominal C_{i}}) .

In addition, we define the deadline miss ratio at each grid point by aggregating all jobs across all instantiated task sets and then taking the fraction:

DeadlineMissRatio = \frac{\sum_{s \in S_{g}} |J^{miss} (s)|}{\sum_{s \in S_{g}} |J (s)|},

where

S_{g}

is the collection of task-set runs at grid point g,

J (s)

is the multiset of all jobs released during run s, and

J^{miss} (s) = {j \in J (s) ∣ completion (j) > deadline (j)}

. A “job” denotes one instance of a periodic task over a period (one release–deadline pair).

The variation of delay and deadline miss ratio over

(M, ρ, α)

yields the system’s multi-core performance signature. In Mode 2, we report (i) the mean delay and deadline miss ratio across all randomized task sets and (ii) their per–task-set distributions, which together summarize system capability under randomized workloads. The tool will record all statistics and raw outputs (e.g., summaries, per-run logs) into standalone files.

The entire design and implementation of the benchmark framework are based on hardware-independent abstractions of proposed models, which ensures a high degree of portability across platforms. The benchmark tool itself is fully implemented in Python: given user-specified parameters, it generates the required contention grid points defined by

(M, ρ, α)

, invokes the task-set generator to produce the corresponding number of task sets, and subsequently compiles and executes them according to collect the measurement data. This feature allows the user to easily reproduce benchmarking across different environments for comparative analysis.

For Mode 1, the only required inputs are the grid resolution, the number of active cores to be tested, and the number of task sets to instantiate per grid point. For Mode 2, the user only specifies whether the number of active cores is fixed and the total number of task sets to be generated. The benchmark then creates a task-set directory, records execution results for each run, and aggregates global metrics. The summarized data files (grid-based metrics for Mode 1 and overall task-set statistics for Mode 2) are stored as CSV files in the working directory. In addition, the framework provides visualization utilities: 3D surface plots are used to illustrate multicore performance signatures, while histograms depict the distribution of metrics across task sets.

In this way, we have constructed a benchmark framework grounded in a task-set model that integrates both realistic embedded-task characteristics and a multi-core contention model. Mode 1 of our framework mainly enables performance characterization of a specific platform, by profiling the overall behavior of a target system under varied contention scenarios. Mode 2, in contrast, randomly generates many task sets under constrained conditions, which can be used together with Mode 1 to facilitate cross-platform comparisons and highlight performance differences among these platforms. We believe that this framework not only fills the gap in multi-core embedded evaluation by explicitly accounting for inter-core contention and examining the performance characteristics of Linux, but also provides useful support for future research and development aimed at understanding and improving embedded systems.

4. Results

4.1. Evaluation Experiments

To validate the design of the proposed benchmark methodology and investigate the open question regarding the multi-core capabilities across different Linux configurations, we conduct comparative experiments on the same hardware under multiple kernel settings. The environment is summarized in Table 3.

The LX2160 is a 64-bit Armv8-based SoC for embedded system released by NXP, integrating 16 Cortex-A72 cores with a maximum clock frequency of 2.2 GHz. It has state-of-the-art shared cache hierarchy and high core count, making it a representative platform for evaluating the impact of Linux kernel design under various contention scenarios.

Because the internal lock design and synchronization mechanisms of Linux are very complex, characterizing the multi-core behaviors across different kernel configurations is a central task for understanding and optimizing Linux-based platform. In this study, we first measure Linux 5.15 PREEMPT (i.e., Low-Latency Desktop Preemption Model) as the baseline, which is the official kernel of LX2160 SDK and the final longterm version of 5.x series. We then compare the baseline with Linux 6.12 PREEMPT, the latest longterm kernel, to assess the impact of kernel evolution from 5.x to 6.x on multi-core performance. In addition, Linux 6.12 PREEMPT_RT (i.e., Fully Preemptible Kernel) is also evaluated to examine the practical effectiveness of the real-time patches.

To ensure consistency across all experiments, the LX2160 platform is fixed to performance mode at 2 GHz, and all code is compiled with the -O2 optimization level. Task priorities and scheduling policy are uniformly set to SCHED_FIFO. Tasks within each task set are bound to cores 0–15 in a round-robin manner according to their generation order using the affinity function. Other aspects, such as timing measurement and code warm-up, are also standardized. In this implementation, we deliberately focus on reproducing contention scenarios so configurations related to IRQ handling and similar kernel parameters are retained at their default settings. This choice is close to typical deployment environments, thereby improving the practical relevance of our evaluation.

We use the microbenchmarks in the generator to measure the baseline execution time of all segments. To eliminate hardware-related interference and avoid over-optimistic values, we explicitly cleared caches, TLBs, and branch predictors after each execution. The results of each fragment (average values calculated from 1000 iterations) are summarized in Table 4.

After establishing the baseline, our experiments employ both Mode 1 and Mode 2 provided by the benchmark framework as follows.

Mode 1: We generate 16 three-dimensional plots, each corresponding to a different number of active cores, showing how the delay ratio and deadline miss ratio vary across grid points defined by

ρ

and

α

. The grid resolution is set to 0.1, and 10 task sets are generated per grid point to ensure that randomness does not bias the results. Each task set is forced to execute for one second before termination, guaranteeing that every task runs at least 200 times and thereby providing sufficient diversity in contention.

Mode 2: We evaluate task-set distributions under randomized conditions. In Mode 2a, 3000 task sets are generated for core counts of 4, 8, 12, and 16, allowing us to examine the distributions of delay ratios and deadline miss ratios under these configurations. In Mode 2b, 5000 task sets are generated with all variables randomized. Histograms are then used to characterize the distributions of delay ratios and deadline miss ratios in both cases.

After task sets are generated by the framework, we use the identical benchmark programs to evaluate Linux 5.15 PREEMPT, Linux 6.12 PREEMPT and Linux 6.12 PREEMPT_RT to ensure a fair cross-version comparison.

4.2. Linux 5.15 Performance Characteristics

In this section, we first present the results of systemic profiling (i.e., Mode 1) measured under Linux 5.15 PREEMPT to show how the real-time performance is impacted by the task sets generated from

(M, ρ, α)

parameters. Figure 4 compares the average delay ratio under different M (active cores) using the three-dimensional plots, and the observations are as follows.

Observation 1: The benchmark framework can correctly generate tasks from the specified model parameters. In the single-core configuration

(M = 1)

, since no inter-core contention is expected, the overall timing behavior should purely correspond to the generated fragments. The workloads have exhibited average delay ratio close to 1.0, which means the execution times of task sets are consistent with the target WCET parameter. Although it is hardly noticeable in the figure, some task sets actually have delay ratio around 0.9 (i.e., 10% faster than WCET). The results are as expected since the WCETs of baseline fragments are measured in cold state while the real tasks can benefit from cache hit.

Observation 2: Linux shows excellent scalability when the number of active cores is small. In the quad-core configuration

(M = 4)

, the average delay ratio remains close to the single-core baseline, which indicates that performance of the target system scales efficiently with minimal overhead. The maximum delay ratio does not exceed 1.2, even under the most extreme contended condition (

ρ = α = 1)

. This result suggests that the current fine-grained locking design of the Linux kernel can effectively mitigate the impact of inter-core contention at least for many budget embedded systems with less than 4 cores, which leads to a possible advantage over those RTOSs using coarse-grained lock.

Observation 3: The performance begins to suffer from heavy synchronization workloads as the number of cores further increases. In the 8-core configuration

(M = 8)

, the delay is still ignorable in most cases. However, unlike the quad-core setting, the ratio rises sharply to about 10 under synchronization-intensive conditions (

ρ > 0.7, α > 0.7

). This indicates that, even though fine-grained locking is employed, other factors (e.g., context switching, memory bus bandwidth, cache coherency) start to dominate the delay in execution time when tasks on multiple cores frequently invoke system calls for synchronization. Even though, the overall scalability still remains acceptable since many systems do not require a large amount of synchronization.

Observation 4: Linux encounters severe performance bottlenecks in many-core environment. In the 16-core configuration

(M = 16)

, delays emerge significantly even under moderate contention levels (

ρ > 0.3, α > 0.3

). This suggests that inter-core interference, shared resource contention, and OS-level synchronization overhead collectively limit scalability beyond a certain point. It means that the system’s efficiency declines rapidly in the many-core regime, revealing inherent bottlenecks in the operating system and hardware components.

In addition to the delays, the average deadline miss ratio is also a key metric to evaluate real-time embedded systems, which is shown in Figure 5 for comparison.

Observation 5: The techniques of cross-core interference mitigating is important to support real-time systems. The tendency of results is very similar to the average delay ratio, which the performance problem begins to appear under extremely contended scenarios with 8 cores. However, in high-end embedded systems with many cores, most of the cores are typically used to process computing-intensive tasks while the share of critical tasks with a firm deadline is relatively small. Therefore, it is a common practice to ensure real-time performance by statically assigning a few cores dedicated for running those critical tasks only. If we allocate up to 4 cores for them to minimize the deadline miss, it means that the computing-intensive workloads can run on at least 12 cores on our target system, which is already possible to cause performance degradation according to Observation 3 and 4. Under those circumstances, the technique to avoid such degradation from interfering the critical tasks become a key to meet the deadline requirement, as also noticed in some latest studies [16].

Figure 6 summarizes the evaluation results measured in Mode 2 (i.e., randomly picking parameters to generate task sets), by plotting the distributions of delay ratio and deadline miss ratio. For delay ratio, we show the percentile to emphasize how the tendency changes. For deadline miss ratio, the cumulative values are used since they are important to the design of real-time systems.

Observation 6: The benchmark framework provides an intuitive way to visualize the overall performance characteristics of the target system. While Mode 1 can support detailed analysis by producing 3D plots for each core, it also has two main limitations: (1) all tasks in a task set sharing the same

ρ

and

α

parameters are rare in real applications, (2) it is hard to summarize the overall performance from many individual graphs. On the other hand, Mode 2 enables large-scale randomized experiments across various parameters and can present the results in a single figure. It does not only provide a comprehensive understanding of the current system under diverse workload conditions, but is also useful to compare overall performance trends on different platforms.

4.3. Comparison of Different Linux Configurations

In this section, we focus on evaluating the target system under three kernel configurations: Linux 5.15 PREEMPT, Linux 6.12 PREEMPT and Linux 6.12 PREEMPT_RT. The purpose is to examine the practical multi-core performance changes caused by version upgrades and preemption modes, which has rarely been explored in prior studies to the best of our knowledge.

We first compare Linux 5.15 PREEMPT with Linux 6.12 PREEMPT to examine the impact of the version upgrade on multi-core performance, by using heatmaps to capture the differences across grid nodes as shown in Figure 7.

Observation 7: Linux 6.12 shows a minor improvement in many-core scenario compared to Linux 5.15. When the number of cores is relatively small (

M < 12

), contention remains limited and the two kernels exhibit nearly identical behavior, with Linux 6.12 PREEMPT performing slightly worse. This minor degradation may come from the increased overhead associated with the more complex design of the newer kernel. As the core count increases, however, Linux 6.12 PREEMPT shows a modest advantage, suggesting that its multi-core mechanisms are more effective.

We then compare Linux 6.12 PREEMPT with Linux 6.12 PREEMPT_RT in order to quantify the effect of the popular real-time patches, as the results shown in Figure 8.

Observation 8: PREEMPT_RT kernel remarkably outperforms PREEMPT under contentions. Linux 6.12 PREEMPT_RT consistently beats Linux 6.12 PREEMPT across most contention scenarios, demonstrating substantially lower delays. However, in the extreme case of 16-core where contention intensity reaches its maximum, the performance of PREEMPT_RT is slightly worse than that of PREEMPT. This degradation is likely due to the overhead introduced by real-time patches becoming dominant once contention is fully saturated. Nevertheless, taken as a whole, PREEMPT_RT exhibits significantly better performance in multi-core contention scenarios and provides clear advantages over its non-RT counterpart.

Finally, we use Mode 2 to generate 5000 task sets in a fully randomized manner, and employ these identical task sets to compare the overall performance of these three configurations. Figure 9 presents the distributions of deadline miss ratio with zero values (i.e., task executions without missing its deadline) excluded.

Observation 9: PREEMPT_RT kernel effectively reduces overall deadline misses in most cases. Across the 5000 generated task sets, we observe that in the vast majority of cases, specifically for the ratio is below 95%, the chances of deadline miss are noticeably lower in PREEMPT_RT than the other two configurations. This provides further evidence of the superior performance of PREEMPT_RT under non-extreme conditions. However, in the extreme tail of the distribution where ratio exceeds 95%, PREEMPT_RT exhibits a higher frequency of deadline misses. This finding aligns with the earlier observations, indicating that under saturated contention the intrinsic overhead of real-time patches may dominate, leading to increased delays.

4.4. Additional Experiments

As described in Section 3.3, the default deadline parameter is set to ten times of WCET (i.e., 10% utilization). For reference purpose, we have also compared the deadline miss ratio on the three kernel configurations using deadline parameter from 2× to 10× under Mode 2b with all 16 cores active and 5000 task sets generated for each parameter level. According to the results summarized in Figure 10, all configurations show significant increase in deadline miss ratio as the utilization level gains. When the deadline parameter is 2× WCET (i.e., 50% utilization), even PREEMPT_RT will miss 35% deadline of all jobs. It indicates that, at least on our 16-core environment, current Linux is difficult to achieve reasonable performance for real-time systems if all cores have tasks with relatively tight deadline requirement. Meanwhile, under the 10× deadline parameter setting, all three configurations show less than 4% miss rate, which should be considered acceptable for many soft real-time use cases. Therefore, we believe 10× should be a good default value to generate useful task sets for evaluating systems with a large number of cores.

The purpose of our benchmark tool is to generate task sets that can trigger performance bottlenecks due to various hardware and software factors, but itself does not integrate the functionality to analyze such factors. Meanwhile, the user can always profile these task sets with existing well-established tools such as perf command for detailed analysis. To demonstrate this approach, we have used perf to record the software events and hardware performance counters for over 2000 task sets. Figure 11 plots three events (branch-misses, context-switches and cache-misses), each showing a different correlation pattern to the delay ratio. The results indicate that the branch-misses events have almost no impact on performance, while the cache-misses events show a clear logarithmic-like correlation. As for the context-switches events, they can introduce a medium amount of delay but are less relevant to those task sets with very high delay. This experiment confirms that our benchmark tool can easily cooperate with existing profilers to analyze the factors causing performance bottlenecks.

5. Discussion

This study has proposed a benchmark methodology and framework for embedded system that integrates realistic task-set modeling with explicit multi-core contention representation. Unlike conventional benchmark suites which mainly provide fixed sequential or application-level workloads, our approach provides a controllable means to reproduce and analyze inter-core contention behaviors under realistic scenarios, especially suitable for those soft real-time systems such as compute-dominated IoT devices with mixed-criticality and core shielding design. By embedding OS-level synchronization primitives into parameterized task sets, the framework captures the combined effects of computation and contention, bridging the gap between theoretical models and executable performance evaluation workloads.

Through experiments on the 16-core NXP LX2160 SoC, we have applied this benchmark to assess the scalability and real-time characteristics of three representative Linux configurations (Linux 5.15 PREEMPT, Linux 6.12 PREEMPT and Linux 6.12 PREEMPT_RT). The results demonstrated several key insights. First, the benchmark reliably created task sets with performance trends consistent with the designed parameters, showing high reproducibility of the random generator. Second, Linux exhibited strong scalability up to moderate core counts, but experienced increasing performance degradation in heavy synchronization scenarios, particularly in many-core configurations. Finally, PREEMPT_RT remarkably reduced delays and deadline misses compared to the non-RT kernel configurations, confirming its effectiveness in mitigating interference and contention, though at the cost of slight overhead under saturated workloads.

An additional strength of the framework lies in its portability and extensibility. Because the task model itself is platform-agnostic by design, it can be easily migrated to other environments by substituting these OS-specific fragments and the system template. This flexibility enables not only cross-version Linux comparisons but also potential extensions to RTOS-based platforms. For POSIX-like RTOS, it should be able to run by simply replacing Linux-dependent service calls in fragments with equivalent ones. For RTOSs with other API specifications, it is also required to implement the interfaces of task creation, multi-core scheduler parameters, resource allocation and timer control. Such cross-OS evaluations can provide more valuable insights into the impact of factors like locking granularity, scheduling policies and heterogeneous architectures on real-time performance.

Nevertheless, several limitations remain within this work. The current task-set model focuses on OS-level contention and has not directly incorporated detailed micro-architectural factors such as cache coherence, bus arbitration, or memory controller latency. Moreover, while the evaluation across Linux versions has discovered performance bottlenecks, the root cause analysis (e.g., bad kernel design or hardware limit?) is still required to address them. Finally, the number and diversity of benchmark fragments should also be expanded to further enhance the generality of evaluation results.

Future work will address these limitations above in the following directions. We first plan to utilize the performance counters to analyze the relation between benchmark workloads and hardware events, and refine the task-set model and generator to integrate these hardware factors. With more performance metrics added into the framework, we can further improve the analysis and visualization tools to assist the rapid identification of root cause for performance issues. By collecting and characterizing the trace data from industrial use cases, it is possible to expand the fragment pools with more representative workloads. To enhance the usefulness of our framework, we will also port it to RTOS-based platforms since they are preferred over Linux in many energy-efficient and mission-critical embedded systems.

Author Contributions

Conceptualization, Y.X. and Y.L.; methodology, Y.X. and Y.L.; software, Y.X.; validation, Y.X.; formal analysis, Y.X.; investigation, Y.X.; resources, Y.X.; data curation, Y.X.; writing—original draft preparation, Y.X.; writing—review and editing, Y.L.; visualization, Y.X. and Y.L.; supervision, Y.L. and H.T.; project administration, Y.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The related files including the full source code of the proposed benchmark framework are released in publicly accessible repository at https://github.com/exiao9605/Taskset-based-benchmark (accessed on 16 November 2025).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Ferdinand, C.; Heckmann, R.; Langenbach, M.; Martin, F.; Schmidt, M.; Theiling, H.; Thesing, S.; Wilhelm, R. Reliable and precise WCET determination for a real-life processor. In Proceedings of the International Workshop on Embedded Software, Tahoe City, CA, USA, 8–10 October 2001; pp. 469–485. [Google Scholar]
Kornaros, G. Multi-Core Embedded Systems; CRC Press: Boca Raton, FL, USA, 2018. [Google Scholar]
Bieber, P.; Boniol, F.; Bouchebaba, Y.; Brunel, J.; Pagetti, C.; Poitou, O.; Polacsek, T.; Santinelli, L.; Sensfelder, N. A model-based certification approach for multi/many-core embedded systems. In Proceedings of the ERTS 2018, Toulouse, France, 31 January–2 February 2018. [Google Scholar]
Xing, Y.; Li, Y.; Takada, H. rtesbench: A Multi-core Benchmark Framework for Real-Time Embedded Systems. In Proceedings of the 26th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems, Seoul, Republic of Korea, 16–17 June 2025; pp. 62–72. [Google Scholar]
Sharma, M.; Elmiligi, H.; Gebali, F. Performance Evaluation of Real-Time Systems. Int. J. Comput. Digit. Syst. 2015, 4, 43–53. [Google Scholar] [CrossRef] [PubMed]
Arm, J.; Baštán, O.; Mihálik, O.; Bradáč, Z. Measuring the performance of freertos on esp32 multi-core. IFAC-PapersOnLine 2022, 55, 292–297. [Google Scholar] [CrossRef]
Reghenzani, F.; Massari, G.; Fornaciari, W. The real-time linux kernel: A survey on preempt_rt. ACM Comput. Surv. (CSUR) 2019, 52, 1–36. [Google Scholar] [CrossRef]
Bienia, C.; Kumar, S.; Li, K. Parsec vs. splash-2: A quantitative comparison of two multithreaded benchmark suites on chip-multiprocessors. In Proceedings of the 2008 IEEE International Symposium on Workload Characterization, Seattle, WA USA, 14–16 September 2008; pp. 47–56. [Google Scholar]
Bienia, C.; Kumar, S.; Singh, J.P.; Li, K. The PARSEC benchmark suite: Characterization and architectural implications. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, Toronto, ON, Canada, 25–29 October 2008; pp. 72–81. [Google Scholar]
Guthaus, M.R.; Ringenberg, J.S.; Ernst, D.; Austin, T.M.; Mudge, T.; Brown, R.B. MiBench: A free, commercially representative embedded benchmark suite. In Proceedings of the Fourth Annual IEEE International Workshop on Workload Characterization, Austin, TX, USA, 2 December 2001; WWC-4 (Cat. No. 01EX538). pp. 3–14. [Google Scholar]
Falk, H.; Altmeyer, S.; Hellinckx, P.; Lisper, B.; Puffitsch, W.; Rochange, C.; Schoeberl, M.; Sørensen, R.B.; Wägemann, P.; Wegener, S. TACLeBench: A benchmark collection to support worst-case execution time research. In Proceedings of the 16th International Workshop on Worst-Case Execution Time Analysis, Toulouse, France, 5 July 2016. [Google Scholar]
Li, M.; Xiao, K.; Zhou, Y.; Huang, D. WCET Analysis Based on Micro-Architecture Modeling for Embedded System Security. Appl. Sci. 2024, 14, 7277. [Google Scholar] [CrossRef]
Shah, H.; Coombes, A.; Raabe, A.; Huang, K.; Knoll, A. Measurement based WCET analysis for multi-core architectures. In Proceedings of the 22Nd International Conference on Real-Time Networks and Systems, Versaille, France, 8–10 October 2014; pp. 257–266. [Google Scholar]
Halang, W.A.; Gumzej, R.; Colnaric, M.; Druzovec, M. Measuring the performance of real-time systems. Real-Time Syst. 2000, 18, 59–68. [Google Scholar] [CrossRef]
Eichler, C.; Distler, T.; Ulbrich, P.; Wägemann, P.; Schröder-Preikschat, W. Taskers: A whole-system generator for benchmarking real-time-system analyses. In Proceedings of the 18th International Workshop on Worst-Case Execution Time Analysis (WCET 2018), Barcelona, Spain, 3 July 2018; pp. 6:1–6:12. [Google Scholar]
Deng, Z.; Zhang, Z.; Li, D.; Guo, Y.; Ye, Y.; Ren, Y.; Jia, N.; Hu, X. Interference-free Operating System: A 6 Years’ Experience in Mitigating Cross-Core Interference in Linux. In Proceedings of the 2024 IEEE Real-Time Systems Symposium (RTSS), York, UK, 10–13 December 2024; pp. 308–321. [Google Scholar]

Figure 1. Resource access model in multi-core systems.

Figure 2. Example of a task generated with fragments filled.

Figure 3. The overview of benchmark framework design.

Figure 4. Average delay ratio of Linux 5.15 PREEMPT under different M (active cores),

ρ

(SY share) and

α

(contended SY fragments share).

Figure 4. Average delay ratio of Linux 5.15 PREEMPT under different M (active cores),

ρ

(SY share) and

α

(contended SY fragments share).

Figure 5. Average deadline miss ratio of Linux 5.15 PREEMPT under different M (active cores),

ρ

(SY share) and

α

(contended SY fragments share).

Figure 5. Average deadline miss ratio of Linux 5.15 PREEMPT under different M (active cores),

ρ

(SY share) and

α

(contended SY fragments share).

Figure 6. Summary of Linux 5.15 PREEMPT’s overall performance characteristics evaluated with random task sets generated by Mode 2.

Figure 7. Comparison of differences in average delay ratio (Linux 6.12 PREEMPT − Linux 5.15 PREEMPT).

Figure 8. Comparison of differences in average delay ratio (Linux 6.12 PREEMPT_RT − Linux 6.12 PREEMPT).

Figure 9. Histogram to compare the distributions of deadline miss ratio under Linux 5.12 PREEMPT, Linux 6.12 PREEMPT and Linux 6.12 PREEMPT_RT.

Figure 10. Comparison of the average deadline miss ratio under different utilization levels with 16 active cores.

Figure 11. Summary of perf events recorded during the execution of each generated task set.

Table 1. Comparison of existing benchmark tools with our proposed method.

	Target Domain			Benchmark Design
	Multi-Core Environment	Linux OS	Real-Time System	Realistic Computation	Kernel API	Resource Contention	Synthetic Task-Set	Modular Framework
Rhealstone [14]			✓		✓
TASKers [15]			✓	✓	✓		✓
PARSEC [9]	✓	✓		✓	✓			✓
MiBench [10]		✓		✓
TACLeBench [11]	✓	✓	✓	✓
RTESBench [4]	✓		✓		✓	✓		✓
Proposed method	✓	✓	✓	✓	✓	✓	✓	✓

Table 2. Current fragment pools implemented to demonstrate the segment realization.

Type of Pool	ID	Fragment Name	Functionality
Non-synchronization	0	EmptyLoop	Empty loop
	1	BinSearch	Binary search
	2	BitCountK	Bit count Kernighan
	3	BitCountL	Bit count Lookup
	4	BitonicSort	Bitonic sort
	5	BubbleSort	Bubble sort
	6	CntNegative	Count negative
	7	Dijkstra	Dijkstra path
	8	MatProduct	Matrix product
Synchronization	0	Semaphore	Semaphore get/release
	1	DataQueue	Data queue get/release
	2	Mutex	Mutex lock/unlock
	3	SpinLock	Spin lock lock/unlock
	4	RWLock	RW lock lock/unlock
	5	EventFD	Eventfd get/release

Table 3. Experimental platform and system configuration.

Category	Parameter	Configuration
Hardware	SoC	NXP LX2160
	CPU Architecture	64-bit Armv8 (Cortex-A72)
	Core Count	16 cores
	Frequency	Up to 2.2 GHz
	Shared Cache	8 MB L2
Software	OS Version	Linux 5.15 PREEMPT
		Linux 6.12 PREEMPT
		Linux 6.12 PREEMPT_RT
	Compiler	gcc, optimization level `-O2`
Configuration	CPU Mode	Performance mode (fixed at 2.0 GHz)
	Scheduling Policy	`SCHED_FIFO`, static priorities
	Task Mapping	Round-robin affinity

Table 4. Baseline execution time of benchmark fragments (in μs).

Fragment Name	Execution Time (μs)
EmptyLoop	0.04
BinSearch	0.40
BitCountK	0.04
BitCountL	0.20
BitonicSort	3.64
BubbleSort	21.64
CntNegative	6.56
Dijkstra	0.56
MatProduct	3.40
Semaphore	6.4
DataQueue	3.8
Mutex	8.0
SpinLock	5.2
RWLock	6.0
EventFD	20.0

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Xing, Y.; Li, Y.; Takada, H. A Multi-Core Benchmark Framework for Linux-Based Embedded Systems Using Synthetic Task-Set Generation. Electronics 2025, 14, 4515. https://doi.org/10.3390/electronics14224515

AMA Style

Xing Y, Li Y, Takada H. A Multi-Core Benchmark Framework for Linux-Based Embedded Systems Using Synthetic Task-Set Generation. Electronics. 2025; 14(22):4515. https://doi.org/10.3390/electronics14224515

Chicago/Turabian Style

Xing, Yixiao, Yixiao Li, and Hiroaki Takada. 2025. "A Multi-Core Benchmark Framework for Linux-Based Embedded Systems Using Synthetic Task-Set Generation" Electronics 14, no. 22: 4515. https://doi.org/10.3390/electronics14224515

APA Style

Xing, Y., Li, Y., & Takada, H. (2025). A Multi-Core Benchmark Framework for Linux-Based Embedded Systems Using Synthetic Task-Set Generation. Electronics, 14(22), 4515. https://doi.org/10.3390/electronics14224515

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Multi-Core Benchmark Framework for Linux-Based Embedded Systems Using Synthetic Task-Set Generation

Abstract

1. Introduction

2. Related Work

2.1. WCET Analysis and WCRT Analysis

2.2. Benchmarking Methods

3. Materials and Methods

3.1. Multi-Core Resource Contention Model

3.2. Task-Set Model

3.3. Task-Set Generator

3.4. Benchmark Framework

4. Results

4.1. Evaluation Experiments

4.2. Linux 5.15 Performance Characteristics

4.3. Comparison of Different Linux Configurations

4.4. Additional Experiments

5. Discussion

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI