1. Introduction
Constructing bifurcation diagrams for various dynamic systems is of great practical importance, as it allows one to identify various regimes depending on changes in key parameters [
1,
2,
3]. One such dynamic system, in our opinion, is the Selkov oscillator due to its important application in applied problems, for example, in biology for describing glycolytic reactions [
4,
5] or in seismology for describing microseisms [
6]. From a mathematical perspective, the Selkov oscillator is a dynamic system consisting of two nonlinear first-order ordinary differential equations that describes various oscillatory regimes, including self-oscillations.
A generalization of the Selkov oscillator to the case of memory effects is the fractional Selkov oscillator [
7]. Memory effects here indicate that the current state of the system depends on previous states, i.e., on its prehistory [
8,
9,
10]. These effects can be described using the theory of fractional calculus using fractional derivatives [
11,
12,
13]. In this case, the Selkov fractional oscillator is a nonlinear system of two ordinary differential equations of fractional orders. The introduction of fractional derivatives also entails the introduction of an additional parameter—a characteristic time scale—to match the dimensions of the right- and left-hand sides of the model equations, which is important in the study of dynamic regimes [
14].
In previous work [
7,
15], a quantitative and qualitative analysis of the Selkov fractional oscillator was conducted for cases where the orders of the fractional derivatives were constant and where the characteristic time scale was chosen to be unity. There, the system’s equilibrium points were studied, the Adams–Bashforth–Moulton numerical algorithm was implemented to construct oscillograms and phase trajectories, and regular and chaotic regimes were also investigated using the maximum Lyapunov exponents [
16].
Further development of the Selkov fractional oscillator involved introducing derivatives of fractional variable order [
17,
18,
19], which also resulted in non-constant coefficients in the model equations [
20,
21,
22]. Quantitative and qualitative analyses were conducted using an adapted Adams–Bashforth–Moulton algorithm, various oscillatory modes were investigated, and 2D and 3D bifurcation diagrams were constructed. These algorithms were implemented in the ABMSelkovFracSim 2.0 software package using the Python programming language and the PyCharm 2025.2 environment [
23,
24].
Constructing bifurcation diagrams is a rather computationally intensive task, as it requires repeatedly solving a system of fractional differential equations numerically. This leads to significant computational time expenditures [
21,
22]. Therefore, a parallel version of the bifurcation diagram calculation, depending on the characteristic time scale, was developed using Python tools. However, the parallel computation algorithm has not been studied for its efficiency depending on CPU processes. Therefore, in this paper, we examine its efficiency by addressing this gap and taking into account the available number of CPU processes, similar to the approach in [
25].
The novelty and contribution of this work are as follows:
We present a reproducible performance analysis (benchmarking) of a parallel MapReduce algorithm applied to the computationally demanding task of constructing bifurcation diagrams for a fractional-order system with memory (SFO). The analysis uses the TAECO metric set (Time, Acceleration, Efficiency, Cost, Cost Optimality Index).
We provide practical guidelines for configuring parallel computations for similar parametric studies in fractional dynamics, including determining the optimal number of worker processes (found to be equal to the number of physical CPU cores in our test system).
The parallel algorithm is integrated into the user-friendly ABMSelkovFracSim 2.0 software package, making high-performance bifurcation analysis accessible to researchers without expertise in parallel programming.
This study highlights the limitations and trade-offs of the applied parallelization strategy, offering insights for future developments, such as hybrid CPU-GPU implementations.
Thus, this work contributes to the computational practice and software engineering for scientific computing, providing a validated tool and methodology for accelerating the analysis of complex fractional dynamical systems [
26,
27,
28].
The remainder of this article is structured as follows:
Section 2 introduces the necessary background and key concepts.
Section 3 presents the problem statement.
Section 4 details the Adams–Bashforth–Moulton numerical algorithm for solving the SFO system.
Section 5 provides an overview of the ABMSelkovFracSim 2.0 software package.
Section 6 describes the parallel algorithm for constructing bifurcation diagrams, including its detailed configuration and practical recommendations.
Section 7 presents a performance analysis of the parallel algorithm with respect to the number of CPU worker processes. Finally,
Section 8 concludes this paper.
2. Preliminaries
Definition 1. The Gerasimov–Caputo fractional derivative of variable order of a function has the following form [19]:where is a function from the class , and is the gamma function. Remark 1. In the case where the variable order is constant, we arrive at the Gerasimov–Caputo fractional derivative, which has been studied quite well [29,30]. Remark 2. Here, we will not dwell on the properties of the fractional derivative (1), but we will point out that they can be studied in review articles [17,19], and refer to the relevant literature sources. Definition 2. A bifurcation diagram is a graphical representation of changes in the structure of solutions of a dynamic system when the parameters change. It shows how the stable and unstable states of the system change depending on the values of the parameters.
3. Statement of the Problem
Consider the following dynamic system:
where
are the unknown solution functions;
,
,
are functions from the class
;
is a parameter with time dimension;
are given constants;
is the current process time;
is the simulation time; and
are positive constants responsible for the initial conditions.
The fractional derivative operators of variable orders
in the dynamic system (
2) are understood in the sense of Gerasimov–Caputo (
1). Following the work in [
21], we choose the following four types of functions
and
:
- 1.
Constant fractional orders:
where
are constant values of the fractional orders. Note that when
, we arrive at the classical Selkov oscillator proposed by [
4].
- 2.
Cosine dependence:
where
are the initial order values;
are the oscillation amplitudes;
are the oscillation frequencies; and
are the initial phases.
- 3.
Exponential dependence:
where
are the limiting values as
,
are the initial deviations, and
are the rates of exponential decay.
- 4.
Linear dependence:
where
are the initial values at
,
are the coefficients of linear decrease, and
is the lower boundary for computational stability.
Definition 3. The dynamic system (2) will be called a Selkov fractional oscillator with variable coefficients and memory, or simply a Selkov fractional oscillator (SFO). Remark 3. It should be noted that in [31,32], a special case of the SFO (2) was investigated for . However, the characteristic time scale
, as further studies of bifurcation diagrams [
20,
21,
22] showed, plays an important role in the dynamics of the SFO. Therefore, as test examples, we will use the calculation of bifurcation diagrams depending on the values of the parameter
.
4. Adams–Bashforth–Moulton Method
To study the SFO (
2), we use the Adams–Bashforth–Moulton numerical method (ABM) from the family of predictor–corrector methods. The ABM has been studied and discussed in detail in [
33,
34,
35,
36]. We adapt this method to solve the SFO (
2). To carry this out on a uniform grid of
N points with step
, we introduce the functions
, which will be determined by the Adams–Bashforth formula (predictor):
For the corrector (Adams–Moulton formula), we obtain the following:
where
, and weight coefficients in (
3) are determined by the following formula:
Formulas (
3) and (
4) constitute a modified ABM method adapted for the fractional-order SFO system with variable exponents (
) and coefficients. A key advantage of the ABM method is that it avoids solving nonlinear systems of equations at each step, thereby reducing the computational cost. However, the primary computational burden arises from the need to store the entire history (memory) of the solution vectors
and
, a characteristic feature of fractional integro-differential methods.
Remark 4. The presented ABM method is explicit in its implementation, despite the use of the implicit Adams–Moulton Formula (4). Therefore, the ABM method is conditionally stable, like most explicit methods. It should also be noted that the order of accuracy of the ABM method is higher than that of the nonlocal explicit finite-difference scheme. The properties of the numerical ABM method (3) and (4), as well as convergence and stability issues, can be studied in [20,21,22]. Based on Remark 4, the functions in the test examples for this article were chosen to be monotonically decreasing or not rapidly changing on the interval (0, 1), and the sampling step was chosen as it is small enough to prevent the solution from exhibiting unphysical oscillations or error growth.
The numerical algorithm ABM (
3) and (
4) was implemented in the ABMSelkovFracSim 2.0 software package for four types of
functions. We will refer to them as follows: ABMSelkovFracConst, ABMSelkovFracCos, ABMSelkovFracExp, and ABMSelkovFracLine.
5. ABMSelkovFracSim 2.0 Software Package
The authors of [
22] provide a detailed description of the ABM software package. We will focus on the module for calculating bifurcation diagrams (the Bifurcation Analysis mode,
Figure 1).
Figure 1 shows a screenshot of the main window in Bifurcation Analysis mode. The interface allows the user to input values for key SFO parameters, select one of four calculation methods based on the types of functions
and
, and define the range and step size for the key parameter
. Additionally, the user can specify the number of CPU worker processes to be used for the efficient parallel computation of the bifurcation diagram.
The left side of the ABMSelkovFracSim 2.0 software interface in the Bifurcation Analysis mode shows visualizations of the calculation results in the form of bifurcation diagrams and graphs of the functions
. In
Figure 1, the bifurcation diagram shows two ranges of parameter
values, within which different dynamic SFO modes exist [
21].
Remark 5. It should be noted that the following useful information is displayed in the interface for the user of the ABMSelkovFracSim 2.0 software package: the time taken to calculate the bifurcation diagram, the number of processed points during the simulation and the progress bar of the algorithm’s operation, and the possible presence of errors in the calculation.
Next, we will discuss in more detail the implementation of the parallel part of the bifurcation diagram calculation algorithm in the ABMSelkovFracSim 2.0 software.
6. Parallel Implementation of the Bifurcation Diagram Calculation Algorithm
Parallel implementation of algorithms is an important and relevant task in reducing their execution time. There are many approaches and technologies for parallelizing algorithms; these can be studied, for example, in [
37,
38]. In this section, we will describe a parallelization method based on the built-in libraries of the Python 3.13 programming language [
23].
Because the numerical ABM algorithm (
3) and (
4) cannot be parallelized at the level of individual time steps due to data dependencies between the solution grid functions
and
, we adopt a parametric parallelization approach. Specifically, we parallelize the multiple independent computations required for each discrete value of the parameter
within a specified interval. That is, the operating principle of our parallel algorithm consists of the following steps: a separate independent process is created for each value of the parameter
; processes are executed in parallel on multiple processor cores; the results are collected after all calculations are completed. This is the classic “MapReduce” approach, where Map is the distribution of calculations across processes, and Reduce is the collection and merging of the results. This approach is optimal for parametric studies, where each calculation is independent of the others.
To implement this approach, we will use Native Python Multiprocessing (Python Multiprocessing, ProcessPoolExecutor) technology from the Python Standard Library (Algorithm 1).
| Algorithm 1 Bifurcation analysis with parallel computing |
- 1:
Step 1: The user runs the bifurcation analysis - 2:
Step 2: Create a list of values - 3:
for to n do - 4:
Create a task to evaluate for - 5:
end for - 6:
Step 4: Initialize ProcessPoolExecutor with N processes - 7:
Step 5: Assign tasks to processes - 8:
Step 6: ▷ Parallel execution - 9:
for to N in parallel do - 10:
Independently compute results for assigned tasks - 11:
end for - 12:
Step 7: Collect all results from processes - 13:
Step 8: Combine results into a single data structure - 14:
Step 9: Build a common bifurcation graph
|
Remark 6. Let’s highlight the key features of Algorithm 1:
Data parallelism: each value of is processed independently.
Process-based parallelism: the implementation uses separate operating system processes, not threads, to circumvent Python’s Global Interpreter Lock (GIL) and achieve true concurrent execution.
Master–worker pattern: the master process distributes tasks, and workers perform computations.
Fault tolerance: an error in one process does not affect others.
Dynamic load balancing: ProcessPoolExecutor automatically distributes tasks.
Remark 7. It is important to note that Pythonuses multiprocessing, not threading, for computationally intensive tasks. This is due to the GIL, which makes threading ineffective for numerical calculations. Each Python process has its own interpreter and memory, allowing for full utilization of multi-core processors. Therefore, in this paper, the term “worker processes” refers to operating system processes created via ProcessPoolExecutor.
We do not use other CPU parallelization technologies here for the following reasons: OpenMP is used for parallelization within a single process [
39], which is not possible in our case. MPI is used for distributed computing on clusters, which is also not necessary in our case [
40]. Threading: threads in Python have a GIL and are not suitable for numerical calculations [
41].
6.1. Parallel Configuration
The parallel configuration of the ABM software suite consists of the following steps:
Partitioning strategy: The data parallelism strategy is used. The initial set of values for the parameter (e.g., ) is partitioned into n independent tasks.
Task granularity: Each task has a coarse-grained granularity. This includes a full calculation of the system dynamics (solving a system of differential equations) for a single value of using the Adams–Bashforth–Moulton method. This granularity is optimal because the computation time for a single significantly exceeds the overhead of process creation and interprocess communication.
Process binding/pinning: Not used in the current implementation. Processes are scheduled by the operating system. We acknowledge that pinning processes to specific physical cores (e.g., using taskset on Linux or libraries like numexpr and affinity) could reduce the overhead of process migration between cores and improve cache utilization, especially on NUMA systems. This optimization is planned for future releases.
Result aggregation method: the MapReduce model is used.
- -
Map: The main process creates a pool of worker processes and distributes tasks (theta values) among them using ProcessPoolExecutor.map() or submit(). This function provides dynamic load balancing.
- -
Reduce: Upon completion of the computation, the main process collects the results (the x(theta) and y(theta) values at the final point in time). The results are returned in the order in which the tasks were completed (as_completed), but to plot the correct bifurcation diagram, they are then sorted in ascending (see the update_bifurcation_plots function in the code). This ensures that the graph will be plotted correctly, regardless of the order in which the calculations were completed.
6.2. Practical Recommendations and Limitations
Based on our performance analysis, we provide the following practical recommendations for users of the ABMSelkovFracSim 2.0 package and for researchers implementing similar parallel parametric studies:
The optimal number of worker processes is typically equal to the number of physical CPU cores. In our tests, eight processes (matching eight physical cores) yielded the best efficiency–cost trade-off. Using more processes than physical cores leads to diminishing returns due to increased overhead.
For memory-intensive fractional-order problems, the coarse-grained parallelism (one process per parameter value) is effective because the computation time for each task is large compared to inter-process communication costs.
The TAECO metrics provide a comprehensive view of parallel performance. Users should monitor not only speedup () but also efficiency () and cost optimality () to assess the true benefit of parallelization.
Limitations: The current implementation is designed for shared-memory systems (multi-core CPUs). It does not support distributed memory (cluster) environments. Also, parallel efficiency is limited by the inherently serial parts of the algorithm (Amdahl’s law). For future work, hybrid CPU-GPU parallelism could be explored to accelerate the internal ABM solver itself.
We now turn to the main goal of our article: studying the efficiency of a parallel algorithm for calculating bifurcation diagrams.
7. Analysis of Algorithm Efficiency Based on Average Execution Time
7.1. Methodology of Performance Evaluation and Fairness of Comparison
In this study, we follow the following methodology:
- 1.
Identical computational code: The sequential version of the algorithm (
) uses exactly the same code for numerical solution of the SFO (ABM method, Formulas (
3) and (
4)) as each task in the parallel pool. This guarantees that identical numerical work is compared for each parameter value
. The difference lies only in the way the loop over
is organized: a sequential
for loop versus task distribution across processes using
ProcessPoolExecutor.
- 2.
Accounting for all stages: The measured execution time
includes all stages common to both versions: data initialization, the loop (or task scheduling) over
, numerical integration of system (
2) for each
, and collection of result arrays. Visualization time (plotting) is excluded from measurements as it is performed single-threaded after calculations are completed and is identical in both cases.
- 3.
Overhead of parallelization: The parallel version inevitably incurs additional costs: creating and managing a process pool, serializing/deserializing data for inter-process communication, and synchronization. The presence of these costs is not a drawback of the method but represents its inherent property. Precisely to account for these and analyze total costs, we use the extended TAECO metric set, not just speedup. The decrease in efficiency
and increase in cost optimality index
for
(
Figure 2,
Figure 3 and
Figure 4) are direct evidence of the growing role of these overheads.
- 4.
Discussion of possible optimizations: We acknowledge that the sequential Python implementation is not highly optimized for single-core performance. It could potentially be accelerated through vectorization of operations (NumPy libraries), just-in-time (JIT) compilation (Numba), or a C/C++ implementation of the computational kernel. However, such optimizations are beyond the scope of this paper, which focuses on evaluating the effectiveness of the MapReduce parallelization strategy for parametric studies of fractional calculus problems. The chosen baseline sequential code is typical and easily reproducible, allowing us to clearly isolate and evaluate the gains from parallelization itself. Optimizing performance on a single-core system is an important but separate task that can be considered for future development of the ABMSelkovFracSim package.
- 5.
Time measurement methodology: All execution time measurements were conducted using wall-clock time by calling the high-precision time.perf_counter() timer from the Python standard library. This allows for accounting for all factors affecting the actual speedup of a parallel algorithm, including the overhead of organizing parallel computations. To ensure measurement stability, a single warm-up run was performed before each series of experiments (for a fixed number of processes p and problem size N), the results of which were not taken into account. All experiments were conducted on a dedicated workstation with unnecessary background processes disabled to minimize the impact of external workloads on the results.
Thus, the presented comparison is valid for the stated goal: demonstrating that the parallel strategy based on ProcessPoolExecutor provides significant and measurable acceleration for the computationally intensive task of constructing bifurcation diagrams of the SFO, with optimal use of computational resources ().
7.2. Experimental Setup and Reproducibility
To ensure transparency and facilitate reproducibility of our results, we provide a detailed description of the experimental setup. All performance measurements were conducted on a workstation with the following hardware and software configuration:
CPU: AMD Ryzen 7 7800X3D (8 physical cores, 16 logical threads due to Simultaneous Multithreading), base clock 4.2 GHz, boost frequency up to 5.0 GHz, L2 cache: 8 MB (1 MB per core), L3 cache: 96 MB (64 MB 3D V-Cache + 32 MB standard).
RAM: 32 GB DDR5.
Operating system: Microsoft Windows 10 Pro 64-bit (build 22H2).
Python environment: Python 3.13.0. The main libraries used were the standard Python multiprocessing module and NumPy for numerical operations. A full list of dependencies for the ABMSelkovFracSim 2.0 package is available from the authors upon reasonable request.
System settings: The Windows power plan was set to “High Performance.” No specific CPU affinity (core pinning) was enforced for the Python processes; they were managed by the default OS scheduler. Background applications were minimized during testing.
Reproducibility statement: The ABMSelkovFracSim 2.0 software package, which contains the complete implementation of the sequential and parallel algorithms described in this paper, is proprietary software developed within the framework of the state assignment of IKIR FEB RAS. Due to institutional policies and licensing restrictions, the source code cannot be publicly distributed at this time. However, to enable verification and reproduction of the key performance results, the authors provide the following:
- 1.
- 2.
The exact mathematical formulation of the problem (
Section 3) and the numerical method (
Section 4).
- 3.
- 4.
The raw measurement data (average execution time and standard deviation) for all experiments, presented in
Table 1,
Table 2 and
Table 3.
The computational experiments can be reproduced by implementing the Adams–Bashforth–Moulton method (Equations (3) and (4)) for the SFO system (Equation (
2)) and applying the parallel MapReduce pattern (Algorithm 1) using Python’s
ProcessPoolExecutor with the parameters specified in the test examples. For access to the compiled software package for verification purposes or for any additional information required to replicate this study, interested researchers are encouraged to contact the corresponding author directly.
Remark 8. It should be noted that the obtained values for the optimal number of processes directly correspond to the number of physical cores of the test processor. This is an expected result for problems with coarse-grained parallelism and intensive memory usage, typical for parametric studies of dynamic systems. However, the performance and optimal efficiency point of a parallel algorithm may demonstrate sensitivity to a specific hardware architecture: processor topology (e.g., the number of cores per die, the presence of simultaneous multithreading technology-SMT/HT), cache hierarchy and size, as well as the memory subsystem configuration (NUMA vs. UMA). On systems with a different configuration (e.g., with a higher number of cores but a smaller shared L3 cache, or in server multiprocessor NUMA configurations), the balance between speedup and overhead may shift, which can potentially change the empirically determined value of . Therefore, the tuning recommendations provided (starting with a number of processes equal to the number of physical cores) should be considered as a starting point for fine-tuning in a particular computing environment.
All timing measurements presented in this section were performed for the complete parallel computation cycle of the bifurcation diagram within a single call of the Bifurcation analysis function. The measured execution time includes (1) initialization of the process pool (ProcessPoolExecutor); (2) task distribution (Map): serialization of parameters and their transfer to worker processes; (3) parallel execution: solving system (2) for each value of the parameter using the Adams–Bashforth–Moulton numerical method (3) and (4)—the most computationally intensive part; (4) collection of results (Reduce): deserialization and aggregation of the results from all processes. The measurement does not include stages unrelated to parallel computations: initialization of the graphical user interface (GUI), loading parameters from input fields, and, most importantly, final visualization (plotting). Plotting is performed single-threaded after all computations and synchronization are completed. This separation allows us to evaluate the performance of the parallel computational component of the algorithm in isolation.
Let us analyze the efficiency of the parallel computation algorithm of the ABMSelkovFracSim 2.0 software package in the mode of calculating and constructing a bifurcation diagram (Bifurcation analysis).
7.3. Performance Metrics and Methodology
Since each run of the bifurcation analysis yields slightly different execution times
T (in seconds), and the number of experiments is finite, we treat
T as a discrete random variable characterized by a distribution function and an expected value:
where
i is the index of the Bifurcation analysis numerical experiment;
L is the sample size;
p is the number of worker processes; and
N is the number of nodes in the uniform grid of the numerical method, i.e., the size of the input data.
The standard deviation of the execution time for a given number of worker processes
p and problem size
N is calculated as follows:
The values of
and
for the test examples are presented in
Table 1,
Table 2 and
Table 3.
The efficiency of a parallel algorithm measures how effectively it utilizes the available computational resources (worker processes) compared to the sequential version. It is defined as the ratio of the speedup achieved to the number of processes used. However, as the number of worker processes increases, hardware and software limitations on parallel speedup become more pronounced. These limitations include the high time costs of parallelizing the task and reassembling all computational results [
42], known as Amdahl’s law [
43,
44]. To represent various data on average execution time, we will use the following parameters in [sec]:
—the execution time of a test example of size N required by the sequential Bifurcation analysis algorithm;
—the execution time of a test example of size N required by the parallel Bifurcation analysis algorithm on a machine with worker processes.
Let us calculate for different numbers of used worker processes p in [units] with a step of 1.
To obtain efficiency estimates, data on average execution time are considered in terms of (TAECO) [
25], applicable to algorithms:
T (execution time),
A (acceleration, speedup),
E (efficiency),
C (cost),
O (cost-optimal indicator).
—is the speedup of the algorithm, in [units], provided by the parallel version of the algorithm compared to the sequential one, and is calculated as follows:
—is the efficiency of the algorithm, in [units/process], of using a given number
p of worker processes and is defined by the following ratio:
moreover, by definition, the sequential algorithm has the greatest efficiency,
. Therefore, for a parallel algorithm, it is better if
.
—the cost of the algorithm, in [sec. × process], which is determined by the product of a given number
p of worker processes and the
T execution time of the parallel algorithm. The cost is determined by the following ratio:
—is the cost-optimal indicator (COI) of the algorithm, in [units × process], characterized by a cost proportional to the complexity of the most efficient sequential algorithm:
and the closer the value is to 1, the better, i.e., the cheaper the use of the parallel algorithm in terms of engaging worker processes.
7.4. Test Examples
The calculations of efficiency indicators (
5)–(
10) and their visualization were performed in the MatlabR2025b computer mathematics system.
Example 1. For the first example, we will choose the ABMSelkovFracCos method and the following parameter values for FOS (2): , , , . In the Bifurcation Analysis mode with all the same parameters, the calculation method ABMSelkovFracCos is used, but with and step . To obtain the average execution time according to (5), we define , which will allow for smoother graphs for both average time and efficiency analysis. The results are shown in
Table 1 and
Figure 2 below.
Table 1.
Average execution time and standard deviation of the Test Example 1 with a varying number of worker processes.
Table 1.
Average execution time and standard deviation of the Test Example 1 with a varying number of worker processes.
| p, [Processes] | , [s] | , [s] |
|---|
| 1 | 4765 | 70.0003 |
| 2 | 2612 | 53.1686 |
| 3 | 1645 | 56.8393 |
| 4 | 1251 | 33.7665 |
| 5 | 1023 | 19.4882 |
| 6 | 868 | 8.9697 |
| 7 | 775 | 4.9542 |
| 8 | 759 | 11.2071 |
| 9 | 716 | 8.8097 |
| 10 | 629 | 7.6884 |
| 11 | 601 | 4.5814 |
| 12 | 589 | 13.8840 |
| 13 | 568 | 2.0790 |
| 14 | 558 | 3.5340 |
| 15 | 544 | 1.8409 |
| 16 | 536 | 1.8974 |
Figure 2 presents the results of the efficiency analysis of the parallel Bifurcation Analysis algorithm for the ABMSelkovFracCos method compared to its sequential counterpart. The results are presented as histograms for greater clarity. Parallelization, specific to this computer, allows for a maximum speedup of up to 8.9 times.
Figure 2 shows a dependence close to exponential: average execution time vs. number of worker processes. Increasing the number of worker processes at
gives a greater performance gain compared to the sequential algorithm, but increasing the number of worker processes at
yields less and less performance gain. In the COI estimate, we see the opposite trend: the cost of using parallel algorithms does not change until around
, and then increases linearly and proportionally to the growth in the number of worker processes. Consequently, at
we have the best efficiency coefficient of the parallel algorithm
in relation to the cost of using the parallel algorithm
while obtaining a speedup of 6.28 times. As a result, we can say that the Bifurcation Analysis mode is optimally parallelized on 7–8 processes, which, depending on the maximum number of CPUs of a particular computer, will allow for running several copies of the ABMSelkovFracSim 2.0 software package itself for simultaneous work in several windows.
Figure 2.
Average execution time and efficiency estimation of a parallel algorithm implementing Bifurcation analysis (Example 1): (a) , (b) , (c) , (d) .
Figure 2.
Average execution time and efficiency estimation of a parallel algorithm implementing Bifurcation analysis (Example 1): (a) , (b) , (c) , (d) .
An analysis of the standard deviation values for the parallel algorithm for calculating bifurcation diagrams (Example 1, ABMSelkovFracCos method) allows us to draw the following conclusions:
- 1.
In all cases, the standard deviation is of less than 1.4% of the corresponding average execution time . For example, for : s at s, which is approximately 1.4%. Such a relatively small spread indicates high stability and reproducibility of the measurements, confirming the reliability of the presented average values for efficiency analysis.
- 2.
A tendency toward a decrease in the absolute value of the standard deviation is observed as the number of processes p increases:
- –
For sequential execution (), s is the highest value.
- –
At maximum utilization (), s—the value has decreased by more than three times.
This may be due to two factors:
- –
Smoothing out random system impacts: During parallel execution, multiple independent computing processes distributed over time statistically “average out” the impact of random fluctuations in system load (background processes, interrupts, caching), which can lead to more noticeable deviations in a sequential run.
- –
Reducing task execution time: With a larger p, the overall execution time is significantly reduced. During this shorter period of time, the system is less likely to be subject to significant background impacts, which reduces variability.
- 3.
The most important range from a practical perspective is
p from 6 to 8 (where the best trade-off between speedup and efficiency is observed, see
Figure 2). In this range, the standard deviation is 8–11 s, which is relatively small. This means that the recommendation for choosing the optimal number of processes
is statistically robust—with repeated runs, the execution time will fluctuate only slightly, and the expected speedup will be achieved with a high probability.
- 4.
User interpretation: The small standard deviation, especially in the region of the optimal p, gives the user of the ABMSelkovFracSim 2.0 package high confidence in the predictability of computation times. This is important for planning computational experiments. The presented standard deviation data confirm that the execution time measurements are statistically reliable. The observed pattern of decreasing with increasing p is consistent with the expected behavior of the parallel version of the algorithm and further substantiates the possibility of choosing the optimal configuration not only based on the average speedup criterion but also on the criterion of low result variability.
Example 2. To evaluate the efficiency of the parallel version of the calculation algorithm, we will choose the ABMSelkovFracExp method. We will select the parameter values from Example 1. To obtain the average time (5), the sample size is . In the case of Test Example 2 for the ABMSelkovFracExp method, parallelization, for this particular computer, allows for a maximum speedup of ≈9 times. The estimates for (
Figure 3) have the same trends as for Test Example 1 (
Figure 2). However, for
, the cost of using parallel algorithms
, and therefore
, i.e., almost as effective as the sequential algorithm, with a 5-fold performance increase.
Table 2 shows that the absolute values of
decrease with increasing
p: for
:
; for
:
; a decrease of more than 70 times is observed, which is even more pronounced than in Example 1. The relative standard deviation is small: in the region of the optimal
:
This indicates very high measurement stability.
Example 3. Let us consider a parallel version of the algorithm for the ABMSelkovFracLinear method. For simplicity, we set . The remaining parameter values are left unchanged.
Table 2.
Average execution time and standard deviation of the Test Example 2 with a varying number of worker processes.
Table 2.
Average execution time and standard deviation of the Test Example 2 with a varying number of worker processes.
| p, [Processes] | , [s] | , [s] |
|---|
| 1 | 4766 | 82.5486 |
| 2 | 2439 | 43.2883 |
| 3 | 1660 | 23.0169 |
| 4 | 1260 | 20.8180 |
| 5 | 1024 | 6.1364 |
| 6 | 862 | 7.9589 |
| 7 | 780 | 6.5693 |
| 8 | 748 | 4.3512 |
| 9 | 658 | 1.9889 |
| 10 | 628 | 6.1427 |
| 11 | 604 | 5.5538 |
| 12 | 577 | 6.4326 |
| 13 | 560 | 1.8886 |
| 14 | 550 | 5.5538 |
| 15 | 538 | 1.5811 |
| 16 | 531 | 1.0750 |
Figure 3.
Average execution time and efficiency estimation of a parallel algorithm implementing Bifurcation Analysis (Example 2): (a) , (b) , (c) , (d) .
Figure 3.
Average execution time and efficiency estimation of a parallel algorithm implementing Bifurcation Analysis (Example 2): (a) , (b) , (c) , (d) .
The results are shown in
Table 3 and
Figure 4 below.
Table 3.
Average execution time and standard deviation of the Test Example 3 with a varying number of worker processes.
Table 3.
Average execution time and standard deviation of the Test Example 3 with a varying number of worker processes.
| p, [Processes] | , [s] | , [s] |
|---|
| 1 | 4660 | 71.4165 |
| 2 | 2462 | 34.7890 |
| 3 | 1681 | 29.0366 |
| 4 | 1262 | 15.8328 |
| 5 | 1032 | 11.1679 |
| 6 | 876 | 6.3034 |
| 7 | 785 | 8.0994 |
| 8 | 717 | 13.2665 |
| 9 | 657 | 1.5776 |
| 10 | 631 | 6.0882 |
| 11 | 603 | 4.1846 |
| 12 | 573 | 3.2728 |
| 13 | 563 | 2.1108 |
| 14 | 553 | 1.2649 |
| 15 | 545 | 1.2649 |
| 16 | 540 | 1.1972 |
Figure 4.
Average execution time and efficiency estimation of a parallel algorithm implementing Bifurcation Analysis (Example 3): (a) , (b) , (c) , (d) .
Figure 4.
Average execution time and efficiency estimation of a parallel algorithm implementing Bifurcation Analysis (Example 3): (a) , (b) , (c) , (d) .
From
Table 3, we can draw the following conclusions:
The standard deviation also decreases with increasing p:
- –
for s;
- –
for s;
- –
A decrease of approximately 60 times.
In the optimal region of
:
This is slightly higher than in Examples 1 and 2, but still less than 2%, indicating acceptable stability.
For Test Example 3 for ABMSelkovFracLinear, parallelization allows for a maximum speedup of ≈8.6 times. The efficiency estimates (
Figure 4) show trends similar to those for Test Example 2 (
Figure 2).
8. Conclusions
This article presents a performance analysis and practical implementation of a parallel algorithm for constructing bifurcation diagrams of the fractional Selkov oscillator with variable coefficients and memory. The main contribution is a reproducible benchmarking study that validates the effectiveness of a coarse-grained parallelization strategy (MapReduce) for this computationally intensive class of problems. This article investigated the performance of a parallel algorithm, implemented with Python’s multiprocessing library, for computing bifurcation diagrams of the Selkov fractional oscillator. Using the TAECO metric set (Time, Acceleration, Efficiency, Cost, Cost Optimality), we evaluated the algorithm for three ABM variants: ABMSelkovFracCos, ABMSelkovFracExp, and ABMSelkovFracLinear. On a test system supporting up to sixteen worker processes, all three methods exhibited similar performance trends. The parallel algorithm achieved a speedup of 8–9 times compared to the sequential version, with an optimal resource allocation of 8 worker processes. Therefore, the implementation of a parallel algorithm for calculating bifurcation diagrams of the fractional oscillator in the ABM software package is justified. This study provides practical guidelines for configuring parallel computations and highlights the limitations of the current approach.
The presented performance analysis and conclusions are valid for the specific hardware and software configuration used in this study (see
Section 7.1). The observed optimal number of processes (
) corresponds to the number of physical cores in the test CPU. While the coarse-grained MapReduce parallelism strategy is generally effective for parametric studies, its efficiency metrics (speedup, optimal process count) are inherently dependent on the system architecture, workload characteristics (grid size
N, parameter range), and software environment. To generalize these findings, cross-platform benchmarking on different CPU architectures (Intel, ARM), operating systems, and with varying problem sizes is necessary. Such future work would help develop adaptive configuration guidelines for a broader range of computational environments.
The obtained results and performance conclusions are valid for the hardware configuration studied. Cross-platform benchmarking is necessary to formulate more general recommendations and test the algorithm’s robustness to different architectures. As a further development of this research, we plan to conduct a comprehensive analysis of the algorithm’s scalability and efficiency on heterogeneous hardware, including the following:
Intel Core i7/i9 processors of various generations with a cache hierarchy different from AMD.
Server platforms based on AMD EPYC or Intel Xeon processors with NUMA architecture and a large number of computing cores.
Systems with various RAM and I/O subsystem configurations.
This study will help identify common patterns and dependencies, minimize the risk of algorithm overfitting to a single hardware topology, and develop adaptive strategies for configuring parallel computing for a wide range of fractal dynamics problems.
Further development of research on this topic may be associated with the use of heterogeneous parallel programming structures, for example, CPU-GPU architecture, similar to [
45,
46,
47].
We note that another promising direction for further research is a detailed comparative analysis of the accuracy and stability of various numerical schemes for the class of fractional-order systems with variable exponents and coefficients, which includes SFO. Such research will allow us to optimize the choice of a computing kernel for parametric calculations and will be the subject of our separate future work.
Author Contributions
Conceptualization, R.P. and D.T.; methodology, R.P. and D.T.; software, R.P. and D.T.; validation, R.P. and D.T.; formal analysis, R.P. and D.T.; investigation, D.T.; data curation, D.T.; writing—original draft preparation, R.P. and D.T.; writing—review and editing, R.P.; visualization, D.T.; supervision, R.P. All authors have read and agreed to the published version of the manuscript.
Funding
This work was carried out within the framework of the state assignment of IKIR FEB RAS (reg. No. 124012300245-2).
Data Availability Statement
The original contributions presented in this study are included in this article. Further inquiries can be directed to the corresponding author.
Conflicts of Interest
The authors declare no conflicts of interest.
Abbreviations
The following abbreviations are used in this manuscript:
| ABM | Adams–Bashforth–Moulton Numerical Method |
| CPU | Central Processing Unit |
| GIL | Global Interpreter Lock |
| GPU | Graphics Processing Unit |
| HT | Hyper-Threading |
| MPI | Message Passing Interface |
| NUMA | Non-Uniform Memory Access |
| OpenMP | Open Multi-Processing |
| RAM | Random Access Memory |
| SFO | Selkov Fractional Oscillator |
| SMT | Simultaneous Multithreading |
| UMA | Uniform Memory Access |
References
- Sinet, S.; Bastiaansen, R.; Kuehn, C.; von der Heydt, A.S.; Dijkstra, H.A. Approximating the bifurcation diagram of weakly and strongly coupled leading-following systems. Chaos Interdiscip. J. Nonlinear Sci. 2025, 35, 063135. [Google Scholar] [CrossRef]
- Li, X.; Liu, G.; Liu, J.; Chen, Y. Unveiling Pseudo-Period-Tripling Bifurcations in Nonlinear Dynamical Systems. Int. J. Bifurc. Chaos 2025, 35, 2550156. [Google Scholar] [CrossRef]
- Gong, R.; Xu, J.; Liu, T.; Qin, Y.; Wei, Z. Bifurcation and Chaos in DCM Voltage-Fed Isolated Boost Full-Bridge Converter. Electronics 2025, 14, 260. [Google Scholar] [CrossRef]
- Sel’kov, E.E. Self-Oscillations in Glycolysis 1. A Simple Kinetic Model. Eur. J. Biochem. 1968, 4, 79–86. [Google Scholar] [CrossRef] [PubMed]
- Dhatt, S.; Chaudhury, P. Study of oscillatory dynamics in a Selkov glycolytic model using sensitivity analysis. Indian J. Phys. 2025, 96, 1649–1654. [Google Scholar] [CrossRef]
- Makovetsky, V.I.; Dudchenko, I.P.; Zakupin, A.S. Auto oscillation model of microseism’s sources. Geosist. Pereh. Zon 2017, 4, 37–46. [Google Scholar]
- Parovik, R.I. Studies of the fractional Selkov dynamical system for describing the self-oscillatory regime of microseisms. Mathematics 2022, 10, 4208. [Google Scholar] [CrossRef]
- Volterra, V. Functional Theory, Integral and Integro-Differential Equations; Dover Publications: Mineola, NY, USA, 2005. [Google Scholar]
- Nyerere, N.; Edward, S. Modeling Chlamydia transmission with caputo fractional derivatives: Exploring memory effects and control strategies. Model. Earth Syst. Environ. 2025, 11, 307. [Google Scholar] [CrossRef]
- Awadalla, M.; Sharif, A.A. A Fractional Calculus Approach to Energy Balance Modeling: Incorporating Memory for Responsible Forecasting. Mathematics 2026, 14, 223. [Google Scholar] [CrossRef]
- Nakhushev, A.M. Fractional Calculus and Its Applications; Fizmatlit: Moscow, Russia, 2003. [Google Scholar]
- Kilbas, A.A.; Srivastava, H.M.; Trujillo, J.J. Theory and Applications of Fractional Differential Equations; Elsevier: Amsterdam, The Netherlands, 2006. [Google Scholar]
- García, J.J.R.; Escalante-Martínez, J.E.; Rojano, F.A.G.; Fuentes, J.C.M.; Torres, L. Advances in Fractional Calculus; Springer: Cham, Switzerland, 2025. [Google Scholar]
- Gómez Aguilar, J.F.; Córdova-Fraga, T.; Tórres-Jiménez, J.; Escobar-Jiménez, R.F.; Olivares-Peregrino, V.H.; Guerrero-Ramírez, G.V. Nonlocal Transport Processes and the Fractional Cattaneo-Vernotte Equation. Math. Probl. Eng. 2026, 2016, 7845874. [Google Scholar]
- Parovik, R.I.; Rakhmonov, Z.; Rakhim Zunnunov, R. Study of Chaotic and Regular Modes of the Fractional Dynamic System of Selkov. EPJ Web Conf. 2021, 254, 02014. [Google Scholar] [CrossRef]
- Zhou, S.; Zhang, Q.; He, S.; Zhang, Y. What is the lowest cost to calculate the Lyapunov exponents from fractional differential equations? Nonlinear Dyn. 2025, 113, 14825–14871. [Google Scholar] [CrossRef]
- Sun, H.; Chang, A.; Zhang, Y.; Chen, W. A Review on Variable-Order Fractional Differential Equations: Mathematical Foundations, Physical Models, Numerical Methods and Applications. Fract. Calc. Appl. Anal. 2019, 22, 27–59. [Google Scholar] [CrossRef]
- Ortigueira, M.D.; Valério, D.; Machado, J.T. Variable order fractional systems. Commun. Nonlinear Sci. Numer. Simul. 2019, 71, 231–243. [Google Scholar] [CrossRef]
- Patnaik, S.; Hollkamp, J.P.; Semperlotti, F. Applications of variable-order fractional operators: A review. Proc. R. Soc. A Math. Phys. Eng. Sci. 2020, 476, 20190498. [Google Scholar] [CrossRef]
- Parovik, R.I. Selkov’s Dynamic System of Fractional Variable Order with Non-Constant Coefficients. Mathematics 2025, 13, 372. [Google Scholar] [CrossRef]
- Parovik, R.I. Study of dynamic modes of fractional Selkov oscillator with variable coefficients using bifurcation diagrams. Comput. Math. Model. 2025. [Google Scholar] [CrossRef]
- Parovik, R.I. ABMSelkovFracSim 2.0 software package for quantitative and qualitative analysis of the Selkov fractional oscillator. Vestn. KRAUNC. Fiz.-Mat. Nauk. 2025, 53, 75–92. [Google Scholar]
- Shaw, Z.A. Learn Python the Hard Way; Addison-Wesley Professional: Boston, MA, USA, 2024. [Google Scholar]
- Van Horn, B.M.; Nguyen, Q. Hands-On Application Development with PyCharm: Build Applications Like a Pro with the Ultimate Python Development Tool; Packt Publishing Ltd.: Birmingham, UK, 2023. [Google Scholar]
- Tverdyi, D.A. An Analysis of the Computational Complexity and Efficiency of Various Algorithms for Solving a Nonlinear Model of Radon Volumetric Activity with a Fractional Derivative of a Variable Order. Computation 2025, 13, 252. [Google Scholar] [CrossRef]
- Kulczycki, P.; Józef, K.; Janusz, K. Fractional Dynamical Systems: Methods, Algorithms and Applications; Springer Nature: Cham, Switzerland, 2022. [Google Scholar]
- Munoz-Pacheco, J.M.; Wei, Z.; Volos, C.; Sambas, A. Future challenges in the fractional-order dynamical systems: From mathematics to applications. Front. Appl. Math. Stat. 2023, 9, 1324660. [Google Scholar] [CrossRef]
- Gao, B.; Shukur, A.; Marwan, M.; Wang, N. On the complex dynamics of simple integer to fractional-order sine chaotic oscillators. Fractals 2026, 34, 2550115. [Google Scholar] [CrossRef]
- Novozhenova, O.G. Life And Science of Alexey Gerasimov, One of the Pioneers of Fractional Calculus in Soviet Union. Fract. Calc. Appl. Anal. 2017, 20, 790–809. [Google Scholar] [CrossRef]
- Caputo, M.; Fabrizio, M. On the notion of fractional derivative and applications to the hysteresis phenomena. Meccanica 2017, 52, 3043–3052. [Google Scholar] [CrossRef]
- Parovik, R.I. Qualitative analysis of Selkov’s fractional dynamical system with variable memory using a modified Test 0–1 algorithm. Vestn. KRAUNC. Fiz.-Mat. Nauk. 2023, 45, 9–23. [Google Scholar]
- Parovik, R.I. Selkov dynamic system with variable heredity for describing Microseismic regimes. In Solar-Terrestrial Relations and Physics of Earthquake Precursors: Proceedings of the XIII International Conference; Dmitriev, A., Lichtenberger, J., Mandrikova, O., Nahayo, E., Eds.; Springer Nature: Cham, Switzerland, 2023; pp. 166–178. [Google Scholar]
- Diethelm, K.; Ford, N.J.; Freed, A.D. A Predictor-Corrector Approach for the Numerical Solution of Fractional Differential Equations. Nonlinear Dyn. 2002, 29, 3–22. [Google Scholar] [CrossRef]
- Yang, C.; Liu, F. A computationally effective predictor-corrector method for simulating fractional order dynamical control system. ANZIAM J. 2005, 47, 168. [Google Scholar] [CrossRef]
- Garrappa, R. Numerical solution of fractional differential equations: A survey and a software tutorial. Mathematics 2018, 6, 16. [Google Scholar] [CrossRef]
- Naveen, S.; Parthiban, V. Qualitative analysis of variable-order fractional differential equations with constant delay. Math. Methods Appl. Sci. 2024, 47, 2981–2992. [Google Scholar] [CrossRef]
- Sino, M.; Domazet, E. Scalable Parallel Processing: Architectural Models, Real-Time Programming, and Performance Evaluation. Eng. Proc. 2025, 104, 60. [Google Scholar]
- Meng, X.; He, X.; Hu, C.; Lu, X.; Li, H. A Review of Parallel Computing for Large-scale Reservoir Numerical Simulation. Arch. Comput. Methods Eng. 2025, 32, 4125–4162. [Google Scholar] [CrossRef]
- Zhou, X.; Wang, C. Research on optimization of data mining algorithm based on OpenMP. SPIE 2025, 13560, 850–857. [Google Scholar]
- Himstedt, K. Multiple execution of the same MPI application: Exploiting parallelism at hotspots with minimal code changes. GEM-Int. J. Geomathem. 2025, 16, 1–28. [Google Scholar] [CrossRef]
- Wicaksono, D.; Soewito, B. Application of the multi-threading method and Python script for the network automation. J. Syntax Lit. 2024, 9, 3614. [Google Scholar]
- Rauber, T.; Runger, G. Parallel Programming for Multicore and Cluster Systems; Springer: New York, NY, USA, 2013. [Google Scholar]
- Al-hayanni, M.A.N.; Xia, F.; Rafiev, A.; Romanovsky, A.; Shafik, R.; Yakovlev, A. Amdahl’s Law in the Context of Heterogeneous Many-core Systems-ASurvey. IET Comput. Digit. 2020, 14, 133–148. [Google Scholar] [CrossRef]
- Poolla, C.; Saxena, R. On extending Amdahl’s law to learn computer performance. Microprocess. Microsystems 2023, 96, 104745. [Google Scholar] [CrossRef]
- Skorych, V.; Dosta, M. Parallel CPU–GPU computing technique for discrete element method. Concurr. Comput. Pract. Exp. 2022, 34, e6839. [Google Scholar] [CrossRef]
- Alaei, M.; Yazdanpanah, F. A survey on heterogeneous CPU–GPU architectures and simulators. Concurr. Comput. Pract. Exp. 2025, 37, e8318. [Google Scholar] [CrossRef]
- Vaithianathan, M. The Future of Heterogeneous Computing: Integrating CPUs GPUs and FPGAs for High-Performance Applications. Int. J. Emerg. Trends Comput. Sci. Inf. Technol. 2025, 1, 12–23. [Google Scholar] [CrossRef]
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |