Research on High-Reliability Energy-Aware Scheduling Strategy for Heterogeneous Distributed Systems

Chen, Ziyu; Wu, Jing; Cheng, Lin; Tao, Tao

doi:10.3390/bdcc9060160

Open AccessArticle

Research on High-Reliability Energy-Aware Scheduling Strategy for Heterogeneous Distributed Systems

¹

School of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan 430065, China

²

Hubei Province Key Laboratory of Intelligent Information Processing and Real-Time Industrial System, Wuhan University of Science and Technology, Wuhan 430065, China

³

Electronic Information School, Wuhan University, Wuhan 430079, China

^*

Author to whom correspondence should be addressed.

Big Data Cogn. Comput. 2025, 9(6), 160; https://doi.org/10.3390/bdcc9060160

Submission received: 21 April 2025 / Revised: 31 May 2025 / Accepted: 13 June 2025 / Published: 17 June 2025

Download

Browse Figures

Versions Notes

Abstract

With the demand for workflow processing driven by edge computing in the Internet of Things (IoT) and cloud computing growing at an exponential rate, task scheduling in heterogeneous distributed systems has become a key challenge to meet real-time constraints in resource-constrained environments. Existing studies now attempt to achieve the best balance in terms of time constraints, energy efficiency, and system reliability in Dynamic Voltage and Frequency Scaling environments. This study proposes a two-stage collaborative optimization strategy. With the help of an innovative algorithm design and theoretical analysis, the multi-objective optimization challenges mentioned above are systematically solved. First, based on a reliability-constrained model, we propose a topology-aware dynamic priority scheduling algorithm (EAWRS). This algorithm constructs a node priority function by incorporating in-degree/out-degree weighting factors and critical path analysis to enable multi-objective optimization. Second, to address the time-varying reliability characteristics introduced by DVFS, we propose a Fibonacci search-based dynamic frequency scaling algorithm (SEFFA). This algorithm effectively reduces energy consumption while ensuring task reliability, achieving sub-optimal processor energy adjustment. The collaborative mechanism of EAWRS and SEFFA has well solved the dynamic scheduling challenge based on DAG in heterogeneous multi-core processor systems in the Internet of Things environment. Experimental evaluations conducted at various scales show that, compared with the three most advanced scheduling algorithms, the proposed strategy reduces energy consumption by an average of 14.56% (up to 58.44% under high-reliability constraints) and shortens the makespan by 2.58–56.44% while strictly meeting reliability requirements.

Keywords:

IoT; reliability; makespan; energy-aware; scheduling

1. Introduction

In recent advancements, processor technology has experienced a rapid evolution, transitioning from the initial single-core designs to sophisticated multi-core architectures. This progression has witnessed a shift from homogeneous to heterogeneous multi-core systems [1]. With the continuous increase in architectural complexity, the performance capabilities of systems have significantly improved. As a result, heterogeneous distributed systems have been widely adopted in several critical domains, particularly in environments with high demands for computational power and efficiency, such as artificial intelligence, drones, smart homes, and edge computing scenarios in the Internet of Things (IoT) [2,3,4]. These systems typically integrate multiple types of processors, with significant variations in task execution efficiency among different processors, enhancing the flexibility of resource utilization. Particularly in large-scale, real-time IoT applications, the adoption trend of heterogeneous distributed systems has become increasingly prominent [5]. In IoT environments, typical computational tasks include signal processing (e.g., fast Fourier transform), data analysis (e.g., Gaussian Elimination), and others. These tasks are often mapped to Directed Acyclic Graphs (DAGs) to facilitate parallelized scheduling and optimization [6,7]. The DAG structure can clearly represent dependencies between tasks, making it conducive to achieving efficient scheduling and resource allocation on multi-core heterogeneous platforms [8].

In the recent past, energy consumption has become a critical challenge in heterogeneous distributed systems, particularly for battery-powered end devices such as new energy vehicles, edge sensors, and drones [9]. To alleviate the energy bottleneck, researchers have explored various technical solutions, including Dynamic Voltage and Frequency Scaling (DVFS) [10], virtualization-based consolidation [11], and task replication strategies [12,13]. Among these, DVFS has gained prominence as a mainstream technique for enhancing processor energy efficiency by dynamically adjusting the voltage and frequency levels. However, applying DVFS in heterogeneous IoT multi-processor systems introduces considerable complexity into task scheduling. The scheduling algorithms must now address multiple dimensions simultaneously, such as task prioritization, processor assignment, and power management. Since different processors support different voltage-frequency ranges, the same task may exhibit significant variations in both execution time and energy consumption depending on the core it is assigned to. Consequently, achieving energy-efficient execution while optimizing task completion time and maintaining system reliability has become an urgent and challenging research problem [14,15,16]. Energy-aware scheduling is of critical importance in IoT and edge computing. First, battery-powered devices need to operate for long periods under strict energy budgets—for instance, a 10% reduction in energy consumption can triple the lifespan of devices in environmental monitoring systems [17]. Second, edge servers handling real-time workflows must meet deadlines while avoiding thermal throttling, and dynamic frequency scaling can reduce peak power by 18–22% [18]. Third, large-scale distributed energy optimization can collectively save megawatt-level power each year. Our work meets these needs through reliability-guaranteed DVFS control.

This paper introduces two novel scheduling algorithms designed for parallel applications in heterogeneous distributed systems, with the dual objectives of minimizing energy consumption and operational timelines while adhering to reliability standards. The principal contributions of this paper are delineated as follows:

We propose a novel search-based optimal energy frequency adjustment algorithm (SEFFA). Predicated on the assumption that all nodes are pre-allocated, this algorithm strives to minimize energy consumption while fulfilling reliability prerequisites through strategic frequency adjustments.
We introduce an energy-aware scheduling algorithm based on the weight method under reliability constraints (EAWRS). This methodology integrates both the out-degree and in-degree of the DAG and employs a combination of task completion times, energy consumption metrics, and reliability factors, expressed through normalized and linear combinations, to optimize node allocation.
The efficacy of the proposed algorithms is validated through comprehensive experimental simulations. The findings demonstrate that these methods outperform existing techniques in terms of energy consumption and makespan.

2. Related Work

In recent years, with the continuous advancement in computing capabilities and the widespread deployment of large-scale workflows in IoT edge computing scenarios, the demand for high-performance computing has exhibited a significant upward trend [17,18,19]. In parallel, heterogeneous distributed systems are facing increasingly stringent performance requirements in terms of reliability, energy efficiency, and task completion time [20]. These challenges are especially pronounced in resource-constrained environments with high real-time demands characteristics typical of many IoT applications where they have become critical bottlenecks for ensuring system stability and achieving efficient task scheduling. For reliability, the objective is to mitigate the occurrence of failures during task execution, which are predominantly related to hardware issues, temperature fluctuations, and other factors. These failures are categorized into transient and permanent types, with transient failures occurring much more frequently than permanent ones. Consequently, most research has focused on transient failures [21,22,23]. A widely accepted reliability model uses

1 - e^{- λ t}

to represent the probability of a transient failure occurring within a time interval

t

, demonstrating that reliability increases as execution time decreases. Therefore, a common approach to enhancing reliability is the reduction in task execution time.

Energy consumption presents another significant challenge within heterogeneous distributed systems [24]. Current research on optimizing energy usage during task scheduling in such systems predominantly utilizes Dynamic Voltage and Frequency Scaling (DVFS) technology [10]. Since high-frequency operation is a major cause of high energy consumption, DVFS can be employed to reduce power consumption by lowering the operational voltage and frequency during the runtime. In terms of reducing completion times, current research aims to minimize the makespan and the total completion time. The heterogeneous earliest finish time (HEFT) algorithm is one of the most popular high-performance scheduling algorithms and is widely regarded as capable of achieving the shortest completion times in many scenarios [25]. Additionally, there are approaches that consider both energy consumption and reliability, such as the low energy consumption (LEC) [26] and minimum reliability (MR) methods [27]. Research has also proposed scheduling algorithms that consider both reliability and energy consumption, such as optimal dynamic scheduling (ODS) and the self-optimizing evolutionary algorithm (SOEA), and has combined these algorithms to optimize reliability, energy use, and scheduling duration, thereby achieving the optimization of energy consumption, reliability, and makespan [28].

The latest advancements in energy-aware computing and IoT security further highlight the need for efficient resource scheduling. Schieber et al. demonstrated a polynomial-time algorithm for interleaving energy harvesting tasks with real-time jobs in battery-free IoT devices, achieving maximum throughput under intermittent power supply—similar to our dynamic scheduling based on DVFS (Dynamic Voltage and Frequency Scaling) in resource-constrained environments [29]. In the field of IoT security, hyperdimensional computing (HDC) has shown remarkable efficiency: Ghajari et al. [30] achieved 99.54% intrusion detection accuracy on NSL-KDD with low computational overhead, while [31] reported 91.55% accuracy for unknown attack patterns, emphasizing the value of lightweight algorithms for edge devices. Similarly, Rastgoo et al. proposed an intelligent control method for electric vehicle charging microgrids, which stabilizes voltage through reactive power compensation while reducing the charging time—paralleling our processor-level energy–time-reliability trade-offs [32]. Collectively, these works validate that cross-layer optimization (from job scheduling to security) is critical for IoT systems. Our EAWRS + SEFFA provides a unified framework for heterogeneous task scheduling.

Although existing studies have made significant progress in understanding the trade-off between energy and time, there are still key issues that remain unaddressed: most algorithms overlook the correlation between reliability and energy consumption, and their performance is suboptimal. Our EAWRS + SEFFA strategy explicitly addresses these gaps through energy-aware priority scheduling and adaptive frequency search.

In summary, as heterogeneous distributed systems find widespread applications across various sectors, enhancing system reliability, reducing energy consumption, and shortening completion times have become focal points of research. The improved scheduling algorithms proposed in this paper aim to optimize scheduling strategies, thereby further reducing energy consumption and shortening completion times while meeting reliability requirements, offering more efficient solutions for the application of heterogeneous distributed systems.

3. Models

In this section, we discuss the mathematical models. The reader can refer to Table 1 for the notation and list of definitions used in this work.

3.1. System Model

We conceptualize heterogeneous distributed systems as a model comprising a set of heterogeneous processors (

U = \{u_{1}, u_{2}, \dots, u_{m}\}

), where

m

denotes the number of processors. An application is represented by a Directed Acyclic Graph (DAG), designated as

G = \{N, E\}

. Here,

N

signifies a set of tasks, and

E

represents a collection of directed edges within the DAG. Each edge

e_{i, j} \in E

illustrates the dependency and correlation between tasks

t_{i}

and

t_{j}

, indicating that

t_{i}

is a prerequisite task for

t_{j}

, with the latter commencing only upon the completion of the former. Additionally, each edge

e_{i, j}

carries a weight

c_{i, j}

, denoting the communication time between tasks

t_{i}

and

t_{j}

. If the two tasks are assigned to the same processor, their intercommunication time is considered to be zero. The execution time of the tasks is represented by

W

, where

ω_{i, j}

denotes the execution time of task

t_{i}

on processor

p_{j}

. The average execution cost of task

t_{i}

can be defined as

\bar{ω_{i}} = \frac{\sum_{j = 1}^{m} ω_{{i, j}}}{m}

(1)

Through this approach, we can effectively evaluate the execution times of tasks, thereby optimizing scheduling strategies to enhance the overall performance of the system.

Figure 1 illustrates an example of a Directed Acyclic Graph (DAG) with ten tasks. In this DAG, tasks

t_{2}, t_{3}, t_{4}, t_{5},

and

t_{6}

can only commence after the completion of task

t_{1}

. Table 2 provides the execution times for this example when operated at the maximum frequency across three processors. Specifically, the value “14” in the table denotes that the execution time of task

t_{1}

on processor

u_{1}

is 14 units. Table 3 shows all the processor parameters such as the frequency-independent dynamic power

P_{\{k, i n d\}}

; this will be explained in more detail in subsequent chapters.

3.2. Energy Model

In this study, we employed an advanced system-level power model, which has been widely applied in various contexts. Within the framework of Dynamic Voltage and Frequency Scaling (DVFS) technology, voltage and frequency are interrelated in a nearly linear relationship. Consequently, adjustments in frequency under DVFS inherently necessitate corresponding alterations in voltage. The system power calculation model utilized in this research is articulated as follows:

\begin{matrix} P (f) = P_{s} + h (P_{i n d} + P_{d}) = P_{s} + h (P_{i n d} + C_{\{k, e f\}} f^{m_{k}}) \end{matrix}

(2)

In the model,

P_{s}

represents the static power consumption, which is omnipresent except when the system is in an idle state.

P_{i n d}

denotes frequency-independent power consumption, while

P_{d}

corresponds to the dynamic power consumption that varies with frequency changes. The parameter

h

symbolizes the system state, and

C_{\{k, e f\}}

represents the effective capacitance.

Given that dynamic power consumption constitutes the primary component within the system (

h = 1

) and managing static power consumption poses challenges, this study opts to exclude static energy consumption from consideration. Therefore, the power consumption formula utilized in this research simplifies to the following expression:

\begin{matrix} P (f) = P_{i n d} + C_{\{k, e f\}} f^{m_{k}} \end{matrix}

(3)

Among them,

m_{k}

represents the characteristic constant of processor

u_{k}

itself. At the same time, due to the heterogeneity of the processors, each processor has its own set of operational parameters. We have defined the following parameter set to accommodate these variations:

The set of

P_{i n d} : \{P_{\{1, i n d\}}, P_{\{2, i n d\}}, \dots, P_{\{|U|, i n d\}}\};

The set of

P_{d} : \{P_{\{1, d\}}, P_{\{2, d\}}, \dots, P_{\{|U|, d\}}\};

The set of

C_{e f} : \{C_{\{1, e f\}}, C_{\{2, e f\}}, \dots, C_{\{|U|, e f\}}\};

The set of

m : \{m_{1}, m_{2}, \dots, m_{|U|}\} .

The actual effective frequency set for the processors is defined as follows:

\{\begin{matrix} \{f_{\{1, l o w\}}, f_{\{1, α\}}, \dots, f_{\{1, m a x\}}\}, \\ \{f_{\{2, l o w\}}, f_{\{2, α\}}, \dots, f_{\{2, m a x\}}\}, \\ \dots \\ \{f_{\{|U|, l o w\}}, f_{\{|U|, α\}}, \dots, f_{\{|U|, m a x\}}\}, \end{matrix}

Based on the definitions of the parameters provided, the execution time of task

t_{i}

on processor

u_{k}

at frequency

f_{\{k, h\}}

can be calculated using the following formula:

\begin{matrix} ω_{\{i, k, h\}} = ω_{\{i, k\}} \times \frac{f_{\{k, m a x\}}}{f_{\{k, h\}}} \end{matrix}

(4)

Among these,

f_{\{k, m a x\}}

represents the maximum frequency on processor

u_{k}

.

Therefore, we can compute the energy consumption of task

t_{i}

on processor

u_{k}

operating at frequency

f_{\{k, h\}}

as follows:

\begin{matrix} E (t_{i}, u_{k}, f_{i}) = P_{\{k, h\}} \times W_{i} \times \frac{1}{f_{i}} = (P_{i n d} + C_{e f} f^{m}) \times \frac{W_{i}}{f_{i}} \end{matrix}

(5)

The total energy consumption of application

G

can be represented as follows:

\begin{matrix} E (G) = \sum_{i = 1}^{|N|} E (t_{i}, u_{k}, f_{\{k, h\}}) \end{matrix}

(6)

3.3. Reliability Model

There are two primary types of faults within computing systems: transient faults (also known as random hardware faults) and permanent faults. Once a permanent fault occurs, the affected processor cannot recover autonomously and must be replaced. Transient faults, on the other hand, appear and last only briefly, disappearing without causing lasting damage to the processor. Therefore, this research primarily focuses on transient faults. Typically, in applications based on Directed Acyclic Graphs (DAGs), the occurrence of transient faults in tasks follows a Poisson distribution. The reliability of an event within time interval

t

can be expressed as

\begin{matrix} R (t) = e^{- λ t} \end{matrix}

(7)

The symbol

λ

represents the total fault rate per time unit for a processor. We use

λ_{k}

to denote the constant fault rate per time unit on processor

u_{k}

. Thus, the fault rate can be expressed as

\begin{matrix} λ_{k} (f_{i}) = λ_{F} \cdot 10^{\frac{d (1 - f_{i})}{1 - f_{m i n}}} \end{matrix}

(8)

where

λ_{F}

is the average number of faults per second at the maximum frequency.

For the task

t_{i}

executed on processor

u_{k}

, the reliability during its execution time is expressed as

\begin{matrix} R (t_{i}, u_{k}, f_{i}) = e^{- λ_{k} w_{\{i, k\}}} = e^{- λ_{k} (f_{i}) \frac{W_{i}}{f_{i}}} = e^{- λ_{F} \cdot 10^{\frac{d (1 - f_{i})}{1 - f_{m i n}}}} \end{matrix}

(9)

In the formula,

W_{i}

represents the execution time of task

t_{i}

at the maximum frequency. From this, the failure probability of task

t_{i}

can be expressed as

\begin{matrix} 1 - R (t_{i}, u_{k}) = 1 - e^{- λ_{k} ω_{\{i, k\}}} \end{matrix}

(10)

So, the reliability of a parallel application with priority constraints is given by

\begin{matrix} R (G) = \prod_{t_{i} \in N} R (t_{i}) \end{matrix}

(11)

3.4. Problem Description

This research focuses on optimizing task allocation and frequency levels within heterogeneous distributed systems to minimize both the makespan and energy consumption. Using the model described above, we can formulate our problem as follows: Given a Directed Acyclic Graph as

G = \{N, E\}

and a set of processors (

U = \{u_{1}, u_{2}, \dots, u_{m}\}

) that support Dynamic Voltage and Frequency Scaling (DVFS) technology, our objective is to find a scheduling algorithm that minimizes both the makespan and energy consumption, denoted as

R (G)

, while also meeting reliability requirements.

\begin{array}{l} M i n i m i z e \{M a k e s p a n (G), E (G)\} \\ s u b j e c t t o : R (G) \geq R_{r e q} (G) \end{array}

(12)

4. Energy-Aware Weighted Scheduling Algorithm with Reliability Constraints

In this section, we introduce the energy-aware weighted scheduling algorithm with reliability constraints. The algorithm is structured into two main phases: the task priority ordering phase and the processor allocation phase.

4.1. The Task Priority Ordering Phase

In task scheduling for heterogeneous distributed systems, the first step involves determining the execution order of the tasks. For this purpose, we employ a widely used method, specifically the HEFT (heterogeneous earliest finish time) algorithm. This algorithm utilizes the calculation of the upward rank value (

r a n k_{u p}

), which is computed as follows:

\begin{matrix} r a n k_{u p} (t_{i}) = \{\begin{matrix} \bar{ω_{i}}, i f (t_{i} = t_{e x i t}) \\ \bar{ω_{i}} + \max_{t_{j} \in s u c c (t_{i})} \{\bar{c_{i, j}} + r a n k_{u p} (t_{j})\}, i f (t_{i} \neq t_{e x i t}) \end{matrix} \end{matrix}

(13)

In the formula,

t_{e x i t}

represents the exit task, and

s u c c (t_{i})

is the set of direct successors to task

t_{i}

. The term

\bar{c_{i, j}}

refers to the average communication cost between tasks

t_{i}

and

t_{j} .

After calculating the upward rank values (

r a n k_{u p}

), these are sorted in descending order. A higher

r a n k_{u p}

value signifies a higher priority for the task.

4.2. The Processor Allocation Phase

Considering task reliability, energy consumption, and completion times, this paper introduces a new method for allocating processors. The complexity of scheduling in a Directed Acyclic Graph (DAG) primarily arises from the dependencies among its nodes, where the in-degree and out-degree of node tasks and their execution times are crucial to the scheduling outcomes. In the HEFT algorithm, while the upward rank values (

r a n k_{u p}

) guide scheduling decisions, it fails to account for other influential factors, thereby potentially impacting the DAG’s makespan.

Here, we have designed a metric to evaluate the criticality of the current node, referred to as the index of execution (

I O E

). The design of this metric is as follows:

\begin{matrix} I O E (t_{i}) = \frac{I D (t_{i}) \times \bar{ω_{i}} + O D (t_{i}) \times \bar{ω_{i}}}{2} \end{matrix}

(14)

In this formulation,

I D (t_{i})

represents the in-degree of task

t_{i}

, and

O D (t_{i})

represents its out-degree. The index of execution (

I O E

) employs a weighted approach to holistically consider the dependencies before and after a task, as well as the impact of execution times on scheduling outcomes. This optimization explicitly gives priority to the high-degree DAG nodes on the critical path and directly adjusts the task allocation based on the graph topological characteristics. The main steps of the energy-aware weighted scheduling algorithm under reliability constraints (EAWRS) and its presentation in Algorithm 1 are as follows:

Algorithm 1: EAWRS
Input: $G = \{N, E\}$ , $R_{g i v e n}$ Output: $E (G), M a k e s p a n (G), R (G)$
1:	Calculate $r a n k_{u p}$ of all tasks
2:	Calculate $I O E (t_{i})$ according to Equation (14)
3:	Push $r a n k_{u p}$ valuese into queue $u r v$
4:	Push $I O E$ values into queue $V Q$ in decreasing order
5:	for (∀ $i$ , $t_{i} \in u r v$ ) do
6:	if $t_{i} \in [V Q [0], V Q [l]]$ then
7:	Assign $t_{i}$ to processor satisfying:
8:	$u_{k} \leftarrow \underset{1 \leq k \leq m}{m i n} \{α \cdot \frac{E_{m a x} (t_{i}) - E (t_{i}, u_{k})}{E_{m a x} (t_{i}) - E_{m i n} t_{i}} + (1 - α) \frac{F t (t_{i}, u_{k}) - F t_{m i n} (t_{i})}{F t_{m a x} (t_{i}) - F t_{m i n} (t_{i})}\}$
9:	else
10:	Assign $n_{i}$ to processor satisfying:
11:	$u_{k} \leftarrow \underset{1 \leq k \leq m}{m i n} E ((t_{i}, u_{k}))$
12:	end if
13:	end for
14:	return $E (G), M a k e s p a n (G), R (G$ )

Calculate the $I O E$ value for each task.
Place the calculated IOE values in queue $V Q$ in decreasing order.
If task $t_{i}$ is within the range of $([V Q[0], V Q [l])$ , it indicates a high criticality level, and it is then assigned to processor $u_{k}$ with

$\begin{matrix} \underset{1 \leq k \leq m}{m i n} \{α \cdot \frac{E_{m a x} (t_{i}) - E (t_{i}, u_{k})}{E_{m a x} (t_{i}) - E_{m i n} t_{i}} + (1 - α) \frac{F t (t_{i}, u_{k}) - F t_{m i n} (t_{i})}{F t_{m a x} (t_{i}) - F t_{m i n} (t_{i})}\} \end{matrix}$

(15)

where $F t (t_{i}, u_{k})$ represents the completion time of task $t_{i}$ on processor $u_{k}$ .
If task $t_{i}$ is within the range of $([V Q[l + 1], V Q[n - 1])$ , it indicates a lower criticality level, and it should then be assigned to processor $u_{k}$ with $\underset{1 \leq k \leq m}{m i n} E ((t_{i}, u_{k}))$ .
Calculate the upward rank values ( $r a n k_{u p}$ ) for each task and assign tasks to the appropriate processors according to Formulas (14) and (15). The final result is a scheduling solution.

Among them, the parameter

α (α \geq 0)

is used to balance completion time and energy consumption. When

l = n

and

α = 0

, we obtain maximum reliability allocation. Here, we change

α

from 0.0 to 1.0 with a step size of 0.1 so as to gradually find the near-optimal solution. The time complexity of EAWRS is

O (n m^{3} L_{α})

, where the parameter

L_{α}

is related to the considered frequency range and represents the accuracy of reliability constraints.

L_{α}

is the number of iterations on α, and the number of iterations on

l

is equal to the number of subtasks

n

.

5. Search-Based Optimal Energy Frequency Adjustment Algorithm

After selecting suitable processors for each task using the EAWRS in the previous section, the next step is to find the optimal frequency that satisfies reliability constraints while minimizing energy consumption, thereby achieving further optimization. In this subsection, we introduce a frequency search algorithm, SEFFA (search-based optimal energy frequency adjustment algorithm), which aims to minimize energy consumption under energy constraints. Utilizing the EAWRS (energy-aware weighted scheduling algorithm with reliability constraints), we obtain an initial processor allocation and scheduling result. We then adjust the frequencies to reduce energy consumption even more. Given that different heterogeneous processors may have optimal frequencies for energy consumption that vary, our objective now becomes

\begin{array}{l} M i n i m i z e E (G) \\ s u b j e c t t o : R (G) \geq R_{r e q} (G) \end{array}

(16)

In this context, we can employ the Karush–Kuhn–Tucker (KKT) conditions to solve the problem. Therefore, we can construct a Lagrangian function as follows:

\begin{matrix} L (f, σ) = E (G, f) + σ (R_{r e q} (G) - R (G, f)) \end{matrix}

(17)

In this formulation,

f = (f_{1}, f_{2}, \dots, f_{n})

represents the vector of frequencies for the processors;

E (G, f)

denotes the total energy consumption of the DAG, expressed as

E (G, f) = \sum_{i = 1}^{n} E (t_{i}, f_{i})

, and

R (G, f)

indicates the reliability of the DAG, given by

R (G, f) = \prod_{i = 1}^{n} R (t_{i}, f_{i})

.

Next, we will proceed with the derivative with respect to

f_{i}

:

\begin{matrix} \frac{\partial L (f, σ)}{\partial f_{i}} = \frac{\partial E (G, f)}{\partial f_{i}} - σ \frac{\partial R (G, f)}{\partial f_{i}} \end{matrix}

(18)

The energy function

E (G, f)

is the sum of the energy consumption of each subtask, while

R (G, f)

is the product of the reliability of each task.

Next, the derivative of the energy function

E (t_{i}, u_{k}, f_{i})

with respect to

f_{i}

is

\begin{array}{l} \frac{\partial E (t_{i}, u_{k}, f_{i})}{\partial f_{i}} = \frac{W_{i}}{f_{i}} \frac{\partial (P_{i n d} + C_{\{k, e f\}} f_{i}^{m_{k}})}{\partial f_{i}} - (P_{i n d} + C_{\{k, e f\}} f_{i}^{m_{k}}) \frac{W_{i}}{f_{i}^{2}} \\ = \frac{W_{i}}{f_{i}} (m_{k} C_{\{k, e f\}} f_{i}^{m_{k} - 1}) - (P_{i n d} + C_{\{k, e f\}} f_{i}^{m_{k}}) \frac{W_{i}}{f_{i}^{2}} \\ = W_{i} (\frac{m_{k} C_{\{k, e f\}} f_{i}^{m_{k} - 1}}{f_{i}} - \frac{P_{i n d} + C_{\{k, e f\}} f_{i}^{m_{k}}}{f_{i}^{2}}) \\ = W_{i} (\frac{m_{k} C_{\{k, e f\}} f_{i}^{m_{k} - 1}}{f_{i}} - \frac{P_{i n d}}{f_{i}^{2}} - \frac{C_{\{k, e f\}} f_{i}^{m_{k}}}{f_{i}^{2}}) \\ = W_{i} (\frac{m_{k} C_{\{k, e f\}} f_{i}^{m_{k} - 2}}{f_{i}^{0}} - \frac{P_{i n d}}{f_{i}^{2}} - \frac{C_{\{k, e f\}} f_{i}^{m_{k}}}{f_{i}^{2}}) \\ = W_{i} (m_{k} C_{\{k, e f\}} f_{i}^{m_{k} - 2} - \frac{P_{i n d}}{f_{i}^{2}} - C_{\{k, e f\}} f_{i}^{m_{k} - 2}) \\ = W_{i} ((m_{k} - 1) C_{\{k, e f\}} f_{i}^{m_{k} - 2} - \frac{P_{i n d}}{f_{i}^{2}}) \end{array}

(19)

The derivative of the reliability function

R (t_{i}, u_{k}, f_{i})

with respect to

f_{i}

is

\begin{matrix} \frac{\partial R (t_{i}, u_{k}, f_{i})}{\partial f_{i}} = R (t_{i}, u_{k}, f_{i}) \cdot (- \frac{\partial (λ_{k} (f_{i}) \frac{W_{i}}{f_{i}})}{\partial f_{i}}) \end{matrix}

(20)

where

λ_{k} (f_{i}) = λ_{F} \cdot 10^{\frac{d (1 - f_{i})}{1 - f_{m i n}}}

; its derivative is calculated as follows:

\begin{matrix} \frac{\partial λ_{k} (f_{i})}{\partial f_{i}} = λ_{F} \cdot \ln (10) \cdot d \cdot \frac{- 1}{1 - f_{m i n}} \cdot 10^{\frac{d (1 - f_{i})}{1 - f_{m i n}}} \end{matrix}

(21)

So:

\begin{array}{l} \frac{\partial R (t_{i}, u_{k}, f_{i})}{\partial f_{i}} = R (t_{i}, u_{k}, f_{i}) (- \frac{W_{i}}{f_{i}} λ_{F} \cdot \ln (10) \cdot d \cdot \frac{- 1}{1 - f_{m i n}} \cdot 10^{\frac{d (1 - f_{i})}{1 - f_{m i n}}} + λ_{k} (f_{i}) \frac{W_{i}}{f_{i}^{2}}) \\ \begin{matrix} = R (t_{i}, u_{k}, f_{i}) (\frac{W_{i} λ_{F} \cdot \ln (10) \cdot d \cdot \frac{- 1}{1 - f_{m i n}} \cdot 10^{\frac{d (1 - f_{i})}{1 - f_{m i n}}}}{(1 - f_{m i n}) f_{i}} + \frac{λ_{k} (f_{i}) W_{i}}{f_{i}^{2}}) \end{matrix} \end{array}

(22)

The derivative of total reliability is

\begin{matrix} \frac{\partial R (G, f)}{\partial f_{i}} = R (G, f) \cdot \sum_{j = 1}^{n} [\frac{\partial R (t_{j}, u_{k}, f_{j})}{\partial f_{j}}] \end{matrix}

(23)

For

f_{i}

:

\begin{matrix} \frac{\partial R (G, f)}{\partial f_{i}} = R (G, f) \cdot [\frac{W_{i} λ_{F} \cdot \ln (10) \cdot d \cdot 10^{\frac{d (1 - f_{i})}{1 - f_{m i n}}}}{(1 - f_{m i n}) f_{i}} + \frac{λ_{k} (f_{i}) W_{i}}{f_{i}^{2}}] \end{matrix}

(24)

Simultaneously solving the equations by substituting the derivatives of the energy function and the reliability function into the KKT conditions yields the following:

W_{i} ((m - 1) C_{e} f_{i}^{m - 2} - \frac{P_{i n d}}{f_{i}^{2}}) = σ R (G, f) [\frac{W_{i} λ_{F} \cdot \ln (10) \cdot d \cdot 10^{\frac{d (1 - f_{i})}{1 - f_{m i n}}}}{(1 - f_{m i n}) f_{i}} + \frac{λ_{k} (f_{i}) W_{i}}{f_{i}^{2}}]

(25)

(m - 1) C_{e} f_{i}^{m - 2} - \frac{P_{i n d}}{f_{i}^{2}} = σ R (G, f) \cdot [\frac{λ_{F} \cdot \ln (10) \cdot d \cdot 10^{\frac{d (1 - f_{i})}{1 - f_{m i n}}}}{(1 - f_{m i n}) f_{i}} + \frac{λ_{k} (f_{i})}{f_{i}^{2}}]

(26)

As seen from Equation (26), the left-hand side is an increasing function of

f_{i}

, while the right-hand side is a decreasing function of

f_{i}

. Therefore, within the domain of

f_{i}

, there exists only one intersection point, i.e., a unique solution, which demonstrates the existence of a unique optimal frequency for

f_{i}

. Due to the monotonicity of the energy and reliability functions with respect to frequency, the KKT condition indicates the existence of a unique global optimum. Although the computational complexity of directly solving the KKT system is large, we utilize the Fibonacci search algorithm as a numerically effective method for locating the minimum value of the energy function under reliability constraints. The concave–convex structure of this problem ensures that the Fibonacci search converges to the same solution as the optimal point based on KKT and has predefined precision.

In this study, we have designed a simple Fibonacci search algorithm to find the set of frequencies for

f_{i}

that satisfies R

(G, f) = R_{g i v e n}

, thereby identifying the optimal frequency. As shown in Algorithm 2, the steps are as follows:

Algorithm 2: SEFFA
Input: $G = (N, E), U, R_{g i v e n}$ Output: $f_{1}, f_{2}, \dots, f_{n}$
1:	Initialize Fibonacci sequence $F_{n}$ with sufficient length
2:	Compute the upper and lower bounds of the search region $u b$ and $l b$
3:	while $(u b - l b > ε)$ do
4:	$n$ = length_of_ $F_{n}$
5:	Compute $σ 1$ and $σ 2$
6:	for each processor $u_{k}$ do:
7:	$f_{l b}$ = $f_{k, m i n}$ , $f_{u b}$ = $f_{k, m a x}$
8:	while ( $f_{u b}$ − $f_{l b}$ > $ε$ ) do
9:	$f$ = ( $f_{l b}$ + $f_{u b})$ /2
10:	if $σ$ $(f)$ < $m i d$ then
11:	$f_{l b}$ = $f$
12:	else
13:	$f_{l b}$ = $f$
14:	end if
15:	end while
16:	for each task $t_{i}$ allocated to the processor $u_{k}$ do
17:	$f_{i}$ = $f$
18:	end for
19:	Compute $R (G, f)$
20:	if $R (G, f)$ < $R_{g i v e n}$ then
21:	$l b$ = $m i d$
22:	else
23:	$u b$ = $m i d$
24:	end if
25:	Update $F_{n}$ for next iteration
26:	end while
27:	return $f_{1}, f_{2}, \dots, f_{n}$

Generating a Fibonacci Sequence: Generate a sufficiently long Fibonacci sequence of length

F_{n}

to determine the search interval. The sequence is defined as

F_{0}

= 0,

F_{1}

= 1, and

F_{n}

=

F_{n - 1}

+

F_{n - 2}

. Choose an appropriate

n

to ensure that the length of the sequence meets the needs of the search interval.

Initializing the Search Interval: Set the initial search interval as [a, b], where

a

and

b

represent the possible range of values for the frequency f. Typically, a = 0, while b is a larger value.

Fibonacci search steps: Use the Fibonacci sequence to progressively narrow the search interval until the desired precision is achieved. First, two initial points are calculated; then the corresponding frequency

f_{i}

is calculated for each

σ

value; then total reliability is calculated and compared with the reliability requirements, and the iteration stops when the reliability requirements are met. The algorithm returns the optimal frequency setting for each task. In Step 3, the two initial points are calculated as follows:

\begin{matrix} σ_{1} = a + \frac{F_{n - 2}}{F_{n}} (b - a) \end{matrix}

(27)

\begin{matrix} σ_{2} = a + \frac{F_{n - 1}}{F_{n}} (b - a) \end{matrix}

(28)

The complexity of SEFFA is

O (m n l o g L)

, and the overall scheduling process is shown in Figure 2.

6. Experiments

In this section, we integrate the proposed EAWRS (energy-aware weighted scheduling algorithm with reliability constraints) and the SEFFA (search-based energy frequency adjustment algorithm) to address the scheduling of Directed Acyclic Graphs (DAGs) on heterogeneous distributed systems. Simultaneously, we consider constraints on reliability, as well as objectives related to energy consumption and makespan under these constraints. This subsection will evaluate the performance from various perspectives, including the diversity of DAG applications.

6.1. Comparative Algorithms

We compare our EAWRS + SEFFA approach with the state-of-the-art methods ODS + SOEA, MR + SEFFA, and HEFT + SEFFA. Since MR and HEFT focus solely on task allocation, we employ the SEFFA to assist in determining processor frequencies for a fairer comparison.

6.2. Experimental Platform and DAG Applications

The probability of task failures in real system is very small and difficult to control. So, similar to most existing studies, we ran simulations. These experiments were carried out on a laptop with 16 GB of RAM and a 12th Gen Intel(R) Core(TM) i7-12650H processor (Intel Corporation, Santa Clara, CA, USA) using Python version 3.9.12. The processor parameters were randomly set within the following ranges:

(P_{i n d} \in [0.4, 0.8]

,

C_{k, e f} \in [0.8, 1.3]

,

f \in [0.3, 1.0]

, and

m_{k} \in [2.7, 3.0]

,

λ_{F} \in

[0.1 × 10⁻⁵, 1.0 × 10⁻⁵], and

d \in [1, 3]

). They reflect the characteristics of the real world, such as for Intel Mobile Pentium III (Intel Corporation, Santa Clara, CA, USA) and ARM Cortex-A9 (ARM Holdings, Cambridge, UK). The purpose of performing this is to ensure that the experimental environment can simulate the diversity of real heterogeneous computing platforms. It is important to note that this work does not consider redundancies that can be deployed to achieve extremely high reliability. For the actual deployment of other systems, these parameters can be adjusted based on the specific hardware configuration and reliability requirements of the target platform while preserving the core scheduling logic of our algorithm. The adjustments include calibrating processor-specific parameters and fine-tuning frequency adjustment thresholds to align with the actual DVFS capabilities of the deployed system.

We evaluated the performance under three different DAG applications: fast Fourier transform (FFT), Gaussian Elimination (GE), and random DAGs. Each has distinct characteristics:

Fast Fourier Transform (FFT): This is an efficient algorithm for computing discrete Fourier transforms in computers, significantly reducing the number of multiplications required. FFT applications exhibit high levels of parallelism. An example of a parallel FFT application DAG for

ρ

= 4 is shown in Figure 3. The parameter

ρ

is used to denote the computational scale of FFT and can be used to determine the number of FFT tasks

|N|

, as shown in formula

|N| = (2 + ρ) \times 2^{ρ} - 1

, where

ρ = 2^{x}

and

x

is an integer. The calculation formula is as follows:

\begin{matrix} |N| = (2 \times ρ - 1) + ρ \times \log_{2} ρ = (2 + ρ) \times 2^{ρ} - 1 \end{matrix}

(29)

Gaussian Elimination (GE) is primarily used for solving systems of linear equations and can also be used to compute the rank of a matrix or its inverse, making it an essential algorithm in linear algebra. The method involves a series of row transformations to convert a matrix into an upper triangular form, followed by back substitution to obtain particular solutions or a set of solutions. GE has a lower degree of parallelism compared to other methods. An illustration of the Gaussian Elimination DAG for ρ = 5 is shown in Figure 4 The parameter

ρ

is utilized to indicate the computational scale of GE. The calculation of the number of GE tasks

|N|

is given in formula

|N| = ρ^{2} + ρ - 2 / 2

, where

ρ

is an integer.

The random parameter settings for the task sets subject the randomly generated DAG task sets to constraints within specified parameter ranges. These tasks set exhibit random parallelism, and their computational and communication costs are also determined randomly.

6.3. Evaluation Metrics

To ensure the fairness of the comparison, we meticulously designed the following evaluation scenario: By establishing reliability as a strict constraint for the four methods and constructing diverse scenarios through adjusting its value, we executed three other algorithms under this hard reliability constraint to pursue the optimal energy consumption and makespan. Subsequently, we conducted a comparative analysis with the optimal solution in the solution set generated by the EAWRS + SEFFA method across two dimensions: makespan and energy consumption.

6.4. Results and Analysis

This section presents the experiment and result analysis based on real parallel applications. The MR algorithm, the ODS algorithm, and the HEFT algorithm are selected as benchmark algorithms for evaluating scheduling performance. Since HEFT and MR only perform task allocation, the SEFFA is employed to determine processor operating frequencies. The application data used in the experiments consists of real parallel applications, including the parallel implementations of FFT and GE. The following will be five implementations. Experiments 1 and 3 evaluate scalability by changing the number of tasks in FFT and GE applications, respectively, while Experiments 2 and 4 assess the trade-offs among energy, completion time, and reliability under fixed workloads but dynamic constraints. Experiment 5 uses unpredictable parallel random DAGs to test the generalization ability. These experiments collectively verify the adaptability of EAWRS + SEFFA to workload sizes (Experiments 1 and 3), reliability-critical scenarios (Experiments 2 and 4), and unstructured workflows (Experiment 5).

Experiment 1: Analysis of energy consumption and makespan under different task numbers in the FFT application: In the DAG application generated by FFT, the EAWRS + SEFFA method is compared with three existing methods: HEFT + SEFFA, MR + SEFFA, and ODS + SOEA. The number of subtasks in the FFT application is controlled by the parameter ρ, with values of 3, 4, 5, and 6, corresponding to 39, 95, 223, and 511 subtasks, respectively. As shown in Figure 5, an increase in the number of tasks leads to a rise in makespan for all algorithms. The EAWRS + SEFFA method achieves competitive makespan values across all scenarios. Specifically, when the number of tasks is 39, 95, 223, and 511, the makespan values for EAWRS + SEFFA are 514.81, 708.19, 867.55, and 1158.84, respectively. On average, it reduces the makespan by 2.58% compared to HEFT + SEFFA, by 56.44% compared to MR + SEFFA, and by 4.76% compared to ODS + SOEA.

Figure 6 illustrates the energy consumption of the FFT application under different numbers of tasks. The EAWRS + SEFFA method demonstrates outstanding energy efficiency across all task scales. When the number of tasks increases from 39 to 511, the energy consumption of EAWRS + SEFFA is 960.75, 3066.83, 6414.95, and 14,134.37, respectively. Compared with HEFT + SEFFA, the algorithm achieves an average energy reduction of 14.56%; compared with MR + SEFFA, it reduces energy consumption by an average of 8.47%; and compared with the energy-optimized ODS + SOEA method, it still achieves an average reduction of 2.92%. These results collectively demonstrate the significant energy efficiency advantage of EAWRS + SEFFA under dynamic task loads.

Taken together, in the FFT application, the EAWRS + SEFFA method demonstrates clear advantages in energy consumption control compared to the other three algorithms as the number of tasks varies.

Experiment 2: Energy consumption and makespan under different reliability constraints in FFT applications: According to Figure 7, the makespan performance of each algorithm under different reliability constraints is examined in the FFT application. The EAWRS + SEFFA method exhibits complex yet competitive performance. When the reliability constraint varies between 0.9 and 0.99, at certain values such as 0.93, the makespan of EAWRS + SEFFA is 665.64, which is lower than HEFT’s 536.61. Compared to MR, the advantage of EAWRS + SEFFA is even more pronounced. Across the entire range of reliability constraints, the makespan of EAWRS + SEFFA is significantly lower than that of MR + SEFFA. For example, with a reliability constraint of 0.95, the makespan of MR reaches 2657.56, while that of EAWRS + SEFFA is only 667.67, demonstrating remarkable efficiency. Compared to ODS + SOEA, EAWRS + SEFFA performs similarly in most cases but slightly better in some instances. For example, at a reliability constraint of 0.93, EAWRS + SEFFA achieves a makespan of 665.64, which is lower than ODS + SOEA’s 668.07. The results indicate that EAWRS + SEFFA delivers excellent makespan performance under different reliability constraints, especially under high-reliability constraints, where its advantages become more prominent.

Figure 8 presents the energy consumption results under different reliability constraints. The EAWRS + SEFFA method achieves outstanding energy efficiency. Compared to HEFT + SEFFA, the average energy consumption level is reduced by approximately 38.38%. For instance, at a reliability constraint of 0.9, HEFT + SEFFA consumes 1495.76, whereas EAWRS + SEFFA only consumes 930.71, reducing energy consumption by 37.77%. Compared to MR + SEFFA, EAWRS + SEFFA achieves an average energy reduction of approximately 31.78%. At a reliability constraint of 0.97, EAWRS + SEFFA exhibits remarkable energy efficiency, with an energy reduction of up to 58.44%. Even when compared to the energy-efficient ODS + SOEA method, EAWRS + SEFFA still maintains a slight advantage, achieving an average energy reduction of approximately 2.91%.

Collectively, under different reliability constraints, the EAWRS + SEFFA method demonstrates significant advantages in energy consumption control compared to the other three algorithms, effectively reducing energy consumption and exhibiting superior energy-saving characteristics.

Experiment 3: Energy consumption and makespan under different numbers of tasks in GE applications: In GE-generated DAG applications, the proposed EAWRS + SEFFA method is compared with five existing methods. The number of tasks is controlled by parameter

ρ

, which is set to 12, 16, 20, and 32, corresponding to subtask counts

N

of 77, 135, 209, and 527, respectively. Each subtask’s computation and communication costs are randomly generated within the range of [10, 100].

As shown in Figure 9, the makespan results of the GE application reveals the differentiated performance of the EAWRS + SEFFA method. Under lower task volumes (77 and 135 tasks), EAWRS + SEFFA achieves slightly shorter makespans than HEFT + SEFFA, with reductions of 0.58% and 3.57%, respectively. However, under higher task volumes (209 and 527 tasks), its makespan increases by 2.16% and 5.37% compared to HEFT + SEFFA, with the gap widening as the task volume grows. In contrast, EAWRS + SEFFA significantly outperforms MR + SEFFA across all task volumes, achieving an average makespan reduction of 39.03%, with a maximum reduction of 46.36% at 209 tasks. Compared to ODS + SOEA, EAWRS + SEFFA maintains a consistent advantage, with an average makespan reduction of 8.39%, though the improvement is less pronounced than that over MR + SEFFA.

Figure 10 presents the energy consumption results for the GE application, demonstrating the sustained energy efficiency advantage of the EAWRS + SEFFA method. Compared to HEFT + SEFFA, the algorithm achieves significant energy savings across all task volumes, notably reducing energy consumption by 32.75% at 135 tasks (from 5131.68 to 3450.83) and by 28.65% at 527 tasks (from 23,251.71 to 16,589.36), with an average energy reduction of 19.51%. Compared to MR + SEFFA, it achieves an average energy saving of 3.75%, with a maximum of 7.46% for 135 tasks. Even when compared with the energy-optimized ODS + SOEA method, EAWRS + SEFFA maintains an average energy advantage of 2.68%, with a typical case being a 0.86% reduction at 527 tasks (from 16,733.47 to 16,589.36).

To sum up, under different task counts in the GE application, EAWRS + SEFFA performs well in terms of both energy consumption and makespan.

Experiment 4: Energy consumption and makespan results under different reliability constraints in the GE application: Figure 11 presents the makespan performance of EAWRS + SEFFA in the GE application under different reliability constraints. The results indicate that compared to the MR algorithm, EAWRS + SEFFA demonstrates a significant advantage, achieving a consistently shorter makespan across various reliability constraint levels. This suggests that EAWRS + SEFFA is more efficient than MR + SEFFA under specific conditions. When compared to ODS + SOEA, the two algorithms exhibit similar performance, with EAWRS + SEFFA achieving slight improvements in certain scenarios. Overall, in the GE application, EAWRS + SEFFA demonstrates competitive makespan optimization under different reliability constraints, showing clear advantages over MR + SEFFA and maintaining comparable performance with ODS + SOEA.

Figure 12 illustrates the energy consumption performance of EAWRS + SEFFA under different reliability constraints, where the algorithm exhibits a clear advantage in energy efficiency. Compared to HEFT + SEFFA, EAWRS + SEFFA achieves an average energy reduction of approximately 42.99%. At various reliability levels, such as 0.91, HEFT + SEFFA consumes 3819.81, whereas EAWRS + SEFFA reduces energy consumption to 2015.81, a reduction of 47.23%, demonstrating its effectiveness in minimizing energy consumption when compared to HEFT. Against the MR algorithm, EAWRS + SEFFA continues to excel, with an average energy reduction of 32.46%. For example, at a reliability level of 0.93, the energy consumption of MR + SEFFA is 3783.66, while EAWRS + SEFFA reduces it to 2006.22, a reduction of 46.98%, further proving its energy-saving efficiency. Even when compared to ODS + SOEA, which is known for its energy efficiency, EAWRS + SEFFA still achieves a slight improvement, with an average energy reduction of 3.73%. For instance, at a reliability level of 0.94, ODS + SOEA consumes 1790.98, while EAWRS + SEFFA reduces it to 1724.24.

Through this experiment, it is evident that EAWRS + SEFFA demonstrates outstanding energy-saving performance across different reliability constraints in the GE application, making it highly competitive in energy efficiency.

Experiment 5: Experimental results and analysis based on random DAG task sets: This experiment was conducted using randomly generated DAG task graphs, with the number of tasks set to 100. The results indicate that as reliability constraints become more stringent, EAWRS + SEFFA effectively balances the makespan and energy consumption.

Figure 13 illustrates the makespan performance of EAWRS + SEFFA under different reliability constraints for random task graphs. Compared to HEFT, the makespan of EAWRS + SEFFA is, on average, 27.91% longer. In most reliability constraint conditions, HEFT + SEFFA achieves faster task completion. For example, at a reliability level of 0.94, the makespan of HEFT is 558.6, whereas EAWRS + SEFFA increases it to 791.31, an increase of 41.66%. However, when compared to MR, EAWRS + SEFFA shows a significant advantage, reducing the makespan by an average of 54.75%. At a reliability level of 0.99, MR + SEFFA results in a makespan of 3270.84, whereas EAWRS + SEFFA reduces it to 835.75, an impressive 74.45% reduction, demonstrating the effectiveness of EAWRS + SEFFA in handling high-reliability requirements. When compared to ODS + SOEA, the two algorithms exhibit highly similar performances, with only minor variations across different reliability constraints. Overall, while EAWRS + SEFFA lags behind HEFT in terms of makespan, it significantly outperforms MR + SEFFA and remains competitive against ODS + SOEA.

Figure 14 presents the energy consumption performance of EAWRS + SEFFA under different reliability constraints in random task graphs, highlighting its remarkable energy-saving capabilities. Compared to HEFT + SEFFA, EAWRS + SEFFA reduces energy consumption by an average of 35.56%. At various reliability levels, such as 0.91, HEFT + SEFFA consumes 3847.39, whereas EAWRS + SEFFA reduces it to 2303.04, achieving a 40.14% reduction. Against MR + SEFFA, EAWRS + SEFFA also demonstrated superior performance, reducing energy consumption by an average of 30.23%. For instance, at a reliability level of 0.92, MR + SEFFA consumes 4443.08, whereas EAWRS + SEFFA lowers it to 2483.09, achieving a 44.11% reduction and showcasing the algorithm’s ability to effectively save energy under varying reliability constraints. Even when compared to ODS + SOEA, while the reduction is relatively small, EAWRS + SEFFA still achieves a noticeable improvement, with an average energy reduction of 1.20%. At a reliability level of 0.95, ODS + SOEA consumes 2668.64, while EAWRS + SEFFA reduces it to 2580.42, a 3.30% reduction.

In brief, EAWRS + SEFFA performs exceptionally well in energy consumption reduction under different reliability constraints for random DAG task graphs, proving its strong energy-saving effectiveness.

Summary: EAWRS + SEFFA demonstrates robust energy optimization capabilities across varying reliability constraints. It effectively reduces the makespan while ensuring reliability, showcasing strong comprehensive scheduling performance. This makes it particularly suitable for heterogeneous distributed systems with stringent requirements for both reliability and energy efficiency. It is worth noting that we repeated each experiment 10 times and took the average value. The standard deviation of the results is always lower than 10% of the average value, indicating that our algorithm has high stability.

6.5. Key Limitations

This study simplifies hardware heterogeneity by simulating processor parameters within a limited range (e.g., the DVFS scope), which may not fully capture the behavior of systems with extreme heterogeneity, such as those containing both ultra-low-power and high-performance cores. Additionally, the reliability model only considers transient faults and does not include permanent faults that may significantly impact the long-term stability of the system.

7. Conclusions

This paper introduces two algorithms: the energy-aware scheduling algorithm based on the weight method under reliability constraints (EAWRS), which considers the in-degree and out-degree of DAG nodes and optimizes execution time and energy consumption within a unified range via normalization, and the optimal energy frequency adjustment algorithm based on Fibonacci search (SEFFA), which further optimizes energy consumption under reliability constraints. Experimental results demonstrate that combining these two algorithms achieves superior performance, outperforming comparative algorithms in terms of energy efficiency and completion time.

The contributions of this work provide valuable insights and guidance for the energy-efficient design of parallel applications in heterogeneous distributed systems. In our study, we only considered scenarios where a single application runs within a heterogeneous distributed system, whereas, in real-world settings, systems typically operate multiple applications simultaneously. Although our methods provide feasible and reasonable solutions within a relatively short period, we are interested in exploring more effective techniques to enhance the overall problem-solving performance. To this end, future research will focus on developing multi-objective optimization models, such as the joint optimization of energy consumption, latency, and reliability, as well as enhancing task allocation strategies for IoT edge nodes and designing cross-platform heterogeneous resource scheduling algorithms. At the same time, we will evaluate the adaptability and practical effectiveness of the proposed algorithms in representative IoT application scenarios, including smart manufacturing, environmental monitoring, and intelligent transportation systems, to facilitate their deployment and practical implementation in real-world systems.

Author Contributions

Conceptualization, Z.C.; methodology, Z.C. and L.C.; software, Z.C.; validation, Z.C.; formal analysis, Z.C.; data curation, L.C.; writing—original draft preparation, Z.C.; writing—review and editing, J.W.; visualization, L.C.; supervision, J.W. and T.T.; project administration, J.W.; funding acquisition, J.W. and T.T. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Key Research and Development Program of Hubei Province (Nos. 2022BCA035).

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author(s).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Rinaldi, M.; Wang, S.; Geronel, R.S.; Primatesta, S. Application of Task Allocation Algorithms in Multi-UAV Intelligent Transportation Systems: A Critical Review. Big Data Cogn. Comput. 2024, 8, 177. [Google Scholar] [CrossRef]
Ibrahim, I.M. Task scheduling algorithms in cloud computing: A review. Turk. J. Comput. Math. Educ. TURCOMAT 2021, 12, 1041–1053. [Google Scholar]
Liu, Y.; Liu, S.; Wang, Y.; Zhao, H.; Liu, S. Video steganography: A review. Neurocomputing 2019, 335, 238–250. [Google Scholar] [CrossRef]
Li, F.; Feng, R.; Han, W.; Wang, L. Ensemble model with cascade attention mechanism for high-resolution remote sensing image scene classification. Opt. Express 2020, 28, 22358–22387. [Google Scholar] [CrossRef] [PubMed]
Zhang, C.; Zheng, Z. Task migration for mobile edge computing using deep reinforcement learning. Future Gener. Comput. Syst. 2019, 96, 111–118. [Google Scholar] [CrossRef]
Liu, F.; He, Y.; He, J.; Gao, X.; Huang, F. Optimization of Big Data Parallel Scheduling Based on Dynamic Clustering Scheduling Algorithm. J. Signal Process. Syst. 2022, 94, 1243–1251. [Google Scholar] [CrossRef]
Wu, J. Energy-efficient scheduling of real-time tasks with shared resources. Future Gener. Comput. Syst. 2016, 56, 179–191. [Google Scholar] [CrossRef]
Gao, K.; Huang, Y.; Sadollah, A.; Wang, L. A review of energy-efficient scheduling in intelligent production systems. Complex Intell. Syst. 2020, 6, 237–249. [Google Scholar] [CrossRef]
Casini, D.; Biondi, A.; Nelissen, G.; Buttazzo, G. A holistic memory contention analysis for parallel real-time tasks under partitioned scheduling. In Proceedings of the 2020 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS), Sydney, NSW, Australia, 21–24 April 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 239–252. [Google Scholar]
Xie, G.; Xiao, X.; Peng, H.; Li, R.; Li, K. A survey of low-energy parallel scheduling algorithms. IEEE Trans. Sustain. Comput. 2021, 7, 27–46. [Google Scholar] [CrossRef]
Ghafari, R.; Kabutarkhani, F.H.; Mansouri, N. Task scheduling algorithms for energy optimization in cloud environment: A comprehensive review. Clust. Comput. 2022, 25, 1035–1093. [Google Scholar] [CrossRef]
Azizi, S.; Shojafar, M.; Abawajy, J.; Buyya, R. Deadline-aware and energy-efficient IoT task scheduling in fog computing systems: A semi-greedy approach. J. Netw. Comput. Appl. 2022, 201, 103333. [Google Scholar] [CrossRef]
Baciu, M.D.; Capota, E.A.; Stângaciu, C.S.; Curiac, D.-I.; Micea, M.V. Multi-Core Time-Triggered OCBP-Based Scheduling for Mixed Criticality Periodic Task Systems. Sensors 2023, 23, 1960. [Google Scholar] [CrossRef] [PubMed]
Taghinezhad-Niar, A.; Pashazadeh, S.; Taheri, J. Energy-efficient workflow scheduling with budget-deadline constraints for cloud. Computing 2022, 104, 601–625. [Google Scholar] [CrossRef]
Deng, Z.; Cao, D.; Shen, H.; Yan, Z.; Huang, H. Reliability-aware task scheduling for energy efficiency on heterogeneous multiprocessor systems. J. Supercomput. 2021, 77, 11643–11681. [Google Scholar] [CrossRef]
Wen, Y.; Wang, Z.; O’boyle, M.F.P. Smart multi-task scheduling for OpenCL programs on CPU/GPU heterogeneous platforms. In Proceedings of the 2014 21st International Conference on High Performance Computing (HiPC), Goa, India, 17–20 December 2014; IEEE: Piscataway, NJ, USA, 2014; pp. 1–10. [Google Scholar]
Babaei, M.A.; Hasanzadeh, S.; Karimi, H. Cooperative energy scheduling of interconnected microgrid system considering renewable energy resources and electric vehicles. Electr. Power Syst. Res. 2024, 229, 110167. [Google Scholar] [CrossRef]
Qin, Y.; Zeng, G.; Kurachi, R.; Matsubara, Y.; Takada, H. Execution-variance-aware task allocation for energy minimization on the big. little architecture. Sustain. Comput. Inform. Syst. 2019, 22, 155–166. [Google Scholar]
Tang, X.; Fu, Z. CPU–GPU utilization aware energy-efficient scheduling algorithm on heterogeneous computing systems. IEEE Access 2020, 8, 58948–58958. [Google Scholar] [CrossRef]
Stewart, R.; Raith, A.; Sinnen, O. Optimising makespan and energy consumption in task scheduling for parallel systems. Comput. Oper. Res. 2023, 154, 106212. [Google Scholar] [CrossRef]
Hosseinioun, P.; Kheirabadi, M.; Tabbakh, S.R.K.; Ghaemi, R. A new energy-aware tasks scheduling approach in fog computing using hybrid meta-heuristic algorithm. J. Parallel Distrib. Comput. 2020, 143, 88–96. [Google Scholar] [CrossRef]
Abd Elaziz, M.; Xiong, S.; Jayasena, K.P.N.; Li, L. Task scheduling in cloud computing based on hybrid moth search algorithm and differential evolution. Knowl.-Based Syst. 2019, 169, 39–52. [Google Scholar] [CrossRef]
Zhang, Y.W.; Zheng, H. Energy-aware fault-tolerant scheduling for imprecise mixed-criticality systems with semi-clairvoyance Journal of Systems Architecture. J. Syst. Archit. 2024, 151, 103141. [Google Scholar] [CrossRef]
Quan, Z.; Wang, Z.-J.; Ye, T.; Guo, S. Task scheduling for energy consumption constrained parallel applications on heterogeneous computing systems. IEEE Trans. Parallel Distrib. Syst. 2019, 31, 1165–1182. [Google Scholar] [CrossRef]
Topcuoglu, H.; Hariri, S.; Wu, M.-Y. Performance-effective and low-complexity task scheduling for heterogeneous computing. IEEE Trans. Parallel Distrib. Syst. 2002, 13, 260–274. [Google Scholar] [CrossRef]
Xie, G.; Chen, Y.; Xiao, X.; Xu, C.; Li, R.; Li, K. Energy-efficient fault-tolerant scheduling of reliable parallel applications on heterogeneous distributed embedded systems. IEEE Trans. Sustain. Comput. 2017, 3, 167–181. [Google Scholar] [CrossRef]
Xie, G.; Zeng, G.; Chen, Y.; Bai, Y.; Zhou, Z.; Li, R.; Li, K. Minimizing redundancy to satisfy reliability requirement for a parallel application on heterogeneous service-oriented systems. IEEE Trans. Serv. Comput. 2017, 13, 871–886. [Google Scholar] [CrossRef]
Huang, J.; Li, R.; Jiao, X.; Jiang, Y.; Chang, W. Dynamic DAG scheduling on multiprocessor systems: Reliability, energy, and makespan. IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 2020, 39, 3336–3347. [Google Scholar] [CrossRef]
Schieber, B.; Samineni, B.; Vahidi, S. Interweaving real-time jobs with energy harvesting to maximize throughput. In Proceedings of the International Conference and Workshops on Algorithms and Computation, Hsinchu, Taiwan, 22–24 March 2023; Springer Nature: Cham, Switzerland, 2023; pp. 305–316. [Google Scholar]
Ghajari, G.; Ghajari, E.; Mohammadi, H.; Amsaad, F. Intrusion Detection in IoT Networks Using Hyperdimensional Computing: A Case Study on the NSL-KDD Dataset. arXiv 2025, arXiv:2503.03037. [Google Scholar]
Ghajari, G.; Ghimire, A.; Ghajari, E.; Amsaad, F. Network Anomaly Detection for IoT Using Hyperdimensional Computing on NSL-KDD. arXiv 2025, arXiv:2503.03031. [Google Scholar]
Rastgoo, S.; Mahdavi, Z.; Nasab, M.A.; Zand, M.; Padmanaban, S. Using an intelligent control method for electric vehicle charging in microgrids. World Electr. Veh. J. 2022, 13, 222. [Google Scholar] [CrossRef]

Figure 1. An example of the DAG application model.

Figure 2. Scheduling process.

Figure 3. Example of FFT parallel application with

ρ

= 4.

Figure 3. Example of FFT parallel application with

ρ

= 4.

Figure 4. Example of GE parallel application with

ρ

= 5.

Figure 4. Example of GE parallel application with

ρ

= 5.

Figure 5. Makespan under different numbers of tasks in FFT applications.

Figure 6. Energy consumption under different numbers of tasks in FFT applications.

Figure 7. Makespan under different reliability constraints in FFT applications.

Figure 8. Energy consumption under different reliability constraints in FFT applications.

Figure 9. Makespan under different numbers of tasks in GE applications.

Figure 10. Energy consumption under different numbers of tasks in GE applications.

Figure 11. Makespan under different reliability constraints in GE applications.

Figure 12. Energy consumption under different reliability constraints in GE applications.

Figure 13. Makespan under different reliability constraints in random applications.

Figure 14. Energy consumption under different reliability constraints in random applications.

Table 1. Notation and list of definitions.

Notation	Definition
$ω_{i, j}$	Execution time of task $t_{i}$ in processor $u_{j}$
$e_{i, j}$	The edge between task $t_{i}$ and task $t_{j}$
$c_{i, j}$	Communication time from $t_{i}$ to $t_{j}$
$\bar{ω_{i}}$	Average execution time of task $t_{i}$
$P_{i n d}$	Dynamic power consumption independent of processor frequency
$P_{s}$	Static energy consumption
$h$	System state
$P_{d}$	Power consumption related to processor frequency
$C_{\{k, e f\}}$	The effective capacitance of processor $u_{k}$
$m_{k}$	Dynamic power index of processor $u_{k}$
$E (G)$	Energy consumption of application $G$
$R (t_{i}, u_{k})$	Reliability value of task $t_{i}$ in processor $u_{k}$
$R (G)$	Reliability value of application $G$
$r a n k_{u p} (t_{i})$	Upward rank value of task $t_{i}$
$s u c c (t_{i})$	The successor to task $t_{i}$
$F t (t_{i}, u_{k})$	The finish time of executing task $t_{i}$ on processor $u_{k}$
$F t_{m i n} (t_{i})$	The minimum values of finish time of executing task $t_{i}$
$F t_{m a x} (t_{i})$	The maximum values of finish time of executing task $t_{i}$
$R_{r e q}$	Reliability requirement of application $G$
$R_{m i n} (t_{i})$	Reliability value of task $t_{i}$ under the minimum redundancy of the task
$R_{m a x} (t_{i})$	Reliability value of task $t_{i}$ under the maximum redundancy of the task
$W_{i}$	The execution time of sub-task $t_{i}$ on a processor with its maximum frequency
$λ_{k}$	Failure rate of processor $u_{k}$

Table 2. The execution time of tasks on processors.

Task	$u_{1}$	$u_{2}$	$u_{3}$
$t_{1}$	$14$	$16$	$9$
$t_{3}$	$11$	$13$	$19$
$t_{4}$	$13$	$8$	$17$
$t_{2}$	$13$	$19$	$18$
$t_{5}$	$12$	$13$	$10$
$t_{6}$	$13$	$16$	$9$
$t_{9}$	$18$	$12$	$20$
$t_{7}$	$7$	$15$	$11$

Table 3. Power parameters of processors.

$u_{k}$	$P_{\{k, i n d\}}$	$C_{\{k, e f\}}$	$m_{k}$	$f_{\{k, l o w\}}$	$f_{\{k, m a x\}}$
$u_{1}$	$0.03$	$0.8$	$2.9$	$0.26$	$1.0$
$u_{2}$	$0.04$	$0.8$	$2.5$	$0.26$	$1.0$

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chen, Z.; Wu, J.; Cheng, L.; Tao, T. Research on High-Reliability Energy-Aware Scheduling Strategy for Heterogeneous Distributed Systems. Big Data Cogn. Comput. 2025, 9, 160. https://doi.org/10.3390/bdcc9060160

AMA Style

Chen Z, Wu J, Cheng L, Tao T. Research on High-Reliability Energy-Aware Scheduling Strategy for Heterogeneous Distributed Systems. Big Data and Cognitive Computing. 2025; 9(6):160. https://doi.org/10.3390/bdcc9060160

Chicago/Turabian Style

Chen, Ziyu, Jing Wu, Lin Cheng, and Tao Tao. 2025. "Research on High-Reliability Energy-Aware Scheduling Strategy for Heterogeneous Distributed Systems" Big Data and Cognitive Computing 9, no. 6: 160. https://doi.org/10.3390/bdcc9060160

APA Style

Chen, Z., Wu, J., Cheng, L., & Tao, T. (2025). Research on High-Reliability Energy-Aware Scheduling Strategy for Heterogeneous Distributed Systems. Big Data and Cognitive Computing, 9(6), 160. https://doi.org/10.3390/bdcc9060160

Article Menu

Research on High-Reliability Energy-Aware Scheduling Strategy for Heterogeneous Distributed Systems

Abstract

1. Introduction

2. Related Work

3. Models

3.1. System Model

3.2. Energy Model

3.3. Reliability Model

3.4. Problem Description

4. Energy-Aware Weighted Scheduling Algorithm with Reliability Constraints

4.1. The Task Priority Ordering Phase

4.2. The Processor Allocation Phase

5. Search-Based Optimal Energy Frequency Adjustment Algorithm

6. Experiments

6.1. Comparative Algorithms

6.2. Experimental Platform and DAG Applications

6.3. Evaluation Metrics

6.4. Results and Analysis

6.5. Key Limitations

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI