ET: A Metaheuristic Optimization Algorithm for Task Mapping in Network-on-Chip

Li, Ke; Shao, Jingbo; Song, Yan

doi:10.3390/electronics14142846

Open AccessArticle

ET: A Metaheuristic Optimization Algorithm for Task Mapping in Network-on-Chip

by

Ke Li

¹,

Jingbo Shao

^1,* and

Yan Song

²

¹

College of Computer Science and Information Engineering, Harbin Normal University, Harbin 150025, China

²

School of Mathematical Sciences, Mudanjiang Normal University, Mudanjiang 157011, China

^*

Author to whom correspondence should be addressed.

Electronics 2025, 14(14), 2846; https://doi.org/10.3390/electronics14142846

Submission received: 10 June 2025 / Revised: 12 July 2025 / Accepted: 15 July 2025 / Published: 16 July 2025

Download

Browse Figures

Versions Notes

Abstract

In Network-on-Chip (NoC) research, the task mapping problem has attracted considerable attention as a core issue influencing system performance. As an NP-hard problem, it remains challenging, and existing algorithms exhibit limitations in both mapping quality and computational efficiency. To address this, a method named ET (Enhanced Coati Optimization Algorithm) is proposed, which leverages the nature-inspired Coati Optimization Algorithm (COA) for task mapping. An incremental hill-climbing strategy is integrated to improve local search capabilities, and a dynamic mechanism for adjusting the exploration–exploitation ratio is designed to better balance global and local searches. Additionally, an initial mapping strategy based on spectral clustering is introduced, which utilizes inter-task communication strength to cluster tasks, thereby improving the quality of the initial population. To evaluate the effectiveness of the proposed algorithm, the performance of the ET algorithm is compared and analyzed against various existing algorithms in terms of communication cost, energy consumption, and latency, using both real benchmark task maps and randomly generated task maps. Experimental results demonstrate that the ET algorithm consistently outperforms the compared algorithms across all performance metrics, thereby confirming its superiority in addressing the NoC task mapping problem.

Keywords:

network-on-chip; task mapping; communication cost; metaheuristic algorithm

1. Introduction

With the continuous advancement of hardware process technology and the increasing core integration in System-on-Chip (SoC) devices, traditional bus-based communication methods have increasingly revealed issues such as limited scalability and low resource utilization. To address these challenges, Network-on-Chip (NoC) has emerged as a key infrastructure for internal interconnection within SoCs. NoC provides an efficient communication infrastructure among multiple processing units, memory components, and other intellectual property (IP) cores. In this context, an IP core refers to a reusable software or hardware functional module or logic unit, commonly employed in SoC design to facilitate the standardization and reuse of design components. As a result, NoC has emerged as a critical approach for enhancing the performance of SoC architectures. In NoC research, the task mapping problem has become a prominent research topic due to its significant impact on system performance. The objective of this problem is to rationally allocate IP cores to network nodes with known communication graphs, network topology, and IP core libraries, while positioning IP cores with frequent communication or high data exchange as closely as possible. This optimization improves key performance metrics such as communication cost, energy consumption, and latency.

Among various NoC topologies, the two-dimensional mesh topology (2D-mesh) is the most widely used in current research, owing to its diverse routing algorithms and strong scalability. This paper adopts the 2D-mesh as the network structure and focuses on the task mapping problem within NoC. Several methods have been proposed under this topology to reduce communication costs and enhance system performance.

As the number of cores increases, the size of the search space for task mapping grows exponentially, making it a typical NP-hard problem. To address this challenge, current research primarily employs two types of methods: one is the exact solution strategy based on mathematical programming, which is able to obtain the optimal solution but has extremely high computational complexity, making it applicable only to small-scale problems; the other is the heuristic algorithm based on intelligent search, which can typically provide a good solution within a reasonable timeframe, although it is difficult to guarantee global optimality. As a key branch of heuristic algorithms, metaheuristic algorithms simulate natural phenomena or group intelligence behaviors and efficiently search for approximate optimal solutions by guiding the search process. These algorithms are not dependent on the specific structure of the problem, making them both universal and flexible. Successful metaheuristic algorithms must establish an effective balance between “exploration” and “exploitation”. The former expands the search space to avoid local optima, while the latter performs in-depth optimization based on the current solution. This balance mechanism directly impacts the algorithm’s performance: excessive exploration reduces convergence speed, whereas excessive exploitation can lead to premature convergence to local optima. The core distinction between metaheuristic algorithms lies in the design of this balance strategy.

Various metaheuristic algorithms have been applied to NoC task mapping research, including genetic algorithm (GA), simulated annealing (SA) [1], and particle swarm optimization (PSO) [2], among others. While these algorithms have achieved some progress, they still exhibit the following limitations: some algorithms suffer from high computational overhead due to their complexity, while others are prone to local optima in certain scenarios. Therefore, selecting an appropriate algorithm and balancing computational complexity with optimization performance are critical for effectively solving NoC mapping problems.

Building on the research background outlined above, this paper focuses on the Coati Optimization Algorithm (COA) [3], proposed in 2022. This algorithm simulates the two-stage foraging behavior of coatis in nature, involving prey capture and predator evasion, which correspond to the global exploration and local exploitation phases in the algorithm. COA effectively balances global search and local optimization without the need for complex parameter adjustments. Building upon this framework, the paper introduces an improved version tailored for NoC mapping problems, referred to as the ET algorithm.

The ET algorithm incorporates an incremental hill-climbing strategy, which significantly enhances the optimization capability during the local exploitation phase. Particularly when the number of iterations is limited, the ET algorithm can escape local optima more efficiently and conduct a more refined search in the solution’s neighborhood, thereby accelerating convergence toward the global optimum. Additionally, the ET algorithm features a dynamic adjustment mechanism for the exploration–exploitation ratio, further improving search efficiency. While avoiding premature convergence, it also maintains a strong global search capability. This algorithm not only retains the simplicity and bio-inspired characteristics of COA in parameter design but also tailors the approach to the specific requirements of the NoC mapping problem for targeted optimization.

To enhance the quality of the initial population, this paper proposes an initial mapping strategy based on spectral clustering. The strategy accurately captures the communication intensity relationships between tasks by constructing an inter-task bandwidth affinity matrix and evaluating clustering quality using profile coefficients. This approach dynamically determines an optimal number of clusters, effectively grouping communication-intensive tasks. The resulting high-quality initial mapping population provides a solid foundation for subsequent optimization.

Experimental results demonstrate that, across multiple real-world benchmark applications, the ET algorithm achieves the best communication cost with fewer iterations than many existing algorithms. In tests using randomly generated task graphs, the ET algorithm also performs effectively, outperforming existing comparison algorithms in terms of communication energy consumption, average network latency, and communication overhead. The main contributions of this paper are as follows:

An initial mapping strategy based on spectral clustering is designed, providing a solid foundation for subsequent algorithms.
The ET algorithm is proposed, which effectively enhances both local optimization and global search efficiency through the introduction of an incremental hill-climbing strategy and a dynamic search mechanism.
The ET algorithm is comprehensively evaluated using real-world benchmark task graphs and randomly generated task graphs, demonstrating superior performance compared to existing mapping algorithms across multiple performance metrics.

This paper is organized as follows: Section 2 reviews related work; Section 3 defines key terms and presents the optimization model; Section 4 provides a detailed discussion of the proposed mapping technique, including the initial mapping strategy and the ET algorithm; Section 5 analyzes the experimental results; and Section 6 concludes the paper and outlines directions for future work.

2. Related Work

The primary objective of NoC mapping algorithms is to optimize key metrics, such as communication cost, energy consumption, and latency, in order to improve the overall system performance. Existing task mapping algorithms can be broadly classified into three categories: exact methods, systematic search methods, and heuristic search methods, each offering distinct advantages depending on the applicable scenarios and performance requirements.

Exact methods solve the optimal mapping scheme through mathematical modeling, with representative approaches including integer linear programming (ILP) and dynamic programming (DP). Although the ILP method theoretically guarantees optimal solutions, its computational complexity grows exponentially with problem size, severely limiting its practical applicability. Research by Tosun et al. [4] demonstrates that ILP methods can obtain optimal solutions for graphs with 25 or fewer nodes; however, for graphs containing 36 or more nodes, no solutions are found within 8 h. This limitation stems not from inadequate hardware performance but from the NP-hard nature of the ILP algorithm itself. Consequently, the development of efficient heuristic and metaheuristic algorithms is of considerable practical importance.

System search methods aim to balance solution quality and search efficiency through regularized traversal strategies, such as branch and bound and dynamic pruning. Although traditional methods, such as branch and bound, perform well in small- to medium-scale scenarios, they encounter efficiency bottlenecks when applied to large-scale task graphs. To enhance the intelligence of the algorithm, researchers have recently incorporated data-driven techniques to assist in search path optimization. Choudhary et al. [5] proposed a machine learning-based system mapping framework, termed FANC. By leveraging historical mapping data to guide the optimization process, the framework achieved an average reduction in communication cost of 266% compared to the baseline. (The percentage exceeds 100% due to the baseline method incurring significantly higher communication costs than FANC). Despite this improvement, the approach is heavily dependent on large volumes of historical data for training and suffers from substantial training overhead. Weng et al. [6] proposed a method to reduce communication latency and energy consumption by constructing symmetry-free, low-complexity NoC mapping datasets. These datasets incorporate data augmentation and multi-label machine learning models to predict optimal mapping sequences. Experimental results show that the model achieves at least 99.6% accuracy on the dataset, with an average mapping accuracy of 96.3%. However, the offline training mechanism may limit its applicability in real-time scenarios.

Heuristic methods quickly approach suboptimal solutions by designing empirical rules or bio-inspired strategies, offering good scalability. Specifically, these methods can be categorized into traditional heuristic methods, metaheuristic methods, and hybrid enhancement strategies. Traditional methods focus on rule design based on problem characteristics. Tang and Kumar [7] proposed a two-stage genetic algorithm. In the first stage, a coarse-grained average edge delay model is used to optimize the mapping of task graph vertices to IP types, while the second stage employs a fine-grained exact edge delay model to map vertices to specific nodes. This approach reduces the search space by decoupling the optimizations for vertex delay and edge delay, significantly improving convergence speed. Tosun et al. [1] proposed SA- and GA-based mapping algorithms for mesh-based NoCs and compared their performance with ILP, CastNet, and random methods. Experimental results showed that both SA and GA achieve near-optimal energy consumption for multimedia benchmarks. However, GA requires significantly more computation time than SA. For small-scale problems, ILP is preferred for optimality, while SA offers a better trade-off between accuracy and runtime for large-scale systems. Although CastNet is faster, its performance degrades as the graph size increases. The study highlights the need to balance accuracy and computational complexity based on problem scale and time constraints. The improved discrete particle swarm optimization method proposed by Sahu et al. [2] significantly enhances the mapping accuracy of small-scale task graphs through multi-particle swarm parallelism and a deterministic initial population strategy. For large-scale task mappings, although their method outperforms existing techniques in terms of communication cost, latency, and energy consumption, its computational complexity increases significantly with problem size, which may limit its real-time performance in large-scale scenarios.

To enhance global search capabilities, bio-inspired strategies have gradually become a prominent research focus. Aravindhan et al. [8] proposed an improved bat algorithm (MBA), which integrates a clustering mechanism and introduces parallel computing to improve search efficiency. Aravindhan et al. [9] also introduced an adaptive chicken swarm optimization algorithm (SCSO) for two-dimensional and three-dimensional NoC mapping problems, combined with a shared K-nearest neighbor clustering method for cognitive modeling, thereby reducing power consumption in NoC and achieving more efficient mapping. Saleha et al. [10] proposed a task mapping method based on the sailfish optimization algorithm (SFOA), which minimizes power consumption through experience-driven optimization and the introduction of the K-nearest neighbor clustering method. Boroumand et al. [11] developed an improved hybrid frog leaping algorithm, tailored to the two-dimensional torus network structure, which significantly reduces communication cost and energy consumption by optimizing the adjacency strategy of high-communication tasks. Mehmood et al. [12] applied the Andean Condor Algorithm (ACA) with a clustering-based initialization technique to enhance the balance between exploration and exploitation in task mapping. This approach dynamically adjusts the search strategy based on population fitness, resulting in significant improvements in communication cost, latency, and energy consumption. Amin et al. [13] designed the Grey Wolf optimization framework, GNoC, which reduces power consumption by an average of 19.7%, improves energy efficiency by 17%, and lowers computational overhead by 40% through cluster initialization and polynomial regression methods.

The hybrid strategy further improves the mapping effect by combining multiple algorithms. The iHPSA algorithm, proposed by Amin et al. [14], integrates and enhances both the PSO and SA algorithms, using the Elbow method to adaptively determine the number of K-means clusters, effectively optimizing communication cost and latency. Mohiz et al. [15] employed a greedy algorithm to pre-allocate high-communication tasks and utilized the cuckoo search (CSO) algorithm combined with Lévy flight to accelerate convergence, effectively reducing communication delay. The hybrid IWOA-IGA algorithm, proposed by Saleem et al. [16], combines the Improved Whale Optimization Algorithm (IWOA) with the Enhanced Genetic Algorithm (IGA), significantly outperforming conventional methods in terms of communication cost, power consumption, and latency by optimizing the initial mapping, directional crossover, and hybrid iterative strategies.

3. Problem Definition

Tasks or IP cores within a chip often require frequent data exchange. If the mapping results in a high volume of long-distance communication, data transmission delays may increase, thereby reducing the overall operational efficiency of the SoC. Therefore, this study focuses on effectively reducing communication cost, average network latency, and energy consumption during the IP core mapping process. This section first defines the relevant terms, then proposes a modeling methodology for energy consumption and latency, and concludes by constructing an optimization model.

3.1. Definition of Terms

To clearly describe the IP core mapping problem, the following terms are defined in this paper [17,18,19]:

Definition 1.

Communication Graph (CG): The IP communication graph is represented as a directed graph

G (C, A)

, where:

C = {c_{1}, c_{2}, c_{3}, \dots, c_{n}}

represents the set of IP cores, A represents the set of communication edges, each directed edge

a_{i, j} \in A

represents the transmission of data from IP core

c_{j}

to IP core

c_{i}

, and the edge weight

w_{i, j}

represents the communication bandwidth requirement from

c_{j}

to

c_{i}

.

Definition 2.

Topology Graph (TG): The NoC topology graph is represented as a directed graph

T (R, L)

, where:

R = {r_{1}, r_{2}, r_{3}, \dots, r_{n}}

represents the set of NoC nodes, L represents the set of physical links, and each link

l_{i, j} \in L

represents the physical connection from node

r_{j}

to node

r_{i}

.

Definition 3.

IP Mapping Problem: The IP mapping problem aims to determine a mapping function map() that assigns each IP core

c_{i}

to a NoC node

r_{k}

, ensuring that

| C | \leq | R |

. When

| C | < | R |

, dummy nodes are introduced to fill the gap, such that

| C | = | R | = n

, thereby ensuring a complete match.

Definition 4.

Communication Cost (CC): The communication cost is defined as the total number of bits transmitted per second in the NoC [20], and is calculated as follows:

\begin{matrix} C C = \sum_{i = 1}^{n} \sum_{j = 1}^{n} w_{i, j} \times H_{m a p (c_{i}), m a p (c_{j})} \end{matrix}

(1)

Here,

H_{m a p (c_{i}), m a p (c_{j})}

represents the hop distance from IP core

c_{j}

to

c_{i}

, which quantifies the length of the communication path. It is defined as the Manhattan distance between the two IP cores on the grid. The calculation formula is as follows:

\begin{matrix} H_{m a p (c_{i}), m a p (c_{j})} = | x_{i} - x_{j} | + | y_{i} - y_{j} | \end{matrix}

(2)

Here,

(x_{i}, y_{i})

and

(x_{j}, y_{j})

represent the two-dimensional coordinates of IP cores

c_{i}

and

c_{j}

, respectively. The number of hops between them is determined by the routing algorithm. This study employs the XY routing algorithm [21], a static and deadlock-free scheme widely used in 2D-mesh networks. Its fundamental principle involves transmitting data packets first along the x-axis (horizontal direction) until the destination column is reached, followed by transmission along the y-axis (vertical direction) until the destination row is reached. As a result, the message path forms an “L” shape. The XY routing algorithm is selected in this study due to its deterministic path planning, deadlock-free nature, ease of implementation, and widespread adoption in 2D-mesh networks, all of which contribute to the simplicity and reproducibility of the experiments.

3.2. Energy Model

This paper employs the energy consumption model widely used in the literature [19,22]. The energy consumed to transmit a single bit from IP core

c_{j}

to

c_{i}

is:

\begin{matrix} E_{i, j}^{b i t} = H_{m a p (c_{i}), m a p (c_{j})} \times E_{L}^{b i t} + (H_{m a p (c_{i}), m a p (c_{j})} + 1) \times E_{S}^{b i t} \end{matrix}

(3)

Here,

E_{S}^{b i t}

represents the energy consumed by data passing through a switch node, and

E_{L}^{b i t}

represents the energy consumed by data passing through the link [23]. The overall NoC energy consumption is then calculated as follows:

\begin{matrix} E_{t o t a l} = \sum_{i = 1}^{n} \sum_{j = 1}^{n} w_{i, j} \times E_{i, j}^{b i t} \end{matrix}

(4)

This can be expanded further as:

\begin{matrix} E_{t o t a l} = E_{L}^{b i t} \times C C + E_{S}^{b i t} \times C C + E_{S}^{b i t} \times \sum_{i = 1}^{n} \sum_{j = 1}^{n} w_{i, j} \end{matrix}

(5)

Among these,

\sum_{i = 1}^{n} \sum_{j = 1}^{n} w_{i, j}

represents the total bandwidth requirement, which is a constant inherent to the application itself. Therefore, the energy consumption is linearly related to the communication cost.

3.3. Latency Model

The average network latency

T_{a v}

is calculated as follows [24]:

\begin{matrix} T_{a v} = \frac{\sum_{i = 1}^{n} \sum_{j = 1}^{n} T_{i, j}}{\sum_{i = 1}^{n} \sum_{j = 1}^{n} λ_{i, j}} \end{matrix}

(6)

Among these,

λ_{i, j}

represents the number of flits (where a flit is a transmission unit in NoC) required to transmit

w_{i, j}

bits from IP core

c_{j}

to IP core

c_{i}

, and

T_{i, j}

represents the latency required to transmit

λ_{i, j}

flits from IP core

c_{j}

to IP core

c_{i}

.

T_{i, j}

can be approximated as follows:

\begin{matrix} T_{i, j} = \frac{w_{i, j}}{F} \times H_{m a p (c_{i}), m a p (c_{j})} \times D_{L}^{f l i t} + \frac{w_{i, j}}{F} \times (H_{m a p (c_{i}), m a p (c_{j})} + 1) \times (D_{S}^{f l i t} + D_{c o n}) \end{matrix}

(7)

Among these,

D_{S}^{f l i t}

represents the latency of the flit passing through the switching node,

D_{L}^{f l i t}

represents the latency of the flit passing over the link,

D_{c o n}

represents the average network latency caused by network congestion, and F represents a constant related to the packet size and the switching method. The total communication latency can be expressed as follows:

\begin{matrix} \sum_{i = 1}^{n} \sum_{j = 1}^{n} T_{i, j} = C C \times \frac{D_{L}^{f l i t} + D_{S}^{f l i t} + D_{c o n}}{F} + \sum_{i = 1}^{n} \sum_{j = 1}^{n} w_{i, j} \times \frac{D_{S}^{f l i t} + D_{c o n}}{F} \end{matrix}

(8)

This formula demonstrates that the total communication latency is linearly related to the communication cost. Consequently, the average network latency is also linearly related to the communication cost.

3.4. Optimization Model

From the energy consumption and latency models described above, it can be observed that both energy consumption and average network latency are linearly related to the

C C

.

C C

is defined as the weighted sum of the bandwidth requirements and the number of hops between IP cores (refer to Section 3.1 for the definition and computation of hop count). Thus, this study indirectly optimizes both energy consumption and latency by minimizing the

C C

[25]. The optimization model for this study is expressed as follows:

\begin{matrix} m i n \sum_{i = 1}^{n} \sum_{j = 1}^{n} w_{i, j} \times H_{m a p (c_{i}), m a p (c_{j})} \end{matrix}

(9)

Constraints are as follows:

\begin{matrix} \forall c_{i} \in C, m a p (c_{i}) = r_{k} \in R \end{matrix}

(10)

\begin{matrix} \forall c_{j} \in C, m a p (c_{j}) = r_{l} \in R \end{matrix}

(11)

\begin{matrix} \forall c_{i} \neq c_{j}, r_{k} \neq r_{l} \end{matrix}

(12)

\begin{matrix} | C | = | R | = n \end{matrix}

(13)

Equation (9) represents the objective of minimizing the communication cost, while the constraints ensure a one-to-one mapping between the IP cores and the NoC nodes. The mapping function,

map ()

, can be derived by solving these constraints.

4. Mapping Based on ET Algorithm

This section provides a comprehensive discussion of the initial mapping strategy based on spectral clustering and the mapping algorithm developed using the COA approach. An efficient metaheuristic mapping algorithm, named ET, is introduced for bandwidth-constrained conditions. The ET algorithm is designed to minimize communication overhead while reducing the number of iterations in a two-dimensional grid NoC architecture. This method effectively avoids premature convergence and stagnation near the global optimum, thereby accelerating the optimization process. Figure 1 illustrates the overall framework of the proposed method.

4.1. Initialization Strategy Based on Spectral Clustering

To improve the quality of the initial solution in NoC task mapping, this paper proposes an initialization strategy based on spectral clustering. The method generates high-quality initial mapping solutions by leveraging the communication bandwidth and structural features between tasks, providing a solid starting point for subsequent optimization algorithms.

In this study, the XY routing strategy is adopted, which maintains symmetry in the hop count of paths. To simplify the calculation, the directed edge weights of the IP communication graph are converted into undirected edge weights. Based on the bandwidth information between task pairs in the communication graph, a bandwidth matrix is constructed to quantify the communication intensity between tasks. To further capture the structural similarity between tasks, the Shared k-Nearest Neighbor method is introduced. This method counts the number of shared neighbors between task pairs and constructs an affinity matrix reflecting their structural intimacy. In this process, the structural relationships between neighboring nodes are also considered, with a weighted addition strategy applied to the number of shared neighbors and their positions. The resulting affinity matrix is then normalized using maximum normalization. Once the affinity matrix is obtained, the spectral clustering algorithm is used to partition the task set.

To avoid an unreasonable number of clusters, a dynamically adjustable constraint range

[k_{m i n}, k_{m a x}]

is imposed on the number of clusters k. Specifically, the minimum number of clusters is fixed at

k_{m i n} = 2

, ensuring that at least two clusters are formed and thereby preserving the semantic meaning of clustering. The maximum number of clusters is dynamically defined as

k_{m a x} = min (8, ⌊\frac{n}{2}⌋)

, which both limits the computational complexity of spectral clustering and guarantees that each cluster contains at least two tasks. This prevents overly fine-grained clustering and fragmentation. Within this range, the optimal number of clusters is evaluated based on silhouette coefficients. Specifically, the silhouette coefficient measures the balance between intra-cluster similarity and inter-cluster similarity, and its calculation formula is as follows:

\begin{matrix} S (k) = \frac{1}{N} \sum_{i = 1}^{N} \frac{b (i) - a (i)}{m a x {a (i), b (i)}} \end{matrix}

(14)

Among these,

a (i)

denotes the average similarity of task i within the current cluster, while

b (i)

represents the average similarity of task i with its nearest neighbor clusters. The optimal number of clusters,

k^{*}

, is determined by maximizing the profile coefficient

S (k)

, which is then selected as the final number of clusters. This dynamic adjustment mechanism ensures that the initial clustering aligns with the task scale, thereby avoiding poor adaptability and the risk of overfitting that may arise from manually specifying a fixed number of clusters.

Given that the maximum task scale considered in this study is

n = 64

, the setting range of

k_{m i n}

and

k_{m a x}

is relatively narrow and empirically grounded. This dynamic adjustment range is sufficient to support effective clustering initialization in practical implementations. Even if the initial clustering is not globally optimal (e.g., when the optimal number of clusters falls outside the predefined range), the proposed ET optimization algorithm is capable of compensating for suboptimal initial mappings through its strong global exploration and local exploitation capabilities during subsequent iterations. Comprehensive experimental results demonstrate that the proposed method consistently outperforms baseline algorithms across several performance metrics, including communication cost, energy consumption, and latency. These findings validate the effectiveness and robustness of the proposed methodological framework.

Subsequently, based on the clustering results, the task with the highest total communication bandwidth within each cluster is selected as the core task. This task is prioritized for placement at the center region of the NoC topology. The remaining tasks are then mapped outward, extending to the periphery of the topology through a breadth-first search, based on the original order of the clusters. Positions adjacent to the core tasks are filled first, in order to minimize communication distances and reduce the communication overhead caused by inter-cluster data exchanges. The process of the initial mapping is illustrated in Algorithm 1.

Algorithm 1: Initial Mapping

4.2. Mapping Optimization Using ET

COA is a metaheuristic algorithm inspired by the biological behaviors of coatis in their natural environment, specifically their behavior when hunting iguanas and avoiding predators. These behaviors correspond to the exploration and exploitation phases of the algorithm, respectively. During the exploration phase, COA performs a global search by simulating the cooperative hunting behavior of coati groups targeting iguanas. In the exploitation phase, individual coatis rapidly flee to known safe areas and make fine adjustments. Through this behavioral pattern, COA achieves both global exploration and a local search within the solution space of the optimization problem.

After generating the initial population and achieving a certain level of solution quality, this study, while preserving the original advantages of COA, proposes an improved version for the NoC mapping problem: the ET algorithm. To enhance the efficiency and scope of a global search, the ET algorithm introduces a dynamic and adaptive global search strategy through two behavioral models: tree pursuit and ground encirclement. The tree pursuit strategy simulates the behavior of coati individuals climbing trees and quickly approaching iguanas, while the ground encirclement strategy simulates the group’s collaborative process of ambushing iguanas on the ground. The allocation ratio between the two strategies is controlled by the dynamic phase ratio, with its calculation formula given by:

\begin{matrix} p h a s e R a t i o (t) = m a x (0.3, 0.9 - 0.6 \cdot \frac{t}{T_{m a x}}) \end{matrix}

(15)

In this context, t represents the current generation, and

T_{m a x}

is the maximum number of iterations. This ratio enables the algorithm to initially assign 90% of the individuals to the tree pursuit strategy, leveraging information from high-quality individuals to accelerate convergence toward the optimal solution and enhance global search capability. As the iteration progresses, the proportion gradually decreases to 30%, with the remaining individuals transitioning to the ground encirclement strategy. This shift improves population diversity and strengthens local exploitation, thereby reducing the risk of premature convergence to local optima. The change rate is set to 0.6, which governs the smooth transition from exploration to exploitation. This parameter was determined empirically through multiple rounds of experimental tuning.

In each iteration, the ET algorithm selects the first

⌊ N \cdot p h a s e R a t i o (t) ⌋

individuals to perform the tree pursuit. This is carried out by randomly selecting one to three task positions to swap, replacing the original individual if the new solution is better, and updating

M_{b e s t}

. The formula is:

\begin{matrix} M_{i}^{n e w} \leftarrow S w a p (M_{i}, M_{b e s t}, k_{s w a p}), k_{s w a p} \sim U {1, 3} \end{matrix}

(16)

Compared to the tree pursuit strategy, the ground encirclement strategy emphasizes diversity and exploration within the search space. This strategy simulates a coati’s ambush on the ground, waiting for the iguana to jump after the raid. Its primary purpose is to expand the search boundary and increase the coverage of the solution space. In this phase, for the remaining

N - ⌊ N \cdot p h a s e R a t i o (t) ⌋

individuals, the ET algorithm introduces a diversified search driven by a random reference solution

M_{i g u a n a}

.

M_{i g u a n a}

effectively enhances population diversity and assists the algorithm in escaping local optima. Compared with traditional fixed local perturbation strategies,

M_{i g u a n a}

introduces greater stochasticity and feasibility guarantees, transforming the ground encirclement strategy from a rigid local search into a more adaptive and exploratory mechanism. For each individual

M_{i}

, a randomized legitimate solution

M_{i g u a n a}

is generated as a temporary reference target. If the fitness of

M_{i g u a n a}

is better than that of the current individual, the alignment operation is performed; otherwise, the discretization perturbation operation is applied. The formula is as follows:

\begin{matrix} M_{i}^{n e w} = \{\begin{matrix} A l i g n (M_{i}, M_{i g u a n a}), & i f C (M_{i g u a n a}) < C (M_{i}) \\ D i v e r g e (M_{i}, M_{i g u a n a}), & o t h e r w i s e \end{matrix} \end{matrix}

(17)

The alignment operation identifies the set of positions where the task mappings of the current solution differ from those in

M_{i g u a n a}

. These positions represent potential improvement points in the current individual. Up to 25% of these positions are randomly selected. For each selected position, the algorithm locates the corresponding task in the current solution based on the task at that position in the reference solution, and then swaps the two tasks in the current solution. This exchange ensures that the local structure of the current solution becomes more similar to the reference, without simply replacing tasks. Instead, it subtly adjusts the local task distribution while preserving the global structure of the individual. This mechanism helps the algorithm avoid premature convergence near a single optimal solution.

In contrast, the differential perturbation operation is employed when the reference solution is not necessarily superior to the current individual. Its purpose is to enhance diversity and expand the search space. This operation begins by detecting the positions where the current and reference solutions share the same task mappings. No more than 25% of these positions are randomly selected to define the perturbation range. For the tasks at these positions, the algorithm identifies all idle cores in the current solution and randomly exchanges the selected tasks with blank positions on these idle cores. This operation disrupts structural similarity between solutions, promotes diversity, and enables more flexible task migration. By leveraging idle areas in the core set, the algorithm enhances its exploratory potential.

Although the ET algorithm demonstrates strong global exploration capabilities, improving local accuracy remains crucial for reducing communication costs in the NoC mapping problem. To address this, the ET algorithm first performs a rapid local perturbation on the individual, simulating the quick, small-scale position adjustments made by coatis when avoiding predators. This strategy controls the perturbation intensity through random task exchanges, which not only enhances the global jumping ability in the early stages but also prevents excessive deviation from the potentially optimal region in the later stages, thereby balancing the breadth and accuracy of the search.

Subsequently, the ET algorithm introduces a hill-climbing optimization strategy based on local perturbation and incremental evaluation, in contrast to the traditional hill-climbing strategy, which relies on violent perturbations and global cost re-evaluation. Specifically, in each round of local optimization, the ET algorithm randomly selects 50% of the mapped tasks from the current solution to form a candidate task set

T_{c a n d}

, serving as potential sources for migration. Simultaneously, 90% of the available positions among the current idle cores are randomly selected to form a candidate position set

P_{c a n d}

, representing potential mapping targets. This strategy maintains perturbation diversity while avoiding the high computational cost of global traversal, thereby enabling controllable local perturbation intensity.

To efficiently support the rapid evaluation of a large number of perturbation candidate pairs, the ET algorithm internally constructs two hash-based mapping structures with constant time complexity: a task-to-location index table and a location-to-task reverse lookup table. These two structures are kept synchronized throughout the state update process, ensuring that operations such as task location queries, mapping modifications, and validity checks can be performed in

O (1)

time. This mechanism enables efficient high-frequency perturbation evaluation during a local search.

For each candidate perturbation pair

(t, p_{n e w})

, the algorithm attempts to migrate task t from its current mapping position

P_{o l d}

to the new position

P_{n e w}

, and evaluates its impact on the overall communication cost using an incremental computation method. The change in communication cost,

Δ C

, is determined by the edge weights

w_{t t^{'}}

between task t and its communication neighbors, as well as the change in hop distances. This relationship can be expressed as:

\begin{matrix} Δ C = \sum_{t^{'} \in N (t)} w_{t t^{'}} \cdot (d (p_{n e w}, p_{t^{'}}) - d (p_{o l d}, p_{t^{'}})) \end{matrix}

(18)

Here,

N (t)

represents the set of direct communication neighbors of task t. The primary advantage of this incremental evaluation mechanism is that it requires computing only the local path changes associated with the edges affected by the migration. This significantly reduces unnecessary global recalculations while preserving computational accuracy.

To avoid ineffective fluctuations caused by local disturbances, the ET algorithm incorporates multiple control mechanisms to enhance the convergence and stability of the hill-climbing phase. First, in terms of acceptance criteria, only perturbations that yield a reduction in communication cost greater than 1% are accepted. This prevents frequent switching and unnecessary computational overhead due to negligible improvements. Second, with respect to iteration rounds, the algorithm performs a maximum of two hill-climbing iterations by default. If no effective improvement is observed in consecutive rounds, the process is terminated early to avoid redundant oscillations. These strategies reflect the algorithm’s emphasis on robustness and computational efficiency. Figure 2 illustrates the overall framework of the ET algorithm and highlights the implementation details of the alignment operation, perturbation mechanism, and hill-climbing optimization strategy. Algorithm 2 illustrates the process of ET algorithm mapping.

Algorithm 2: ET Algorithm

4.3. Parameter Settings and Complexity Analysis

4.3.1. Main Parameter Settings

The main control parameters and their configurations for this study are presented in Table 1.

4.3.2. Complexity Analysis

Considering a scenario where the number of cores is n, the computational complexity of feature decomposition in spectral clustering is approximately

O (n^{3})

. Therefore, the time complexity of the initialization phase is

O (n^{3} + s \cdot m^{2})

, where m is the number of tiles, and s is the number of initial solutions. The corresponding space complexity for this phase is

O (n^{2} + s \cdot m)

.

In the hill-climbing strategy component of the ET optimization algorithm, the time complexity is

O (m \cdot e)

, where e denotes the number of communication edges. For the overall optimization process, taking into account the number of iterations g and the population size p, the total time complexity is

O (g \cdot p \cdot m \cdot e)

, and the space complexity is

O (p \cdot m + e)

.

5. Simulation Results

This section presents the simulation results of the ET algorithm, which are analyzed and compared with existing mapping methods.

5.1. Simulation Settings and Scenarios

To perform a comparative analysis of different mapping algorithms, this study utilizes the NoCTweak simulator [26]. NoCTweak is a simulation tool designed for NoC research, enabling the evaluation and optimization of communication architecture performance in multi-core processors or SoCs. It allows researchers to flexibly configure NoC parameters and analyze the impact of various designs on key metrics such as latency, throughput, and power consumption through simulation. The ET algorithm proposed in this study is implemented in a Python (version 3.12) environment, where it generates a task mapping scheme. This mapping scheme is then converted into the app format supported by the NoCTweak simulator and passed to the simulator for performance evaluation. During the simulation phase, a Python script invokes the NoCTweak executable by specifying the path to the mapping file. This triggers the simulator to load the mapping configuration and execute the corresponding simulation tasks, ultimately producing key performance indicators that serve as the basis for subsequent performance evaluation. Table 2 presents the detailed simulation environment settings used in NoCTweak. It is important to note that NoCTweak supports a wide range of parameterized configurations for simulating various NoC design schemes [26]. In this study, only a subset of the parameters is annotated and explained. All experiments in this study were conducted on a computer system equipped with an Intel Core i7 processor, 8 GB of main memory, and a 2.6 GHz clock frequency, to map real-world benchmark applications and randomly generated task graphs based on TGFF.

5.2. Analysis and Determination of the Optimal Initial Mapping Location

To demonstrate the effectiveness of the ET algorithm for various practical applications and facilitate a comparative analysis with similar algorithms, this experiment utilizes typical benchmark applications, including PIP, VOPD [27], MPEG-4 [18], MWD, MP3encMP3dec, 263encMP3dec, 263decMP3dec, and 80211ARX. These applications have distinct task decomposition structures and varying numbers of task nodes. Table 3 presents the details of the actual benchmark applications. To illustrate the initial core selection process more specifically, a 4 × 4 2D grid is used as an example. In a 4 × 4 grid network, the first task is randomly assigned to a specific position. However, due to the asymmetry in the core locations within the grid structure, different starting points result in different communication costs. Therefore, this study systematically analyzes all grid locations by initially mapping each position as a starting point and calculating its communication cost separately. By comparing the communication costs of each starting point, the location with the lowest communication overhead is selected as the initial mapping point, which is subsequently optimized by the ET algorithm proposed in this study. Figure 3 illustrates this analysis process, where the green color represents the starting location with the optimal communication cost.

In addition to systematically analyzing the communication overhead associated with each initial position in the grid structure, this study further investigates the impact of the initialization strategy on algorithm performance by selecting the VOPD task graph as a test case. To isolate the effect of initialization, two scenarios were compared while keeping all other algorithm parameters unchanged: one using the proposed initialization strategy and the other employing random initialization. The communication cost during the optimization process was recorded across iterations. The corresponding convergence curves are presented in Figure 4, which clearly shows that the algorithm converges more rapidly when the initial mapping is applied. This result demonstrates that a high-quality initial mapping contributes positively to the convergence efficiency of the ET algorithm.

5.3. Comparative Analysis of Communication Costs

The primary objective of this study is to minimize communication cost, thereby reducing communication energy consumption and average network latency. This section provides a detailed evaluation of the communication cost, comparing results based on real-world benchmark applications and randomly generated task graphs.

5.3.1. Analysis Based on Real-World Benchmarking Applications

In this study, the proposed ET algorithm is compared with existing mapping algorithms, including GA, SA, PSO, and CASTNET, as well as with the ILP-based 2D NoC accurate mapping method. To ensure fair comparison, all algorithms were configured with the same population size and maximum number of iterations as the stopping criterion. The parameters of the comparison algorithm are set as follows: the initial temperature of the SA algorithm is set to 100, controlling the temperature decay rate and the solution acceptance probability; the crossover rate of the GA algorithm is set to 0.6 to promote population diversity; the inertia weight

ω

in the PSO algorithm is set to 0.3 to balance global exploration and local exploitation; and the number of independent mapping attempts in the CASTNET algorithm is dynamically adjusted based on the size of the task graph. It should be noted that the optimal values of algorithm parameters are often closely related to the size and characteristics of the task graph. Therefore, parameter configurations may need to be tuned for different mapping scenarios to achieve optimal performance. While the parameter settings used in this study perform well under the current experimental conditions, their applicability and effectiveness may vary across different task graphs or application scenarios, and further adjustment may be necessary depending on the specific context.

The ILP provides the optimal solution for communication cost estimation [28]. Table 4 compares the communication costs of the ET algorithm and five comparison algorithms across eight real-world benchmark applications. For the 80211ARX task graph containing 24 nodes, when mapped onto a 5 × 5 grid, the theoretical search space comprises

25! / (25 - 24)! \approx 1.5 \times 10^{25}

possible combinations. This enormous search space prevents ILP from finding the optimal solution within a reasonable timeframe. Consequently, optimal communication cost data could not be obtained for comparison, and ILP comparison results for the 80211ARX task are omitted from the table. However, results from the remaining benchmark tests demonstrate that the ET algorithm achieves optimal communication costs for the real-world benchmark applications examined.

To verify the stability and reliability of the algorithmic results, this study conducted 20 independent repeated experiments for each algorithm and reported the results in the form of “mean ± standard deviation.” This approach provides a more comprehensive evaluation of each algorithm’s performance in task mapping while ensuring the robustness of the results. Since the ILP algorithm is deterministic, its output is unique and stable; therefore, repeated runs were not necessary, and standard deviation values are not reported. The CASTNet algorithm adopts a static priority-based mapping strategy in this implementation, resulting in fixed outputs without variability, and thus no standard deviation is shown. As shown in Table 4, the standard deviation of the ET algorithm is generally smaller than those of the other heuristic algorithms. This indicates that the ET algorithm not only achieves the lowest communication cost for actual benchmark applications but also exhibits minimal performance fluctuations across runs, reflecting high stability and repeatability.

This section also analyzes the number of iterations required for the four algorithms to reach convergence on six real-world benchmark applications (VOPD, MPEG-4, MWD, MP3encMP3dec, 263encMP3dec, 263decMP3dec). The experimental results, as illustrated in Figure 5, show that the ET algorithm requires significantly fewer iterations to converge compared to GA, SA, and PSO across six real benchmark applications. For instance, in the VOPD benchmark, the ET algorithm achieved an optimal communication cost within approximately 50 iterations, whereas the other algorithms required more than 200 iterations to reach comparable performance. These findings clearly demonstrate the superior convergence speed of the ET algorithm.

5.3.2. Analysis of Randomly Generated Task Graphs Based on TGFF

In this study, the TGFF (Task Graphs For Free) tool [29] is used to randomly synthesize task graphs for evaluating the performance of the ET algorithm at different grid sizes. TGFF is widely used in academic research for task graph generation due to its simplicity and flexibility. By assuming that each task is mapped to a single core, the generated task graph represents a one-to-one correspondence between tasks and cores, making it suitable for use as a core graph. To meet the requirements of this study, the TGFF tool was modified to generate task graphs for 25, 32, and 64 cores. The task graph for 25 cores is mapped to a 5 × 5 grid, the task graph for 32 cores to a 6 × 6 grid, and the task graph for 64 cores to an 8 × 8 2D grid architecture. A unified naming convention is adopted in this study for randomly generated task graphs. Specifically, G25_1 and G25_2 denote two task graphs, each containing 25 cores; G32_3 and G32_4 represent task graphs with 32 cores; and G64_5 and G64_6 correspond to task graphs with 64 cores. This naming convention is consistently applied in the subsequent figures (Figure 6, Figure 7 and Figure 8).

Since the randomly generated task graphs cannot provide the exact optimal communication cost, this study defines an evaluation criterion for these task graphs by fixing the number of iterations of the algorithm. The proposed ET algorithm is compared and analyzed with the PSO, SA, GA, and CASTNET algorithms. The communication cost comparison, presented in Table 5 and Figure 6, demonstrates that the ET algorithm outperforms the other algorithms across different core sizes.

For 25 cores, the ET algorithm reduces the communication cost by 2.57% compared to GA, 5.51% compared to SA, 11.25% compared to PSO, and 11.22% compared to CASTNET. For 32 cores, the ET algorithm achieves a 6% reduction compared to GA, 7.41% compared to SA, 25.54% compared to PSO, and 4.48% compared to CASTNET. With 64 cores, the ET algorithm saves 7.62% compared to GA, 2.33% compared to SA, 41.11% compared to PSO, and 4.33% compared to CASTNET.

Overall, the ET algorithm demonstrates particularly significant savings in communication cost when compared to PSO, especially as the number of cores increases, achieving a reduction of 41.11%. Even with a smaller number of cores, the ET algorithm consistently shows a clear advantage in communication cost over most algorithms, particularly in comparison to PSO and CASTNET.

5.4. Performance Comparison in Energy Consumption Optimization

This section analyzes and compares the results of energy consumption. Based on the mapping schemes obtained from different algorithms, the corresponding communication energy consumption is calculated. The ET algorithm reduces energy consumption by 5.14%, 4.83%, 25.09%, and 6.36% compared to GA, SA, PSO, and CASTNET, respectively. Overall, these results demonstrate that the proposed mapping technique performs well in terms of energy efficiency. Figure 7 presents the energy consumption of the different algorithms.

5.5. Performance Comparison in Latency Optimization

This section analyzes and compares the results of average network latency. Latency, defined as the time required for a packet to travel from the source node to the destination node, is a crucial metric for evaluating mapping algorithms. In this study, the NoCTweak simulator [26] is used to simulate the average network latency under different mapping schemes. The evaluation results indicate that the ET algorithm reduces latency by 2.14% compared to GA, 1.71% compared to SA, 12.13% compared to PSO, and 2.54% compared to CASTNET. Overall, these results demonstrate that the proposed mapping technique performs effectively in latency optimization. The comparison results are presented in Figure 8.

5.6. Parameter Sensitivity Analysis

To evaluate the stability of the algorithm under critical parameter configurations, this study conducted sensitivity analyses on two key parameters: the number of hill-climbing rounds and the number of clustering neighbors. During the experiments, all other parameters were held constant while the target parameters were varied independently.

The number of hill-climbing rounds determines the depth of a local search. Too few rounds may lead to insufficient exploration of local optima, while too many rounds may significantly increase runtime, reducing overall efficiency. Therefore, the number of rounds was varied from one to four. As shown in Figure 9a, the lowest communication cost and a reasonable runtime were achieved when the number of rounds was set to two. Although additional rounds provided marginal improvements in communication cost, the gains were offset by a sharp increase in runtime. As a result, two rounds were selected as the default setting in this study.

The number of clustering neighbors affects the quality of task partitioning. An insufficient number of neighbors may lead to poor or imbalanced clustering, whereas excessive neighbors can increase computational overhead. To evaluate this trade-off, the algorithm was tested with neighbor counts of two, four, six, and eight. As illustrated in Figure 9b, a neighbor count of six yielded the lowest communication cost while maintaining a moderate runtime, thus achieving a good balance between clustering effectiveness and computational efficiency. Accordingly, six neighbors were chosen as the default configuration for clustering.

Based on the results of the sensitivity analysis, the number of hill-climbing rounds is set to two and the number of clustering neighbors to six in this study, in order to achieve an optimal balance between communication cost and runtime.

6. Conclusions

This paper proposes a two-stage optimization framework that combines the spectral clustering initialization strategy with the ET algorithm for the NoC mapping problem. Through systematic experimental validation, the framework outperforms existing mainstream algorithms in key metrics such as communication cost, energy consumption, and latency.

In this study, an initial mapping strategy based on spectral clustering is employed to provide a strong starting point for the main algorithm. Subsequently, the ET algorithm is applied to perform the mapping. Initially, the ET algorithm allocates 90% of the individuals to global exploration, gradually transitioning to 30% through an adaptive switching mechanism between the tree pursuit and ground encircling strategies, effectively preventing premature convergence. Additionally, an incremental hill-climbing strategy is introduced to enhance local search capabilities, thereby improving the efficiency of local solution optimization.

To comprehensively evaluate the effectiveness of the proposed algorithm, this paper presents an in-depth analysis based on multiple real-world benchmark applications and randomly generated task graphs, comparing the ET algorithm with several existing mapping algorithms. The experimental results demonstrate that the ET algorithm achieves the optimal communication cost with fewer iterations in experiments involving real-world benchmark applications. In the case of randomly generated task graphs, the ET algorithm not only outperforms other algorithms in terms of communication cost and energy consumption but also shows significant improvements in latency. Notably, the algorithm reduces the communication cost by up to 41.11% in the 64-core random task graph experiment.

In summary, this paper proposes an efficient optimization algorithm for the task mapping problem in NoC systems. Extensive experiments conducted on both real-world benchmark applications and randomly generated task graphs demonstrate that the proposed method outperforms several mainstream mapping strategies in terms of communication overhead, adaptability, and mapping quality. These results highlight the algorithm’s performance advantages.

The spectral clustering-based initialization strategy proposed in this study is not only effective within the current algorithmic framework but also has the potential to be integrated into other mainstream task mapping algorithms. Future work will explore the development of generalized initialization strategies that are applicable across different algorithms, aiming to further enhance the effectiveness of task mapping in diverse scenarios. Additionally, while the proposed strategy performs well under conditions of relatively balanced task loads and regular topological structures, its effectiveness in minimizing communication overhead may be limited in cases involving irregular topologies or highly non-uniform task distributions. To address this limitation, future research will focus on designing adaptive initialization strategies that incorporate features of the underlying topology and task communication behavior, thereby improving the adaptability and robustness of the algorithm.

In addition to the XY routing algorithm, NoC systems employ a variety of other routing strategies, such as least-hop routing (e.g., West-First, North-Last) and adaptive routing. These algorithms can dynamically select communication paths based on network conditions, offering the potential to further reduce latency. Future research will investigate the impact of these routing strategies on task mapping quality. This study primarily focuses on mapping task graphs to a 2D-Mesh. Future work will aim to extend the proposed algorithm to support more complex NoC topologies, such as ring-based structures (e.g., Torus) and 3D-Mesh, to better accommodate the diverse requirements of NoC designs.

Author Contributions

Writing—original draft, K.L.; Writing—review & editing, J.S. and Y.S. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Natural Science Foundation of Heilongjiang Province under Grant No. LH2021F055 and LH2019F027.

Data Availability Statement

The data presented in this study are available in this article.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

Tosun, S.; Ozturk, O.; Ozkan, E.; Ozen, M. Application mapping algorithms for mesh-based network-on-chip architectures. J. Supercomput. 2015, 71, 995–1017. [Google Scholar] [CrossRef]
Sahu, P.K.; Shah, T.; Manna, K.; Chattopadhyay, S. Application mapping onto mesh-based network-on-chip using discrete particle swarm optimization. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 2013, 22, 300–312. [Google Scholar] [CrossRef]
Dehghani, M.; Montazeri, Z.; Trojovská, E.; Trojovský, P. Coati optimization algorithm: A new bio-inspired metaheuristic algorithm for solving optimization problems. Knowl.-Based Syst. 2023, 259, 110011. [Google Scholar] [CrossRef]
Tosun, S. Cluster-based application mapping method for network-on-chip. Adv. Eng. Softw. 2011, 42, 868–874. [Google Scholar] [CrossRef]
Choudhary, J.; Sudarsan, C.S. A performance-centric ML-based multi-application mapping technique for regular network-on-chip. Mem.-Mater. Devices, Circuits Syst. 2023, 4, 100059. [Google Scholar] [CrossRef]
Weng, X.; Liu, Y.; Xu, C.; Lin, X.; Zhan, L.; Wang, S.; Yang, Y. A machine learning mapping algorithm for NOC optimization. Symmetry 2023, 15, 593. [Google Scholar] [CrossRef]
Lei, T.; Kumar, S. A two-step genetic algorithm for mapping task graphs to a network on chip architecture. In Proceedings of the Euromicro Symposium on Digital System Design, 2003. Proceedings, Belek-Antalya, Turkey, 1–6 September 2003; IEEE: Piscataway, NJ, USA, 2003; pp. 180–187. [Google Scholar]
Alagarsamy, A.; Gopalakrishnan, L. MBA: A new cluster based bandwidth and power aware mapping for 2D NoC. In Proceedings of the 2018 International Conference on Circuits and Systems in Digital Enterprise Technology (ICCSDET), Kottayam, India, 21–22 December 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 1–4. [Google Scholar]
Alagarsamy, A.; Gopalakrishnan, L.; Mahilmaran, S.; Ko, S.B. A self-adaptive mapping approach for network on chip with low power consumption. IEEE Access 2019, 7, 84066–84081. [Google Scholar] [CrossRef]
Sikandar, S.; Baloch, N.K.; Hussain, F.; Amin, W.; Zikria, Y.B.; Yu, H. An optimized nature-inspired metaheuristic algorithm for application mapping in 2D-NoC. Sensors 2021, 21, 5102. [Google Scholar] [CrossRef]
Boroumand, B.; Yaghoubi, E.; Barekatain, B. An enhanced cost-aware mapping algorithm based on improved shuffled frog leap in network on chips. J. Supercomput. 2021, 77, 498–522. [Google Scholar] [CrossRef]
Mehmood, F.; Baloch, N.K.; Hussain, F.; Amin, W.; Hossain, M.S.; Zikria, Y.B.; Yu, H. An efficient and cost effective application mapping for network-on-chip using Andean condor algorithm. J. Netw. Comput. Appl. 2022, 200, 103319. [Google Scholar] [CrossRef]
Amin, W.; Hussain, F.; Anjum, S.; Saleem, S.; Baloch, N.K.; Zikria, Y.B.; Yu, H. Efficient application mapping approach based on grey wolf optimization for network on chip. J. Netw. Comput. Appl. 2023, 219, 103729. [Google Scholar] [CrossRef]
Amin, W.; Hussain, F.; Anjum, S. iHPSA: An improved bio-inspired hybrid optimization algorithm for task mapping in network on chip. Microprocess. Microsystems 2022, 90, 104493. [Google Scholar] [CrossRef]
Mohiz, M.J.; Baloch, N.K.; Hussain, F.; Saleem, S.; Zikria, Y.B.; Yu, H. Application mapping using cuckoo search optimization with Lévy flight for NoC-based system. IEEE Access 2021, 9, 141778–141789. [Google Scholar] [CrossRef]
Saleem, S.; Hussain, F.; Baloch, N.K. IWO-IGA—A hybrid whale optimization algorithm featuring improved genetic characteristics for mapping real-time applications onto 2D network on chip. Algorithms 2024, 17, 115. [Google Scholar] [CrossRef]
Amin, W.; Hussain, F.; Anjum, S.; Khan, S.; Baloch, N.K.; Nain, Z.; Kim, S.W. Performance evaluation of application mapping approaches for network-on-chip designs. IEEE Access 2020, 8, 63607–63631. [Google Scholar] [CrossRef]
Sahu, P.K.; Chattopadhyay, S. A survey on application mapping strategies for network-on-chip design. J. Syst. Archit. 2013, 59, 60–76. [Google Scholar] [CrossRef]
Chen, Q.; Huang, W.; Huang, Y. The learnable model-based genetic algorithm for the IP mapping problem. IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 2022, 42, 2350–2363. [Google Scholar] [CrossRef]
Wang, X.; Choi, T.M.; Yue, X.; Zhang, M.; Du, W. An effective optimization algorithm for application mapping in network-on-chip designs. IEEE Trans. Ind. Electron. 2019, 67, 5798–5809. [Google Scholar] [CrossRef]
Fang, J.; Zong, H.; Zhao, H.; Cai, H. Intelligent mapping method for power consumption and delay optimization based on heterogeneous NoC platform. Electronics 2019, 8, 912. [Google Scholar] [CrossRef]
Hu, J.; Marculescu, R. Energy- and performance-aware mapping for regular NoC architectures. IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 2005, 24, 551–562. [Google Scholar]
Wenbiao, Z.; Zhang, Y.; Shenzhen, G.; Mao, Z.; Harbin, H. Link-load balance aware mapping and routing for NoC. Architecture 2007, 4, 6. [Google Scholar]
Xu, C.; Liu, Y.; Li, P.; Yang, Y. Unified multi-objective mapping for network-on-chip using genetic-based hyper-heuristic algorithms. IET Comput. Digit. Tech. 2018, 12, 158–166. [Google Scholar] [CrossRef]
Khan, S.; Anjum, S.; Gulzari, U.A.; Afzal, M.K.; Umer, T.; Ishmanov, F. An efficient algorithm for mapping real-time embedded applications on NoC architecture. IEEE Access 2018, 6, 16324–16335. [Google Scholar] [CrossRef]
Tran, A.T.; Baas, B. NoCTweak: A Highly Parameterizable Simulator for Early Exploration of Performance and Energy of Networks On-Chip (Tech. Rep. ECE-VCL-2012-2); VLSI Computation Lab, ECE Department, University of California: Davis, CA, USA, 2012. [Google Scholar]
Aravindhan, A.; Lakshminarayanan, G. SAT: A new application mapping method for power optimization in 2D–NoC. In Proceedings of the 20th IEEE International Symposium on VLSI Design and Test, Guwahati, India, 24–27 May 2016; IIT Guwahati: Guwahati, India, 2016; pp. 1–6. [Google Scholar]
Tosun, S.; Ozturk, O.; Ozen, M. An ILP formulation for application mapping onto network-on-chips. In Proceedings of the 2009 International Conference on Application of Information and Communication Technologies, Baku, Azerbaijan, 14–16 October 2009; IEEE: Piscataway, NJ, USA, 2009; pp. 1–5. [Google Scholar]
Dick, R.P.; Rhodes, D.L.; Wolf, W. TGFF: Task graphs for free. In Proceedings of the Sixth International Workshop on Hardware/Software Codesign (CODES/CASHE’98), Washington, DC, USA, 15–18 March 1998; IEEE: Piscataway, NJ, USA, 1998; pp. 97–101. [Google Scholar]

Figure 1. The overall framework of the algorithm.

Figure 2. Schematic diagram of the overall framework and key strategy implementation process of the ET algorithm.

Figure 3. Analysis to find the best starting position for initial mapping on a 4 × 4 2D mesh.

Figure 4. Comparison of convergence curves between initial mapping and random initialization strategies.

Figure 5. Comparison of the number of iterations of ET and three algorithms.

Figure 6. Comparison of communication costs for randomly generated task graphs.

Figure 7. Comparison of energy consumption of different algorithms.

Figure 8. Comparison of the average network latency of different algorithms.

Figure 9. Sensitivity analysis of key parameters of the ET algorithm.

Table 1. Main control parameters.

Parameters	Set Values	Explanation
Population Size	150	Controlling the search breadth and solution space coverage.
Maximum Number of Iterations	150	Global optimization iterations.
Number of Cluster Neighbors	6	The number of neighbors for each node is used in constructing the shared adjacency matrix.
Maximum Rounds of Hill Climbing	2	The maximum number of iterations for each local search phase.
Minimum Cost Increase Threshold	≥1%	Avoid weak or ineffective perturbations.

To enhance the adaptability of the algorithm across task graphs of varying sizes, certain parameters (e.g., Population Size) are dynamically adjusted based on the size of the task graph during actual experiments. The table lists the commonly used parameter settings.

Table 2. Detailed settings of the simulation environment.

Settings	Details
Network Type	2D Mesh
Platform Type	Embedded
Packet Delivery Type	Without ACK
Sending ACK Policy	Send ACK Optimally
Packet Distribution	Exponential
Fixed Packet Length	8 (flits)
Flit Injection Rate	0.1 (flits/cycle/node)
Routing Algorithm	XY DIMENSION-ORDER
Output Channel Selection	XY-ORDER
Buffer Size	8 (flits)
Pipeline Type	8
Pipeline Stages	4
Operating Clock Frequency	10,000 (MHz)
Warm-Up Time	5000 cycles

“Pipeline type” refers to the category of simulation processing flow used in the system.

Table 3. Details of actual benchmark applications.

Benchmark	Nodes	Edges	Mesh Size
PIP	8	8	3 × 3
VOPD	16	21	4 × 4
MPEG-4	12	26	4 × 4
MWD	12	13	4 × 4
Mp3EncMp3Dec	13	14	4 × 4
263encMP3dec	12	12	4 × 4
263decMP3dec	14	15	4 × 4
80211ARX	24	42	5 × 5

Table 4. Comparison of communication cost for real-world benchmark applications.

Algorithms	PIP	MWD	MPEG-4	VOPD	MP3enc	263enc	263dec	80211ARX
ILP	640	1120	3567	4119	17.021	230.407	19.823	-
CASTNET	640	1120	3567	4135	17.021	230.407	19.823	12,737.625
GA	640	1120	3567	4119	17.021	230.407	19.823	12,736.700
Mean ± Std	640 ± 0	1242 ± 71	3605 ± 53	4188 ± 89	17.189 ± 0.26	230.457 ± 0.12	19.989 ± 0.11	13,311.329 ± 577.58
SA	640	1120	3567	4119	17.021	230.407	19.823	12,735.650
Mean ± Std	640 ± 0	1155 ± 52	3622 ± 76	4138 ± 8	17.084 ± 0.12	230.419 ± 0.01	19.959 ± 0.10	12,977.043 ± 331.69
PSO	640	1120	3567	4125	17.021	230.407	19.823	12,780.925
Mean ± Std	640 ± 0	1317 ± 88	3695 ± 101	4253 ± 157	17.318 ± 0.31	234.279 ± 8.73	20.159 ± 0.29	13,845.923 ± 747.85
Proposed ET	640	1120	3567	4119	17.021	230.407	19.823	12,733.975
Mean ± Std	640 ± 0	1120 ± 0	3567 ± 0	4120 ± 3	17.021 ± 0	230.408 ± 0.002	19.837 ± 0.01	12,776.904 ± 41.02

Table 5. ET communication cost saving (%) over other algorithms.

Task Graph	Over GA	Over SA	Over PSO	Over CASTNET
25 CORES	2.57%	5.51%	11.25%	11.22%
32 CORES	6.00%	7.41%	25.54%	4.48%
64 CORES	7.62%	2.33%	41.11%	4.33%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, K.; Shao, J.; Song, Y. ET: A Metaheuristic Optimization Algorithm for Task Mapping in Network-on-Chip. Electronics 2025, 14, 2846. https://doi.org/10.3390/electronics14142846

AMA Style

Li K, Shao J, Song Y. ET: A Metaheuristic Optimization Algorithm for Task Mapping in Network-on-Chip. Electronics. 2025; 14(14):2846. https://doi.org/10.3390/electronics14142846

Chicago/Turabian Style

Li, Ke, Jingbo Shao, and Yan Song. 2025. "ET: A Metaheuristic Optimization Algorithm for Task Mapping in Network-on-Chip" Electronics 14, no. 14: 2846. https://doi.org/10.3390/electronics14142846

APA Style

Li, K., Shao, J., & Song, Y. (2025). ET: A Metaheuristic Optimization Algorithm for Task Mapping in Network-on-Chip. Electronics, 14(14), 2846. https://doi.org/10.3390/electronics14142846

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

ET: A Metaheuristic Optimization Algorithm for Task Mapping in Network-on-Chip

Abstract

1. Introduction

2. Related Work

3. Problem Definition

3.1. Definition of Terms

3.2. Energy Model

3.3. Latency Model

3.4. Optimization Model

4. Mapping Based on ET Algorithm

4.1. Initialization Strategy Based on Spectral Clustering

4.2. Mapping Optimization Using ET

4.3. Parameter Settings and Complexity Analysis

4.3.1. Main Parameter Settings

4.3.2. Complexity Analysis

5. Simulation Results

5.1. Simulation Settings and Scenarios

5.2. Analysis and Determination of the Optimal Initial Mapping Location

5.3. Comparative Analysis of Communication Costs

5.3.1. Analysis Based on Real-World Benchmarking Applications

5.3.2. Analysis of Randomly Generated Task Graphs Based on TGFF

5.4. Performance Comparison in Energy Consumption Optimization

5.5. Performance Comparison in Latency Optimization

5.6. Parameter Sensitivity Analysis

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI