An Optimized Nature-Inspired Metaheuristic Algorithm for Application Mapping in 2D-NoC

Mapping application task graphs on intellectual property (IP) cores into network-on-chip (NoC) is a non-deterministic polynomial-time hard problem. The evolution of network performance mainly depends on an effective and efficient mapping technique and the optimization of performance and cost metrics. These metrics mainly include power, reliability, area, thermal distribution and delay. A state-of-the-art mapping technique for NoC is introduced with the name of sailfish optimization algorithm (SFOA). The proposed algorithm minimizes the power dissipation of NoC via an empirical base applying a shared k-nearest neighbor clustering approach, and it gives quicker mapping over six considered standard benchmarks. The experimental results indicate that the proposed techniques outperform other existing nature-inspired metaheuristic approaches, especially in large application task graphs.


Introduction
The overall performance and scalability of the system-on-chip (SoC) are degraded because of the increasing number of intellectual property (IP) cores embedding on the SoC. For the improvement of overall performance and flexibility of the SoC, new promising solutions have been proposed, and they are called network-on-chip (NoC) [1]. NoC is an on-chip, packet-based communication switching network which is created for interaction between IP cores of the SoC designs [2]. Routers (switch fabric) are linked in some standard topology for communications among IP cores. A router is available for every IP core in an NoC. The router is a basic building block of the NoC architecture; a fault-resilient router architecture is necessary for reliable on-chip communication. The authors of [3][4][5][6] did some architectural modifications in the existing NoC routers designs to propose a reliable on-chip network communication infrastructure. A message passing technique is used for the exchange of data between IP cores. As per the multi-core system principle, the contribution of NoC in power consumption of the total system is around 40%, and this has a vital role in network performance [1,7]. The power, latency and area of NoC-based systems are conspicuously impacted by the selection of an on-chip interconnection architecture [7]. Depending on the interconnection networks, numerous standard topologies are established for the NoC. The most renowned topology out of all prevailing conventional topologies of the NoC architecture is a mesh topology [8].
In the mesh topology, there are short paths for communication between IP cores and high bisection width. The interconnected structure is regular and fixed, and the links are of equal size. Considering this context, various techniques for applications mapping have been proposed using search-based and exact optimization methods. Additionally, proper modeling via an analytical approach has been investigated to reduce the area, latency and power in NoCs.
Because computation time to solve the mapping problem increases with the size of the application to be mapped, it is known that an application mapping is a non-deterministic polynomial-time (NP)-hard problem. To obtain the optimal solution over NoC performance metrics, search-based optimization techniques have been considered. Therefore, the solution of NP-hard problems is significantly dependent on the choice of the best heuristic or metaheuristic technique.
In practical systems, resources are limited so that an efficient utilization of given resources is a critical issue. Optimization techniques can be employed in a wide range of areas, including engineering, finance, resource planning and Internet routing. Using a mathematical model of the social and political progression, metaheuristic algorithms provide an effective algorithm to solve the given optimization problems. These algorithms can obtain a universal solution by facilitating interaction between high level approaches and local improvement methods.
Furthermore, a metaheuristic algorithm can be efficient if it offers a realistic equilibrium between experimentation and exploitation on a provided optimization problem, which is critical. Intensification (i.e., exploitation) is associated with local search, while diversification (i.e., exploration) is associated with global search. Diversification tends to find out diverse solutions globally (i.e., global search). On the other hand, intensification focuses on searching local regions with the knowledge of the current best solution from this region (i.e., local search). There is no initial solution required for global search, while local search starts from an initial candidate solution. The mobility of candidate solutions should be randomized as far as possible during the exploration phase. On the other hand, the exploitation process entails thorough investigation of the promising area(s). The most dominant difference between current metaheuristic algorithms, in general, is how they balance the discovery and exploitation phases. Depending upon the context mentioned above, sailfish optimization (SFO) is considered in this study.
SFO provides a suitable equilibrium between intensification (exploitation) and diversification (exploration) to avoid early convergence. To examine the performance metrics of NoC, the novel metaheuristic optimization algorithm used in this paper, that is, SFO, is described in [9]. The SFO algorithm is modeled after a sailfish group targeting a school of sardine prey in a series of attacks. To begin, SFO uses two assortments of prey and predator species to replicate the technique of group hunting. Second, the presented algorithm breaks down the mutual security of grouping prey by alternating attacks. Third, prey mobility can be changed across the search region, allowing the hunter to capture the right prey and improve its fitness. The effectiveness of the SFO algorithm is verified by examining the optimal mapping for eight NoC benchmarks for the two-dimensional (2D) mesh topology.
The remainder of the paper is structured as follows. The related work is given in Section 2. The inspiration for the sailfish optimization algorithm is described in Section 3. The mapping using SFO, models used for the analysis of metrics and the proposed algorithm are described in Sections 4-6, respectively. The experimental setup along with considered benchmarks and results are summarized and analyzed in Section 7. Section 8 ends with some conclusive remarks.

Related Work
In [10], Araki and Yoshihiro presented a multi-path reliable distance-vector routing strategy by utilizing multiple paths for the extension of reliable distance-vector routing (RDV) for the improvement of communication performance, decreased delivery delay, higher load-balancing and more substantial network capacity. In comparison to RDV, fault tolerance is also greater against the topology modifications. In [4], Rashid et al. proposed a reliable on-chip network communication architecture by making some architectural improvements in the existing NoC routers' designs. In [11], a router's controllers design based on finite-state machine (FSM) is presented for the minimization of error propagation, aiming at low utilization of logical resources.
In [12], Wu and Cai presented a Fibonacci tree optimization strategy (FTOS) for the scheduling query of wireless sensor networks. The proposed algorithm provided less energy consumption and optimization of detection efficiency. In [13], Rhee et al. presented an artificial neural network (ANN) model combined with the genetic algorithm (GA) for the cost-effective operation of a silo. The combined technique gave the optimized results with the improvement in the accuracy of internal level prediction of the silo, and an efficient number of sensors and their positions of installation are determined. In [14], the authors presented a comprehensive overview of the algorithms of machine learning for embedded systems and mobile computing space. In [15], the authors presented a heuristic technique based on the moth-flame optimization (MFO) algorithm for resolving the weak exploration problem of the k-means data clustering algorithm.
The problem of application mapping has stimulated the research community because of the expeditious growth in NoC. Tosun et al. proposed integer linear programming (ILP) as an exact mapping method for the mesh-based two-dimensional NoC with an energy minimization principle in [8]. In [16], Hu and Marculescu presented a branch and bound (BB) mapping solution for the topological allocation of IP cores on an NoC platform for the minimization of the total consumption of energy with the limitation of bandwidth of the link. In [17], Lei et al. presented a two-step genetic algorithm (GA) based on delay for the communication of NoC. The prime function for the scheduling and mapping of IPs was the minimization of overall execution time. Murali and Micheli proposed a heuristic approach based on a mapping algorithm for cores mapping on 2D mesh topology with the restraint of bandwidth reservation in [18]. In [19], Lu et al. presented a clustering algorithm based on simulated annealing for reducing the simulation time of an annealing process of a large system. The process of clustering compromised the optimum results but accelerated the computation time. In [20], Radu and Vintan proposed an optimized simulated annealing (OSA) algorithm for 2D mesh mapping by optimizing the parameters of the annealing process for producing the optimum outcomes with less time than the conventional simulated annealing schemes. Ascia et al. [21] presented a multi-objective GA for mapping of IP cores in a 2D mesh topology for optimizing the power consumption and network performance. In [22], Jena and Sharma presented a heuristics search based multi-objective GA for the mapping of IP cores on a 2D mesh topology for the optimization of link bandwidth, the performance of the network and power dissipation. Sepulvada et al. also presented a multi-objective adaptive immune algorithm (MAIA) for the problem of application mapping of NoC architecture [23]. In [24], Harmanani and Farah proposed an algorithm for assigning tasks to the nodes of a 2D mesh network based on simulated annealing. Hu et al. proposed a task mapping technique for the NoC architecture with a constraint of bandwidth [25]. This technique was energy aware and expedited the run-time of the process of task mapping, but it shows trade-off in the network performance results.
Ye et al. derived the power models for connectivity wires, switch and inbuilt buffer in [26]. In [27], the authors provided a well-accepted mathematical term for 2D NoC interconnect energy models. Kahng et al. [28] and Ost et al. [29] created a practical power model for 2D NoC as a follow-up to the one in [27]. In [28], the power model takes into account architecture-level power as well as region modeling and router capacity for the router. The power modeling in [28] was validated and checked by Ost et al. [29]. The authors of [30] calculated the efficiency of mesh-dependent 2D and 3D NoCs based on the comprehension of energy depletion between the cores and the routing area. The thesis by Sahu and Chattopadhyay [31] takes advantage of a comprehensive review of framework mapping techniques for NoC and examines various mapping methods proposed during the last period. As per Sahu and Chattopadhyay [31], a heuristic-based mapping strategy provided a better end result in terms of network output metrics optimization.
In [32], a simulated annealing (SA) algorithm is implemented as a metaheuristic approach to create an efficient mapping with IP connectivity specifications as a restriction for 2D NoC. The authors of [33] implemented mapping by scheduling with an ant colony optimization (ACO) approach for 2D NoC. In [34], a particle swarm optimization (PSO) is used as a mapping technique on both 2D and 3D NoCs, with the connectivity metric as the objective function. To tackle the problem addressed in [32], a power-aware mapping technique for 2D NoC utilizing SA with the taboo quest (SAT) was proposed by Alagarsamy and Gopalakrishnan [35]. In [36], a mapping technique for a 2D NoC is presented. The foremost objective is to build a chain of linked cores that can be used to construct a new mapping system. In comparison to similar ones, the authors of [36] attempted to use less bandwidth. In [37], Tosun presented a heuristic approach for a mesh 2D NoC in which a priority list based on overall and average communication bandwidth was established.
In [38], a reliability-aware technique is presented. The featured graph is divided into two sub-graphs, which are used to reduce transmission flow. As a result, transmission flow between the two sub-graphs is reduced, while traffic within every graph increases. Niknam and Amiri presented a novel hybrid PSO-based approach to address the clustering issue in [39]. For better performance, ACO and k-means techniques were used. The presented approach was tested and validated on various publicly available datasets, and the preliminary observations are optimistic. The suggested hybrid approach was shown to coincide with an optimal solution in the majority of instances. Junior et al. [40] also presented an ACO-based approach for finding and maximizing directions in a mesh-based NoC. Routed optimization was achieved by reducing the total delay in packet transmission between activities. The visionary conclusions showed the efficiency of the ACO-based technique. In addition, Xie et al. proposed an online mapping protocol to refine task mapping methodology for minimizing connection power consumption [41]. First, the run-time interconnection point of applications was investigated. Secondly, this method measured the mapping assignment and used real-time web mapping.

Sailfish Optimizer
In this section, the key inspiration for the SFO algorithm (SFOA) is discussed. The suggested algorithm and mathematical models are then thoroughly explained.

Inspiration
Shadravan et al. [9] recently introduced a new metaheuristic technique called SFO, which incorporates the action of both a predatory group of sailfish and a prey group of sardines. The sailfish is known as a social predator since it attacks and catches its prey in groups. Predators use various killing techniques in cooperative hunting. The class of sailfish, for example, is distinguished by the alternation of attack techniques. It entails that each member of the group attacks the school of prey (sardine) alone at a given time, injuring or hunting some of them while the other group members conserve their strength. Whenever a sailfish attacks a school of prey, it will update its location concerning them. Furthermore, the sailfish will update their location to occupy vacant space around the prey school and imitate circling the prey. When a member of the sardine group (prey) is wounded, the sardine group changes direction to avoid the sailfish's subsequent attacks. The general procedure of the sailfish optimizer algorithm is defined in the subsections that follow.
Group hunting is an intriguing illustration of collective activity in communities of invertebrates, fishes, birds and mammals. Compared to hunting alone, predators do not require a lot of power to kill their prey while hunting in groups.
Predators in the most basic type of group hunting aim to finish off the prey by step-bystep planning of the attack, whereas predators under the more sophisticated class of group hunting practice specialized positions to mob and capture the prey [42]. The alternation of attacks is one of the most complicated group hunting techniques. This tactic allows the hunter to save strength when other predators are injuring the prey. Sailfish hunting in groups that alternate attacks on the schooling sardines is an illustration of this kind of method [43,44].
The most expeditious fish in the ocean, sailfish can attain speeds up to 62 miles per hour. They hunt in clusters, herding schools of smaller fish, such as sardines, near the surface. Sailfish find the sardines' mobility and speed during the assault very difficult. The sailfish either slashes multiple sardines with its rostrum or taps a single sardine, causing it to become unstable. Sardines cannot float quickly enough to dodge the tip of the sailfish's rostrum and are incapable of responding to this community hunting because the sailfish has one of the fastest accelerations ever observed in a floating creature. According to sardine experimental action, wounded sardines would be isolated from the prey shoal and unable to travel with the shoal, resulting in their capture by the sailfish [42].
The majority of sailfish attacks do not result in sardine deaths, and only a small percentage of sardines are directly caught. However, as sailfish attacks become more common, an increasing number of sardines are injured. Animals who hunt in groups, such as wolves, are more likely to engage in this form of hunting. On the other hand, these sailfish parties split up and regroup with new affiliates daily. During an assault, a sailfish preserves its big back flipper and sacral flippers upright to maintain its body strength. Often, right before an attack, they transform their body color from the usually bluish-silver parallel edges deepening to nearly black. The purpose for the color change is unclear, but it appears to be a form of communication between sailfish [42]. Sailfish use shifts in their body to signal which should move first, allowing them to avoid being injured by a companion. The attack-alternation technique of sailfish party hunting is the key inspiration for the SFO algorithm. The natural actions of sailfish and sardines are mathematically represented in the following subsection, and an optimization approach based on this mathematical model is developed.

Problem Formulation
An application is characterized by a directed graph of the network in NoC, which is later scheduled by the scheduler using another directed core graph of the network on the existing IP-cores. The directed core graph is transmuted and depicted via an effective mapping method on the NoC topological architecture using an architecture graph.

Definition 1. Directed Task Graph (DTG):
The task graph of the network is a directed acyclic graph DTG(P, E), where every node of the graph symbolizes a task of the computational process of the application. In addition, the directed edges or links represent the communication or data volume among the tasks communicating.
where P and E are the sets of nodes, which correspond to the processes or tasks, and links or edges, respectively, and p i ∈ P, e i,j ∈ E for i, j = 1, 2, 3, · · · .

Definition 2. Directed Core Graph (DCG):
The core graph of the NoC architecture is a directed graph DCG(C, D), where every node of the graph symbolizes the IP cores in the topology. The directed edges represents the direct communication among the nodes (i.e., IP cores, d i and d j ) .
where C is the set of IP cores or processing elements and D denotes the set of links or edges with communication directions in the architecture graph. Elements in C and D are defined as c i ∈ C and d i,j ∈ D for i, j = 1, 2, 3, · · · .

SFOA for NoC Mapping
The initial sailfish and sardine populations are generated using the initial mapping and weight of the task graph given at time t = 0. Considering the settings of parameters of the proposed algorithm, the fitness value, which is the communication cost (CC) of the best sailfish (i.e., mapping solution), is computed. (For CC, refer to Equation (8) which is defined in Section 5). Later, the positions of sailfish and sardine are updated in in consideration of attack power (AP). (For position updates of sailfish and sardine, refer to Equations (21) and (27), respectively. For AP, refer to Equation (24) in Section 5). After updating the positions, the optimized result of mapping (sailfish) can be obtained.

Parameters Setting for SFOA
The proposed algorithm requires the setting of a few basic parameters to verify the efficiency of group hunting. In the proposed algorithm, the fitness function under consideration is the cost for communication, which is denoted by CC. The population size is 300, the number of iteration is equal to 150 and pp is the rate between the sailfish and sardine (where pp is defined as the fraction of the sardine population which forms the initial sailfish population), which is set to 0.1; these values are set for the application mapping on 2D NoC. These values are set based on the number of iterations run and optimization acquired for deducing an optimal solution. They also differ as per the properties of the application considered for mapping.
For the analysis of the performance parameters of an NoC such as energy, power and communication cost computation along with latency and average throughput, two models are used in this work. These two models are named the Bit Energy model and CMOS cell library model, and their mathematical expressions are explained in detail and in the next section.

Models Used for Analysis of Metrics
For analyzing the performance metrics of an NoC, two models are considered in the presented work [25,43]. An effective trade-off between the faster mapping over 2D mesh and performance metrics of NoC is presented by SFOA in this study.

Bit Energy Model
For the estimation of consumption of power of the router in the network, an energy model [25] is considered as follows: where E B is the energy used up for transferring 1 bit of data from the source node to the destination node, which comprises the energy of the switch (E SB ) and energy of the link (E LB ) of the NoC network. The average network energy consumption E B(p i ,p j ) for transferring 1 bit of data from a source node p i to the destination node p j is calculated by the following equation: where H count is the Manhattan distance between the source node (a i , a j ) and the destination node (b i , b j ), which is obtained by Therefore, the total energy consumption of the network (E T ) is calculated by using the average network energy and the link bandwidth, BW (p i ,p j ) , between nodes p i and p j .
Substituting Equation (4) into Equation (6), E T can be rewritten by Moreover, the cost of communication is defined by Different mapping results generate different energy and cost values. The prime concern is to obtain a mapping function that provides minimal cost for the whole network. The communication cost of the applications of NoC is considered the performance measure for distinct applications in this research work.

CMOS Cell Library Model
The proposed SFO algorithm utilizes the standard CMOS cell library model [43] for the calculation of network power, latency, energy consumption of packets and throughput of an NoC system. For the computation of average latency of the network via this model, the following equation is used: where N is the total number of processor or cores in the network, N i is the total numbers of received packets by the core i and Lat (i,j) is the latency of packet j at destination node i. The average throughput of the network, TP avg , is evaluated as follows: where T W is the warm-up time of the simulation and T S is the simulation time.
The network average power, P Navg , is computed by where α (i,k) is the active probability of component k in router i after T W . Moreover, P N(act,k) and P N(inact,k) are the post-layout active and inactive power of the component k. Finally, the network average energy consumption by every packet is given by where N is the total number of cores available in the network. N pack = ∑ N i=1 N i is the total number of packets injected in the network. For a certain number of experiments, N remains the same, and N pack can be changed by increasing or decreasing the packet injection rate.

The Proposed Algorithm: SFOA
The proposed SFOA takes the inputs, directed task graph, DCG, and directed network graph, DNG, and effectively performs the mapping of the task onto the cores of the 2D NoC topological architecture.

Empirical Base for Initial Mapping
To create the empirical base for the initial mapping, the following five steps of the selfadaptive chicken swarm optimization (SCSO) algorithm [44] are considered. Furthermore, Figure 1 shows the flowchart for initial mapping procedure.

•
Step 1: From DCG, randomly select the IP-Core Rand(c i ), for c i ∈ C (13)

•
Step 2: Use the DC matrix to find the presence of direct connection of the selected core with each core. • Step 3: Calculate the average CC (A i ) and weight (W i ) for each core (c i ) as follows: where w ij is the weight between cores c i and c j and N(c i ) is the open neighborhood of c i . For the identification of neighbors, use the following equation: • Step 4: For the identification of hop counts among the source node c i and sink node c j , use the following matrix: where H ij means that (i, j) element of matrix H is given by H ij . Matrix H indicates the minimum probable links for communication between the source and sink nodes. Considering d(c i , c j ) is the shortest path between the cores c i and c j , N(c i , c j ) is the number of hops in the shortest path. • Step 5: Using the shared K-nearest neighbor clustering approach, form a diverse cluster. If c i and c j have each other in their closest K-nearest neighbors list, then an edge exists between them. The strength of this edge is evaluated using: where K is the size of the neighbor's list, o is the position of shared near-neighbor in c i list and p is the position of shared near-neighbor in c j list. Hence, c io , i.e., the shared near-neighbor in c i list, is equal to c jp , that is the shared near-neighbor in c j list. After Step 5, an empirical base is created with clustered DCG. Figures 2-4 show the standard NoC video object plane decoder (VOPD) benchmark, clustering of VOPD task graph and its initial mapping on a 4 × 4 mesh, respectively.

Video Object Plane Decoder
Video object plane decoder (VOPD) is an application comprising several sub-tasks: run-length decoder, downsampler, quantizer, etc. These sub-tasks require communication among themselves at the rates specified in MBs on the edges between them. Figure 5 represents the architectural diagram of the VOPD, while Figure 2 illustrates the graphical representation of the VOPD tasks. VOPD consists of 16 sub-tasks having 21 edges labeled with distinct communication bandwidth.
For the initial phase of mapping, a random procedure is adopted as a mapping strategy. The outcome of this initial mapping is considered the input for the proposed SFOA to minimize the consumption of power and communication cost of 2D NoC. Figure 6 represents the flowchart for SFOA.

Aristocracy
A sailfish hunts the sardine while exploring the search region and updating its location/position to find a better solution. While updating the location of the sailfish, which is the search agent in this algorithm, better solutions may be lost. There is the possibility that the updated positions can be worse than the previous positions, thus elitism/aristocracy is applied.
Aristocracy involves finding the best search agent via the best sailfish fitness value and, for the sardines, the best fitness value of injured sardine and replicating the unchanged best solutions to the next generation. The best position of the search agent (sailfish) is kept in every iteration and measured as an Elite. The best or the fittest sailfish acquired until now is the Elite sailfish. It would be the one affecting the maneuverability and speeding up of sardines during the attacking. The location of any injured sardine is also saved in every iteration, which the sailfish will consider for group hunting as the best target selected.
Secondly, the fitness of each sailfish and sardine in the population is calculated using the fitness function (i.e., CC in the proposed algorithm). Based on this, Elite (i.e., the best sailfish) and injured sardine are acquired. The best sailfish is the one having the smallest fitness function value at iteration itr.
XSF itr best = {XSF itr |sailfish with the smallest fitness value} Similarly, the injured sardine is the one which has been attacked and injured by the sailfish and having the smallest value of CC.
XS itr inj = {XS itr |sardine injured by the best sailfish}

Attack-Alternation Technique
Sailfish promote the success rate of hunting their prey with the help of attacking in coordination technique. Sailfish chase their prey and herd them, change their own position conferring to the position of the other hunting sailfish, without even directly communicating with each other. Through this attack-alternation technique, sailfish injure more sardines during the first phase of hunting, which leads to a higher rate of success in capturing the prey at advanced phases of group hunting.
Afterward, the termination condition is checked. If the condition is not satisfied, the position of sailfish is updated with the following equation: The symbols in the above update equation are defined as follows: XSF itr new is the updated position of sailfish, XSF itr best is the position of best sailfish, δ itr is the coefficient at iteration itr, ϕ is a random number between 0 and 1, XS itr inj is the position of injured sardine and XSF itr old is the current position of sailfish.
where P denotes the prey density. The prey density represents the quantity of prey at each iteration. It is an important factor when updating the position of sailfish because the number of prey (i.e., sardines) will decline in group hunting as follows: where nSF and nS denote the numbers of sailfish and sardine, respectively, in each iteration. After using Equation (21) for updating the position of sailfish, the attacked power of sailfish, AP, at iteration itr is calculated with where C and epsilon are the coefficients for linearly decreasing AP.

Hunting Prey
The observation of a complete massacre of sardine is very sporadic at the beginning of the group hunting. In more than 90% of the cases, the scales of sardines would be removed after the sailfish strikes their bodies. At the start of the hunting phase, the energy level of sailfish for hunting and catching its prey is higher, and the sardines are also not really drained and injured. This is the reason that sardines have excessive escape speed and high maneuverability. Sailfish's attacking power would decline steadily over the time of hunting.
The position of every sardine in the population is also updated based on the current position of sailfish and AP at every iteration. The following formula is used for updating the position of sardine: where XS itr new and XS itr old are the updated and previous positions of sardine and ϕ 1 is a random number between 0 and 1.
Considering the value of AP, if the attack power of sailfish is less than 0.5, only S number of sardines positions will be updated. Otherwise, all the sardines' positions will be updated. Here, S is determined by Next, the fitness value (i.e., CC) of all the sardines and sailfish is recalculated as per their updated positions and population is sorted.

Catching Prey
Alongside the reducing attacking power of sailfish, the energy levels of sardines would be decremented because of the recurrent powerful attacking of sailfish. The attacks also affect the maneuverability as it reduces the prey's ability to detect the directional information regarding the position of sailfish. This will result in pulling away the sardines from the school after being slashed by the sailfish's rostrum, and they would be quickly captured then.
In the last phase of hunting, the pulled away sardines are quickly captured by the sailfish. In this algorithm, it is considered that, if any sardine becomes fitter than the sailfish, it is removed from its population. The sailfish will update to the position of the corresponding sardine as follows: where CC(S itr ) and CC(SF itr ) denote the fitness values (i.e., CC values) of sardines and sailfish at iteration itr. Thereafter, the position of best sailfish and injured sardine is also updated at every iteration.

Deducing Optimal Sailfish
The injured sardine that pulled away from the school would quickly be captured. In SFOA, it is considered that, when a sardine becomes weak, its respective sailfish catches its prey. The hunted sardine's position replaces the sailfish's position, elevating the probabilities of new prey's hunting. After satisfying the termination condition, the best sailfish is acquired along with its fitness value, that is CC.

Results and Discussion
This section presents the results of the performance analysis of SFOA for 2D NoC for six standard NoC benchmarks, as shown in Table 1. Network size is standard 4 × 4 for all considered benchmarks. For a fair comparison with previous state-of-the-art architectures, the network size is the same. VOPD application consists of 16 sub-tasks. These sub-tasks can be mapped on a 4 × 4 mesh network. However, in the case of MPEG4, MWD, MP3encMP3dec, 263encMP3dec and 263decMP3dec 4, 4, 3, 4 and 1 routers are idle, respectively.

Experimental Setup
To evaluate the performance of the proposed SFOA, different standard NoC benchmarks were considered and various experiments were conducted. The proposed algorithm was verified for 2D NoC with other nature-inspired algorithms such as ACO, PSO, GA, SA and CSO. The code for the proposed SFOA algorithm was written in Python and implemented on NoC Tweak Simulator [43]. All experiments were run on a PC Intel(R) Core (TM) i7-16GB RAM, 2.30 GHz processor. Table 2 depicts the details of the NoC Tweak platform for simulation.

Average Power Dissipation Analysis
To evaluate the efficiency of the proposed algorithm, power minimization analysis was also performed. It shows that SFOA outperformed other existing mapping techniques and the average percentage of improvement on power minimization with other natureinspired algorithms. Table 3 shows the results for total power consumption in watts (W) of 2D 4 × 4 mesh for six standard NoC benchmarks. From the results in Table 3, it is evident the average improvement of power minimization of our proposed algorithm SFOA is 3.63%, 23.7%, 18.70%, 22.14%, 27.25%, 18.66%, 12.08% and 4.73% over ILP, ACO, PSO, SAT, SA, GA, BA and CSO, respectively.

Communication Cost and Computation Time Analysis
The execution analysis of the proposed SFOA compared to other present natureinspired mapping algorithms is presented in this section. Table 4 depicts the evaluation of average communication cost (H count × BW) from Equation (8) for VOPD [34] and MPEG4 [30] standard NoC benchmarks for two-dimensional NoC.
As ILP [8] is regarded as one of the most competent algorithms in the exact mapping method for communication cost estimation, our proposed SFOA esd explicitly compared with ILP as well, along with other algorithms. SFOA provides the same results for communication cost, as shown by the results in Table 4. The values of a few parameters are missing in Tables 4-6 for some benchmarks as they were not provided by the authors in the base papers of ACO [33] and SA [32]. The percentage deviation from the exact mapping method based on ILP over heuristicbased mapping techniques for 2D NoC is shown in Table 5. However, the proposed SFOA gives the best results compared with other nature-inspired algorithms, as specified by the results in Table 6. In comparison with other existing mapping techniques, the proposed SFOA takes 69% less computation time. Table 6 represents the estimations for computation time in seconds and communication cost in MB/s of two-dimensional 4 × 4 mesh for six standard NoC benchmarks.

Average Network Latency Analysis
For the analysis of the performance of the proposed SFOA, the impact of average network latency was also scrutinized with different types of traffic patterns on mesh topological architecture. The considered distinct types of traffic patterns are a uniform random traffic pattern and tornado traffic pattern. These traffic patterns are a method for defining the communication between the IP-cores of the NoC.
In the case of uniform random traffic patterns, it distributes the traffic uniformly, balances the load and each source is equally likely to communicate with each destination. In the case of tornado traffic patterns, it is devised as a combatant for torus topologies.
The performance analysis of the considered 4 × 4 mesh-based NoC architecture was done using the XY-routing algorithm via the NoCTweak simulator [43]. The average network latency of the proposed algorithm, i.e., SFOA, was evaluated for the aboveconsidered two types of traffic patterns compared with other existing nature-inspired heuristics algorithms. Figure 7 depicts the graphical results of the average network latency in contrast to different rates of injection load under uniform random traffic patterns. It is evident from this graph that SFOA outperformed PSO, GA, BA and CSO by 11.23%, 16.40%, 8.65% and 4.42%, respectively, for uniform random traffic pattern. Furthermore, Figure 8 illustrates the graphical results of the average network latency compared to different injection load rates under tornado traffic patterns. It can be seen from this graph that SFOA outperformed PSO, GA, BA and CSO by 24.06%, 25.45%, 13.89% and 5.82%, respectively, for tornado traffic patterns.
SFOA gives the best latency in comparison with other existing nature-inspired algorithms considered such as PSO, GA, BA and CSO using minimum hops count mapping technique.  The mapping results of the proposed SFOA clearly indicate that it is more efficient than other existing nature-inspired algorithms. The results in figures and tables show the improvement in performance analysis parameters. It indicates the reduction in average network power consumption, computation time, communication cost and average network latency.

Conclusions
This paper presents a state-of-art nature-inspired metaheuristic algorithm, i.e., SFOA, which mainly comprises two advantages. The first advantage is high-speed convergence by strengthening the searching process used for the best sailfish group. The second advantage is robust optimization by strengthening the search space intended for the diversity of the sardine population. SFOA is used for the optimized mapping of the application task graph on a two-dimensional NoC with mesh topology. The efficiency of the proposed approach was assessed based on the results of the performance analysis parameters for six standard NoC benchmarks. The evaluation of the proposed SFOA proficiency was done via multiple experiments on alternative heuristic algorithms such as ACO, PSO, SA, GA, BA and CSO. The results shown in the previous section indicate that the average improvement of power minimization of the proposed algorithm SFOA is 3.63%, 23.7%, 18.70%, 22.14%, 27.25%, 18.66%, 12.08% and 4.73% over ILP, ACO, PSO, SAT, SA, GA, BA and CSO, respectively. In contrast to other existing mapping techniques, the proposed SFOA takes 69% less computation time. It is evident from the average network latency graphs that SFOA outperformed PSO, GA, BA and CSO for two distinct standards of traffic patterns for NoC by 11.23%, 16.40%, 8.65% and 4.42% for uniform random traffic patterns and 24.06%, 25.45%, 13.89% and 5.82% for tornado traffic patterns, respectively. The experiments results reveal that SFOA outperformed other nature-inspired algorithms to minimize power consumption, computation time, communication cost and latency. Moreover, this work can be continued in various ways, e.g., some hybrid algorithms can be introduced to reduce computation time further. This algorithm can also be implemented on 2D and 3D NoC architectures with different topologies.

Short Biography of Authors
Saleha Sikandar completed her BS in Electronic Engineering from International Islamic University Islamabad (IIUI), Pakistan in 2017. She is currently doing her MSc in Computer Engineering from University of Engineering & Technology (UET) Taxila, Pakistan. She has vast experience in research and development in various embedded systems companies. She worked on the development of many embedded and chip design systems. Her research interests include embedded systems design, Network on Chip (NoC), and reconfigurable systems designs. She is currently working on the low-cost application mapping on NoC.
Naveed Khan Baloch received his BSc degree in Computer Engineering from the University of Engineering and Technology, Taxila, Pakistan in 2007. He has worked in multinational companies as an embedded system designer from 2007 to 2010. He joined the university as a lecturer after completing his MS degree from UET Taxila. He recently completed his Ph.D. in Computer Engineering from the same university. He has published many research papers in his field and has experience in embedded system design, fault tolerant systems, reconfigurable computing, and he is currently working on self-healing digital systems. Nowadays, he is working as an Assistant Professor in Computer Engineering Department UET Taxila. During his tenure in academia, he did many collaborations with industry and foreign universities in the field of on-chip networks, embedded vision, and reconfigurable computing.