1. Introduction
When dealing with the application requirements of large data size and high real-time requirements, the system platform of multi-core processors working together shows great advantages compared with the single core processor system platform, but it also poses a great challenge to the on-chip system communication capabilities. The traditional System-on-Chip (SoC) communication usually uses the bus structure [
1]. The traditional network communication method is adopted to solve the continuous growth of communication demand between the cores on multi-core chip and to implement a better flow control mechanism [
2] to cope with the very large scale. A new method to the design of integrated circuits due to factors such as size and communication requirements is needed [
3,
4,
5].
The design of network-on-chip (NoC) usually contains four aspects [
6]: the mapping belongs to the category of modeling and optimization in which the application is applied. The NoC processing element mapping is an NP-hard problem [
7]. Therefore, a serious challenge for NoC designers is how to create a preferable algorithm to achieve mapping solutions with low-power consumption and low-latency [
8].
Recently, several NoC mapping algorithms for optimizing energy consumption or time delay have been proposed. A multi-application fault-tolerant mapping algorithm was proposed by Khalili et al. [
9], which maps the application kernel map to the idle error-free core by using an heuristic algorithm, and then places other idle kernels on other idle error-free kernels. This algorithm is more functional and energy efficient than previous solutions, but it will cause local crowding around the faulty node and forming hotspot congestion. Murali et al. [
10] proposed a method to ensure quality of service (QoS) in mapping physical planning process by satisfying real-time restrain and delay restrain of traffic flow. Compared to traditional solutions, this design can save up to five times the bandwidths. To minimize the power consumption of different benchmarks, Tosun et al [
11] suggested a new ILP formula for mapping applications to grid-based NoCs, but this algorithm still has room for improvement, as the consumptions of run time become enormous when the tasks increase accordingly. Macron et al. [
12] compares some NoC mapping algorithms for obtaining a low energy consumption communication weighted model. Although the NoC mapping schemes finally achieved by these algorithms have lower power consumption or higher performance, these papers discuss single optimization targets of NoC, and are not applicable to the multi-target mapping problem of NoC heterogeneous processing units.
The contribution of this paper is three-fold. First, we propose a two-populations-with-enhanced-initial-population based on genetic algorithm (TI_GA) mapping scheme for heterogeneous multi-core NoC. The results of the experiment show that the scheme is better than the existing mapping scheme, reducing power consumption and delay. Secondly, we propose to use the method of optimizing initial population and dual population inheritance to compensate for the shortcomings of genetic algorithm (GA) random initialization and often get into the local optimal solution. Finally, the proposed scheme provides a new perspective for heterogeneous multi-objective optimization problems.
This paper is structured as follows: In
Section 2, the concept of Two-Step mapping is proposed, and a formal description of the mapping problem for power consumption and delay is made. In
Section 3, a mathematical model of power consumption and delay is established, and in
Section 4, TI_GA is proposed, and the algorithms introduced in detail. See
Section 5 for simulation results and discussion. Finally, we draw a conclusion in
Section 6.
2. Two-Step of NoC Mapping
In more complex applications, there are many cooperative tasks, which are generally different in nature. Tasks are usually assigned to heterogeneous IP cores that are good at running such tasks. Compared with isomorphic systems, heterogeneous systems can achieve better system performance and lower power consumption. This paper divides the problem of heterogeneous multi-core NoC mapping into two steps: the first step, as shown in
Figure 1, assigns tasks in the application task graph to the IP core in a specific strategy according to the task characteristics. The second step, as shown in
Figure 2, assigns IP cores associated with tasks in the first step to NoC network resource nodes with some mapping strategy. In these two steps, different allocation and mapping strategies have different effects on delay and energy consumption. In task graphs with a large number of tasks, factors such as area cost are usually taken into account, and multiple tasks are often assigned to an IP core for execution.
The NoC platform used in the mapping algorithm research in this paper is a two-dimensional mesh (2D-Mesh) to realize the connection between nodes. As shown in
Figure 2 (right), its routing node (R) and resource node only correspond to each other. Each routing node and resource node transmit data through network interface (NI). Since the study of the mapping algorithm involves the length of the packet transmission path, in the selection of the routing algorithm [
13,
14], the NoC platform is specified to utilize the static shortest path routing algorithm, which is easy to address and the conclusion is easily extended to other complex platform architecture.
The chart above shows the process of mapping tasks to NoC platform resource nodes. The following is the basic concepts of this problem.
Definition 1. Given the task communication graphis a weighted graph, wheredenotes the subtasks in the application anddenotes the traffic betweenand.
Definition 2. Topology diagram , where denotes a resource node in the topology, usually an IP core, and denotes the communication link between and .
Based on the above definition, the application mapping algorithm for low power consumption and low latency can be described as: given an and an , a mapping function is found to minimize the overall power consumption and delay.
3. Establishment of Mapping Evaluation
This paper will give a calculation model of NoC system energy consumption and delay without considering the number of task nodes and the number of links, so as to evaluate the quality of NoC mapping scheme from the index of energy consumption and delay.
3.1. Power Model
The energy consumed in NoC system can be divided into two parts, one is the computing power consumed in the process of computing and processing IP cores in NoC network, the other is the communication power consumed in data transmission in network links. We can express the total energy consumption of NoC system as following formula:
For data processing of the same size, the energy consumed in IP processing is only related to the type of IP core. The power consumption of processing
bit size data on an IP core can be expressed as:
where
is a constant related to IP core type.
The power consumption in a NoC platform network can be divided into two parts: the router power consumption in NoC and the link energy consumption between NoC routing nodes. At present, many NoC communication power models have been proposed. Here we adopt the commonly accepted communication power model [
15]. The specific formula can be described as follows:
where,
denotes the power consumption of transmitting unit data from one routing node to another,
denotes the power consumption of the cross-switch,
denotes the power consumption of unit data in the buffer area of the routing node,
denotes the power consumption of internal lines, and
denotes the power consumption of transmitting unit data in the communication interconnection link of the NoC network. Among them, the values of
,
, and
will not change with the communication environment and conditions in the network, because their values are only related to the internal design of the router, so they can be regarded as an approximate constant. Therefore, we use
to express them uniformly, so the above formula can be described as:
where,
denotes the power consumption of transmitting unit data from node
to node
, and
denotes the number of routing nodes that pass through in the process of transmitting unit data, that is, the Manhattan distance. Therefore, the power consumption of communication between node
and node
can be calculated by formula (5).
where
represents the traffic between node
and node
.
3.2. Delay Model
The delay of the NoC platform includes the time consumed by data processed by the IP core and the time consumed by data transmission between routing nodes in the network. The delay
D in the NoC system can be expressed as:
where
denotes the time spent on task processing on the
i-th IP core and
denotes the time spent on data transmission over the
j-th link. The specific type of task, the type of IP core and the amount of data in the process of task processing determine the time spent on data processing by IP core. The delay
of processing
bit data on a specific IP core can be expressed as:
where
is a constant related to the IP core type.
The time consumption of data transmission mainly comes from the physical delay on the connection line and the delay of data packet parsing [
16].
The data delay from the source node to the destination node consists of three parts: the delay on the source node NI, the delay on the routers in the transmission path, and the delay on the transmission path interconnection line.
If
bit data is transmitted from network node
through path
to network node
, the network is a 2D-Mesh topology. Under the condition of using shortest path routing, the calculation formula of time
spent in the transmission process is as follows:
where,
,
,
are the constants of NI, interconnection line and router about transmission time, respectively. Formula (8) assumes that the NI at source and destination nodes is of the same type.
is the Manhattan distance between nodes
a and
b.
3.3. Multi-Objective Mapping Optimization Model
Traditional network-on-chip mapping adopts a single-objective optimization mapping strategy. With the increase of tasks, network scale, application complexity and so on, the goal of mapping is no longer limited to the optimization of communication energy consumption. If the pursuit of power consumption is only one-sided, it may lead to local congestion in the NoC network, which will cause the delay of executing the application to become longer, thus affecting the quality of service.
Therefore, it is impossible to optimize all the indexes by using the method of optimization mapping. It is a multi-objective optimization problem that all the optimization indexes must be compromised. In this paper, two very important performance indicators in many design constraints of NoC are discussed: system power consumption and system delay. The evaluation function design in multi-objective programming mainly includes linear weighted summation method, ideal point method, maximum–minimum method, square weighted method and so on. Linear weighted summation can highlight the optimization direction by setting each objective weight, and its realization is simpler. Therefore, the linear weighted summation method will be used in the implementation of the algorithm in this paper.
Assuming that the mapping scheme
belongs to the set
containing all mapping schemes, for multi-objective optimization problems with energy consumption and delay:
The linear weighted summation method needs to determine the constant and in advance according to the importance of energy consumption and delay, and satisfy , , . When , only power optimization is done, , only delay optimization is done, and when , , the tradeoff between the two performance is optimized.
The evaluation function is defined as:
The multi-objective programming problem is transformed into .
5. Performance Analysis
Our experiment ws carried out on the structure of 4 × 4, 5 × 5, 6 × 6 2D-Mesh. The routing algorithm adopted the XY routing algorithm. Five examples were used to analyze the scheme, verifying the performance of the proposed mapping scheme.
The first example is MPEG-4 decoding system, the second one is VOPD application, the third one is MWD application, the fourth one is 263enc application, and the fifth one is 263dec application. The attributes of the applied feature graph are shown in
Table 4. The mapping was accomplished by TI_GA-based mapping scheme search algorithm. The parameters of the experiment were defined and the corresponding values are displayed in
Table 5. We compared the effect of traditional genetic algorithm and ant colony algorithm with TI_GA. In order to obtain more accurate data, we used BookSim [
21] simulator. BookSim could obtain our custom traffic, simulate the operation of NoC and obtain energy consumption and delay data. Our experiments were carried out separately from single-objective optimization and multi-objective optimization.
Firstly, we carried out the experiment of single-objective optimization. Setting
to 1 and
to 0, we caould get the single-objective optimization with power consumption as the goal. As shown in
Figure 6, we got the optimal power consumption solutions of the three algorithms when the network scale was 4 × 4, 5 × 5 and 6 × 6, respectively. Among them, the
X-axis represented five different applications in different network scales, and the
Y-axis was the power evaluation value from the traditional genetic algorithm. The smaller the value, the better the corresponding performance. Compared to the traditional genetic algorithm, the power consumption of TI_GA in the MPEG-4 application chart decreased by 6.23% and 4.35% compared with the ant colony algorithm on the experimental platform with the network size of 4 × 4. In the MWD application chart, the power consumption of TI_GA decreased by 23% compared to the traditional genetic algorithm and 24.14% compared to the ant colony algorithm. In the VOPD application chart, the power consumption of TI_GA decreased by 12.3% compared to the traditional genetic algorithm. Compared to the ant colony algorithm, the power consumption decreased by 14%. In the 263enc application chart, the power consumption of TI_GA decreased by 16% compared to the traditional genetic algorithm. Compared to the ant colony algorithm, the power consumption decreased by 15.21%. In the 263dec application chart, the power consumption of TI_GA decreased by 12.67% compared to the traditional genetic algorithm, and the power consumption decreased by 13.81% compared to the ant colony algorithm. In the experimental platform with network scale of 5 × 5 and 6 × 6, in five test cases, TI_GA had different power consumption reductions compared to the traditional genetic algorithm and ant colony algorithm.
Setting
to 1 and
to 0, we could get the single-objective optimization with delay as the goal. As shown in
Figure 7, we got the optimal delay consumption solutions of the three algorithms when the network scale was 4 × 4, 5 × 5 and 6 × 6, respectively. Among them, the
X-axis represents five different applications in different network scales, and the
Y-axis is the delay evaluation value based on the traditional genetic algorithm. The smaller the value, the better the corresponding performance. In the experimental platform with a network size of 4 × 4, the delay of TI_GA in the MPEG-4 application graph was 14.2% lower than that of the traditional genetic algorithm. Compared to the ant colony algorithm, the delay decreased by about 8%. In the MWD application chart, the delay of TI_GA was 15.97% lower than that of the traditional genetic algorithm. Compared to the ant colony algorithm, the delay decreased by 18.10%. In the VOPD application chart, TI_GA had a delay of 23.9% compared to the traditional genetic algorithm. Compared to the ant colony algorithm, the delay was decreased by 19.92%. In the 263enc application chart, the delay of TI_GA was 14.55% lower than that of the traditional genetic algorithm. Compared to the ant colony algorithm, the delay was 14.27%. In the 263dec application chart, the delay of TI_GA was 10.95% lower than that of the traditional genetic algorithm. Compared to the ant colony algorithm, the delay was 13.16%. In the experimental platform with network scale of 5 × 5 and 6 × 6, TI_GA has different degrees of delay in the five test cases compared to the traditional genetic algorithm and ant colony algorithm.
Next, we carried out multi-objective optimization with power consumption and delay as the optimization index under the experimental platform of 4 × 4 network scale. First, we adjusted the values of
and
to make
. This ratio represents a fair tendency to optimize power consumption and latency.
Figure 8 shows the optimal power consumption of the three algorithms. The
X-axis illustrates different applications, and the
Y-axis is the evaluation from the power consumption achieved by the traditional genetic algorithm. Compared to the traditional genetic algorithm, the power consumption in the MPEG-4 application chart decreased by 3.2%. Compared to the traditional genetic algorithm, the power consumption in the MWD application chart decreased by 13.2%. Compared to the traditional genetic algorithm, the power consumption in the VOPD application chart decreased by 7.6%. As shown in
Figure 9, the optimal delay solution obtained by the three algorithms was obtained. The
X-axis illustrates different applications, and the
Y-axis is the evaluation from the delay obtained by the traditional genetic algorithm. Compared to the traditional genetic algorithm, the delay in the MPEG-4 application graph decreased by 8.02%. Compared to the traditional genetic algorithm, the delay in the MWD application graph decreased by 7.25%. Compared to the traditional genetic algorithm, the delay in the VOPD application graph decreased by 13.54%.
We continued to adjust the values of
and
to make
. This ratio represented the direction of optimization for multi-objective optimization, focusing more on power optimization, as shown in
Figure 10. The
X-axis illustrates different applications, and the
Y-axis is the evaluation based on the power consumption obtained by the traditional genetic algorithm. Compared with the traditional genetic algorithm, the power consumption in the MPEG-4 application chart decreased by 3.75%. Compared with the traditional genetic algorithm, the power consumption in the MWD application chart decreased by 16.15%. Compared with the traditional genetic algorithm, the power consumption in the VOPD application chart decreased by 9.89%. As shown in
Figure 11, the optimal solution of the delay obtained by the three algorithms was obtained. The
X-axis illustrates different applications, and the
Y-axis is the evaluation based on the delay results of the traditional genetic algorithm. Compared with the traditional genetic algorithm, the delay in the MPEG-4 application graph decreased by 5.49%. Compared to the traditional genetic algorithm, the delay in the MWD application graph decreased by 4.56%. Compared to the traditional genetic algorithm, the delay in the VOPD application graph decreased by 8.23%.
We continued to adjust the values of
and
to make
. This ratio represented the multi-objective optimization which focused more on the optimization of delay. As shown in
Figure 12, the optimal power consumption of the three algorithms was obtained. The
X-axis illustrates different applications, and the
Y-axis is the evaluation based on the power consumption results obtained by the traditional genetic algorithm. Compared to the traditional genetic algorithm, the power consumption in the MPEG-4 application chart decreased by 2.45%. Compared to the traditional genetic algorithm, the power consumption in the MWD application chart decreased by 8.03%. Compared to the traditional genetic algorithm, the power consumption in the VOPD application chart decreasesdby 5.20%. Compared to the case of
, the optimization of power consumption was not as obvious as the previous group of experiments. As shown in
Figure 13, the optimal delay solution by the three algorithms was obtained. The
X-axis illustrates different applications, and the
Y-axis is the delay evaluation value based on the traditional genetic algorithm. Compared to the traditional genetic algorithm, the delay in the MPEG-4 application graph decreased by 9.91%; compared to the traditional genetic algorithm, the delay in the MWD application graph decreased by 9.38%; compared to the traditional genetic algorithm, the delay in the VOPD application graph decreased by 18.32%. Compared to the case of
, the optimization efficiency of delay was improved obviously.
Illustrating the influence of
and
values on multi-objective optimization, we continued to carry out two groups of experiments on the experimental platform with network scale of 4 × 4. As shown in
Figure 14, the
X-axis illustrated the value of
, and the
Y-axis illustrated the percentage improvement of power optimization performance based on the traditional genetic algorithm. As shown in
Figure 15, the
X-axis illustrates the value of
, and the
Y-axis illustrates the delay optimization based on the traditional genetic algorithm. With the increase of
and
, the percentage of power consumption and delay optimization of TI_GA was higher and higher than that of the traditional genetic algorithm. We could weigh the weights between
and
according to the needs of applications to maximize power consumption and delay to meet our needs.