Torus-Connected Toroids: An Efficient Topology for Interconnection Networks

Bossard, Antoine

doi:10.3390/computers12090173

Open AccessArticle

Torus-Connected Toroids: An Efficient Topology for Interconnection Networks^†

by

Antoine Bossard

Graduate School of Science, Kanagawa University, 3-27-1 Rokkakubashi, Yokohama 221-8686, Kanagawa, Japan

^†

This paper is an extended version of a conference paper “Torus-Connected Toroids: An interconnection network for massively parallel systems”. In Proceedings of the 19th Applied Computing International Conference (AC), Lisbon, Portugal, 8–10 November 2022; pp. 21–28.

Computers 2023, 12(9), 173; https://doi.org/10.3390/computers12090173

Submission received: 13 June 2023 / Revised: 25 August 2023 / Accepted: 25 August 2023 / Published: 29 August 2023

Download

Browse Figures

Versions Notes

Abstract

:

Recent supercomputers embody hundreds of thousands of compute nodes, and sometimes millions; as such, they are massively parallel systems. Node interconnection is thus critical to maximise the computing performance, and the torus topology has come out as a popular solution to this crucial issue. This is the case, for example, for the interconnection network of the Fujitsu Fugaku, which was ranked world no. 1 until May 2022 and is the world no. 2 at the time of the writing of this article. Here, the number of dimensions used by the network topology of such torus-based interconnects stays rather low: it is equal to three for the Fujitsu Fugaku’s interconnect. As a result, it is necessary to greatly increase the arity of the underlying torus topology to be able to connect the numerous compute nodes involved, and this is eventually at the cost of a higher network diameter. Aiming at avoiding such a dramatic diameter rise, topologies can also combine several layers: such interconnects are called hierarchical interconnection networks (HIN). We propose, in this paper, which extends an earlier study, a novel interconnect topology for massively parallel systems, torus-connected toroids (TCT), whose advantage compared to existing topologies is that while it retains the torus topology for its desirable properties, the TCT network topology combines it with an additional layer, toroids, in order to significantly lower the network diameter. We both theoretically and empirically evaluate our proposal and quantitatively compare it to conventional approaches, which the TCT topology is shown to supersede.

Keywords:

node; cluster; dependability; supercomputer; high-performance; graph; routing; diameter; degree

1. Introduction

The performance of modern supercomputers is summarised by the TOP500 world rankings, which are published twice a year. By going through this list of high performance machines, which also gives computer architecture information, one can see that the torus topology is a popular interconnection network structure. This is the case, for instance, for the Fujitsu Fugaku supercomputer, which was ranked world no. 1 until May 2022 and is the world no. 2 at the time of the writing of this article [1]. Other highly ranked supercomputers also rely on this topology, for example, the IBM BlueGene/Q, rated in 2010s the “Greenest Supercomputer in the World” [2,3], and the Fujitsu K, a former world number one supercomputer, which are both based on the torus network topology [4,5,6]. An overview of the interconnection networks of the fastest supercomputers is given in Table 1 (data taken from the TOP500 list of June 2023).

Together with this adoption of the torus topology for interconnects, another trend is the popularity of hierarchical interconnection networks (HINs) [8,9,10,11,12,13,14]. Effectively, as HINs enable one to lower the diameter of the compute nodes network, they are key to reducing the network cost, which is defined as the product network diameter × network degree [15,16]. The network cost is a major measure in establishing how efficient an interconnect topology is. The interconnection network of the Fujitsu supercomputers K and Fugaku, called Tofu, is an HIN that combines two layers: the higher layer consists of a 3-dimensional torus and the lower layer is a 3-dimensional

(2, 3)

-ary torus [4]. Nodes of the higher layer are said to be “meta-nodes”, consisting of several compute nodes. As mentioned previously, the network cost, induced by the network degree and diameter, is now critical given the large amount of compute nodes that equip the TOP500 machines, in the order of one million; a low diameter and a small degree are two very desirable topological properties of a modern interconnect. Approaches similar to that of Tofu are also mentioned in [17,18].

In this paper, acknowledging on the one hand the torus topology and HIN trends of the high-performance computing industry, and aiming on the other hand at further lowering the diameter and degree of the network, for instance, compared to the Tofu interconnect, we propose the torus-connected toroids (TCT) interconnection network topology for supercomputers.

In order to have simple routing methodologies, proposing an HIN that links meta-nodes according to the n-dimensional torus topology demands that one meta-node be made of

2 n

compute nodes. Based on older topologies such as the seminal cube-connected cycles (CCC) [19], the torus-connected cycles (TCC) network topology was introduced as a first attempt to this end [20]. However, the fact that a meta-node in a TCC is isomorphic to a cycle (ring) induces that the network diameter grows at a fast pace as the dimension of the network increases; thus, the network cost rapidly rises. Thanks to the design of the meta-nodes of the TCT topology proposed in this paper, meta-nodes that hold a greater number of compute nodes than their counterparts in the case of the Tofu topology, the arity of the higher layer torus is reduced in comparison to the same higher layer torus of Tofu when numerous processing nodes are to be connected as it is the case with modern supercomputers. Precisely, and as detailed hereinafter, a meta-node (i.e., toroid) of an n-dimensional k-ary TCT consists of

2 n

nodes, which is to be compared with the constant order twelve of a meta-node of Tofu.

The rest of this paper is organised as follows. After recalling notations and definitions, toroids are formally defined and evaluated in Section 2. Then, the TCT topology is described in Section 3. It is evaluated theoretically in Section 4 and empirically in Section 5. Finally, Section 6 concludes this paper.

2. Toroids

The lower layer of a torus-connected toroids network consists of toroids. So, as explained, we shall consider in this paper one toroid in a TCT network as one meta-node (a.k.a. one cluster).

2.1. Preliminaries

Several notations and definitions used hereinafter are recalled in this section.

The two symbols ∧ and ∨ conventionally designate the logical conjunction and the logical disjunction, respectively. A path is an alternate sequence of distinct nodes and edges, for instance

p : u_{1}, {u_{1}, u_{2}}, u_{2}, \dots, {u_{k - 1}, u_{k}}, u_{k}

. The same path p can also be conveniently denoted with right-pointing arrows, in which case such arrows denote path concatenation between two paths, between one path and one node, or between two nodes, for instance,

p : u_{1} \to u_{2} \to \dots \to u_{k}

. The length of a path is its number of edges: the length of p is thus equal to

k - 1

. A path between two nodes u and v can also be abbreviated to

u ⇝ v

when it is not ambiguous.

It is recalled that the graph order is its number of nodes, that the degree of a node is its number of adjacent nodes (a.k.a. neighbours), and that the diameter of a graph is the maximum length of a shortest path between any two nodes. In addition, the network cost is defined as the product of the average degree (i.e., network degree) by the diameter. For other graph theory notations and definitions, we generally abide by [21].

Next, we recall the definition of the torus network topology, which is essential to our proposal.

Definition 1.

An n-dimensional

(k_{0}, k_{1}, \dots, k_{n - 1})

-ary torus is an undirected graph that consists of

\prod_{i = 0}^{n - 1} k_{i}

nodes, which are of the form

(u_{0}, u_{1}, \dots, u_{n - 1})

with

0 \leq u_{i} \leq k_{i} - 1

and

0 \leq i \leq n - 1

. Two nodes

u = (u_{0}, u_{1}, \dots, u_{n - 1})

and

v = (v_{0}, v_{1}, \dots, v_{n - 1})

are adjacent if and only if

\exists j, \forall i \neq j, u_{i} = v_{i}, u_{j} = v_{j} \pm 1 (mod k_{j})

holds, where

0 \leq i, j \leq n - 1

.

In a torus, we call wrap-around edges the ones that do not exist in the corresponding mesh [22].

2.2. Definition

A toroid is formally defined as follows.

Definition 2.

An n-toroid consists of

2 n

nodes, where

n \geq 1

. Let V be the set of the nodes of an n-toroid and define P the Cartesian product

{0, min {1, n - 1}} \times {0, 1} \times {0, 1, \dots, ⌊ (n - 1) / 2 ⌋}

. Define

V = P

when n is even and

V = P ∖ {(1, 0, (n - 1) / 2), (1, 1, (n - 1) / 2)}

when n is odd. Two nodes

u = (u_{x}, u_{y}, u_{z})

and

v = (v_{x}, v_{y}, v_{z})

of V are adjacent if and only if one of the following conditions holds:

$u_{x} = v_{x} \land u_{z} = v_{z}$
$u_{y} = v_{y} \land u_{z} = v_{z}$
$u_{x} = v_{x} \land u_{y} = v_{y} \land u_{z} = v_{z} \pm 1 mod ⌈ n / 2 ⌉$
$n odd \land u_{y} = v_{y} \land u_{x} \neq v_{x} \land u_{z} = v_{z} \pm 1 mod (n + 1) / 2 \land (u_{z} = (n - 1) / 2 \lor v_{z} = (n - 1) / 2)$

The n-toroids where

1 \leq n \leq 6

are illustrated in Figure 1.

Next, we make the following two remarks. First, when n is even, an n-toroid is isomorphic to a

(2, 2, n / 2)

-torus. Second, when n is odd, an n-toroid is isomorphic to a

(2, 2, (n + 1) / 2)

-torus but with its two edges

{(0, 0, (n - 1) / 2), (1, 0, (n - 1) / 2)}

and

{(0, 1, (n - 1) / 2), (1, 1, (n - 1) / 2)}

being contracted [21].

Regarding notations, since a toroid is structured according to three dimensions, they are naturally named X, Y, and Z, and they correspond to the coordinates

u_{x}

,

u_{y}

, and

u_{z}

, respectively, of node

u = (u_{x}, u_{y}, u_{z})

. In addition, in an n-toroid with

n \geq 5

, the edges that link a node of Z coordinate 0 and one of Z coordinate

⌊ (n - 1) / 2 ⌋

are said to be wrap-around edges.

Finally, in an n-toroid, the address of the adjacent nodes of node u can be expressed function of n and u. We give the detailed expressions of the adjacent nodes of node u in an n-toroid.

The neighbour of u on the Y dimension is as follows:

n_{1} (u) = (u_{x}, u_{y} + 1 mod 2, u_{z})

Next, we distinguish between the two cases n even and n odd. First, we consider the case when n is even. The neighbours of u on the Z dimension are as follows:

Case $n \geq 4$ .
- $n_{2} (u) = (u_{x}, u_{y}, u_{z} + 1 mod n / 2)$
- $n_{3} (u) = (u_{x}, u_{y}, u_{z} - 1 mod n / 2)$

Then, the neighbour of u on the X dimension is as follows:

n_{4} (u) = (u_{x} + 1 mod 2, u_{y}, u_{z})

Next, we consider the case when n is odd. The neighbours of u on the Z dimension are conditionally defined as follows:

Case $u_{x} = 0$ or $0 < u_{z} < (n - 3) / 2$ .
- $n_{2} (u) = (u_{x}, u_{y}, u_{z} + 1 mod (n + 1) / 2)$
- $n_{3} (u) = (u_{x}, u_{y}, u_{z} - 1 mod (n + 1) / 2)$
Case $u_{x} = 1$ and $u_{z} = 0$ .
- $n_{2} (u) = (0, u_{y}, (n - 1) / 2)$
- $n_{3} (u) = (u_{x}, u_{y}, u_{z} + 1 mod (n - 1) / 2)$
Case $u_{x} = 1$ and $u_{z} = (n - 3) / 2$ .
- $n_{2} (u) = (0, u_{y}, (n - 1) / 2)$
- $n_{3} (u) = (u_{x}, u_{y}, u_{z} - 1 mod (n - 1) / 2)$

Then, the neighbours of u on the X dimension are as follows:

Case $n \geq 3$ and $u_{z} < (n - 1) / 2$ .
- $n_{4} (u) = (u_{x} + 1 mod 2, u_{y}, u_{z})$
Case $n \geq 3$ and $u_{z} = (n - 1) / 2$ .
- $n_{4} (u) = (1, u_{y}, u_{z} - 1)$
- $n_{5} (u) = (1, u_{y}, 0)$

Finally, it should be noted that overlapping can occur: for instance, when n is even and

k = 2

, we have

n_{2} (u) = n_{3} (u)

. This is why the above

n_{i} (u)

(

1 \leq i \leq 5

) expressions induce a set of adjacent nodes for node u, set whose cardinality is at most five.

2.3. Topological Properties

First, we consider the degree of a toroid. The following property can be directly derived from Definition 2.

Property 1.

The degree of node

u = (u_{x}, u_{y}, u_{z})

in an n-toroid (

n \geq 1

) is as follows:

d_{τ_{n}} (u) = \{\begin{matrix} 3 & i f n = 4 \\ 5 & i f n odd \land n \geq 5 \land u_{z} = (n - 1) / 2 \\ \min {n, 4} & i f (n even \land n \neq 4) \lor (n odd \land (n \leq 3 \lor u_{z} < (n - 1) / 2)) \end{matrix}

Second, we introduce the following theorem for the diameter of a toroid.

Theorem 1.

The diameter

τ (n)

of an n-toroid (

n \geq 1

) is as follows:

τ (n) = \{\begin{matrix} 1 & if n = 1 \\ ⌊ n / 4 ⌋ + 2 & if n is even \\ ⌊ (n - 1) / 4 ⌋ + 2 & if n is odd and n \geq 3 \end{matrix}

Proof.

It is first recalled that the diameter is the maximal length of a shortest path between any two nodes.

Assume that n is even. Ignoring the wrap-around edges, the maximal distance on the Z dimension is equal to

n / 2

. Then, considering this time the wrap-around edges, the maximal distance on the Z dimension becomes

⌊ n / 4 ⌋

. Moreover, the maximal distance on the X dimension is equal to one, and it is also equal to one on the Y dimension. Hence, in an n-toroid with n even, the maximal distance is equal to

⌊ n / 4 ⌋ + 2

.

Assume that n is odd. The case

n = 1

is trivial since the graph consists of one single edge. So, we can assume that

n \geq 3

. Ignoring the wrap-around edges, the maximal distance on the Z dimension is equal to

(n - 1) / 2

. Then, considering this time the wrap-around edges but ignoring the two nodes

(0, y, z)

with

0 \leq y \leq 1

and

z = (n - 1) / 2

, that is, considering a

(2, 2, (n - 1) / 2)

-torus, the maximal distance on the Z dimension is equal to

⌊ (n - 1) / 4 ⌋

. Lastly, a maximum of two edges are needed to either move on both the X and Y dimensions, or to both reach a node of Z coordinate

(n - 1) / 2

and move on the Y dimension (no move is possible on the X dimension from, or to, a node of Z coordinate

(n - 1) / 2

). Hence, in an n-toroid with n odd, the maximal distance is equal to

⌊ (n - 1) / 4 ⌋ + 2

. □

2.4. Shortest-Path Routing in an n-Toroid

Below, we first introduce a shortest-path routing algorithm in a toroid and next quantitatively evaluate it. This algorithm is then used for routing in a TCT as described in Section 3.

2.4.1. Algorithm Description

The main idea of this routing method is to first distinguish the two cases n even and n odd. Then, dimension-order routing with support for wrap-around edges is applied, although it is adjusted for the Z dimension in the case n odd. Concretely, this adjustment is about further distinguishing cases depending on the value of the X coordinate of the source node. The details of the algorithms for an n-toroid are provided with pseudo-code in the following: Algorithm 1 is used for routing on either the X or Y dimension in an n-toroid, while Algorithm 2 is used for routing on the Z dimension. Algorithm 3 is the main function that returns a shortest path between any two nodes.

Algorithm 1: route-xy(u, v)

Input: two coordinates u, v of either the X or Y dimension, say dimension

δ

Output: a sequence of coordinates

u ⇝ v

on the dimension

δ

if

u = v

then return u;

else return

u \to v

;

Algorithm 2: route-z(n, u, v)

Algorithm 3: toroid-spr(n, u, v)

2.4.2. Empirical Evaluation

In order to confirm the maximum path length induced by the described shortest-path routing algorithm in an n-toroid and to inspect its practical behaviour, we implemented it and ran a computer experiment. The experimental conditions were as follows. First, the toroid parameter n ranged from 1 to 8. For each value of n, 1000 random problem instances were automatically generated, that is, a source node and a destination node were randomly selected from the set of nodes of the corresponding toroid. Then, the routing algorithm was applied to select a path between the source node and the destination node.

The results of this simulation experiment are illustrated in Figure 2: for each value of n, the maximum length and the average length of the selected 1000 paths are calculated and plotted. The standard deviation with respect to the average path length is shown with vertical bars below and above the corresponding length. In addition, the theoretically established diameter in the corresponding toroid is also plotted for reference.

From these experimental results, it can be observed that the empirically measured maximum path length confirms the theoretically established toroid diameter. Precisely, the length of a selected path is at most

⌊ n / 4 ⌋ + 2

in the case n is even, and it is at most

⌊ (n - 1) / 4 ⌋ + 2

in the case n is odd. So, considering that the length of some of the selected paths equals the network diameter, this means that the source and destination nodes of some of the random problem instances were diagonally opposed.

3. Torus-Connected Toroids

3.1. Definition

In this section, the TCT network topology is formally defined and exemplified.

Definition 3.

An n-dimensional k-ary torus-connected toroids network, denoted as

TCT (n, k)

, consists of

k^{n}

n-toroids (

n \geq 1

,

k \geq 1

). Furthermore, two nodes

u = ((u_{0}, u_{1}, \dots, u_{n - 1}), (u_{x}, u_{y}, u_{z}))

and

v = ((v_{0}, v_{1}, \dots, v_{n - 1}), (v_{x}, v_{y}, v_{z}))

are adjacent if and only if one of the following two conditions holds:

1.: $\forall i, 0 \leq i \leq n - 1, u_{i} = v_{i} \land τ_{n} ((u_{x}, u_{y}, u_{z}), (v_{x}, v_{y}, v_{z}))$
2.: $\exists j, \forall i, 0 \leq i, j \leq n - 1, i \neq j, u_{i} = v_{i} \land v_{j} = u_{j} \pm 1 mod k \land u_{x} = v_{x} \land u_{y} = v_{y} + 1 mod 2 \land u_{z} = v_{z}$

where the predicate

τ_{n} (a, b)

is satisfied if and only if the two nodes a, b are adjacent in an n-toroid (see Definition 2).

An illustration of a sample torus-connected toroids network is given in Figure 3. Condition 1 of Definition 3 induces what we call internal edges, and Condition 2 induces what we call external edges. It should be noted that considering each toroid of a

TCT (n, k)

as one meta-node, external edges connect such meta-nodes according to an n-dimensional torus whose arities are all equal to k.

Regarding the topological properties of a TCT, it can be deduced from Definition 3 that the degree of a node of a

TCT (n, k)

is as detailed in the following property.

Property 2.

The degree of node

u = ((u_{0}, u_{1}, \dots, u_{n - 1}), (u_{x}, u_{y}, u_{z}))

in a

TCT (n, k)

(

n \geq 1

,

k \geq 1

) is as follows:

\{\begin{matrix} d_{τ_{n}} ((u_{x}, u_{y}, u_{z})) & i f k = 1 \\ d_{τ_{n}} ((u_{x}, u_{y}, u_{z})) + 1 & i f k \geq 2 \end{matrix}

The diameter of a

TCT (n, k)

will be discussed in Section 4.

Finally, the neighbours of node

u = ((u_{0}, u_{1}, \dots, u_{n - 1}), (u_{x}, u_{y}, u_{z})) \in TCT (n, k)

can be expressed function of n, k and u as follows:

The internal neighbours are of the form $((u_{0}, u_{1}, \dots, u_{n - 1}), t (u_{x}, u_{y}, u_{z}))$ , where $t (u_{x}, u_{y}, u_{z})$ represents each of the (at most five) neighbours of node $(u_{x}, u_{y}, u_{z})$ in an n-toroid (see Section 2).
The unique external neighbour of u is

$n_{0} (u) = ((u_{0}, u_{1}, \dots, u_{α - 1}, u_{α} + {(- 1)}^{u_{y} + 1} mod k, u_{α + 1}, \dots, u_{n - 1}), (u_{x}, u_{y} + 1 mod 2, u_{z}))$

where

α = u_{x} (n - 1) + u_{z} {(- 1)}^{u_{x}}

.

3.2. Point-to-Point Routing Algorithm

The main idea of this point-to-point routing algorithm in a

TCT (n, k)

is to first calculate a path of meta-nodes (i.e., clusters), that is, to first route inside the torus layer (which makes the higher layer of the TCT topology) and to then convert this torus path into a path in a TCT by routing inside each meta-node. The input of the algorithm is one source node s, one destination node d, and the dimension n and arity k of the TCT network.

So, first, the torus path can be obtained with a dimension-order routing algorithm that supports wrap-around edges applied to the torus layer of the TCT [22]. This algorithm is applied to select a path in a torus between the meta-node that includes the source node s and the meta-node that includes the destination node d. Let this path be

p : m_{0} (∋ s) \to m_{1} \to \dots \to m_{l} (∋ d)

, where each

m_{i}

(

0 \leq i \leq l

) is one node of this torus path, that is, one meta-node of the TCT.

Then, the torus path p is converted to a TCT path as follows. Inside each meta-node

m_{i}

(

1 \leq i \leq l - 1

), apply the toroid shortest-path routing algorithm that was presented in Section 2 to select a path between a node

u_{i} \in m_{i}

that is incident with an external edge connected to a node

u_{i - 1}^{'}

of the meta-node

m_{i - 1}

and a node

u_{i}^{'} \in m_{i}

that is incident with an external edge connected to a node

u_{i + 1}

of the meta-node

m_{i + 1}

.

Routing inside the first (i.e.,

m_{0}

) and last (i.e.,

m_{l}

) meta-node is conducted similarly: inside

m_{0}

, select a path between the source node s and a node

u_{0}^{'} \in m_{0}

that is incident with an external edge connected to a node

u_{1}

of the meta-node

m_{1}

. And inside

m_{l}

, select a path between the destination node d and a node

u_{l} \in m_{l}

that is incident with an external edge connected to a node

u_{l - 1}^{'}

of the meta-node

m_{l - 1}

. An illustration of this point-to-point routing method is given in Figure 4.

The position of a node

u_{i}^{'}

of

m_{i}

that is adjacent with a node

u_{i + 1}

of

m_{i + 1}

is directly calculated as follows. Given two adjacent meta-nodes

m_{i} = (u_{0}, u_{1}, \dots, u_{n - 1})

and

m_{i + 1} = (v_{0}, v_{1}, \dots, v_{n - 1})

with

u_{α} \neq v_{α}

(

0 \leq α \leq n - 1

; it is recalled that only one such

α

exists), we define

u_{i}^{'} = (m_{i}, (u_{x}, u_{y}, u_{z}))

where

$u_{x} = ⌊ 2 α / n ⌋$
$u_{y} = min {v_{α} - u_{α} + 1 mod k, 1}$
$u_{z} = u_{x} (n - 1) + α {(- 1)}^{u_{x}} = ⌊ 2 α / n ⌋ (n - 1) + α {(- 1)}^{⌊ 2 α / n ⌋}$

The relation between

α

and the node position

(u_{x}, u_{y}, u_{z})

inside a toroid is illustrated in Figure 5.

Finally, one should note that although the path of meta-nodes is shortest when considering the torus layer of the TCT, the path selected with this routing algorithm is not guaranteed to be a shortest path inside the TCT. The advantage of this routing algorithm lies in its simplicity and thus robustness. We analyze, in Section 4 below, the maximum length of a path selected with this TCT routing algorithm.

3.3. Routing Strategy for Fault-Tolerant (i.e., Adaptive) Routing and Parallel Routing (e.g., One-to-Many and Many-to-Many)

Taking into account the status of the network (e.g., unavailable nodes and links) when selecting paths for data communication is an important issue: this is adaptive (a.k.a. dynamic) routing. Such advanced routing can be achieved as detailed below.

First, it is recalled that a TCT consists of two layers: toroids (i.e., the lower layer) are connected according to the torus topology (i.e., the higher layer). Consider a meta-node (i.e., a toroid) as being faulty as soon as it includes at least one faulty node of the TCT. Then, apply a fault-tolerant routing algorithm in a torus (e.g., [23]) to select a fault-free path in the torus induced by the toroids of the TCT.

Of course, this discussion applies to overloaded nodes as well: a node can be treated as faulty when it is simply overloaded, not only when it is physically broken. Furthermore, this strategy can be extended to faulty links: as soon as a link is not available, the nodes that are incident with it are treated as being faulty. Or, the faulty link and the two nodes incident to it can even be defined as a faulty cluster, of diameter one. Then, a cluster-fault-tolerant routing algorithm in a torus can be applied to select a fault-free path in the higher layer of the TCT [24].

Once a fault-free path in the higher layer of the TCT has been obtained as just explained, this path can be converted back to a TCT path as detailed in Section 3.2.

Moreover, this approach can also be applied to more advanced routing scenarios such as disjoint-path one-to-many (a.k.a. node-to-set) routing, many-to-many (a.k.a. set-to-set) routing, and pairwise routing (i.e., a special case of many-to-many routing): it suffices to spread source nodes into distinct adjacent meta-nodes and similarly for destination nodes. Then, the meta-node of each source (resp. destination) node is considered as a source (resp. destination) node for routing in the higher layer (i.e., in the torus layer). The performance of the proposed TCT topology in the case of such advanced routing algorithms is discussed in Section 4 below.

4. Theoretical Evaluation

We conduct, in this section, a formal and quantitative evaluation of the proposed TCT topology by comparing it to the Tofu interconnection network of the two Fujitsu supercomputers K and Fugaku.

The Tofu topology—precisely, that of Tofu 1 [4], Tofu 2 [25], and Tofu D [5], indistinctly—connects meta-nodes according to a 3-dimensional torus network, with every meta-node being isomorphic to a

(3, 2, 2)

-torus. The network order of each meta-node is thus twelve. So, first, the degree of a TCT node is smaller than that of a Tofu node; precisely, in a TCT the node degree is at most six (refer to Properties 1 and 2), whereas it is ten for a Tofu node.

Next, we consider the network diameter of TCT and Tofu. On the one hand, the topology of each meta-node of Tofu does not depend on any variable as just explained, and on the other hand the network dimension of the higher layer of Tofu is constant: it is equal to three. As a result, there is no other choice with Tofu but to increase the arity of the network to augment the number of compute nodes included in the network. And this subsequently forces the network diameter to increase rapidly.

To further investigate this diameter issue, we compare the maximum path length induced by a dimension-order routing algorithm, which supports torus wrap-around edges, for both TCT and Tofu. This algorithm has been detailed for TCT in Section 3.2, and the same approach can be applied to Tofu as well considering that meta-nodes are connected in the higher layer according to the torus topology (i.e., external edges) in both cases (i.e., Tofu and TCT), with a meta-node path being eventually transformed to a Tofu path after routing inside meta-nodes (i.e., internal edges). Let this algorithm be denoted by R.

We start with the Tofu topology in the following lemma.

Lemma 1.

In a k-ary Tofu, algorithm R induces a path of length at most

3 ⌊ k / 2 ⌋ + 3

.

Proof.

By definition, the Tofu topology requires a maximum of

⌊ k / 2 ⌋

external edges for each dimension of the higher layer torus. Because this higher layer torus is 3-dimensional, in total a maximum of

3 ⌊ k / 2 ⌋

external edges are selected. Lastly, three is added to count the internal edges required inside the meta-node of either the source node or the destination node; it is recalled that one single such routing inside a meta-node is required in the case of Tofu. □

Next, we consider the TCT topology in the following theorem.

Theorem 2.

In a

TCT (n, k)

, algorithm R induces a path of length at most

2 n ⌊ k / 2 ⌋ + 2 τ (n) + n - 2

.

Proof.

For each dimension, a maximum of

⌊ k / 2 ⌋

external edges are required. Since there are n dimensions, the algorithm selects a maximum of

n ⌊ k / 2 ⌋

external edges in total. Regarding internal edges, a maximum of two are required inside a toroid at a dimension change, a maximum of

τ (n)

inside the toroid of the source node and that of the destination node, and one for each of all the other toroids (i.e., meta-nodes) included in the path. Moreover, since there are n dimensions, there are a maximum of

n - 1

toroids inside which routing is needed for a dimension change. Hence, a maximum of

2 (n - 1) + 2 τ (n) + [(n ⌊ k / 2 ⌋ + 1) - (n - 1) - 2] = 2 τ (n) + n (⌊ k / 2 ⌋ + 1) - 2

internal edges are selected in total. Therefore, the length of a selected path is at most

2 n ⌊ k / 2 ⌋ + 2 τ (n) + n - 2

. □

Corollary 1.

In a

TCT (n, k)

, the worst-case time complexity of algorithm R is

O (n k)

.

Proof.

As explained in Section 3.2, each edge of the path can be selected in constant time

O (1)

. Moreover, from Theorem 1, it is clear that

τ (n)

is

O (n)

. Hence, from Theorem 2 and considering that n and k are positive integers, the worst-case time complexity of algorithm R is

O (n k)

. □

Corollary 2.

The value

2 n ⌊ k / 2 ⌋ + 2 τ (n) + n - 2

is an upper bound on the diameter of a

TCT (n, k)

.

Proof.

From Theorem 2, a path of length at most

l = 2 n ⌊ k / 2 ⌋ + 2 τ (n) + n - 2

exists between any two nodes of a

TCT (n, k)

. Hence, the diameter of a

TCT (n, k)

network is at most equal to l. □

To further clarify the evolution of the network order and maximum path length depending on the TCT parameters n and k, and to investigate with actual values how TCT compares to Tofu, we consider below sample dimension and arity values. It should be noted that for the comparison to be fair, both networks, that is, TCT and Tofu, should be of same or near same order. It is recalled that a k-ary Tofu includes

12 k^{3}

compute nodes whereas a

TCT (n, k)

has

2 n k^{n}

.

For example, on the one hand in a

TCT (8, 3)

the network order is equal to 104,976 and the maximum path length induced by algorithm R is 30. On the other hand, for the same maximum path length 30, a Tofu network only connects a maximum of 82,308 compute nodes (i.e., Tofu arity

k = 19

). A Tofu network, with a maximum path length of 36, can accommodate a maximum of 146,004 compute nodes (i.e., Tofu arity

k = 23

). This is significantly less than the 1,180,980 compute nodes supported by a TCT network with the same maximum path length (i.e.,

TCT (10, 3)

). In order to link at least that same number of compute nodes, a Tofu network has a maximum path length of at least 72, for instance, a 46-ary Tofu has 1,168,032 nodes and a 47-ary Tofu 1,245,876, with these two cases both inducing a maximum path length of 72. Although large, these network orders remain definitely realistic: modern supercomputers include hundreds of thousands, and in some cases millions, of compute nodes (e.g., the Fujitsu Fugaku, based on Tofu D, includes 7,630,848 and the Sunway Taihulight 10,649,600). In order to complete the illustration of the network order–maximum path length evolution trend and comparison, we give additional values in Table 2.

Next, we focus on the bisection bandwidth to compare the proposed TCT network topology to Tofu.

Definition 4

([22]). The bisection bandwidth of an interconnection network is the minimum number of edges to remove in order to split the network into two sets of nodes of same size.

On the one hand, it is recalled, for instance in [26,27], that the bisection bandwidth of an n-dimensional k-ary torus network is equal to

2 k^{n - 1}

when

k > 2

and to

k^{n - 1} = 2^{n - 1}

when

k = 2

. Hence, given that an n-dimensional k-ary TCT network connects its meta-nodes, that is, its toroids, according to the torus topology, its bisection bandwidth is that of an n-dimensional k-ary torus.

On the other hand, the bisection bandwidth of the Tofu interconnection network can be derived from its topology [4,5,28] as follows. Each node has degree 10, out of which four links are used for connection inside the meta-node and six links are used to connect meta-nodes together according to a 3-dimensional k-ary torus network. In other words, between any two adjacent meta-nodes of Tofu, there are in total 12 links. Therefore, the bisection bandwidth of Tofu is

2 k^{n - 1} \times 12 = 24 k^{2}

when

k > 2

and

2^{n - 1} \times 12 = 48

when

k = 2

.

As a result, the bisection bandwidth of the proposed TCT network topology is greater than that of Tofu when

2 k^{n - 1} > 24 k^{2} \Leftrightarrow k^{n - 3} > 12

(

k > 2

) and when

2^{n - 1} > 48 \Leftrightarrow n > 6

(

k = 2

). Hence, as soon as the dimension and arity of the network increase (cf. Table 2 for sample dimension and arity values), the bisection bandwidth of a TCT overcomes that of Tofu. Assuming a link bandwidth of 5 GB/s [5], we obtain the bisection bandwidth values detailed in Table 3.

Finally, we consider the more advanced routing scenarios described earlier: fault-tolerant routing and parallel routing with disjoint paths: one-to-many (a.k.a. node-to-set) routing, many-to-many (a.k.a. set-to-set) routing, and pairwise routing (i.e., a special case of many-to-many routing).

First, it is recalled from [23] that in an n-dimensional k-ary torus with a maximum of

2 n - 1

faulty nodes, a fault-free path of length at most

Λ (n, k) + 1

can be selected in

O (n^{2})

time, where

Λ (n, k)

is the diameter of an n-dimensional k-ary torus with

Λ (n, k) = n ⌊ k / 2 ⌋

as recalled and explained in the proof of Theorem 2. Hence, by applying the routing strategy described in Section 3.3, in an n-dimensional k-ary TCT network with a maximum of

2 n - 1

faulty nodes, a fault-free path of length in the higher layer at most

Λ (n, k) + 1

can be selected in

O (Λ (n, k) \cdot n^{2}) = O (k n^{3})

time. This is to be compared to Tofu: since it is based on a 3-dimensional torus, only a maximum of

2 \times 3 - 1 = 5

faulty nodes are tolerated with this approach, and we have already discussed the diameter and path length issues for TCT and Tofu, with the former being more performant than the latter.

We can then consider tolerance to faulty clusters: it has been shown that in an n-dimensional k-ary torus with a maximum of

2 n - 1

faulty clusters of diameter one, a fault-free path of length at most

n (2 k + ⌊ k / 2 ⌋ - 2)

can be selected in

O (n^{2} k^{2} | F |)

time, with F the set of faulty nodes induced by the faulty clusters [24]. So, by applying the routing strategy described in Section 3.3, in an n-dimensional k-ary TCT network with a maximum of

2 n - 1

faulty clusters of diameter one, a fault-free path of length in the higher layer at most

n (2 k + ⌊ k / 2 ⌋ - 2)

can be selected. This is to be compared to Tofu: again, a maximum of

2 \times 3 - 1 = 5

faulty nodes only are tolerated with this approach, and, as discussed previously, with a large value of k (cf. Table 2) the path in the higher layer would be significantly longer, thus inducing a greater maximum path length in Tofu.

It is here essential to note that unlike Tofu, the proposed TCT topology enables efficient reuse of existing routing algorithms such as those for fault-tolerant point-to-point and node-to-set routing [23], set-to-set routing [29], pairwise routing [30], and cluster-fault-tolerant routing [24]. Precisely, “efficient” here stands for increased fault tolerance: a maximum of

2 n - 1

faulty nodes in the case of TCT and only 5 in the case of Tofu, an increased number of selectable parallel paths, and a reduced maximum path length as further detailed below.

In the case of fault-tolerant one-to-many routing, it is recalled from [23] that in an n-dimensional k-ary torus with a maximum of

2 n - p

faulty nodes,

p \leq 2 n

fault-free disjoint paths of lengths at most

Λ (n, k) + 1

can be selected in

O (n^{3})

time. Therefore, by applying the routing strategy described in Section 3.3, in an n-dimensional k-ary TCT network with a maximum of

2 n - p

faulty nodes, p fault-free disjoint paths of lengths in the higher layer at most

Λ (n, k) + 1

can be selected in

O (Λ (n, k) \cdot n^{3}) = O (k n^{4})

time. This is to be compared to Tofu: since it is based on a 3-dimensional torus as explained previously, a maximum of

6 - p

faulty nodes only are tolerated with this approach, with

p \leq 6

. The maximum number of selectable parallel paths for the disjoint-path one-to-many routing scenario is thus significantly reduced compared to that in the case of the proposed TCT topology given its higher values of n (cf. Table 2).

Regarding many-to-many routing, it is known that in an n-dimensional k-ary torus,

2 n

disjoint paths of lengths at most

2 (k + 1) n

can be selected in

O (k n^{3} + n^{3} log n)

[29]. Hence, by applying the routing strategy described in Section 3.3, in an n-dimensional k-ary TCT network,

2 n

disjoint paths of lengths in the higher layer at most

2 (k + 1) n

can be selected. This is once again to be compared to Tofu, which would allow only six disjoint paths to be selected and used in parallel, and, as discussed previously, with a large value of k (cf. Table 2) the path in the higher layer would be significantly longer, thus inducing a greater maximum path length in Tofu.

Finally, regarding pairwise routing, it has been established that in an n-dimensional k-ary torus, n disjoint paths of lengths at most

⌊ k / 2 ⌋ n + (⌈ 3 k / 2 ⌉ - 2) (n - 1)

can be selected in

O (n^{4} + k n^{2})

time. Therefore, by applying the routing strategy described in Section 3.3, in an n-dimensional k-ary TCT network, n disjoint paths can be selected and used in parallel, and this number of parallel paths would be reduced to only 3 paths in the case of Tofu. And, once again, as discussed previously, with a large value of k (cf. Table 2) the path in the higher layer would be significantly longer in the case of Tofu, thus inducing a greater maximum path length in Tofu.

5. Empirical Evaluation

In this section, we quantitatively compare the proposed TCT topology with the previously introduced TCC topology.

5.1. Routing in a TCT

We have established, in Section 4, the theoretical maximum path length induced by a dimension-order routing algorithm that supports torus wrap-around edges (i.e., algorithm R): in an n-dimensional k-ary TCT network, it is equal to

2 n ⌊ k / 2 ⌋ + 2 τ (n) + n - 2

, with

τ (n)

the diameter of an n-toroid as established in Section 2. In this section, we aim at investigating the average length of a path induced by the same routing algorithm.

To this end, we have implemented a dimension-order routing algorithm in a torus-connected toroids network that supports torus wrap-around edges and run a computer experiment as a simulation. The experimental conditions were as follows: the dimension n of the TCT network ranges from 1 to 8, and the network arity k was arbitrarily fixed to five (i.e., the considered network order goes up to 6,250,000 nodes). For each value of the dimension n, 1000 random problem instances were automatically generated, that is, a source node and a destination node were randomly selected from the set of nodes of the corresponding TCT network. Then, the routing algorithm was applied to select a path between the source node and the destination node.

The obtained results are given in Figure 6: for each dimension n, the maximum length and the average length of the selected 1000 paths are calculated and plotted. The standard deviation with respect to the average path length is shown with vertical bars below and above the corresponding length. In addition, the theoretical maximum path length in the corresponding TCT network is also plotted for reference.

From these experimental results, it can be observed that the value of the maximum path length obtained empirically is equal to that of the theoretical maximum path length when the TCT dimension satisfies

n \in {2, 3}

. For greater dimensions, the value of the maximum path length obtained empirically remains below that of the theoretical maximum path length, which tends to indicate that our theoretical estimation of the maximum path length is slightly pessimistic but is also a good indicator of the performance of the TCT topology and the described routing algorithm. Moreover, the difference between the theoretical maximum path length and the empirically measured one seems to diverge, which is yet another positive performance indicator. In addition, for the considered dimensions (i.e., values of n), the average path length is approximately 44% shorter than the maximum path length, which is a good indicator of the performance of the proposed TCT network topology.

5.2. TCT and TCC Routing Comparison

As mentioned in the introduction, an earlier attempt was made to define a hierarchical interconnection network that connects smaller networks (clusters) according to the torus topology: torus-connected cycles (TCC) connect clusters that are isomorphic to a ring according to the torus topology.

It is pertinent to compare the proposed TCT topology to the existing TCC topology considering the following two essential points. First, these two topologies, TCT and TCC, are determined by two parameters: the dimension and arity of the network. These two parameters induce the exact same network order in the case of TCT and TCC, meaning they have the same number of nodes. In other words,

| TCT (n, k) | = | TCC (n, k) |

. Second, the fundamental concept of the routing algorithm discussed in Section 3.2 for a TCT can also be applied to a TCC. Both TCT and TCC topologies share a common approach for routing in the higher layer, which involves torus shortest-path routing. The only distinction between them lies in the internal routing within clusters. Concretely, in-cluster routing in the case of TCT is done with the shortest-path routing algorithm in a toroid described in Section 2, whereas in-cluster routing in the case of TCC is done with shortest-path routing inside a cycle, which is trivial. Hence, this comparison of TCT and TCC routing is absolutely fair.

We reproduce the conditional experiments described previously in Section 5.1: for each value of the dimension n in the range

[1, 8]

and with the arity k set to five, 1000 random routing problem instances are solved by randomly selecting one source node and one destination node from within the set of the network nodes. This simulation experiment is conducted for both the TCT and TCC topology. The obtained results are illustrated in Figure 7.

One can easily confirm from these experimental results that the maximum path length in the case of

TCT (n, k)

is significantly shorter than that in the case of

TCC (n, k)

. And the same observation can also be made for the average path length. Furthermore, it can be noticed that the path length difference between TCT and TCC becomes larger as n increases, which is yet another indicator of the advantage of the TCT topology over the TCC one. This increasing path length difference can be explained by the different diameter of meta-nodes (i.e., cycles in the case of a TCC and toroids in the case of a TCT): it is equal to n in the case of a

TCC (n, k)

and

τ (n)

in the case of a

TCT (n, k)

.

6. Conclusions

The TOP500 list is a ranking of the fastest supercomputers in the world. It can be inferred from this list that, first, the number of compute nodes included in these machines has passed the one million node barrier a few years ago and, second, that the torus topology is a popular interconnect. Moreover, in order to maximize the performance of these massively parallel systems, the network diameter and degree, in other words the network cost, need to stay as low as possible. In this paper, acknowledging both the torus topology and hierarchical interconnect trends of the supercomputing industry, we have described the torus-connected toroids (TCT) network topology as a solution to the aforementioned network cost issue. The TCT topology has been formally defined and both theoretically and empirically evaluated. Notably, we have shown that the TCT topology significantly reduces the network cost when compared to conventional approaches such as torus-connected cycles and the Tofu topology that is used by the two major Fujitsu supercomputers K and Fugaku, thus paving the way for further increased performance.

The formal establishment of the diameter for a

TCT (n, k)

network remains an unresolved matter. Given the inherent difficulty associated with this problem, an empirical estimation could serve as a valuable intermediate step. To discuss the performance of the interconnect topology, a theoretical upper bound on the maximum path length as established in this paper is generally a satisfactory approach.

Funding

This work was partly supported by a Grant-in-Aid for Scientific Research (C) of the Japan Society for the Promotion of Science under grant no. 19K11887. The APC was funded by Kanagawa University.

Data Availability Statement

Not applicable.

Acknowledgments

The author is sincerely grateful towards the four reviewers for their insightful comments and suggestions.

Conflicts of Interest

The author declares no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

TOP500. Still Waiting for Exascale: Japan’s Fugaku Outperforms All Competition Once Again. 2021. Available online: https://top500.org/news/still-waiting-exascale-japans-fugaku-outperforms-all-competition-once-again/ (accessed on 28 August 2023).
IBM. IBM100—Icons of Progress—Blue Gene. Available online: https://www.ibm.com/ibm/history/ibm100/us/en/icons/bluegene/ (accessed on 28 August 2023).
Scogland, T.; Subramaniam, B.; Feng, W.C. The Green500 list: Escapades to exascale. Comput. Sci. Res. Dev. 2013, 28, 109–117. [Google Scholar] [CrossRef]
Ajima, Y.; Sumimoto, S.; Shimizu, T. Tofu: A 6D mesh/torus interconnect for exascale computers. Computer 2009, 42, 36–40. [Google Scholar] [CrossRef]
Ajima, Y.; Kawashima, T.; Okamoto, T.; Shida, N.; Hirai, K.; Shimizu, T.; Hiramoto, S.; Ikeda, Y.; Yoshikawa, T.; Uchida, K.; et al. The Tofu interconnect D. In Proceedings of the IEEE International Conference on Cluster Computing (CLUSTER), Belfast, UK, 10–13 September 2018; pp. 646–654. [Google Scholar] [CrossRef]
Chen, D.; Eisley, N.A.; Heidelberger, P.; Senger, R.M.; Sugawara, Y.; Kumar, S.; Salapura, V.; Satterfield, D.L.; Steinmacher-Burow, B.; Parker, J.J. The IBM Blue Gene/Q interconnection network and message unit. In Proceedings of the International Conference for High Performance Computing Networking, Storage and Analysis (SC), Seattle, WA, USA, 12–18 November 2011; pp. 1–10. [Google Scholar] [CrossRef]
De Sensi, D.; Di Girolamo, S.; McMahon, K.H.; Roweth, D.; Hoefler, T. An in-depth analysis of the Slingshot interconnect. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC), Atlanta, GA, USA, 9–19 November 2020; pp. 1–14. [Google Scholar] [CrossRef]
Malluhi, Q.M.; Bayoumi, M.A. The hierarchical hypercube: A new interconnection topology for massively parallel systems. IEEE Trans. Parallel Distrib. Syst. 1994, 5, 17–30. [Google Scholar] [CrossRef]
Wu, J.; Sun, X.-H. Optimal cube-connected cube multicomputers. J. Microcomput. Appl. 1994, 17, 135–146. [Google Scholar] [CrossRef]
Ghose, K.; Desai, K.R. Hierarchical cubic networks. IEEE Trans. Parallel Distrib. Syst. 1995, 6, 427–435. [Google Scholar] [CrossRef]
Li, Y.; Peng, S.; Chu, W. Efficient collective communications in dual-cube. J. Supercomput. 2004, 28, 71–90. [Google Scholar] [CrossRef]
Li, Y.; Peng, S.; Chu, W. Metacube—A versatile family of interconnection networks for extremely large-scale supercomputers. J. Supercomput. 2010, 53, 329–351. [Google Scholar] [CrossRef]
Li, Y.; Peng, S.; Chu, W. Disjoint-paths and fault-tolerant routing on recursive dual-net. Int. J. Found. Comput. Sci. 2011, 22, 1001–1018. [Google Scholar] [CrossRef]
Li, Y.; Peng, S.; Chu, W. Hierarchical Dual-Net: A flexible interconnection network and its routing algorithm. Int. J. Netw. Comput. 2012, 2, 234–250. [Google Scholar] [CrossRef]
Bossard, A.; Kaneko, K. A routing algorithm solving the container problem in a hypercube with bit constraint. Int. J. Networked Distrib. Comput. 2015, 3, 202–213. [Google Scholar] [CrossRef]
Bossard, A.; Kaneko, K. A node-to-set disjoint paths routing algorithm in Torus-Connected Cycles. Int. J. Comput. Their Appl. 2015, 22, 22–30. [Google Scholar]
Rahman, M.H.; Inoguchi, Y.; Sato, Y.; Horiguchi, S. TTN: A high performance hierarchical interconnection network for massively parallel computers. IEICE Trans. Inf. Syst. 2009, E92.D, 1062–1078. [Google Scholar] [CrossRef]
Faisal, F.A.; Rahman, M.M.H.; Inoguchi, Y. HFBN: An energy efficient high performance hierarchical interconnection network for exascale supercomputer. IEEE Access 2022, 10, 3088–3104. [Google Scholar] [CrossRef]
Preparata, F.P.; Vuillemin, J. The Cube-Connected Cycles: A versatile network for parallel computation. Commun. ACM 1981, 24, 300–309. [Google Scholar] [CrossRef]
Bossard, A.; Kaneko, K. Torus–Connected Cycles: A simple and scalable topology for interconnection networks. Int. J. Appl. Math. Comput. Sci. 2015, 25, 723–735. [Google Scholar] [CrossRef]
Diestel, R. Graph Theory, 5th ed.; Springer: Heidelberg, Germany, 2016. [Google Scholar]
Duato, J.; Yalamanchili, S.; Ni, L. Interconnection Networks—An Engineering Approach; Revised printing; Morgan Kaufmann: San Francisco, CA, USA, 2003. [Google Scholar]
Gu, Q.P.; Peng, S. Fault tolerant routing in toroidal networks. IEICE Trans. Inf. Syst. 1996, E79-D, 1153–1159. [Google Scholar]
Bossard, A.; Kaneko, K. Cluster-fault tolerant routing in a torus. Sensors 2020, 20, 3286. [Google Scholar] [CrossRef] [PubMed]
Ajima, Y.; Inoue, T.; Hiramoto, S.; Uno, S.; Sumimoto, S.; Miura, K.; Shida, N.; Kawashima, T.; Okamoto, T.; Moriyama, O.; et al. Tofu interconnect 2: System-on-chip integration of high-performance interconnect. In Lecture Notes in Computer Science, Proceedings of the International Supercomputing Conference (ISC), Leipzig, Germany, 22–26 June 2014; Kunkel, J.M., Ludwig, T., Meuer, H.W., Eds.; Springer: Cham, Switzerland, 2014; Volume 8488, pp. 498–507. [Google Scholar] [CrossRef]
Aroca, J.A.; Anta, A.F. Bisection (band)width of product networks with application to data centers. IEEE Trans. Parallel Distrib. Syst. 2014, 25, 570–580. [Google Scholar] [CrossRef]
Li, S.; Huang, P.C.; Banks, D.; DePalma, M.; Elshaarany, A.; Hemmert, S.; Rodrigues, A.; Ruppel, E.; Wang, Y.; Ang, J.; et al. Low latency, high bisection-bandwidth networks for exascale memory systems. In Proceedings of the Second International Symposium on Memory Systems (MEMSYS), Alexandria, VA, USA, 3–6 October 2016; pp. 62–73. [Google Scholar] [CrossRef]
Ajima, Y.; Inoue, T.; Hiramoto, S.; Shimizu, T. Tofu: Interconnect for the K computer. Fujitsu Sci. Tech. J. 2012, 48, 280–285. [Google Scholar]
Kaneko, K.; Bossard, A. A set-to-set disjoint paths routing algorithm in tori. Int. J. Netw. Comput. 2017, 7, 173–186. [Google Scholar] [CrossRef]
Kaneko, K.; Nguyen, S.V.; Binh, H.T.T. Pairwise disjoint paths routing in tori. IEEE Access 2020, 8, 192206–192217. [Google Scholar] [CrossRef]

Figure 1. Drawings of the n-toroids with

1 \leq n \leq 6

.

Figure 1. Drawings of the n-toroids with

1 \leq n \leq 6

.

Figure 2. Experimental results of the empirical evaluation of the described shortest-path routing algorithm in an n-toroid. The maximum length and the average length of a selected path have been measured (the formally established diameter is also plotted for reference).

Figure 3. (a) Interconnection of toroids (i.e., clusters) according to TCT dimensions; the case

n = 3

. (b) Excerpt of a

TCT (3, 3)

: only the nodes of one single value of the third dimension are shown.

Figure 3. (a) Interconnection of toroids (i.e., clusters) according to TCT dimensions; the case

n = 3

. (b) Excerpt of a

TCT (3, 3)

: only the nodes of one single value of the third dimension are shown.

Figure 4. Point-to-point routing inside a TCT: the path

m_{i - 1} \to m_{i} \to m_{i + 1}

inside the higher layer torus is transformed into the TCT path

u_{i - 1} ⇝ u_{i - 1}^{'} \to u_{i} ⇝ u_{i}^{'} \to u_{i + 1} ⇝ u_{i + 1}^{'}

.

Figure 4. Point-to-point routing inside a TCT: the path

m_{i - 1} \to m_{i} \to m_{i + 1}

inside the higher layer torus is transformed into the TCT path

u_{i - 1} ⇝ u_{i - 1}^{'} \to u_{i} ⇝ u_{i}^{'} \to u_{i + 1} ⇝ u_{i + 1}^{'}

.

Figure 5. An illustration of the relation between

α

and the node position

(u_{x}, u_{y}, u_{z})

inside a toroid.

Figure 5. An illustration of the relation between

α

and the node position

(u_{x}, u_{y}, u_{z})

inside a toroid.

Figure 6. Experimental results of the empirical evaluation of the described routing algorithm in an n-dimensional 5-ary TCT. The maximum length and the average length of a selected path have been measured. In addition, the theoretical maximum path length is also plotted for reference.

Figure 7. Comparing the maximum and average path lengths induced by the same routing algorithm approach in the case of the TCT and TCC topologies (the arity k is again set to five).

Table 1. An overview of the interconnection networks of the fastest supercomputers (data taken from the TOP500 list of June 2023).

No	Supercomputer	Interconnect	Cores	Year	Manufacturer
1	Frontier	Slingshot-11	8,699,904	2021	HPE
		(supports the torus topology [7])
2	Supercomputer	Tofu interconnect D	7,630,848	2020	Fujitsu
	Fugaku
3	LUMI	Slingshot-11	2,220,288	2023	HPE
		(supports the torus topology [7])
4	Leonardo	Quad-rail NVIDIA HDR100 Infiniband	1,824,768	2023	EVIDEN
5	Summit	Dual-rail Mellanox EDR Infiniband	2,414,592	2018	IBM
6	Sierra	Dual-rail Mellanox EDR Infiniband	1,572 480	2018	IBM/NVIDIA/Mellanox
7	Sunway	Sunway	10,649,600	2016	NRCPC
	TaihuLight
8	Perlmutter	Slingshot-10	761,856	2021	HPE
		(supports the torus topology [7])

Table 2. A comparison of the network order and the maximum path length induced by algorithm R in the case of TCT and Tofu with sample dimensions and arities.

Torus-Connected Toroids (TCT)				Tofu
$n$	$k$	Network Order	Maximum Length	$k$	Network Order	Maximum Length
3	5	750	17	3	324	6
3	7	2058	23	4	768	9
4	5	5000	24	10	12,000	18
5	5	31,250	29	20	96,000	33
5	6	77,760	39	30	324,000	48
5	7	168,070	39	40	768,000	63
6	4	49,152	34	50	1,500,000	78
6	5	187,500	34	60	2,592,000	93
7	5	1,093,750	39	70	4,116,000	108
8	5	6,250,000	46	80	6,144,000	123

Table 3. A comparison of the bisection bandwidth in the case of TCT and Tofu with sample dimensions and arities. A link bandwidth of 5 GB/s is assumed.

Torus-Connected Toroids (TCT)			Tofu
$n$	$k$	Bisection Bandwidth (GB/s)	$k$	Bisection Bandwidth (GB/s)
3	5	250	3	1080
3	7	490	4	1920
4	5	1250	10	12,000
5	5	6250	20	48,000
5	6	12,960	30	108,000
5	7	24,010	40	192,000
6	4	10,240	50	300,000
6	5	31,250	60	432,000
7	5	156,250	70	588,000
8	5	781,250	80	768,000

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Bossard, A. Torus-Connected Toroids: An Efficient Topology for Interconnection Networks. Computers 2023, 12, 173. https://doi.org/10.3390/computers12090173

AMA Style

Bossard A. Torus-Connected Toroids: An Efficient Topology for Interconnection Networks. Computers. 2023; 12(9):173. https://doi.org/10.3390/computers12090173

Chicago/Turabian Style

Bossard, Antoine. 2023. "Torus-Connected Toroids: An Efficient Topology for Interconnection Networks" Computers 12, no. 9: 173. https://doi.org/10.3390/computers12090173

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Torus-Connected Toroids: An Efficient Topology for Interconnection Networks^†

Abstract

1. Introduction

2. Toroids

2.1. Preliminaries

2.2. Definition

2.3. Topological Properties

2.4. Shortest-Path Routing in an n-Toroid

2.4.1. Algorithm Description

2.4.2. Empirical Evaluation

3. Torus-Connected Toroids

3.1. Definition

3.2. Point-to-Point Routing Algorithm

3.3. Routing Strategy for Fault-Tolerant (i.e., Adaptive) Routing and Parallel Routing (e.g., One-to-Many and Many-to-Many)

4. Theoretical Evaluation

5. Empirical Evaluation

5.1. Routing in a TCT

5.2. TCT and TCC Routing Comparison

6. Conclusions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Torus-Connected Toroids: An Efficient Topology for Interconnection Networks †

Abstract

1. Introduction

2. Toroids

2.1. Preliminaries

2.2. Definition

2.3. Topological Properties

2.4. Shortest-Path Routing in an n-Toroid

2.4.1. Algorithm Description

2.4.2. Empirical Evaluation

3. Torus-Connected Toroids

3.1. Definition

3.2. Point-to-Point Routing Algorithm

3.3. Routing Strategy for Fault-Tolerant (i.e., Adaptive) Routing and Parallel Routing (e.g., One-to-Many and Many-to-Many)

4. Theoretical Evaluation

5. Empirical Evaluation

5.1. Routing in a TCT

5.2. TCT and TCC Routing Comparison

6. Conclusions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Torus-Connected Toroids: An Efficient Topology for Interconnection Networks^†