Abstract
Clos networks and their folded versions, fat trees, are widely adopted in interconnection network designs for data centers and supercomputers. There are two main types of Clos networks: strictly nonblocking Clos networks and rearrangeably nonblocking Clos networks. Strictly nonblocking Clos networks can connect an idle input to an idle output without interfering with existing connections. Rearrangeably nonblocking Clos networks can connect an idle input to an idle output with rearrangements of existing connections. Traditional strictly nonblocking Clos networks have two drawbacks. One drawback is the use of crossbars with different numbers of input and output ports, whereas the currently available switches are square crossbars with the same number of input and output ports. Another drawback is that every connection goes through a fixed number of stages, increasing the length of the communication path. A drawback of traditional fat trees is that the root stage uses differently sized crossbar switches than the other stages. To solve these problems, this paper proposes an Identical Strictly NonBlocking folded Clos (ISNBC) network that uses equally sized square crossbars for all switches. Correspondingly, this paper also proposes an Identical Rearrangeably NonBlocking folded Clos (IRNBC) network. Both ISNBC and IRNBC networks can have any number of stages, can use equally sized square crossbars with no unused switch ports, and can utilize shortcut connections to reduce communication path lengths. Moreover, both ISNBC and IRNBC networks have a lower switch crosspoint cost ratio relative to a single crossbar than their corresponding traditional Clos networks. Specifically, ISNBC networks use 46.43% to 87.71% crosspoints of traditional strictly nonblocking folded Clos networks, and IRNBC networks use 53.85% to 60.00% crosspoints of traditional rearrangeably nonblocking folded Clos networks.
1. Introduction
Clos networks [1] and fat trees [2] have been widely used in interconnection network designs for modern data centers and supercomputers such as Google Jupiter [3], Tianhe [4], TaihuLight [5], Frontera [6], Summit and Sierra [7], and the Multi-Plane Fat Tree for DeepSeek-V3 [8].
The traditional unidirectional nonblocking Clos network was originally designed for telecommunications. It is a type of multistage circuit-switching network that replaces a single large crossbar to reduce hardware costs in terms of crosspoints.
A three-stage traditional unidirectional Clos network topology [1] is parameterized by two integers (n and m, where n is the number of sources connecting to an ingress-stage crossbar switch or the number of destinations connecting to an egress-stage crossbar switch and m is the number of crossbar switches in the middle stage). The ingress stage has n crossbar switches, and the number of crossbar switches in the egress stage is also n. Therefore, the total number of sources is , and the total number of destinations is also . A switch in the ingress stage is an (n inputs and m outputs) crossbar. A switch in the egress stage is an crossbar. A switch in the middle stage is an crossbar. There is exactly one connection between each ingress-stage switch and each middle-stage switch, and there is exactly one connection between each middle-stage switch and each egress-stage switch. , indicates a strictly nonblocking network, meaning that the network can connect a free source to a free destination without interfering with existing connections. indicates a rearrangeably nonblocking network, meaning that the network can connect a free source to a free destination with rearrangements of existing connections.
Traditional unidirectional strictly nonblocking Clos networks have two drawbacks. One drawback is that all connections from sources to destinations pass through a fixed number of stages. The source is connected to the switch in the ingress stage, and the destination is connected to the switch in the egress stage. Let s be the number of stages of a Clos network. Then, the path length from the ingress stage to the egress stage is . Counting the path from the source to the ingress-stage switch and the path from the egress-stage switch to the destination, the path length is . For example, the path length in a three-stage unidirectional Clos network is always 4. It is not possible to exploit the locality of source–destination pairs. Another drawback is that it uses and crossbars with different numbers of input and output ports. However, nowadays, the available switches are square crossbars with the same number of input and output ports. If the same square crossbars are used for all switches, there will be many unused ports.
A fat tree is a folded version of a Clos network [9]. It merges the corresponding ingress and egress switches. Then, the merged stage is called a leaf stage, and the middle stage is called a root stage. A fat tree can utilize shortcut connections, that is, a connection does not have to go through all stages. For example, if the source and destination are connected to the same leaf switch, the connection does not need to go to the root stage. Thus, the path length is 2 instead of 4. A drawback of fat-tree networks is that the root stage uses differently sized crossbar switches than the other stages.
A k-ary n tree [10] is a kind of parametric fat tree where k is the arity or number of links of a switch that connect to the previous or next stage and n is the number of stages, that is, the switch radix is . A k-ary n-tree Clos network can be constructed with two back-to-back k-ary n-fly butterflies [11]. A 2-ary n-tree Clos network is also called a Beneš network [12]. A k-ary fat tree [13] is a bidirectional ()-ary n-tree Clos network where k is even.
A packet can be routed to an arbitrary middle switch in a rearrangeably nonblocking Clos network or an arbitrary root switch in a rearrangeably nonblocking folded Clos network, then to its ultimate destination. This increases hardware cost and packet latency. Mirrored k-ary n-tree networks [14] and peer k-ary n-tree networks [15] focus on increasing network capacity and reducing hardware costs and packet latency.
A strictly nonblocking folded Clos network using same-sized crossbar switches was proposed in [16]. The proposed network has a two-stage structure and uses multiple links between a leaf switch and a root switch. It may have unused switch ports. The number of unused switch ports is reduced by adjusting the number of leaf switches, but the existence of unused switch ports increases the hardware cost. A flexible folded Clos network was proposed in [17]. To reduce the blocking probability, a second group of switches is added to the root stage. Ref. [18] extended the number of groups from two to a general number (S). All these networks have only two stages, making it difficult to scale the network.
The existing issues are summarized below. Traditional unidirectional strictly nonblocking Clos networks use switches with different numbers of input and output ports and route packets through a fixed number of stages. Fat trees, the folded version of Clos networks, use differently sized switches at the root and other stages. Recently proposed Clos networks have only two stages and introduce unused ports on the switches. This paper attempts to solve these problems with a low crosspoint ratio relative to a single crossbar.
The contributions of this paper are summarized as follows. An Identical Strictly NonBlocking folded Clos (ISNBC) network and an Identical Rearrangeably NonBlocking folded Clos (IRNBC) network are proposed. Both ISNBC and IRNBC networks can have any number of stages to increase the system’s scalability, can use equally sized square crossbars with no unused switch ports to accommodate currently available switches at low costs, and can utilize shortcut connections to reduce communication path lengths. Moreover, both ISNBC and IRNBC networks have a lower crosspoint ratio relative to a single crossbar than their corresponding traditional nonblocking Clos networks.
The rest of this paper is organized as follows. Section 2 reviews some related multistage interconnection networks. Section 3 proposes identical strictly and rearrangeably nonblocking folded Clos networks consisting of equally sized square crossbars. Section 4 evaluates the hardware cost from the crosspoint perspective and shows that the costs of the proposed networks are lower than those of their corresponding traditional networks. Finally, Section 5 concludes the paper and suggests some future research topics.
2. Related Works
There are many different types of multistage interconnection networks. This section reviews some related multistage interconnection networks.
2.1. Traditional Nonblocking Clos Networks
A traditional unidirectional strictly nonblocking Clos network [1] has odd switch stages. Consider a traditional unidirectional strictly nonblocking Clos network with three switch stages: The ingress stage has n switches, and each switch is an (n inputs and m outputs) crossbar. The middle stage has m switches, and each switch is an crossbar. The egress stage has n switches, and each switch is an crossbar. The m outputs of an ingress switch are connected to m middle switches; each output is connected to a different middle switch. The n outputs of a middle switch are connected to n egress switches; each output is connected to a different egress switch, that is, there is exactly one connection between each ingress-stage switch and each middle-stage switch. There is exactly one connection between each middle-stage switch and each egress-stage switch. The total number of switches is , and the total number of compute nodes is . It can be seen that crossbar switches with different numbers of input and output ports are used at the ingress and egress stages.
Assume that connections have been built, which means that an ingress switch has an idle input and an egress switch has an idle output. It is desirable to build a connection from the idle input to the idle output without interfering with existing connections. This is called strictly nonblocking. Suppose that the idle input is at switch i in the ingress stage and the idle output is at switch j in the egress stage, that is, connections are built into switch i and connections are built into switch j. In the worst case, these connections use switches in the middle stage. If there is another switch in the middle stage, a connection from the idle input to the idle output can be built through that switch without interfering with existing connections. Therefore, the condition required to achieve strictly nonblocking is .
An (x inputs and y outputs) crossbar has crosspoints. Let . Then, there are crosspoints in a traditional unidirectional strictly nonblocking Clos network. There are inputs and outputs. Using a single crossbar requires crosspoints. For , the Clos network requires crosspoints, which is less than the crosspoints in the single-crossbar implementation.
A folded version of a traditional unidirectional strictly nonblocking Clos network can be constructed by combining ingress and egress switches to form leaf switches. The middle switches become root switches. A leaf switch is an crossbar. A root switch is still an crossbar. It can be seen that the differently sized crossbar switches are used at the leaf and root stages.
A Clos network is rearrangeably nonblocking if and only if . In such a case, an input of an ingress switch can be connected to an output of an egress switch using a middle switch. A folded version of a traditional unidirectional rearrangeably nonblocking Clos network can be constructed by combining ingress and egress switches.
Traditional unidirectional strictly nonblocking Clos networks use crossbars with different numbers of input and output ports. The root stage of the folded version of a traditional unidirectional strictly and rearrangeably nonblocking Clos network uses differently sized crossbar switches than the other stages.
2.2. K-Ary N-Tree Clos Networks
A k-ary n-tree Clos network can be created with k-ary n-fly butterfly networks. A k-ary n-fly butterfly network [11] has compute nodes. It has n stages. Each stage has switches, and each switch has k input ports and k output ports ( is the radix of the switch). A k-ary 1-fly butterfly network has a switch. There are k input ports and k output ports. Each of compute nodes is connected to an input port and an output port. A k-ary 2-fly butterfly network has two stages: stage 0 and stage 1. Each stage has k switches. compute nodes are connected to the input ports of switches in stage 0 and the output ports of switches in stage 1. Each output port of a switch in stage 0 is connected to an input port of a different switch in stage 1. A k-ary n-fly butterfly network with and is given in Figure A1 in Appendix A.
A butterfly network minimizes the network diameter and reduces the network cost. However, there is a lack of path diversity because there is only one path between the source node and the destination node. A butterfly network is a blocking network. In addition, it cannot exploit the locality of traffic because all packets must traverse the diameter of the network [9].
Multistage rearrangeably nonblocking Clos networks are also called k-ary n-tree Clos networks. A k-ary n-tree Clos network can be created by combining two k-ary n-fly butterfly networks [11] back to back, where the two back stages are fused [9]. There are stages. A three-ary three-tree Clos network, built from three-ary three-fly butterfly networks, is given in Figure A2 in Appendix A.
When , a k-ary n-tree Clos network is also called a Beneš network [12]. There are stages, where N is the number of compute nodes. Each stage contains switches, and each switch is a crossbar. For example, a three-stage Beneš network has four compute nodes, a five-stage Beneš network has eight compute nodes, and a seven-stage Beneš network has sixteen compute nodes.
In a k-ary n-tree Clos network, a packet needs to be routed first to an arbitrary middle stage switch, then to its ultimate destination. It is a rearrangeably nonblocking network. The cost and latency of a k-ary n-tree Clos network is nearly double that of a k-ary n-fly butterfly network with equal node capacity [9].
Because in a unidirectional k-ary n-tree Clos network, the nodes on the input side and the nodes on the output side in the same row are the same compute nodes and the ports of a switch are unidirectional, the unidirectional k-ary n-tree Clos network can be folded, and two unidirectional switches can be combined to build a bidirectional switch. Furthermore, the bidirectional links are used to connect switch ports. This is called a k-ary n-tree folded Clos network or a k-ary n-tree fat tree [10]. It has n stages and compute nodes.
A k-ary n-tree fat-tree network can be thought of as a bidirectional k-ary n-fly butterfly network with compute nodes connected to stage 0. In contrast to a unidirectional k-ary n-tree Clos network, a k-ary n-tree fat-tree network can exploit traffic locality because a packet needs to be routed only to the nearest common ancestor (NCA) of the source and destination, then to its ultimate destination. This means that packets may no longer need to be routed to the root switch, reducing the path they take.
The switches in a unidirectional k-ary n-tree Clos network can be redesigned so that they have bidirectional ports. Bidirectional links are also used, and the number of compute nodes is doubled (the left nodes and right nodes are distinct compute nodes). Such a network has stages and compute nodes. All the switches in a bidirectional k-ary n-tree Clos network have the same radix, which is .
When and k is even, a bidirectional ()-ary n-tree Clos network is also called a k-ary fat-tree network [13]. It has three fixed stages (layers). The root stage is called the core (spine) layer. There are k pods below the core layer. Each pod contains two layers: an aggregation layer in the middle stage and an edge layer in the leaf stage.
2.3. Mirrored and Peer K-Ary N-Tree Networks
To implement nonblocking routing, the network must provide a high level of path diversity so that a packet can be routed to an arbitrary middle-stage switch in a Clos network or an arbitrary root switch in a fat-tree network, then to its ultimate destination. This approximately doubles the number of switches and links, resulting in a high hardware cost and a high level of packet latency [9]. Both mirrored k-ary n-tree (MiKANT) networks [14] and peer k-ary n-tree networks [15] focus on increasing network capacity and reducing hardware costs and packet latency.
A k-ary n-tree fat-tree network has n stages. If the number 0 represents the leaf stage, then the root stage is numbered . A MiKANT network [14] consists of two k-ary n-tree fat-tree networks joined back to back, where the switches in stage of a fat tree serve as root switches for the other fat tree, that is, the MiKANT network has stages. Fat tree 0 and fat tree 1 are notations used to distinguish between two fat trees. Compared to k-ary n-tree fat tree network, MiKANT doubles the number of compute nodes. Compared to a bidirectional k-ary n-tree Clos network, MiKANT uses fewer switches with equal node capacity. If the source and destination nodes belong to the same fat tree, MiKANT behaves like a k-ary n-tree fat-tree network. MiKANT reduces the path length and path diversity when the source and destination nodes belong to different fat trees.
In a k-ary n-tree fat-tree network, the root switch has a radix of k, and the other switches have a radix of . The -radix switches can be used in the root stage, and k compute nodes can be connected to each root switch. Therefore, the root switches are the same as the leaf switches. This is called a peer k-ary n-tree or a peer fat tree [15] because there is no difference between roots and leaves. The peer fat tree reduces both hardware cost and average distance and, meanwhile, provides nonblocking routing functionality for half of the source–destination node pairs. Note that a peer k-ary n-tree network is not a bidirectional k-ary n-fly butterfly network. A bidirectional k-ary n-fly butterfly network connects compute nodes to either stage 0 or stage . However, a peer k-ary n-tree network connects compute nodes to both stage 0 and stage . Therefore, the number of compute nodes in a peer k-ary n-tree network is twice that in a bidirectional k-ary n-fly butterfly network. Actually, a bidirectional k-ary n-fly butterfly network is the same as a k-ary n-tree fat-tree network.
2.4. Twisted-and-Folded Clos Networks
A two-stage twisted-and-folded Clos network using equally sized square crossbar switches was proposed in [16]. In the proposed network, there are multiple links between a leaf switch and a root switch. It has k switches in the leaf stage and m switches in the root stage. The condition required to achieve strict nonblocking is , where n is the number of ports connected to compute nodes at a leaf switch and v is the number of links between a leaf switch and a root switch. It has compute nodes in total. It uses square crossbars, where and the number of leaf switches (k) is determined so that is as close as possible to . This keeps the number of unused ports as low as possible. Then, if , there is no unused port; otherwise, ports are unused on every switch in either the leaf stage (if ) or root stage (if ).
Let us take a look at the following examples proposed in [16]. For and , and . There are two possible values of k, as shown below.
- Let ; then, , which is smaller than . The network uses switches, and each switch is an crossbar. There is one () unused port in each root switch. The number of compute nodes is .
- Let ; then, , which is larger than . The network uses switches, and each switch is an crossbar. There is one () unused port in each leaf switch. The number of compute nodes is .
A flexible twisted-and-folded Clos network was proposed in [17]. It eliminates the strict condition of . To reduce the blocking probability, a second group of switches is added to the root stage. If it cannot establish a connection through the root switch in group 1, it will try to use the root switch in group 2 (two-step model).
The number of groups can be extended to a general number (S in an S-step model), as presented in [18]. To use it as a building block, a multiple-plane twisted-and-folded Clos network can be constructed [19]. A plane-selection layer consisting of and crossbars is inserted between the terminals and the input–output layer, where M is the number of planes. This multiple-plane network uses crossbar switches of various sizes. The twisted-and-folded Clos networks introduced in [16,17,18,19] have only two stages with compute nodes.
3. Proposed Identical Nonblocking Folded Clos Networks
This section first presents Unidirectional Strictly NonBlocking Clos (USNBC) networks and Unidirectional Rearrangeably NonBlocking Clos (URNBC) networks. Based on these unidirectional networks, the construction methods of identical strictly nonblocking folded Clos (ISNBC) networks and identical rearrangeably nonblocking folded Clos (IRNBC) networks, as listed in Table 1, are presented. Note that the number of stages in unidirectional networks is odd, e.g., for , where s is the number of stages for corresponding identical folded networks. Here, only the cases of , 3, and 4 are shown, but for , USNBC, URNBC, ISNBC, and IRNBC networks can be constructed similarly.
Table 1.
Proposed nonblocking Clos networks.
3.1. Proposed Identical Strictly NonBlocking Folded Clos (ISNBC) Networks
As mentioned before, there are two drawbacks of traditional strictly nonblocking Clos networks. One is the use of and crossbars with unequal numbers of input and output ports. Another is that all connections pass through a fixed number of stages. A drawback of traditional nonblocking folded Clos networks is that the root stage uses differently sized crossbar switches than the other stages. These problems can be solved by using an ISNBC network consisting of equally sized square crossbars. This subsection presents USNBC networks and the corresponding ISNBC networks.
3.1.1. Two-Stage ISNBC Networks
To construct a two-stage ISNBC network, a three-stage USNBC network is constructed first. In a three-stage USNBC network, let n be the number of inputs per switch in the ingress stage, m be the number of switches in the middle stage, and r be the number of switches in the ingress and egress stages. Let to ensure the strictly nonblocking property. Let to ensure that the ISNBC network uses equally sized square crossbars.
In the ingress stage, there are r switches, and each switch is an crossbar (n inputs and m outputs); in the middle stage, there are m switches, and each switch is an crossbar; and in the egress stage, there are r switches, and each switch is an crossbar. Each output of a switch in the ingress stage is connected to an input of a different switch in the middle stage. Since there are r switches in the ingress stage, the number of inputs of a switch in the middle stage must be r—one input for an output of the r switches in the ingress stage. Each output of a switch in the middle stage is connected to an input of a different switch in the egress stage. Because the number of outputs of a switch in the middle stage is also r, each output is connected to an input of r switches in the egress stage. Then, the number of compute nodes is .
In summary, to construct a three-stage USNBC network, m and r are determined as shown in Formula (1), where N is the number of compute nodes.
To construct a folded version of a three-stage USNBC network, the corresponding position switches in the ingress and egress stages are merged and expanded so that there are inputs and outputs in the combined switch. Then, these switches have the same number of inputs and outputs, which is . The switch in the middle stage has r inputs and r outputs with . Therefore, all switches in the folded Clos network use equally sized crossbars.
Figure 1a shows a three-stage USNBC network with , , and . It has compute nodes. A two-stage ISNBC network, the folded version of the three-stage USNBC network with , , and , is shown in Figure 1b. It uses equally sized square crossbars for all switches. It can utilize shortcut connections to reduce communication path lengths. For example, if the source and destination nodes are connected to the same leaf switch, the communication does not need to go through the root switch.
Figure 1.
Proposed strictly nonblocking Clos networks ( and ). (a) A 3-stage USNBC network with , , and . (b) A 2-stage ISNBC network composed of equally sized square crossbars.
The root switch in Figure 1b and the middle switch in Figure 1a are the same — both are the crossbar switches. However, the leaf switch in Figure 1b is not simply a combination of the ingress switch and the egress switch in Figure 1a. A leaf switch is a square crossbar.
Referring to Figure 2, for and , Figure 2a shows an crossbar and an crossbar. The number of crosspoints is . There are no paths from inputs and to outputs and . Similarly, there are no paths from inputs , , , and to outputs , , , and . Figure 2b shows an crossbar. The number of crosspoints is . Any input () can be routed to any output () for . A crosspoint can be implemented using two two-to-one multiplexers, as shown in Figure 2c.
Figure 2.
Merging and expanding an crossbar switch and an crossbar switch to a big square crossbar switch for and . (a) An crossbar switch and an crossbar switch, each with 8 crosspoints. (b) A square crossbar switch with 36 crosspoints. The red line shows the path of . (c) Crosspoint states and implementation using two 2-to-1 multiplexers.
Figure 2b is a crossbar switch. The input port () and output port () form a bidirectional port (, where ). An example of a shortcut is shown in Figure 3, where and connect to compute nodes 1 and 2, respectively. Then, node 1 can send packets to node 2 only through leaf switch 1, thereby reducing the communication path length.
Figure 3.
A shortcut in an ISNBC network. When a packet is sent from a source (node 1) to a destination (node 2), it does not need to go through the root switch.
Figure 4 shows another three-stage USNBC network with , , and . There are ingress switches, egress switches, and middle switches. It has compute nodes.
Figure 4.
A 3-stage USNBC network with , , and .
A two-stage ISNBC network, the folded version of the three-stage USNBC network with , , and , is shown in Figure 5. It uses equally sized square crossbars for all switches. It can utilize shortcut connections to reduce communication path lengths.
Figure 5.
A 2-stage ISNBC network with , , and composed of equally sized square crossbar switches.
Table 2 and Table 3 show the differences between this work and existing original work. Table 2 compares the proposed three-stage USNBC network with a traditional three-stage unidirectional strictly nonblocking Clos network. Table 3 compares the proposed two-stage ISNBC network with a traditional two-stage strictly nonblocking folded Clos network. The expression in the switch column indicates that there are z switches, and each switch is an crossbar. N is the total number of compute nodes in the network. The ISNBC network uses equally sized square crossbar switches in the leaf and root stages.
Table 2.
Comparison of three-stage unidirectional strictly nonblocking Clos networks.
Table 3.
Comparison of two-stage strictly nonblocking folded Clos networks.
3.1.2. Three-Stage ISNBC Networks
To construct a three-stage ISNBC network, a five-stage USNBC network is constructed first. By using three-stage USNBC networks as building blocks, a five-stage USNBC network can be constructed. For , a three-stage USNBC network has compute nodes. As a building block, the compute nodes are removed so that the three-stage USNBC network has inputs and outputs.
The building blocks can be thought of as virtually crossbars. m such building blocks are arranged in the middle stage. Then, in total, there are inputs and outputs in the middle stage. Correspondingly, the same number of outputs in the ingress stage and the same number of inputs in the egress stage can be arranged. Let r be the number of switches in the ingress and egress stages; then, must be equal to . Therefore, .
In summary, to construct a five-stage USNBC network, given n as the number of inputs per switch in the ingress stage, there are switches in the ingress stage, and each switch is an crossbar with . The middle stage has m building blocks, and each building block is a three-stage USNBC network with compute nodes removed. The egress stage has switches, and each switch is an crossbar with , as shown in Formula (2). The linking method is similar to the three-stage USNBC network; each output of a switch in the ingress stage is connected to an input of a different building block in the middle stage. Each output of a building block in the middle stage is connected to an input of a different switch in the egress stage. Because , the number of compute nodes is .
Figure 6 shows a five-stage USNBC network with , , and . It has compute nodes. There are building blocks in the middle stage, and each building block is a three-stage USNBC network with compute nodes removed. The detailed network of a building block is shown at the bottom of the figure. It can be seen from the figure how the switches are linked together.
Figure 6.
A five-stage USNBC network with , , and . There are building blocks (3-stage USNBC, Figure 1a) in the middle stage.
A three-stage ISNBC network, the folded version of the five-stage USNBC network with , , and , is shown in Figure 7. It uses equally sized square crossbars for all switches. Four building blocks are depicted in the two switch columns on the right, and each building block is a two-stage ISNBC network with compute nodes removed, as shown in Figure 1b. It can utilize shortcut connections to reduce communication path lengths.
Figure 7.
A three-stage ISNBC network with composed of equally sized square crossbars (folded version of Figure 6).
3.1.3. Four-Stage ISNBC Networks
To construct a four-stage ISNBC network, a seven-stage USNBC network is constructed first. Similarly, by using five-stage USNBC networks as building blocks, a seven-stage USNBC network can be constructed. For , five-stage building block has inputs and outputs. m such building blocks are arranged in the middle stage. Then, in total, there are inputs and outputs in the middle stage. Correspondingly, switches in the ingress stage and switches in the egress stage are arranged.
In summary, to construct a seven-stage USNBC network, given n, which is the number of inputs per switch in the ingress stage, there are switches in the ingress stage, and each switch is an crossbar with . There are m building blocks in the middle stage, and each building block is a five-stage USNBC network with compute nodes removed. There are switches in the egress stage, and each switch is an crossbar with , as shown in Formula (3). The linking method is similar to that in the five-stage USNBC network.
Figure 8 shows a seven-stage USNBC network with , , and . It has compute nodes. There are building blocks in the middle stage, and each building block is a five-stage USNBC network with compute nodes removed. The detailed networks of building blocks are shown at the bottom of the figure.
Figure 8.
A 7-stage USNBC network with , , and . There are building blocks (5-stage USNBC, Figure 6) in the middle stage.
A four-stage ISNBC network, the folded version of the seven-stage USNBC network with , , and , is shown in Figure 9. It uses the equally sized square crossbars for all switches. Four three-stage ISNBC networks are shown in the three switch columns on the right. It can utilize shortcut connections to reduce communication path lengths.
Figure 9.
A four-stage ISNBC network with composed of equally sized square crossbars (folded version of Figure 8).
Table 4 lists the numbers of compute nodes and switches of two-, three-, and four-stage ISNBC networks. Let s be the number of stages. Then, the number of compute nodes is , and the number of switches is . The derivation of the formula is given in Theorem A1 in Appendix B. The crossbar size is listed in the right column.
Table 4.
The numbers of compute nodes and switches in ISNBC networks.
3.2. Proposed Identical Rearrangeably NonBlocking Folded Clos (IRNBC) Networks
This subsection presents a URNBC network and the corresponding IRNBC network composed of square crossbars of the same size.
3.2.1. Two-Stage IRNBC Networks
To construct a two-stage IRNBC network, a three-stage URNBC network is constructed first. In a three-stage URNBC network, let n be the number of inputs per switch in the ingress stage, m be the number of switches in the middle stage, and r be the number of switches in the ingress and egress stages. Let to ensure the rearrangeably nonblocking property. Let to ensure that the IRNBC network uses equally sized square crossbars.
In the ingress stage, there are r switches, and each switch is an crossbar (n inputs and m outputs); in the middle stage, there are m switches, and each switch is an crossbar; and in the egress stage, there are r switches, and each switch is an crossbar. Each output of a switch in the ingress stage is connected to an input of a different switch in the middle stage. Since there are r switches in the ingress stage, the number of inputs of a switch in the middle stage must be r — one input for an output of the r switches in the ingress stage. Each output of a switch in the middle stage is connected to an input of a different switch in the egress stage. Because the number of outputs of a switch in the middle stage is also r, each output is connected to an input of r switches in the egress stage. Then, the number of compute nodes is . In summary, to construct a three-stage URNBC network, m and r are determined as shown in Formula (4).
To construct a folded version of a three-stage URNBC network, the corresponding position switches in the ingress and egress stages are merged and expanded so that there are inputs and outputs in the combined switch. Then, these switches have the same number of inputs and outputs, which is . The switch in the middle stage has r inputs and r outputs with . Therefore, all switches in the folded Clos network use equally sized crossbars.
Figure 10a shows a three-stage URNBC network with , , and . It has compute nodes. A two-stage IRNBC network, the folded version of the three-stage URNBC network with , , and , is shown in Figure 10b. It uses equally sized square crossbars for all switches. It can utilize shortcut connections to reduce communication path lengths. For example, if the source and destination nodes are connected to the same leaf switch, the communication does not need to go through the root switch.
Figure 10.
Proposed rearrangeably nonblocking Clos networks ( and ). (a) A 3-stage URNBC network with , , and . (b) A 2-stage IRNBC network composed of equally sized square crossbars.
Figure 11 shows the blocking and rearrangements in a rearrangeably nonblocking Clos network. Referring to Figure 11a, connection cannot be built because source node 2 (connected to the same switch as node 1) and destination node 3 (connected to the same switch as node 4) use different middle switches for their connections ( and ). Figure 11b shows the case of the folded version, where a bidirectional link consists of two oppositely oriented unidirectional links. Figure 11c–f show that the connection can be constructed after the rearrangement of existing connections.
Figure 11.
Blocking and rearrangements in a rearrangeably nonblocking Clos network ( and ). (a) Two connections ( and ) were built. The connection cannot be built. (b) The case of the folded version of (a). Note that a bidirectional link consists of two oppositely oriented unidirectional links. (c) Two connections ( and ) were built. The connection can also be built by the rearrangements of (a). (d) The case of the folded version of (c). (e) Two connections ( and ) were built. The connection can also be built by the rearrangements of (a). (f) The case of the folded version of (e).
Figure 12a shows a three-stage URNBC network with , , and . It has compute nodes. A two-stage IRNBC network, the folded version of the three-stage URNBC network with , , and , is shown in Figure 12b. It uses the equally sized square crossbars.
Figure 12.
Proposed rearrangeably nonblocking Clos networks ( and ). (a) A 3-stage URNBC network with , , and . (b) A 2-stage IRNBC network composed of equally sized square crossbars.
Table 5 and Table 6 show the differences between this work and existing original work. Table 5 compares the proposed three-stage URNBC network with a traditional three-stage unidirectional rearrangeably nonblocking Clos network. Table 6 compares the proposed two-stage IRNBC network with a traditional two-stage rearrangeably nonblocking folded Clos network. The expression in the switch column indicates that there are z switches, and each switch is an crossbar. N is the total number of compute nodes in the network. The IRNBC network uses equally sized square crossbar switches in the leaf and root stages.
Table 5.
Comparison of 3-stage unidirectional rearrangeably nonblocking Clos networks.
Table 6.
Comparison of 2-stage rearrangeably nonblocking folded Clos networks.
3.2.2. Three-Stage IRNBC Networks
To construct a three-stage IRNBC network, a five-stage URNBC network is constructed first. By using three-stage URNBC networks as building blocks, a five-stage URNBC network can be constructed. For , a three-stage URNBC network has compute nodes. As building blocks, the compute nodes are removed so that the three-stage URNBC network has inputs and outputs. m such building blocks are arranged in the middle stage. Then, in total, there are inputs and outputs in the middle stage. Correspondingly, the same number of outputs in the ingress stage and the same number of inputs in the egress stage can be arranged. Let r be the number of switches in the ingress and egress stages; then, must be equal to . Therefore, .
In summary, to construct a five-stage URNBC network, given n, which is the number of inputs per switch in the ingress stage, there are switches in the ingress stage, and each switch is an crossbar with . There are m building blocks in the middle stage, and each building block is a three-stage URNBC network with compute nodes removed. There are switches in the egress stage, and each switch is an crossbar with , as shown in Formula (5). The linking method is similar to the three-stage URNBC network; each output of a switch in the ingress stage is connected to an input of a different building block in the middle stage. Each output of a building block in the middle stage is connected to an input of a different switch in the egress stage.
Figure 13 shows a five-stage URNBC network with , , and . It has compute nodes. There are building blocks in the middle stage, and each building block is a three-stage URNBC network with compute nodes removed. The detailed network of a building block is shown at the bottom of the figure.
Figure 13.
A 5-stage URNBC network with , , and . There are building blocks (3-stage URNBC, Figure 10a) in the middle stage.
A three-stage IRNBC network, the folded version of the five-stage URNBC network with , , and , is shown in Figure 14. It uses equally sized square crossbars for all switches. Two two-stage IRNBC networks (Figure 10b) are shown in the two switch columns on the right.
Figure 14.
A 3-stage IRNBC network with composed of equally sized square crossbars (folded version of Figure 13).
3.2.3. Four-Stage IRNBC Networks
To construct a four-stage IRNBC network, a seven-stage URNBC network is constructed first. Similarly, by using five-stage URNBC networks as building blocks, a seven-stage URNBC network can be constructed. In summary, to construct a seven-stage URNBC network, m and r are determined as shown in Formula (6). The linking method is similar to the five-stage URNBC network.
Figure 15 shows a seven-stage URNBC network with , , and . It has compute nodes. There are building blocks in the middle stage, and each building block is a five-stage URNBC network with compute nodes removed. The detailed networks of building blocks are shown at the bottom of the figure.
Figure 15.
A 7-stage URNBC network with , , and . There are building blocks (5-stage URNBC, Figure 13) in the middle stage.
A four-stage IRNBC network, the folded version of the seven-stage URNBC network with , , and , is shown in Figure 16. It uses equally sized square crossbars for all switches. Two three-stage IRNBC networks (Figure 14) are shown in the three switch columns on the right.
Figure 16.
A 4-stage IRNBC network with composed of equally sized square crossbars (folded version of Figure 15).
Table 7 lists the numbers of compute nodes and switches of two-, three-, and four-stage IRNBC networks. Let s be the number of stages. Then, the number of compute nodes is , and the number of switches is . The derivation of the formula is given in Theorem A2 in Appendix B. The crossbar size is listed in the right column.
Table 7.
The numbers of compute nodes and switches in the IRNBC networks.
4. Cost Evaluations
This section evaluates the hardware cost for USNBC, ISNBC, URNBC, and IRNBC networks from the perspective of switch crosspoints and compares them to the corresponding traditional Clos networks. Here, we evaluate the costs of the 24 networks listed in Table 8.
Table 8.
Nonblocking Clos networks for cost evaluations.
4.1. Cost Evaluations of Strictly Nonblocking Clos Networks
This subsection investigates the crosspoint ratios relative to a single crossbar for unidirectional strictly nonblocking Clos (USNBC) networks and identical strictly nonblocking folded Clos (ISNBC) networks. These ratios are compared to those of the corresponding traditional Clos networks.
4.1.1. Cost Evaluations of USNBC Networks
An crossbar (n inputs and m outputs) has crosspoints. In the proposed strictly nonblocking Clos networks, . Referring to Figure 1a, in a three-stage USNBC network, there are switches in the ingress stage, and each switch is an crossbar. There are m switches in the middle stage, and each switch is an crossbar. There are switches in the egress stage, and each switch is an crossbar. The number of total crosspoints is . There are inputs and outputs. A single crossbar requires crosspoints. The crosspoint ratio of the three-stage USNBC network relative to a single crossbar is , which is less than 1 if . For example, when , the three-stage USNBC network requires crosspoints, which is less than crosspoints in the single crossbar’s implementation. In contrast, a traditional strictly nonblocking Clos network requires , as mentioned in Section 2.
Referring to Figure 6, in the five-stage case, there are crosspoints in the ingress stage, there are crosspoints in the middle stage, and there are crosspoints in the egress stage, where is the number of crosspoints in a three-stage USNBC network, as derived above. The total number of the crosspoints is . There are inputs and outputs. A single crossbar requires crosspoints. The crosspoint ratio of the five-stage USNBC network relative to a single crossbar is , which is less than 1 if .
Referring to Figure 8, in the seven-stage case, there are crosspoints in the ingress stage, there are crosspoints in the middle stage, and there are crosspoints in the egress stage, where is the number of crosspoints in a five-stage USNBC network, as derived above. The total number of the crosspoints is . There are inputs and outputs. A single crossbar requires crosspoints. The crosspoint ratio of the seven-stage USNBC network relative to a single crossbar is , which is less than 1 if .
The number of crosspoints for a traditional unidirectional strictly nonblocking Clos network [1] is examined below. A three-stage traditional unidirectional strictly nonblocking Clos network has n switches in the ingress stage, switches in the middle stage, and n switches in the egress stage. An ingress-stage switch is an crossbar, a middle-stage switch is an crossbar, and an egress-stage switch is an crossbar. Then, the total number of crosspoints is . The total number of compute nodes is . A single crossbar requires crosspoints. The crosspoint ratio of a three-stage traditional unidirectional strictly nonblocking Clos network relative to a single crossbar is . To guarantee that the ratio is less than 1, is needed.
Consider the five-stage case. There are crosspoints in the ingress stage, there are crosspoints in the middle stage, and there are crosspoints in the egress stage, where is the number of crosspoints in a three-stage traditional unidirectional strictly nonblocking Clos network, as derived above. The total number of crosspoints is . The total number of compute nodes is . A single crossbar requires crosspoints. The crosspoint ratio of a five-stage traditional unidirectional strictly nonblocking Clos network relative to a single crossbar is . To guarantee that the ratio is less than 1, is needed.
Consider the seven-stage case. There are crosspoints in the ingress stage, there are crosspoints in the middle stage, and there are crosspoints in the egress stage, where is the number of crosspoints in a five-stage traditional unidirectional strictly nonblocking Clos network, as derived above. The total number of crosspoints is . The total number of compute nodes is . A single crossbar requires crosspoints. The crosspoint ratio of a seven-stage traditional unidirectional strictly nonblocking Clos network relative to a single crossbar is . To guarantee that the ratio is less than 1, is needed.
Table 9 summarizes the crosspoint ratio relative to a single crossbar for traditional unidirectional strictly nonblocking Clos networks and USNBC networks.
Table 9.
Crosspoint ratio for unidirectional strictly nonblocking Clos networks.
Figure 17 plots the crosspoint ratio relative to a single crossbar for unidirectional strictly nonblocking Clos networks, showing that USNBC networks have a lower crosspoint cost than traditional strictly nonblocking Clos networks.
Figure 17.
Crosspoint ratio relative to a single crossbar in unidirectional strictly nonblocking Clos networks.
4.1.2. Cost Evaluations of ISNBC Networks
The number of crosspoints for an ISNBC network that uses equally sized square crossbars of for is examined below. Referring to Figure 1b, an ISNBC network based on the three-stage USNBC network has two stages. There are leaf switches and m root switches. The total number of switches is , and each switch is a square crossbar. Then, the total number of crosspoints is . The total number of compute nodes is .
A single crossbar requires crosspoints. The crosspoint ratio of a two-stage ISNBC network relative to a single crossbar is . To guarantee that the ratio is less than 1, is needed.
Consider a three-stage ISNBC network. Referring to Figure 7, there are switches in the leaf stage; there are m building blocks, and each building block is a two-stage ISNBC network whose number of switches is , as derived above. The total number of switches is , and each switch is a square crossbar. Then, the total number of crosspoints is . The total number of compute nodes is .
A single crossbar requires crosspoints. The crosspoint ratio of a three-stage ISNBC network relative to a single crossbar is . To guarantee that the ratio is less than 1, is needed.
Consider a four-stage ISNBC network. Referring to Figure 9, there are switches in the leaf stage; there are m building blocks, and each building block is a three-stage ISNBC network whose number of switches is , as derived above. The total number of switches is , and each switch is a square crossbar. Then, the total number of crosspoints is . The total number of compute nodes is .
A single crossbar requires crosspoints. The crosspoint ratio of a four-stage ISNBC network relative to a single crossbar is . To guarantee that the ratio is less than 1, is needed.
Table 10 lists the crosspoints of ISNBC networks. The “Crossbar” column shows the number of crosspoints in a single crossbar. The “ISNBC” column shows the number of crosspoints in an ISNBC network. The number of crosspoints of the ISNBC network is better (smaller) than that of a single crossbar when for the two-stage ISNBC network and for the three- and four-stage ISNBC networks.
Table 10.
The numbers of crosspoints in ISNBC networks.
The number of crosspoints for a traditional strictly nonblocking folded Clos network that uses crossbars of different sizes is examined below. A two-stage traditional strictly nonblocking folded Clos network has n switches in the leaf stage and switches in the root stage. A leaf switch is an crossbar. A root switch is an crossbar. Then, the total number of crosspoints is . The total number of compute nodes is .
A single crossbar requires crosspoints. The crosspoint ratio of a two-stage traditional strictly nonblocking folded Clos network relative to a single crossbar is . To guarantee that the ratio is less than 1, is needed.
Consider the three-stage case. There are switches in the leaf stage, and each switch is an crossbar. There are building blocks, and each building block has crosspoints, as derived above. Then, the total number of crosspoints is . The total number of compute nodes is . A single crossbar requires crosspoints. The crosspoint ratio of a three-stage traditional strictly nonblocking folded Clos network relative to a single crossbar is . To guarantee that the ratio is less than 1, is needed.
Consider the four-stage case. There are switches in the leaf stage, and each switch is an crossbar. There are building blocks, and each building block has crosspoints, as derived above. Then, the total number of crosspoints is . The total number of compute nodes is . A single crossbar requires crosspoints. The crosspoint ratio of a four-stage traditional strictly nonblocking folded Clos network relative to a single crossbar is . To guarantee that the ratio is less than 1, is needed.
Table 11 summarizes the crosspoint ratio relative to a single crossbar for traditional strictly nonblocking folded Clos networks and ISNBC networks. The general formula for calculating the ISNBC crosspoint ratio relative to a single crossbar is , where s is the number of stages and the number of compute nodes is .
Table 11.
Crosspoint ratio for strictly nonblocking folded Clos networks.
Table 12 lists the crosspoint ratios relative to a single crossbar for strictly nonblocking folded Clos networks. The ratios are calculated based on the formulas in Table 11.
Table 12.
Crosspoint ratio relative to a single crossbar in strictly nonblocking folded Clos networks.
Figure 18 plots the crosspoint ratio relative to a single crossbar for strictly nonblocking folded Clos networks, showing that ISNBC networks have a lower crosspoint cost than traditional strictly nonblocking folded Clos networks. Also note that ISNBC networks use equally sized square crossbars for all switches in the network.
Figure 18.
Crosspoint ratio relative to a single crossbar in strictly nonblocking folded Clos networks.
Figure 19 plots the crosspoint ratio relative to a single crossbar for strictly nonblocking folded Clos networks, showing that ISNBC networks have a lower crosspoint cost than the strictly nonblocking folded Clos network proposed in [16]. The network proposed in [16] is labeled “Redesign” in the figure, only supports two stages, and uses multiple links () between a leaf switch and a root switch. If the network is constructed using equally sized crossbar switches, there will be unused ports on the leaf switch, the root switch, or both the leaf and root switches.
Figure 19.
Crosspoint ratio relative to a single crossbar in strictly nonblocking folded Clos networks.
4.2. Cost Evaluations of Rearrangeably Nonblocking Clos Networks
This subsection investigates the crosspoint ratios relative to a single crossbar for unidirectional rearrangeably nonblocking Clos (URNBC) networks and identical rearrangeably nonblocking folded Clos (IRNBC) networks. These ratios are compared to the corresponding traditional Clos networks. It is unfair to compare a rearrangeably nonblocking Clos network to a single crossbar, since a single crossbar is a strictly nonblocking network. The reason the ratios are presented here is to make it easier to see the difference between the proposed network and a traditional network.
4.2.1. Cost Evaluations of URNBC Networks
As described in the previous section, in URNBC networks, and . Referring to Figure 10a, in a three-stage URNBC network, there are switches in the ingress stage, and each switch is an crossbar; there are m switches in the middle stage, and each switch is an crossbar; and there are switches in the egress stage, and each switch is an crossbar. The total number of crosspoints is . There are compute nodes. A single crossbar requires crosspoints. The crosspoint ratio of a three-stage URNBC network relative to a single crossbar is .
Referring to Figure 13, in a five-stage URNBC network, there are switches in the ingress stage, and each switch is an crossbar; there are m three-stage URNBC networks, and each URNBC network has crosspoints, as derived above; and there are switches in the egress stage, and each switch is an crossbar. The total number of crosspoints is . There are compute nodes. A single crossbar requires crosspoints. The crosspoint ratio of a five-stage URNBC network relative to a single crossbar is .
Referring to Figure 15, in a seven-stage URNBC network, there are switches in the ingress stage, and each switch is an crossbar; there are m five-stage URNBC networks, and each URNBC network has crosspoints, as derived above; and there are switches in the egress stage, and each switch is an crossbar. The total number of crosspoints is . There are compute nodes. A single crossbar requires crosspoints. The crosspoint ratio of a seven-stage URNBC network relative to a single crossbar is .
The number of crosspoints for a traditional unidirectional rearrangeably nonblocking Clos network is examined below. In a three-stage traditional unidirectional rearrangeably nonblocking Clos network, and . The total number of crosspoints is . A single crossbar requires crosspoints. Therefore, the crosspoint ratio of a three-stage traditional unidirectional rearrangeably nonblocking Clos network relative to a single crossbar is .
In the five-stage case, and . The total number of crosspoints is , where is the number of crosspoints in a three-stage traditional unidirectional rearrangeably nonblocking Clos network, as derived above. A single crossbar requires crosspoints. Therefore, the crosspoint ratio of a five-stage traditional unidirectional rearrangeably nonblocking Clos network relative to a single crossbar is .
In the seven-stage case, and . The total number of crosspoints is , where is the number of crosspoints in a five-stage traditional unidirectional rearrangeably nonblocking Clos network, as derived above. A single crossbar requires crosspoints. Therefore, the crosspoint ratio of a seven-stage traditional unidirectional rearrangeably nonblocking Clos network relative to a single crossbar is .
Table 13 summarizes the crosspoint ratio relative to a single crossbar for traditional unidirectional rearrangeably nonblocking Clos networks and URNBC networks. The cost ratios for the proposed URNBC networks relative to traditional networks are , , and for three-stage, five-stage, and seven-stage networks, respectively.
Table 13.
Crosspoint ratio for unidirectional rearrangeably nonblocking Clos networks.
Figure 20 plots the crosspoint ratio relative to a single crossbar for unidirectional rearrangeably nonblocking Clos networks, showing that the URNBC networks have a lower crosspoint cost than traditional rearrangeably nonblocking Clos networks.
Figure 20.
Crosspoint ratio relative to a single crossbar in unidirectional rearrangeably nonblocking Clos networks.
4.2.2. Cost Evaluations of IRNBC Networks
The number of crosspoints for IRNBC networks that use equally sized square crossbars of for is examined below. Referring to Figure 10b, in a two-stage IRNBC network, there are switches in the leaf stage and m switches in the root stage. The total number of crosspoints is . There are compute nodes. A single crossbar requires crosspoints. The crosspoint ratio of a two-stage IRNBC network relative to a single crossbar is .
Referring to Figure 14, in a three-stage IRNBC network, the total number of crosspoints is , where is the number of crosspoints in a two-stage IRNBC network, as derived above. There are compute nodes. A single crossbar requires crosspoints. The crosspoint ratio of a three-stage IRNBC network relative to a single crossbar is .
Referring to Figure 16, in a four-stage IRNBC network, the total number of crosspoints is , where is the number of crosspoints in a three-stage IRNBC network, as derived above. There are compute nodes. A single crossbar requires crosspoints. The crosspoint ratio of a four-stage IRNBC network relative to a single crossbar is .
Table 14 lists the crosspoints of IRNBC networks. The “Crossbar” column shows the number of crosspoints in a single crossbar. The “IRNBC” column shows the number of crosspoints in an IRNBC network. The number of crosspoints of the IRNBC network is better (smaller) than that of a single crossbar when for the two-stage IRNBC network, for the three-stage IRNBC network, and for the four-stage IRNBC networks.
Table 14.
The number of crosspoints in IRNBC networks.
The number of crosspoints for traditional rearrangeably nonblocking folded Clos networks with is examined below. In a two-stage traditional rearrangeably nonblocking folded Clos network, the total number of crosspoints is . There are compute nodes. A single crossbar requires crosspoints. The crosspoint ratio of a two-stage traditional rearrangeably nonblocking folded Clos network relative to a single crossbar is .
In the three-stage case, the total number of crosspoints is , where is the number of crosspoints in a two-stage traditional rearrangeably nonblocking folded Clos network, as derived above. There are compute nodes. A single crossbar requires crosspoints. The crosspoint ratio of a three-stage traditional rearrangeably nonblocking folded Clos network relative to a single crossbar is .
In the four-stage case, the total number of crosspoints is , where is the number of crosspoints in a three-stage traditional rearrangeably nonblocking folded Clos network, as derived above. The crosspoint ratio of a four-stage traditional rearrangeably nonblocking folded Clos network relative to a single crossbar is .
Table 15 summarizes the crosspoint ratio relative to a single crossbar for traditional rearrangeably nonblocking folded Clos networks and IRNBC networks. The cost ratios for the proposed IRNBC networks relative to traditional networks are , , and for three-stage, five-stage, and seven-stage networks, respectively. The general formula for calculating the IRNBC crosspoint ratio relative to a single crossbar is , where s is the number of stages and the number of compute nodes is .
Table 15.
Crosspoint ratio for rearrangeably nonblocking folded Clos networks.
Table 16 lists the crosspoint ratios relative to a single crossbar for rearrangeably nonblocking folded Clos networks. The ratios are calculated based on the formulas in Table 15.
Table 16.
Crosspoint ratio relative to a single crossbar in rearrangeably nonblocking folded Clos networks.
Figure 21 plots the crosspoint ratio relative to a single crossbar for rearrangeably nonblocking folded Clos networks, showing that IRNBC networks have a lower crosspoint cost than traditional rearrangeably nonblocking folded Clos networks.
Figure 21.
Crosspoint ratio relative to a single crossbar in rearrangeably nonblocking folded Clos networks.
From the above discussion in this section, it can be seen that the crosspoint ratios of ISNBC and IRNBC networks are lower than those of their corresponding traditional folded Clos networks. The crosspoint ratio of IRNBC networks is lower than that of ISNBC networks because IRNBC networks are rearrangeably nonblocking folded Clos networks that require rearrangements of existing connections to make new connections and ISNBC networks are strictly nonblocking folded Clos networks that do not require rearrangements of existing connections to make new connections.
The proposed ISNBC and IRNBC networks are summarized in Table 17, where the “Node” column shows the number of compute nodes, the “Crossbar” column shows the number of crosspoints in a single crossbar, the “Crosspoint” column shows the number of crosspoints in the proposed network, and the “Switch” column shows the number of switches in the proposed network. Both networks use square crossbar switches.
Table 17.
Summary of the proposed ISNBC and IRNBC networks, where n is the number of compute nodes connected to a leaf switch and s is the number of stages.
Figure 22 compares the crosspoint ratios of ISNBC and IRNBC networks for different numbers of compute nodes. Based on this figure, the number of stages can be selected for a given number of compute nodes such that the system has a low crosspoint ratio.
Figure 22.
Crosspoint ratios versus the numbers of compute nodes in the proposed networks.
A limitation of the proposed identical nonblocking folded Clos networks is that in ISNBC networks, square crossbar switches must be used, whereas in IRNBC networks, square crossbar switches must be used, where n is the number of compute nodes connected to a leaf switch. Therefore, the proposed networks cannot use arbitrary square crossbar switches, where , without any unused switch ports. For example, by using practical crossbar switches, an IRNBC network can be constructed with no unused switch ports ( with ). By constructing an ISNBC network with crossbar switches, one port on each switch is left unused ( with ). Again, if an ISNBC network is constructed using crossbar switches, there will be no unused switch ports.
5. Conclusions
Nowadays, available switches are square crossbars with the same number of input and output ports. Traditional unidirectional strictly nonblocking Clos networks use switches with different numbers of input and output ports and route packets through a fixed number of stages. Fat trees, the folded version of Clos networks, use differently sized switches at the root and other stages. Recently proposed Clos networks have only two stages and introduce unused ports on the switches.
To address these issues, this paper proposed two new folded Clos variants: an identical strictly nonblocking Clos (ISNBC) network and an identical rearrangeably nonblocking Clos (IRNBC) network. These designs use equally sized square crossbar switches across all stages, eliminate unused switch ports, increase system scalability by accommodating any number of stages, and reduce communication path lengths by supporting shortcut connections. Both ISNBC and IRNBC networks have lower switch crosspoint costs compared to their traditional counterparts. Specifically, ISNBC networks use 46.43% to 87.71% of the crosspoints of traditional strictly nonblocking folded Clos networks, and IRNBC networks use 53.85% to 60.00% of the crosspoints of traditional rearrangeably nonblocking folded Clos networks.
The limitation is that ISNBC networks require the use of square crossbar switches, and IRNBC networks require the use of square crossbar switches, where n is the number of compute nodes connected to a leaf switch.
Future work should develop load-balancing adaptive routing algorithms and fault-tolerant routing algorithms for the proposed identical strictly and rearrangeably nonblocking folded Clos networks and evaluate performance through simulations.
Funding
This research received no external funding.
Data Availability Statement
Data is contained within the article.
Conflicts of Interest
The author declares no conflicts of interest.
Appendix A. K-Ary N-Fly Butterfly and K-Ary N-Tree Clos Networks
This section describes k-ary n-fly butterfly and k-ary n-tree Clos networks. Figure A1 shows a k-ary n-fly butterfly network with and . It has 27 compute nodes. By convention, source and destination nodes are logically drawn from left and right, but physically, two nodes in the same row are the same physical node. The links that connect the ports of switches are unidirectional.
Figure A1.
A 3-ary 3-fly butterfly network (unidirectional links).
Generally, a switch in a k-ary n-fly butterfly network is labeled as , where s is the stage with and is the switch inside stage s with for . In stage s for , a switch connects to switches , where . For example, in Figure A1, switch in the 3-ary, 3-fly butterfly network connects to switches , , and .
A k-ary n-tree Clos network can be created by combining two k-ary n-fly butterfly networks back to back, where the two back stages are fused [9]. There are stages. The stages on the left to the middle stage form an input network, and the stages on the right to the middle stage form an output network. A k-ary n-tree Clos network is a rearrangeably nonblocking network. It solves the problem of a lack of path diversity in butterfly networks.
The input network can route from any source compute node to any middle-stage switch. The output network can route from any middle stage switch to any destination compute node. Like a k-ary n-fly butterfly network, the links in a k-ary n-tree Clos network are also unidirectional. Figure A2 shows a k-ary n-tree Clos network with and . It has stages and compute nodes.
Figure A2.
A 3-ary 3-tree Clos network (unidirectional links).
Appendix B. Deriving the Number of Switches in ISNBC and ISNBC Networks
This appendix presents the derivation of the general formulas for calculating the number of switches for ISNBC and ISNBC networks.
Theorem A1.
The number of switches in an ISNBC network is , where n is the number of compute nodes connected to a leaf switch and s is the number of stages.
Proof.
For (one switch stage), there is one switch connecting n compute nodes. Then,
For (two switch stages), there are r switches in the leaf stage and m switches in the root stage, with and . Then,
For (three switch stages), there are switches in the leaf stage and building blocks in the root stage, and each building block has switches. Then,
For (four switch stages), there are switches in the leaf stage and building blocks in the root stage, and each building block has switches. Then,
Generally, for s stages, there are switches in the leaf stage and building blocks in the root stage, and each building block has switches. Then,
In other words, we have a recursive equation:
Now, we derive the general formula for from the recursive equation. Dividing the left and right sides of Equation (A1) by , we obtain
Let ; then, Equation (A2) becomes
To eliminate constant 3, we have
Let ; then, Equation (A5) becomes
It is a geometric progression, and the common ratio is 2. Now, we calculate the first term.
It is well known that the general nth term of a geometric progression is given by , where is the first term, r is the common ratio, and n is the term number. Then, in our case, the sth term of the geometric progression (A6) is given by
Considering , we have
The sum of the left side of the above equations is
and the sum of the right side of the above equations is
Then, we have (left sum = right sum) or , that is,
Because , we have . □
Theorem A2.
The number of switches in an IRNBC network is , where n is the number of compute nodes connected to a leaf switch and s is the number of stages.
Proof.
For (one switch stage), there is one switch connecting n compute nodes. Then,
For (two switch stages), there are r switches in the leaf stage and m switches in the root stage, with and . Then,
For (three switch stages), there are switches in the leaf stage and building blocks in the root stage, and each building block has switches. Then,
For (four switch stages), there are switches in the leaf stage and building blocks in the root stage, and each building block has switches. Then,
Generally, for s stages, there are switches in the leaf stage and building blocks in the root stage, and each building block has switches. Then,
In other words, we have a recursive equation:
Now, we derive the general formula for from the recursive equation. Dividing the left and right sides of Equation (A9) by , we obtain
Let ; then, Equation (A10) becomes
It is an arithmetic progression; the common difference is 2, and the first term is 1:
It is well known that the general nth term of an arithmetic progression is given by , where is the first term, d is the common difference, and n is the term number. Then, in our case, the sth term of the geometric progression (A11) is given by
Because , we have . □
References
- Clos, C. A study of non-blocking switching networks. Bell Syst. Tech. J. 1953, 32, 406–424. [Google Scholar] [CrossRef]
- Leiserson, C.E. Fat-trees: Universal networks for hardware-efficient supercomputing. IEEE Trans. Comput. 1985, C-34, 892–901. [Google Scholar] [CrossRef]
- Singh, A.; Ong, J.; Agarwal, A.; Anderson, G.; Armistead, A.; Bannon, R.; Boving, S.; Desai, G.; Felderman, B.; Germano, P.; et al. Jupiter Rising: A Decade of Clos Topologies and Centralized Control in Google’s Datacenter Network. ACM SIGCOMM Comput. Commun. Rev. 2015, 45, 183–197. [Google Scholar] [CrossRef]
- Liao, X.K.; Pang, Z.B.; Wang, K.F.; Lu, Y.T.; Xie, M.; Xia, J.; Dong, D.Z.; Suo, G. High Performance Interconnect Network for Tianhe System. J. Comput. Sci. Technol. 2015, 30, 259–272. [Google Scholar] [CrossRef]
- Fu, H.; Liao, J.; Yang, J.; Wang, L.; Song, Z.; Huang, X.; Yang, C.; Xue, W.; Liu, F.; Qiao, F.; et al. The Sunway TaihuLight supercomputer: System and applications. Sci. China Inf. Sci. 2016, 59, 1869–1919. [Google Scholar] [CrossRef]
- Stanzione, D.; West, J.; Evans, R.T.; Minyard, T.; Ghattas, O.; Panda, D.K. Frontera: The Evolution of Leadership Computing at the National Science Foundation. In Proceedings of the Practice and Experience in Advanced Research Computing 2020: Catch the Wave, New York, NY, USA, 27–31 July 2020; PEARC ’20. pp. 106–111. [Google Scholar] [CrossRef]
- Stunkel, C.B.; Graham, R.L.; Shainer, G.; Kagan, M.; Sharkawi, S.S.; Rosenburg, B.; Chochia, G.A. The high-speed networks of the Summit and Sierra supercomputers. IBM J. Res. Dev. 2020, 64, 3:1–3:10. [Google Scholar] [CrossRef]
- Zhao, C.; Deng, C.; Ruan, C.; Dai, D.; Gao, H.; Li, J.; Zhang, L.; Huang, P.; Zhou, S.; Ma, S.; et al. Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures. In Proceedings of the 52nd Annual International Symposium on Computer Architecture, New York, NY, USA, 21–25 June 2025; pp. 1731–1745. [Google Scholar] [CrossRef]
- Abts, D.; Kim, J. High Performance Datacenter Networks: Architectures, Algorithms, and Opportunities; Morgan and Claypool: San Rafael, CA, USA, 2011. [Google Scholar] [CrossRef]
- Petrini, F.; Vanneschi, M. k-ary n-trees: High performance networks for massively parallel architectures. In Proceedings of the 11th International Parallel Processing Symposium, Geneva, Switzerland, 1–5 April 1997; pp. 87–93. [Google Scholar] [CrossRef]
- Dally, W.J.; Towles, B.P. Principles and Practices of Interconnection Networks; The Morgan Kaufmann Series in Computer Architecture and Design; Elsevier: Amsterdam, The Netherlands, 2004; Available online: https://books.google.co.jp/books?id=oOqpcB5191sC (accessed on 1 July 2025).
- Beneš, V.E. Permutation groups, complexes, and rearrangeable connecting networks. Bell Syst. Tech. J. 1964, 43, 1619–1640. [Google Scholar] [CrossRef]
- Al-Fares, M.; Loukissas, A.; Vahdat, A. A scalable, commodity data center network architecture. ACM SIGCOMM Comput. Commun. Rev. 2008, 38, 63–74. [Google Scholar] [CrossRef]
- Li, Y.; Chu, W. MiKANT: A Mirrored K-Ary N-Tree for Reducing Hardware Cost and Packet Latency of Fat-Tree and Clos Networks. In Proceedings of the 18th IEEE International Conference on Scalable Computing and Communications, Washington, DC, USA, 8–12 October 2018; pp. 1643–1650. [Google Scholar] [CrossRef]
- Li, Y.; Chu, W. Fault Tolerance and Packet Latency of Peer Fat-Trees. In Proceedings of the Parallel and Distributed Computing, Applications and Technologies, Sendai, Japan, 7–9 December 2022; pp. 413–425. [Google Scholar] [CrossRef]
- Mano, T.; Inoue, T.; Mizutani, K.; Akashi, O. Redesigning the Nonblocking Clos Network to Increase Its Capacity. IEEE Trans. Netw. Serv. Manag. 2023, 20, 2558–2574. [Google Scholar] [CrossRef]
- Taka, H.; Inoue, T.; Oki, E. Twisted and Folded Clos-Network Design Model with Two-Step Blocking Probability Guarantee. IEEE Netw. Lett. 2024, 6, 60–64. [Google Scholar] [CrossRef]
- Taka, H.; Inoue, T.; Oki, E. Design model of a twisted and folded Clos network with multi-step grouped intermediate switches guaranteeing admissible blocking probability. J. Opt. Commun. Netw. 2024, 16, 328–341. [Google Scholar] [CrossRef]
- Oki, E.; Taniguchi, R.; Anazawa, K.; Inoue, T. Design of Multiple-Plane Twisted and Folded Clos Network Guaranteeing Admissible Blocking Probability. IEEE Trans. Netw. Serv. Manag. 2025, 22, 2278–2294. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).