A Primer on Design Aspects and Recent Advances in Shuffle Exchange Multistage Interconnection Networks

Interconnection networks provide an effective means by which components of a system such as processors and memory modules communicate to provide reliable connectivity. This facilitates the realization of a highly efficient network design suitable for computational-intensive applications. Particularly, the use of multistage interconnection networks has unique advantages as the addition of extra stages helps to improve the network performance. However, this comes with challenges and trade-offs, which motivates researchers to explore various design options and architectural models to improve on its performance. A particular class of these networks is shuffle exchange network (SEN) which involves a symmetric N-input and N-output architecture built in stages of N/2 switching elements each. This paper presents recent advances in multistage interconnection networks with emphasis on SENs while discussing pertinent issues related to its design aspects, and taking lessons from the past and current literature. To achieve this objective, applications, motivating factors, architectures, shuffle exchange networks, and some of the performance evaluation techniques as well as their merits and demerits are discussed. Then, to capture the latest research trends in this area not covered in contemporary literature, this paper reviews very recent advancements in shuffle exchange multistage interconnection networks within the last few years and provides design guidelines as well as recommendations for future consideration.


Introduction
Interconnection network is an economical and attractive way to enhance communication among components of a system [1]. It is usually deployed to provide a reliable fast on-chip communication between processors and the embedded modules while performing tasks in large-scale complex computing systems [2]. It is used in real-time applications to facilitate concurrent processing in applications such as weather prediction, radar tracking, airframe design and image processing. These applications use many processors since their requirements go beyond what a processor can deliver [3]. Thus, a concurrent processing system is constructed using large scale integration (LSI) technology by connecting (i.e., interconnecting) hundreds or thousands of network memory modules and off-the-shelf processors [4].
Multistage interconnection network (MIN) is realized by deploying multiple stages of switching elements and connecting links. An example is a shuffle exchange network (SEN). The computing systems are built using modules/processing elements that can communicate with each other [2] to take advantage of multiprocessing. However, this is more challenging as the number of processors increases [3]. MINs are suitable for communication between units of processors and memory whose operations are tightly networks within the last few years. Prior to that, insights are provided on the design aspects of MINs with emphasis on reliability and fault-tolerance properties. It is worthy of note that the focus is to bring up new developments in this area as regards the reliability property as well as other design aspects while providing useful recommendations and guidelines for future work.

Contribution
The contributions of this paper are as follows: • It reviews the recent technical contributions towards improving fault-tolerance, analysis, assessment, and modelling of the reliability property of shuffle exchange MINs while comparing some of their key metrics, objectives and features. Topologies for enhanced SEN designs are also provided to facilitate a deeper understanding of the reviewed works. • It highlights further recommendations towards the design and analysis of shuffle exchange networks based on lessons learnt from prior art and current research trends.
The rest of this paper is structured as follows: Section 2 provides the background to MINs. In Sections 3 and 4, further detail on key considerations and recent advances in the design aspects of shuffle exchange MINs are discussed in terms of the fault tolerance (Section 3) and reliability (Section 4) properties of MINs. Future recommendations in this area are discussed in Section 5 while Section 6 concludes this paper.

Multistage Interconnection Networks
MINs involve switching elements linked in stages between the input and output terminals [30] based on a specific topology. Generally, the topology of MIN follows the architecture provided in Figure 1. Switching elements store received packets and forward them using input and output buffers [31]. Thus, the performances of MINs are greatly affected by the internal buffer design. Each stage partly satisfies the input-output connection requirements [32]. These stages of switching elements and links provide highbandwidth and low cost/cost-effective communication among multiprocessors. MINs are suitable for communication between units of processors and memory whose operations are tightly coupled [5]. They provide fast and efficient communication between processors of high capacity [3]. The advantages of parallel computing have made MIN a practical way to achieve high-performance computing [1].

Architecture of MINs
A MIN with an N-input and N-output (where N is assumed to be a power of two for simplicity e.g., N = 2 n ) is literally built in several stages and each stage consists of N/2 switching elements. Successive stages are linked such that every input can be connected to as many outputs as possible. Via the n-stages, a path can be provided between any of the N-inputs and each of the N-output [32]. Hence, communication is facilitated between processors and memory modules using MINs [33,34]. This is essential for building efficient and scalable parallel systems [21], especially in supercomputers and multi-processor systems consisting of thousands of processors [35]. MINs can be described using the switch box (an interchange having more than one input and output), network topology and control structure. Figure 1 depicts a generalized MIN where several switches are deployed in every stage and inter-stage connections are fixed. Here, it is possible to dynamically set the switch to establish the desired connection [31].
Trade-offs: A crucial aspect of the network architecture is to determine the stage when it is necessary to add more switching elements and where to add extra links. This is due to the tradeoff involved when more switching elements are added. MINs without extra switching elements and links are non-redundant. Thus, they are weak because they result in faulty systems [30]. Extra links improve fault tolerance and performance but at the cost of a higher level of hardware overhead. Moreover, a complicated routing algorithm may be required. Such a complicated algorithm may not be a favorable option, especially when configuration time is considered [30].
Peculiar aspects: Aside from being attractive, MINs are fast and cost-effective [34,36]. They facilitate effective communication among system components [12]. MINs also have the potential to increase the reliability of multiprocessor systems with shared memory. MINs can play a vital role in telecommunications and parallel processing systems [12,37]. As such, fast computation applications such as chemical reaction simulation, air traffic control, robotic vision, aerodynamic simulations, ballistic missile defence and seismic data processing can all be facilitated [7]. Recently, the design of network-on-chip architectures has also benefited immensely from MINs [25].

Categories of MINs
MINs for multiprocessor systems are of different kinds such as Benes, Omega, Clos, Indirect binary cube, to mention a few [38]. MINs are either single-path or multi-path [39]. The former is usually not fault-tolerant since there exists only a single path to the destination while the latter is more fault-tolerant because of its multi-path architecture. MINs can also be categorized as static and dynamic. Dynamic MINs can be further classified into three: regular, irregular and hybrid. Regular dynamic MINs usually have the same number of switching elements in every stage while this is not the case in irregular MINs. Hybrid MINs combine the features of regular and irregular MINs [39]. A summary of these categorizations is given in Figure 2.
Furthermore, MINs are either one-sided or two-sided. One-sided MINs have both input and output ports on the same side while the two-sided ones have different sides: one for the input and the other for the output. This can be further divided into blocking, re-arrangeable and non-blocking classes. When the simultaneous connection of more than one terminal pair causes conflicts in the use of communication links, the network is considered a blocking network. On the other hand, all possible combinations in the network could be handled without blocking, i.e., a non-blocking network, e.g., a crossbar network. In re-arrangeable networks, all the possible input-output connections can be established by rearranging existing connections [31]. A non-blocking MIN can implement a full permutation where each connection can be routed in any order. Generally, nonblocking and re-arrangeable network incur more cost (i.e., more expensive) than a blocking network. Besides, blocking networks have reduced latency, switching and wire cost [40].

Design Aspects of MINs
Designing economical and fault-tolerant architectures is fundamental to MINs [37] to ensure its reliability. Hence, packet collision and blocking that occur when several sources attempt to send data [41] should be well managed. One fundamental issue in MIN is determining how an increase in the number of MIN stages impacts reliability. This is because the improvement achieved by adding more switching stages is limited in large-scale systems. An alternative approach towards improving the network performance in such scenarios is by reducing the number of nodes [42]. Similarly, the design of MINs to cope with different types of traffic such as hotspot and multicast is essential. The challenge of finding the exact number of passes to achieve a MIN permutation is traditionally NP-complete [33].
Factors that should be put into consideration in the design of MINs include its support for handling interrupts, data routing, synchronization and coherence, network latency, implementation cost and hardware complexity, bandwidth, and scalability in terms of modular expandability of MINs [31]. These factors should not be ignored for simplicity of analysis especially when the assumptions are not spelt out by the network designer. This would help to reduce the disparity in the pen-and-paper network performance prediction and its actual hardware implementation expectation and performance.
Additionally, determining the minimum number of passes is essential for cost-effective routing in MINs [33]. This ensures that data can be transferred successfully from the source to the destination with the least number of possible passes for efficient transmission. Moreover, whenever the network load is heavy or some lines are faulty, the data transfer rate becomes adversely affected and as such, it may be difficult to establish links within the network [32].

Improving Efficiency in MINs
Whenever a switching element forwards two or more requests to the same output in interconnection networks, the switching element randomly selects a packet to forward to the next stage. This kind of connection is rejected in circuit-switched networks. On the other hand, a packet-switched network stores the unselected packet in an internal buffer. As such, these buffers must be well designed to ensure efficient performance [31]. Due to the importance of delivering multicast traffic in many modern systems today, it is imperative that efficient MINs are designed to cater for the routing requirements of such systems.
One way towards an efficient MIN design is the use of replication. Typical MINs can be replicated L times to form L layers, which are usually referred to as replicated MINs. Increasing the number of paths improves the reliability and multicast performance in replicated MINs due to the existence of multipath between the source-destination pair. However, there is an issue of unnecessary replication in the initial stages instead of replication in the last stages that are needed to provide switching power. This limits the network performance relative to the cost involved [43] as well as the path length [41].
Multilayer MINs can improve the performance of MINs from different aspects [43] for data transfer. Two options are considered for restricting the amount of chip space and the number of layers in multilayer MINs: (a) beginning replication in stages that are far behind and (b) stopping replication to reduce the complexity of the network [43]. U-type MIN is another type of MINs capable of handling different traffic types efficiently. This improves the performance of the network [24]. However, much is yet to be studied on the potential of this type of network.

Shuffle Exchange MINs
SEN is a basic topology of MINs having N inputs and N outputs [36]. Figure 3 shows an example of 12 switching elements linked in three stages based on SEN topology. Note that several other topologies exist in the literature such as omega, gamma, delta, and hypercube. The configuration of SEN is dependent on its switches, number of stages, and its network interconnection [12]. An N × N MIN that is connected to switching elements (2 × 2 crossbar switches) forms a structure with N/2 switching elements in every stage and a total of log 2 N stages. However, there is only one single path between the source (input) and the destination (output) [44]. SEN is considered a unipath MIN having the lowest cost [36] and can be enhanced in several ways to suit the reliability and routing requirements. SENs are thus considered highly propitious for the design of efficient and reliable supercomputers [23]. One way that has been used to improve the performance of SENs is increasing the number/size of switching elements to increase its redundant paths thereby improving the network performance [36]. SEN has many unique characteristics. It is a practical and popularly used MIN because of its simple topology and low cost. The topology is uncomplicated since its basic building blocks use the smallest switching elements. In fact, its topology is equivalent to Cube network, baseline network and Omega network which are regular MINs having unique paths. Moreover, similar methods can be used to enhance the reliability of these networks.
A lot of researchers have invested much time and effort to improve this network in terms of reliability and fault tolerance [16]. This is because these two properties are of major concern in the design of SEN's variants. Next, we discuss some of the pertinent aspects of fault tolerance and reliability with a highlight of the recent advances towards achieving these objectives. A classification ( Figure 4) and comparison of metrics, objectives and unique features ( Table 1) of some of these works are presented to provide a first look into some of the features and limitations of these works.  Metric/Objective

Fault Tolerance
The ability of a network to function in the presence of component failures is referred to as fault tolerance. The technique involved in achieving this objective comes at the cost of considerable degradation in system performance [45]. Fault tolerance can be achieved in multistage interconnection by providing a disjoint path between the source and destination [5]. Hence, latency is reduced [12] in the network. When a non-catastrophic fault occurs in a fault-tolerant system, there would not be a total system shutdown albeit a reduction in system capacity. Such graceful degradation depends not only on the performance of the processor but also that of the MINs [46].
In MINs, one path exists between a source and its destination. Therefore, the network can fail when a switch or link fails. Researchers have put in effort to improve the reliability using fault avoidance and fault-tolerant schemes. The use of fault tolerance is corrective in the sense that action is only taken after a fault occurs [47]. Thus, fault-tolerant MINs are designed to ensure that the system continues functioning (at a reduced rate) in the presence of failures. This is because in most cases, it is not possible to provide immediate repair or replacement to components that fail.
It is imperative to know that by being fault-tolerant, the system is not being optimized due to the faulty system components [46]. For this reason, the performance is lower (e.g., low response time) when compared with a system without faults. Fault tolerance can also increase the system cost, design complexity, space requirements, and weight. The other technique, fault avoidance, is preventive in the sense that components with high reliability and quality are used. This makes it affordable as compared to fault tolerance techniques where new switches and links may be required [47].

Improving Fault Tolerance
Fault tolerance is regarded as a core aspect of parallel multiprocessor systems [32]. In binary hypercube and MINs, which are from the most popular architectures due to their commercial availability, a fault in a single processor may make the entire network unavailable. This happens especially if the topology dictates the function-ability of the algorithm which makes it important to embed fault tolerance into such architectures [48]. Fault tolerance can be improved by creating more redundant paths between the source and the destination [49]. Systems having a higher level of fault tolerance use more system components which causes an increase in system complexity and cost. This is because a higher number of system components increases the total network cost [16].
Fault tolerance is a matter of concern in SEN because it is a unique path network. In order to improve the fault tolerance of SENs, many variants have been proposed by adding or reducing the number of stages. Only a maximum of one fault tolerance can be provided at the input/output stage [23]. These variants of SEN have their unique characteristics and weaknesses. For instance, SEN- [50] improves the network performance through its reduced number of stages, however, it has problems of partial connectivity for source-destination pairs. Most of the advancements in SEN architecture has been made by either reducing the number of input and output or increasing the input and output stages [36]. It is also important to bear in mind that adding extra stages calls for re-wiring terminals in practice which makes it imperative to minimize the number of steps involved [51].

Recent Advances
One of the prior works on improving the performance of SENs [50] proposed a scheme with higher (terminal, broadcast and network) reliability performance, requiring lesser switches but characterized by partial connectivity. In other words, not all sources are connected and there are only two destinations. Additionally, there is a single path between source and destination. This motivates the authors of [36] to propose a new SEN-MIN which improves on [50].
As shown in Figure 5, the use of multiplexer (MUX) and demultiplexer (DEMUX) was considered in [36] to achieve full connectivity by providing disjoint paths between the source and the destination. The architecture achieves fault tolerance because the source-destination paths have two paths in between them. Additionally, there is a lower path distance compared to other SEN variants considered and there are distinct paths for reliability enhancements. The key advantage in deploying this approach is the provision of redundancy as there are totally disjoint source-destination paths. According to the authors, the use of MUX and DEMUX at the input and output stages is beneficial with respect to reliability, delay and cost despite that it requires one stage of switching element. The authors of [27] proposed Shuffle Exchange Gamma Interconnection Network Minus (SEGIN-Minus) to achieve the interconnection network reliability and fault tolerance using a fewer number of stages. MUX and DEMUX were included at the input and output terminals, respectively while reducing the number of intermediate stages by one. To achieve fault-tolerance, disjoint paths are formed between each pair of source and destination which improves the network reliability. The evaluation and analysis are obtained using reliability block diagrams (RBDs) for terminal, broadcast and network reliability. Results show that the proposed architecture achieves a more favourable cost, higher reliability and fault-tolerance when compared to other SEGINs in the literature.
Similarly, Gholizadeh et al. [28] proposed a new technique to enhance the reliability of fault-tolerant MIN based on shuffle exchange and shuffle exchange gamma interconnection. This was achieved by adding more MUX and DEMUX to the first and last stages to increase the number of paths. This in turn increases the reliability of the system. Interestingly, by adding these extra modules, a bypass path is created for the source or destination node which reduces the problem of single-point failure. The proposed scheme shows promising results with respect to reliability. However, this is traded-off with cost. Since prior works achieve only one fault tolerance at the input/output stage, Ref. [23] proposed fault-tolerant SEN-( Figure 6) to address the issue of fault tolerance with a higher number of fully disjoint paths. A throughput comparison of two-terminal reliability assured that the proposed scheme provides better reliability.

Reliability
Reliability relates to the ability of the network to successfully perform specific functions [34]. It is a crucial performance measurement parameter that is considered by many researchers as a frontier towards efficient network performance. It is of concern, especially in multiprocessor systems [43]. Reliability becomes more important as the network size and complexity increase [34]. For instance, a reliable MIN would provide multiple paths to achieve the desired performance even if some switching elements fail [23]. This is because the implication of a failed network can be very detrimental considering the diverse applications of communication networks in financial-critical, daily and safety-critical applications [52]. As such, it is critical in interconnection networks as well as mobile ad hoc, electrical, gas and wireless mesh networks, to mention a few [47].
Reliability in MINs is affected by several factors such as network configuration, topology and number of stages [1]. Reliability constraints go a long way to determine whether having more processors improve system reliability. This is because the need for synchronization and the additional complexity involved to achieve error recovery should be well-considered [53]. Similarly, a connection of multiple MINs is possible using MUX and DEMUX, however, this increases the system complexity and hardware cost [29,41]. An example of MINs that achieves higher reliability and reduced packet latency is U-type MINs [24].
From the above, it becomes clear that reliability remains a major challenge. Some of the challenges associated with the definition and evaluation of reliability in communication networks include an adequate definition of reliability, determining possible network states and determining how failures impact reliability especially when there exist several network elements and multilayer protection [54]. Note that fault diagnosis and fault detection are important aspects of reliability [4]. Aside from reliability, another important feature of interconnection systems to be considered is expandability. This feature helps to facilitate the addition of more elements in a manner that does not affect the existing system structure [55].
Some enhancements in reliability can be achieved by adding more MIN stages which increases the distinct source-destination path. However, this approach increases cost and network complexity. Additionally, multiplexer and demultiplexer can be used to improve the network since several MINs can be arranged and connected in a parallel manner. However, the cost and failure rate increase as switching becomes more complex. Another approach is that the network can be replicated over-and-over again so that multiple paths are created between source and destination pairs. This makes its reliability higher than a MIN with a single path. However, the cost, as well as path length, increases [41]. Nevertheless, replicated MINs are suitable for multi-casting at higher reliability [43].

Reliability Measurement
Evaluating network reliability is of paramount importance in the design of faulttolerant MINs. Usually, the main objective in the course of reliability analysis is computing failure distribution of the whole system based on that of its components, which is NP-hard. Reliability analysis can be carried out on extra-stage MINs, parallel MINs, multilayer MINs and Banyan-type MINs. The measurement of reliability could be with respect to the mean time to failure (upper and lower bounds). It is calculated using series-parallel probabilistic combinations [39]. Reliability analysis shows lapses in system design. For instance, a system with components connected in series will fail if any of its components fails. It is thus an inductive approach to perform analysis on the network and system performance [1]. Note that system reliability depends on component reliability, as such, the reliability of system components properly measures the reliability of the entire system.
Reliability measurement can take the form of terminal, broadcast, and network reliability [36]. The probability that there exists a minimum of one fault-free path between the nodes in the source and destination pair under a particular working environment is known as terminal reliability [22,29]. If the scenario considered is an input switch (where one source connects to all destinations [43]), broadcast reliability is evaluated. This is the probability that such a switch can send data broadcasts or connect to the switches at the output end [22]. The network reliability is related to the successful communication probability between all source nodes and destinations within the network. This, as well as broadcast reliability, can be calculated using RBDs [43].

Recent Advances
Terminal reliabilities for existing MINs were estimated and validated in [29]. The authors showed that 4D-GIN-3 (which is a gamma architecture with four disjoint paths where every tag value has six alternative paths) [56] and 4D-GIN-2 (another gamma architecture with four disjoint paths for individual source-to-destination pair which can also accommodate three links/switches failures in intermediate layers) [57] had high network complexity and terminal reliability. As for 4D-GIN-1 [57] (which is similar to 4D-GIN-2 but with reversed links and an interchange of both input and output stages), its terminal reliability is low while complexity is high. It was concluded that 4D-GIN-2 and 4D-GIN-3 [56] performs better with respect to terminal reliability. SEN with an extra stage (SEN+) was shown to exhibit better terminal reliability and hardware cost relative to several other architectures compared. However, the network and broadcast reliabilities of the studied schemes were not analyzed [26]. Thus, this is open for future consideration.
While investigating the reliability of MIN with respect to mean time to failure (upper and lower bounds), Arya and Singh [39] proposed a new network-modified irregular augmented SEN having [(log 2 N) − 1] stages. In the first, last and middle stages, there are N/2, N/2 and N/8 switching elements, respectively. After analysis, results show that the proposed network is more reliable and fault-tolerant as compared to the traditional irregular augmented SEN. Yunus et al. [20] proposed a replicated SEN topology (see Figure 7). They investigated the performance of non-blocking MINs for their terminal reliability. Results revealed that this proposed topology had the highest reliability performance due to the redundant paths created by replication.
The authors in [58] studied the impact of increasing the number of switching stages on the reliability of SENs. They showed that SEN with an extra stage has higher reliability for small-scale systems in addition to its lower hardware cost. However, SEN with two extra stages (SEN + 2) yields a better result than SEN and SEN+ when the size of the network is increased significantly. Besides the aforementioned, the authors provided a time-dependent reliability analysis of SENs which was majorly unanalyzed in prior research. Furthermore, the network availability was studied using the Hierarchical composition method due to the difficulty posed by the traditional Markov model for analyzing fault-tolerant MINs. They also studied the impact of increasing the number of stages in SENs on broadcast and network reliabilities and show that this has very little impact on improving the network performance.

Reliability Modelling
Reliability analysis requires the selection of appropriate mathematical modeling and analysis techniques (for both quantitative and qualitative evaluations). The modeling technique must be able to effectively capture the important parameters of the real system and the analysis technique should be capable of providing insights into the system behavior without running (or executing) the real system [52]. One of the most popular approaches to reliability modelling is the use of an RBD. It is a graphical representation/visualization of the system components and their relationships to determine the overall reliability of the system [22]. In order to have a better picture of the reliability of a system, an accurate modelling and evaluation structure should be assumed. The model should be very close to real-scenarios regardless of what it may involve of very complex series-parallel relationships [42].
Typical examples of RBD are shown in Figure 8 (adapted from [52]) where connector lines join blocks that represent system components. The input and output of the network are located at different ends and each system functions properly only if a minimum of a single path having functional components between source and destination exists. In Figure 8a, none of the components should dysfunction in order to make the entire system functional. Figure 8b-d show series-parallel arrangements with active redundancy. In Figure 8b, all the redundant components are arranged in parallel whereas in Figure 8c,d a combination of series-parallel connections is considered. To achieve active redundancy in such configurations, it should be borne in mind that the links in parallel provide redundancy and as such should remain active [52].
RBD provides a priori evaluation of the design in a manner that changes can be easily made. It is used to evaluate terminal reliability. Additionally, network analysts can have a better pictorial understanding of the system design (see some RBD configurations in Figure 8). However, to clearly visualize the system, it is imperative to break it into components that could be exhaustive. This is because some models cannot be modelled in true series or parallel branches thus, complex RBDs are needed [59]. Moreover, an estimate of the reliability of all elements may not be easily available. It is also important that accurate analysis is made to derive fine-grained performance insights [42].
Some common assumptions in RBD are that links are reliable, switches are either in good or bad condition (working or failed), faults occur independently, and re-routing is done through alternative routes [29]. RBD has limitations especially when the network state is dynamic with respect to time (such as the case of complex failure and repair mechanisms) [60]. Therefore, it is essential to statistically estimate the reliability of network sub-components to effectively deploy RBD for evaluating the overall system. However, this becomes a bottleneck when the data related to the reliability of subcomponents is not available [52]. Switching elements in MINs are susceptible to failure than links thus, switching elements are denoted as nodes in the equivalent reliability logic diagram [22].

Recent Advances
The author of [1] proposed replicated-enhanced augmented SEN (Figure 9) which improves on the enhanced augmented SEN topology ( Figure 10) by replicating the network to create a redundant path while it provides secondary/auxiliary links. This improves the performance in terms of reliability. The analysis of terminal reliability reveals that this replicated network results in higher reliability performance. Although the use of loops increases the number of paths between a source and its destination, it is worthy of note that looping between switch pairs in the SEN architecture increases the size of the switch which can degrade the reliability performance in large-sized networks [6]. Stergiou [24] carried out a reliability study of U-type multistage fabric/topology that can be used to construct parallel systems. The authors showed that as compared to unidirectional fabrics/topologies, these U-type fabrics can achieve high reliability and low latency especially when it is deployed for a high percentage of local-type traffic services.  The authors in [25] presented an on-chip SEN-based architecture with a function switching element having two inputs and two outputs. This is designed in a Quantumdot cellular automata (QCA) nanotechnology which can be configured to form 1 × 2 and 2 × 2 connections. They deployed the switch to implement the nanoscale interconnection structure of multistage SEN with multicast and broadcast capabilities. With this, one-to-one, one-to-multiple and one-to-all communications can be performed. They also examined the hardware complexity and compared it with other QCA-based network architectures.

Reliability Assessment
Proper reliability assessment is important to determine which approach is best to determine the roles played by each system component [47]. A few steps must be followed to carry out reliability assessment in communication networks. These include developing a conceptual model of the system to be studied, calculating the reliability and availability metrics, selecting the reliability modeling technique as well as selecting the reliability analysis technique to be used [52]. Note that reliability analysis should take time dependence and independence into proper consideration. Broadly speaking, two approaches are used for calculating the reliability of complex networks. They are analytical and simulation methods [22].
In analytical methods, exact solutions can be derived for system reliability using redundant paths and information transition time. The information about a system's reliability equation can be used to derive mean time to failure, failure rate at specific points in time and exact reliability values. Additionally, efforts for design improvements can be reinforced using reliability optimization and fault-tolerance techniques [43]. Note that the performance of a system is greatly influenced by the components' interconnection. Thus, it becomes even more difficult to analyze the network performance when the interconnection paths increase.
On a larger scale, networks it is usually useful to consider lower bounds of reliability [34]. Mathematical modelling techniques can yield accurate results. However, a large-scale system is prone to errors since it involves manipulation and simplification requiring manual efforts. Additionally, the mathematician may make several assumptions in his mind without documenting them. Since these assumptions are not conveyed to the design engineers, they are not considered in the system implementation thereby yielding design errors. Moreover, manual methods cannot be used in large scale system analysis (as they may become tedious) [52].
Using the simulation method is scalable since computers can manipulate large system models and simulation tools have user-friendly interfaces. However, this is at the cost of paying for commercial tools requiring high computational strength/resources. Additionally, absolute correctness cannot be achieved considering that pseudo-random numbers and numerical methods are used [52]. Simulation is easy to implement but it requires a large number of simulations for comprehensive analysis. Additionally, very large samples are required due to the statistical nature of simulation models in order to reach an acceptable convergence level for probability estimations (small or large) [43]. This is time-consuming and often challenged by repeatability and reproducibility. Compared to analytical results, the range of results produced by simulations is smaller [35].

Recent Advances
Most works on increasing the reliability of MINs involve increasing the number of intermediate stages which changes the architecture. However, packet collision and blocking also occur when several sources attempt to send data. This motivates the need for multiple access techniques since this problem cannot be completely subverted. In such scenarios, time division multiple access (TDMA) plays a critical role. Thus, the authors in [41] consider an adaptive slot allocation approach where a Monte Carlo random sampling method is applied to SEN and SEN+. Network simulator 2 reveals that these SENs' reliability and throughput performances are better than the regular types. However, a major challenge lies in large-scale networks when it becomes difficult to synchronize nodes using TDMA.
Additionally, Panda et al. [22] proposed a method based on graph theory to calculate terminal and broadcast reliability of MINs. In the proposed method, MIN is viewed as a reliability logic graph that is reduced to a subgraph having minimal path sets/broadcast trees. The minimal cut set of the subgraphs is generated to subvert searching and validating the minimal cut set of the whole reliability logic graph. New algorithms are considered to support the proposed method by calculating both terminal and broadcast reliability. The algorithms' performances are compared with the reliability values of other algorithms with analytical methods. The results show they can estimate the reliability of MINs.
Stergiou and Garofalakis [21] proposed an interconnect architecture that accommodates several internal paths and a fan-out at the end. Particularly, the authors combined both the multipath and multilayer-based architectures to yield a network of size N × N with a total of (log k N − S) stages (S refers to the number of stages of a segment with a single layer) and 3N 4 switches in each stage. The proposed architecture is flexible and can address blocking. Similarly, the proposed architecture helps to achieve uniform load dispersion due to the presence of MUXs in front of the network. The authors presented a qualitative analysis and show that the proposed system improves reliability, fault-tolerance and as such can be commercialized and used in practice.
Abedin et al. [26] also proposed a parallel SEN (PSEN) architecture ( Figure 11) where the terminal, network and broadcast reliabilities of SENs are enhanced. The proposed architecture is also used as a framework where SENs with one and two extra stages are plugged in. In Figure 11, the green line (below) indicates a non-faulty path which provides redundancy for the red colored path (above). With the use of RBD, the authors showed that the performance of the proposed method is better than SEN in terms of of reliability and cost-effectiveness compared to other forms of MIN. They also concluded that parallelizing SENs improves the performance as compared to adding more stages to the traditional SEN architecture.

Configuration
Interconnection network performance is highly influenced by its configuration and thus, an efficient design of the interconnection between network components is sought. Proper design decisions also have to be made with respect to its topology, mode of operation and proper network control. Future work could incorporate queues, modified control algorithms, in addition to the multiple passes to enrich the network. This is based on the fact that such techniques can augment performing permutations. The literature reviewed showed that the consideration of replication, use of multiplexers and demultiplexers, parallelism, reducing stages, support for broadcast and multicast e.t.c., are being used to improve the network performance. Some of these techniques are based on established approaches to improving network reliability. However, the issues concerning adaptation to application-specific scenarios (such as on-chip communication) as well as the tradeoffs involved in these configurations should be considered in future works. Similarly, the performance when some of these recently proposed architectures are merged with other types of MINs can be studied. As regards modelling, cost-efficient models are essential.

Cost-Effectiveness
Cost-effectiveness depends on the implementation of hardware, number of processors, inter-processor data transfer, amongst others. Particularly, cost is important because expensive schemes are practically difficult. Therefore, proposed enhancements to existing architecture should make thorough comparisons with respect to cost, reliability and complexity [21]. One way to achieve a lower link cost is by a hierarchical architecture since the merit of local communication in parallel applications can be leveraged. Such an architecture can be realized by link replication and the use of interface nodes [69].
With regards to reliability-cost trade-offs, designers should provide with sufficient information needed to make proper decisions on the architecture to deploy. Reducing the number of network components and power consumed is one way by which designers aim at reducing system cost. However, to effectively reduce network components while maintaining the computational power can either be achieved by an increase in end nodes attached to switches or designing a more appropriate fabric. Nevertheless, both options increase the link use level which makes the network prone to congestion. This motivates the need for congestion management schemes.

Congestion
The problem of congestion is quite popular in interconnections systems [66]. It occurs when the same resource is requested by different packets. In buffered interconnection networks, persistent contention leads to the filling of buffers by blocked packets. Whenever it is not allowed to drop packets, the flow control mechanism ensures other switches do not send packets to congested parts which could make congestion spread in all the network. Congestion management is hardly mentioned in the study of several SEN variants despite it is a matter of concern. As such efforts should be made towards proposing congestion management schemes especially for bursty traffic. Note that congestion is quite difficult to handle in the lossless network as opposed to a lossy network. Tackling this problem can either be done in a proactive or reactive manner. It is essential to know the network states in proactive schemes as well as application requirements. However, this knowledge is not always available, especially in large-scale networks. The reactive approach has scalability issues with hotspot traffic as a result of the increased delay between traffic detection and response time.

Traffic
Models should also be developed to study metrics such as latency for different traffic patterns (such as broadcast and multicast). This will aid designers in understanding how the network performs in various scenarios as traffic type depends on applications and it affects synchronization, message pattern, etc. For instance, bursty traffic emphasizes the need for proper congestion management. This makes it evident that not all traffic is homogeneous and balanced. The reliability equations for such scenarios should also be formulated. By studying the impact of traffic patterns (uniform and non-uniform, e.g., hotspot), the impact of faults on the system performance can easily be predicted. Efficient flow control protocols that cater for the diverse and increasing network traffic patterns should be given proper consideration and incorporated into the network design, analysis and component fabrication. Similarly, system analysis should consider network traffic with constant and varying packet sizes.
For non-uniform traffic, the use of buffers can help improve the performance of MINs. In this respect, buffers that suit different types of applications and latency constraints should be developed. It is also imperative that buffers are incorporated into network design and performance evaluation for developed architectures as it makes the system assumptions more realistic. For example, buffered MINs perform well under uniform traffic while they degrade when the traffic pattern becomes non-uniform [70]. Thorough simulation and fine-grained analysis are required to capture the impact of buffers in newly developed SEN variants. With this in mind, the development of special simulators that can simulate the various traffic types for complex architectures is important. Since buffers (with finite capacities) are required in practice to temporarily store packets, it is important to avoid buffer overflow [71] in MINs.
Multicore systems should support both multicast and local traffic [67]. Multicast has the potential to cause network saturation [72]. Multicast traffic occurs if a core changes a stressed shared variable in the cache of other cores. On the other hand, the need to process many applications within a system distributed over cores might imply processing applications in parallel. This motivates the study of local traffic [67]. When multicast traffic is considered due to packet replication, applying MINs in multicore processors can give rise to multicast traffic which may occur when packets are copied within the switching elements before they go into the network. It is essential to incorporate/consider traffic densities to design MINs suitable for multicasting in parallel computers and multicore processors. Problems such as out of order packet sequences may occur in replicated MINs due to the existence of multiple source-destination paths. To avoid packets' disordering, packets belonging to the same message can be set to the same layer [68].

Scalability
Scalability is a fundamental issue in the design of parallel systems (See [73] for instance where scalability and availability are considered positive aspects of Tofu interconnect network). However, it is seldom discussed with respect to SEN variants as observed in most of the reviewed literature in this paper. For instance if the network size is increased does a particular architecture still provide the same reliability gains as compared to other works and to what extent does it scale well. Hence, this should be borne in mind during performance evaluation and analysis. In this respect, models that can use performance analysis of small-scale systems to predict the larger-scale deployments are essential (especially the hardware-related scalability in MIN) [74]. This will reveal the impact of increasing the network size on proposed architectures and the level of performance gains that can be achieved. However, note that studying a representative small-sized network for evaluation will only simplify the analysis. It is essential to know how the performance of the topology scales with an increase in hardware. In the course of network design, the support for maximum bandwidth and speedy data access (i.e., access to data stored in memory) should also be considered.

Practical Assumptions
It is essential that assumptions properly match the studied scenario in practice. Therefore, more practical assumptions should be made while the network analyses of proposed models should be as tractable as possible. As such, more fine-grained models that can reveal detailed system performance prediction closer to those experienced with real hardware are needed. Furthermore, highly realistic reliability models are important to ensure that the performances of newly developed architectures are well-predicted. Efficient routing techniques should be developed to allow different forms of communication between system components. It is also essential to ensure that congestion is well-managed within the interconnection network system as well as factored into different aspects of the network design.
When intermediate stages are increased to enhance the throughput and reliability performance, it should be noted that such architectures are prone to collisions when several sources begin to send data. For this reason, proper MAC designs are required to solve the issue. To analytically handle MINs, it is necessary to lower the model complexity which can be achieved if each stage in the network is reduced to a switching element. However, this would only occur under certain assumptions such as equal packet size, random packet conflict resolution, uniform distribution of multicast traffic among network outputs and the same load is sent to all inputs. As such buffers can be handled equally where one buffer can take the position of all buffers in the same network stage [72].

Switching
Newly developed architectures should follow the trend and advances in switching technology, both in the design and analysis stages. Towards contributing to more reliable topologies, the heterogeneous nature of switching elements should be adapted into network design and evaluation. It is common to find that a single reliability value is assumed for all system components (e.g switches) which does not totally conform what is obtainable in practice. The use of heterogeneous switching elements is closer to what is obtainable in practice since switches might be of different sizes and they might have to be handled/analyzed with different reliability values. However, this complicates the analysis. Adding more switches also increases the network complexity. Another issue is the performance analysis when links are less reliable as compared to switching elements [63]. Furthermore, the type of traffic handled, compatible routing schemes, etc. should all be considered. Proper load distribution is another important issue to be incorporated in system design to accommodate the failure of a switch. This is essential in developing internetwork architectures.
For efficient network performance, it is essential to properly manage switches within the network to increase reliability. Switches can connect in various forms to achieve different network target objectives. As such, the end result obtained from diverse types of connections would differ. Duplicated switches can be considered in the network design. Proper evaluation of bounds on the number of permutation passes is also required. The inclusion of more stages in the network to improve the fault tolerance would also require proper load balancing in the presence of a failure. Moreover, the re-arrangement of links and proper resource scheduling are core issues to be borne in mind for efficient design in future proposals.

Performance Analysis
Performance modelling helps designers make accurate decisions on alternatives regarding interconnection network system design [75]. Recent research (see [58]) has shown that there exist a gap in current literature with regards to time-dependent reliability analysis. Furthermore, the event that a terminal node (source/destination) fails should be incorporated into future analysis to make it more comprehensive [58]. It is also essential to make numerical analysis for very large-scale networks. The study of network availability is another aspect which should be adapted into future research. It is also important to study the impact of increase in the number of stages with respect to replication, looping (extra links) and parallelism for the various SEN variants proposed. A full consideration of reliability aspects such as broadcast, terminal and network reliabilities should also be studied in this context. This is because it is common to find that only terminal reliability is studied which does not give a clear picture of broadcast and network reliability performance.

Network Optimization
Network optimization is an aspect to be considered as different parameters and configurations affect the network and optimal values may change with changes in topology and parameter. For instance, the number of stages, path length, etc., can all be optimized for higher fault tolerance and improved latency. The design should be simple such that even an increase in network size should not cause a degradation in the latency performance. Service provisioning optimization and optimal routing are important target objectives that can be achieved. With respect to the network architecture, the traditional shuffle exchange topology can be extended to cater for multi-priority MINs with varying quality of service considering application demands (such as file transfer, multimedia streaming) [64]. A generalized class priority MINs can also be studied in future work.

Deadlock avoidance
Another potential direction is the consideration of multicast MINs with deadlock avoidance applied to the design of parallel architectures and high-speed networks. Besides the techniques for handling interrupts, creating multiple paths, using routing tags (for cyclic MINs) and auxiliary links are very handy towards the design of reliable and fault-tolerant architectures. However, trade-offs with respect to cost and hardware complexity should be spelt out. The integration of MINs as an underlay/backbone for wireless network-on-chip communication has just begun to receive attention for improving the network performance (see [76]). It is expected that more architectures would be adapted to network-on-chip (NOC) communication in the future.

Application to NOC
NOC is considered a scalable architecture especially for multicore chips [77]. MINs are potential topologies for use in the design of NOC [68]. Interconnection architecture in embedded systems is important to the overall performance of modern embedded systems. In particular, MINs can ensure efficient communications for multiprocessor systems on chip [78]. NOCs have the potential to improve system efficiency as well as scalability. However, the system/network throughput performance is affected by chip faults. Generally, topologies and routing algorithms are designed to detect and correct faults. NOC requires that data between processing elements are transferred in a well-designed/efficient network where the packets are routed via reliable and safe paths. A suitable topology and optimized routing algorithm can improve the network performance [79]. Additionally, it is quite essential to achieve very low latency with reduced transmit power in NOC even when there are faulty links.
The application of shuffle exchange-based architectures to the NOC domain to form a suitable communication structure under application-specific traffic conditions also opens avenues for further studies where communication speed, reliability, fault-tolerance, throughput and delay are highly essential.

Conclusions
Interconnection networks facilitate effective communication between components such as processors and memory modules for performing tasks within large-scale computing systems. It has diverse applications in different aspects of human knowledge such as geography, engineering and biology where supercomputers are used to execute complex computations and simulations. This paper provides an insight into some of the main design considerations within shuffle exchange interconnection network systems. Particularly, recent works in this area have been highlighted and trends have been discussed.
Some of the findings include the need to integrate interconnection network architectures into newly developed technologies such as network-on-chip. Although a considerable effort has been made on improving the reliability of interconnection networks, more can be done to cater for the demand of providing ultra-reliable fast communication and connectivity for newly developed computational intensive applications. Additionally, effective fault-tolerant routing techniques are required to improve the performance of multistage interconnection networks. More precise analytical models and realistic assumptions are required to advance the research and understanding of proposed network architectures. The nature of the traffic handled by the network should also be borne in mind in the design of interconnection network systems.