1. Introduction
System reliability has always been an issue of widespread concern, and reliable systems have been widely required by various applications [
1,
2,
3,
4,
5,
6,
7,
8]. If the reliability of the system is not guaranteed, it will most likely cause serious consequences and even threaten human life. However, with silicon technology scaling, the reliability of nano-components is generally not very good. Hence, how to build reliable systems out of unreliable components is an inevitable problem. In order to solve this problem, scholars have investigated several redundant fault-tolerant techniques, such as N-tuple modular redundancy (e.g., triple modular redundancy) [
9,
10] and reconfiguration [
11,
12,
13]. However, these techniques do not yield high fault tolerance for nanocomputers due to the extreme high devices’ density and the high percentage of faulty components. Since faulty components are the building blocks of the von Neumann’s multiplexing technique, faulty components are an integral part of the system. As a result, von Neumann’s multiplexing technique has received attention again [
14]. A wealth of papers that reported the performance analysis of multiplexing technique have been published and, among them, the most attention was paid to NAND multiplexing [
15,
16,
17,
18] and majority multiplexing [
19,
20,
21], first proposed by von Neumann. The multiplexing technique has been studied as an effective fault-tolerant technique for protection against the increasing transient faults in nanoelectronic circuits [
22,
23,
24,
25]. Hence, in addition to the two multiplexing techniques, scholars paid attention to other types of multiplexing technique, e.g., NOR-2multiplexing [
26].
None of those multiplexing schemes mentioned how to realize the function of XOR or XNOR. In fact, they are unable to achieve it. As a universal logic gate, XOR and XNOR are widely used in integrated circuits, therefore it is necessary to study XOR multiplexing or XNOR multiplexing. In this paper, we present the XOR multiplexing technique for nanocomputers. This will make the research of multiplexing technique more comprehensive. The newly designed architecture is composed of XOR gates and NAND gates, where the XOR gates constitute the executive unit and the NAND gates constitute the restoring organs. The executive unit performs the desired logic function and the restoring organs perform the error correction function. First, we analyzed the error distributions in the XOR multiplexing unit and compared it with von Neumann’s NAND multiplexing unit. Then, the error distributions of the multiple stages system and the comparison are presented. Last, we analyzed the system performance of the architecture, i.e., its fault-tolerant ability. The system performance of the architecture is evaluated by studying its fault-tolerant ability, which can be defined by the gate error threshold and the input signal error threshold, where the gate error threshold is the maximum gate error probability in which the system can still work properly, and the input signal error threshold is the maximum input signal error probability that the system can tolerate. The experiment’s results show that the XOR multiplexing unit is more reliable than the NAND multiplexing unit and this technique has a high fault-tolerant ability and a unique feature; we name this as the critical point property, which can indicate the fault tolerant ability of the system.
The rest of paper is arranged as follows. In
Section 2, we present the error distributions in the XOR executive unit and the XOR multiple stages multiplexing system, then compare them to the NAND executive unit and the NAND multiple stages multiplexing system. In
Section 3, we discuss the bifurcation analysis, which is followed by
Section 4: the fault-tolerant ability analysis of the XOR multiplexing system.
Section 5 concludes the paper.
3. Bifurcation Analysis of the XOR Multiplexing System
Multiplexed systems contain two types of organs. The first type is the executive organ, which performs the desired basic operations on the bundles. The second type of organ is the restoring organ, which uses the redundant information available from the input bundle to provide more reliable information on the output bundle. Any logic gates, like the NAND gate, NOR gate, AND gates and OR gates, effectively alternate critical inputs (which produce critical errors) and subcritical inputs (which produce subcritical errors), thereby performing error correction. Among them, the NAND gate restoring organ is the first two-layer restoring organ with effective error correction ability. As shown in
Figure 5, the XOR multiplexing system is composed of the XOR executive unit and NAND restoring organs. In order to make the system stable, multiple restoring organs would be necessary. Note that the odd stage number is necessary to keep the XOR function.
In order to derive the error threshold value for two-input XOR gates and two-input NAND gates, the circuit schematic shown in
Figure 6 is involved. As can be seen, the circuit schematic is a binary tree of cascaded two-input unreliable XOR gates and NAND gates [
14,
23]. Assume that the XOR gates and NAND gates have the same error probability
of making a von Neumann error, and their input lines and output lines function reliably. Let us denote the probabilities of the two inputs of the XOR gate being stimulated by
X and
Y. Since there are no feedback loops and fan-out in the circuit, the two inputs can be treated as independent. Then, the probability of the output of XOR gate being stimulated is
In the following analysis, we shall first assume that this circuit is a discrete time system. Then, further assume that all inputs to the XOR gates are independent and have the same probabilities,
X and
Y, of being stimulated. This structure not only guarantees that the inputs to all NAND gates at an arbitrary stage
n are also independent but also guarantees that they have equal probabilities of being stimulated, which we denote to them by
[
14]. Thus, for the second stage, the first stage of the NAND gates, the probability of the output being stimulated is
For such a construction, Equation (6) reduces to a simple nonlinear map
In order to discover the dynamic behavior of the map, bifurcation analysis is used to analyze Equation (7) [
23]. For any fixed
, we choose an arbitrary initial condition
X,
Y and then iterate Equation (7) until, after a sufficient number of iterates. it converges to an attractor. Those attractors are then plotted against each
[
14,
23]. This leads to a nonlinear map called a bifurcation diagram and the diagram is shown in
Figure 7 (∆
= 0.001). This nonlinear map contains two kinds of attractors, fixed-point solution and periodic motion. The period-doubling bifurcation occurs at bifurcation point
. When
, the system has a stable fixed-point solution; by solving the equation
, we get
By stable, it means that for any arbitrary initial inputs condition
X and
Y, the output
will converge to
when
n is large. In other words, in this region, the system no longer functions as XOR. When
, the system exhibits periodic motion with period 2, namely
, loses stability. We denote those two points by
and
. At
nth stage, when
is input, then
would be output and vice versa [
14,
23]. That means
From Equations (9) and (10), one obtains
Clearly, when , we have and it can be derived that the bifurcation point . Now, it is easy to see that the error probability interval where the system functions is . When , the outputs converge to the stable fixed point regardless of what the initial inputs are. Hence, the gate error threshold is the bifurcation point .
Using fixed error probability
from 0 to 0.1, and plotting the 3-D diagrams of
X,
Y and
Z for the XOR multiplexing system, leads to
Figure 8. From
Figure 8, we can clearly observe the transformation of output from two distinct states to a fixed point when we fixed error probability
from 0 to 0.1, with
as the bifurcation point.
4. Fault-Tolerant Ability Analysis
In the last section, we analyzed the tolerant ability of the gate error probability (gate error threshold). Now let us analyze the tolerant ability of input signal error (input signal error threshold). In order to map each output probability to a logic state, we need a threshold
. According to
Figure 7, it is notdifficult to find out that
is a good choice for
. It is simple and effective. Substituting
into Equation (8), then we have
Below, we shall interpret
as non-stimulated state and
as a stimulated state. When we have fixed the input
and
, then we can get 3-D diagrams, as shown in
Figure 9. Clearly, the XOR multiplexing system has a higher fault-tolerant ability when inputs are both stimulated or both non-stimulated. Seen in
Figure 9, the effectiveness of this threshold is obvious.
Note that for each different fixed
Y, there is a different value of
X (here we name it as critical point and denote it by
) that divides the output into two states when
. Take
as an example, in the interval
, when
, the output would be non-stimulated, and when
, the output would be stimulated. The calculation of the critical point can help us more intuitively understand the fault tolerant ability of the system. Since when
n is large enough and
, the output only depends on the input condition: input
X and
Y have the same logic state (both stimulated or both non-stimulated) or have a different logic state (one of the inputs is stimulated and the other one is non-stimulated). Let us denote the probability that two inputs
X and
Y have a different logic state by
, and denote the probability that two inputs
X and
Y have the same logic state by
. Therefore, the ratio of
and
will be a key parameter to determining if the final output
is larger than
or not.
X and
Y are the probabilities of inputs being stimulated, and then
and
are the probabilities of inputs being non-stimulated.
and
are shown as follows
If we need the output to be stimulated, then
must be larger than a specific value that is greater than one. Since the output logic state is associated with the output threshold
, the specific value will be a function of
and the mathematic relation between them is shown below.
Clearly,
, hence Equation (15) is equivalent to
If
, the final output would be larger than threshold (stimulated). When the inequality becomes
The final output would be smaller than threshold (non-stimulated). Hence, it is easy to obtain the critical point
for each fixed
Y by solving the following equality
Critical point
is a function of
Y; these critical points are then plotted against each
Y (∆
Y = 0.01). This leads to
Figure 10, which shows that the diagram has two regions and for each different
Y critical point
has a different value and there is a parameter interval that makes the system no longer function, even though the system is fault-free, and the parameter interval is approximately 0.3924 <
Y < 0.6076. If the value of one of the inputs is in this interval, then the output will always be non-stimulated for XOR multiplexing.
In order to demonstrate the tolerant ability of the input signal error probability of the system more intuitively, we extracted several fixed
Y and the corresponding
from
Figure 10. These lead to
Table 1. Let us take
Y = 0.7 as an example; when input
Y has a probability of 70% of being stimulated (means 30% error probability), any stimulated probability smaller than 23.1% of the other input
X can be accepted. That is to say, the system can tolerate error probabilities of 30% and 23.1% for the inputs
Y and
X. Other situations are similar, so we omit them here. It also can be obtained that the maximum input signal error probability that the system can tolerant is 0.3924 (39.24%); namely, the input signal error threshold is 0.3924.
5. Conclusions
In order to make systems based on unreliable nanoelectronics reliable, it is necessary to design fault-tolerant architectures. This paper can be seen as a part of the endeavor devoted to this work. In this paper, we have studied a new fault-tolerant architecture for nanocomputers: XOR multiplexing. This fault-tolerant technique, based on a massive duplication of imperfect devices and randomized interconnections, was comprehensively studied. We have analyzed the error distributions of the XOR multiplexing unit and multiple stages of the XOR multiplexing system, then compared them with the NAND multiplexing technique. Analysis results have shown that the XOR multiplexing system has more stages to improve the fault tolerance. Comparison results have shown that the XOR multiplexing unit is more reliable, since it produces fewer faulty outputs than the NAND multiplexing unit. The fault tolerance ability analysis results have shown that the system has a high gate error tolerant ability and is expected to work at an acceptable reliability level when inputs have different logic states, and expected to work at a much higher reliability level when inputs have the same logic state. Although the conceived fault-tolerant architecture requires a rather large number of redundant components, which makes it inefficient for protection against permanent faults, it might be a system solution for the ultra large integration of highly unreliable nanometer-scale devices affected by dominant transient errors. Hence, this architecture is potentially effective in protection against transient faults for systems based on unreliable nanometer-scale devices. In the future, we hope to be dedicated to improving this technique so that it has a better fault-tolerant performance and a lower system redundancy.